Re: Basic git-archive --remote question

2013-06-26 Thread Jeff King
On Mon, Jun 24, 2013 at 03:53:56PM -0400, Greg Freemyer wrote:

> I'm trying to create a tarball from a git tag and I can't get the
> syntax right.  The documentation is not very clear.
> [...]
> > git archive --format=tar --remote=github.com:dkovar/analyzeMFT.git v2.0.4

Your remote should be g...@github.com:dkovar/analyzeMFT.git.

But...

> > git archive --format=tar --remote=git://github.com/dkovar/analyzeMFT v2.0.4
> 
> fatal: remote error
>Your Git client has made an invalid request:
>003agit-upload-archive /dkovar/analyzeMFT

Your syntax is fine, but GitHub does not support remote git-archive
requests. Not over git://, and not over ssh. In both cases the error
messages are rather confusing and unhelpful, though.

> > git archive --format=tar --remote=//github.com/dkovar/analyzeMFT v2.0.4

This is wrong, as it is looking for //github.com... in the local
filesystem.

> The github page also says I can use ssh with git as the user, but that
> complains I don't have the private key (which I don't):
> 
> git archive --format=tar
> --remote=ssh://git@github/com/dkovar/analyzeMFT.git v2.0.4

If you have a GitHub account and register your public key, then you can
use ssh to fetch from the repository (using the g...@github.com:user/repo
syntax). However, remote git-archive is not currently supported, so it
would not help you.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/5] Reroll patches against v1.8.3.1

2013-06-26 Thread Alexey Shumkin
On Tue, Jun 25, 2013 at 12:28:02PM -0700, Junio C Hamano wrote:
> Alexey Shumkin  writes:
> 
> > 4. [PATCH v6 4/5] pretty: Add failing tests: --format output should honor 
> > logOutputEncoding
> > iso8859-5 encoding reverted back to cp1251 encoding (as it was in v4 
> > series)
> 
> Thanks for a reroll, but why this change?
> 
> The reason I asked you to avoid 8859-5 is because our test do not
> use that encoding and I do not want to add new dependency to people
> when they run test.  cp1251 shares exactly the same issue, doesn't
> it?  So in that sense, this change does not make anything better.
> 
> That is why I asked you if the breakage you are trying to
> demonstrate about non-ASCII non-UTF8 encoding was specific to
> Cyrillic/Russian.  I do not recall seeing your answer, but what is
> the right thing to do depend on it.
> 
>  - If the answer is yes, then we would need to add dependency either
>way, and 8859-5 is just as fine as cp1251.
> 
>  - If the breakage is not specific to Cyrillic, it should be
>reproducible using 8859-1 (latin-1), and our tests already depend
>on 8859-1, so we wouldn't be adding new dependencies on people.
> 

I suppose you've missed my answer somehow. It was:
---8<---
> Alexey Shumkin  writes:
> 
> > @@ -19,7 +23,8 @@ add_file () {
> > echo "$name" >"$name" &&
> > git add "$name" &&
> > test_tick &&
> > -   git commit -m "Add $name" || exit
> > +   msg_added_iso88595=$(echo "Add $name ($added $name)" | 
> > iconv -f utf-8 -t iso88595) &&
> > +   git -c 'i18n.commitEncoding=iso88595' commit -m 
> > "$msg_added_iso88595"
> 
> Hmph.  Do we know 8859-5 is available or do these need to be
> protected with prereq?
Opps, this encoding is absent even in my Cygwin box :)
Actually, previuosly, there was a Windows-1251 encoding,
you said we'd prefer to limit different encodings used in tests,
And I've made an attempt to avoid this but I'm way off the mark.

> 
> Can these tests be done with 8859-1 i.e. something we already depend
> on, by changing that $added message to latin-1, or is there something
> very Russian specific breakage we want to test here?
Well, actually, most popular Russian 8-bit codepages are Windows-1251 (aka 
cp1251) and KOI8-R.
Cygwin (as a "Linux box on Windows") uses cp1251.
Iconv cannot even convert Russian messages from cp1251(or UTF-8) to latin1.
It fails.
  $ echo "коммит" | LANG= iconv -t latin1 -f UTF-8 
  iconv: illegal input sequence at position 0
("коммит" is a "commit" in Russian)

> 
> If the former, please redo this entire patch (not just t4041) with
> 8859-1.
> 
> If there is some Russian specific breakage you cannot demonstrate
> with 8859-1, then please explain it in the log message.
Well, it's not a "Russian specific". It's "codepage conversion" specific,
but I cannot use latin1 codepage because I do not know any language that
uses iso8859-1/latin1 codepage (as German, Spanish, for example) to
write a commit message in it.
If someone can do the same with latin1, I'd be happy.
> 
> > diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
> > index d32e65e..36e4cc0 100755
> > --- a/t/t6006-rev-list-format.sh
> > +++ b/t/t6006-rev-list-format.sh
> > ...
> > -# usage: test_format name format_string  > +# usage: test_format [failure] name format_string  >  test_format () {
> > +   local must_fail=0
> 
> This breaks tests for non-bash users.  "local" is not even in POSIX.
---8<---

But today I've taken a look to Cygwin's locales more closely and found
out that I've used incorrect encoding name (`iso88595` instead of "canonic"
`iso-8859-5` that Cygwin has and "understands")

Nevertheless, as I've already said that is not a Russian locale specific
issue.
The problem in tests for me now is a language (that uses iso-8859-1
encoding) I do not speak or even write ;)

-- 
Alexey Shumkin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 1/5] t6006 (rev-list-format): don't hardcode SHA-1 in expected outputs

2013-06-26 Thread Alexey Shumkin
The expected SHA-1 digests are always available in variables. Use
them instead of hardcoding.

Signed-off-by: Alexey Shumkin 
---
 t/t6006-rev-list-format.sh | 140 +
 1 file changed, 77 insertions(+), 63 deletions(-)

diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index 0393c9f..cc1008d 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -7,8 +7,19 @@ test_description='git rev-list --pretty=format test'
 
 test_tick
 test_expect_success 'setup' '
-touch foo && git add foo && git commit -m "added foo" &&
-  echo changed >foo && git commit -a -m "changed foo"
+   : >foo &&
+   git add foo &&
+   git commit -m "added foo" &&
+   head1=$(git rev-parse --verify HEAD) &&
+   head1_short=$(git rev-parse --verify --short $head1) &&
+   tree1=$(git rev-parse --verify HEAD:) &&
+   tree1_short=$(git rev-parse --verify --short $tree1) &&
+   echo changed >foo &&
+   git commit -a -m "changed foo" &&
+   head2=$(git rev-parse --verify HEAD) &&
+   head2_short=$(git rev-parse --verify --short $head2) &&
+   tree2=$(git rev-parse --verify HEAD:) &&
+   tree2_short=$(git rev-parse --verify --short $tree2)
 '
 
 # usage: test_format name format_string foo && git commit -a -F commit-msg
+   git config i18n.commitencoding iso8859-1 &&
+   echo change2 >foo && git commit -a -F commit-msg &&
+   head3=$(git rev-parse --verify HEAD) &&
+   head3_short=$(git rev-parse --short $head3)
 '
 
-test_format complex-encoding %e <<'EOF'
-commit 1ed88da4a5b5ed8c449114ac131efc62178734c3
+test_format complex-encoding %e expect commit $head3 &&
echo >>expect fooQbar &&
git rev-list -1 --format=foo%x00bar HEAD >actual.nul &&
nul_to_q actual &&
-- 
1.8.3.1.15.g5c23c1e

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 0/5] Reroll patches against v1.8.3.1

2013-06-26 Thread Alexey Shumkin
v7 of this patch series includes the following changes against v6:
1. [PATCH v7 1/5] t6006 (rev-list-format): don't hardcode SHA-1 in expected 
outputs
untouched
2. [PATCH v7 2/5] t7102 (reset): don't hardcode SHA-1 in expected outputs
untouched
3. [PATCH v7 3/5] t4205 (log-pretty-formats): don't hardcode SHA-1 in expected 
outputs
untouched
4. [PATCH v7 4/5] pretty: Add failing tests: --format output should honor 
logOutputEncoding
cp1251 encoding changed to iso-8859-1 encoding already used in tests
Test commit messages contain words ("changed" and "added") in German
(which is covered by iso-8859-1 encoding)
They are translated from English (verified in Russian) with Google Translate
5. [PATCH v7 5/5] pretty: --format output should honor logOutputEncoding
builtin/reset.c:
"const char ..., *msg;" declaration replaced with "char *msg;"
to avoid compiler warning on the line "logmsg_free(msg, commit);"

P.S.
It's all started here 
[http://thread.gmane.org/gmane.comp.version-control.git/177634]

Alexey Shumkin (5):
  t6006 (rev-list-format): don't hardcode SHA-1 in expected outputs
  t7102 (reset): don't hardcode SHA-1 in expected outputs
  t4205 (log-pretty-formats): don't hardcode SHA-1 in expected outputs
  pretty: Add failing tests: --format output should honor
logOutputEncoding
  pretty: --format output should honor logOutputEncoding

 builtin/reset.c  |   5 +-
 builtin/rev-list.c   |   1 +
 builtin/shortlog.c   |   1 +
 log-tree.c   |   1 +
 submodule.c  |   1 +
 t/t4041-diff-submodule-option.sh |  25 +++--
 t/t4205-log-pretty-formats.sh| 179 -
 t/t6006-rev-list-format.sh   | 207 ---
 t/t7102-reset.sh |  37 ++-
 9 files changed, 293 insertions(+), 164 deletions(-)

-- 
1.8.3.1.15.g5c23c1e

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 2/5] t7102 (reset): don't hardcode SHA-1 in expected outputs

2013-06-26 Thread Alexey Shumkin
The expected SHA-1 digests are always available in variables. Use
them instead of hardcoding.

Signed-off-by: Alexey Shumkin 
---
 t/t7102-reset.sh | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/t/t7102-reset.sh b/t/t7102-reset.sh
index df82ec9..05dfb27 100755
--- a/t/t7102-reset.sh
+++ b/t/t7102-reset.sh
@@ -192,7 +192,8 @@ test_expect_success \
'changing files and redo the last commit should succeed' '
echo "3rd line 2nd file" >>secondfile &&
git commit -a -C ORIG_HEAD &&
-   check_changes 3d3b7be011a58ca0c179ae45d94e6c83c0b0cd0d &&
+   head4=$(git rev-parse --verify HEAD) &&
+   check_changes $head4 &&
test "$(git rev-parse ORIG_HEAD)" = \
$head5
 '
@@ -211,7 +212,7 @@ test_expect_success \
git reset --hard HEAD~2 &&
check_changes ddaefe00f1da16864591c61fdc7adb5d7cd6b74e &&
test "$(git rev-parse ORIG_HEAD)" = \
-   3d3b7be011a58ca0c179ae45d94e6c83c0b0cd0d
+   $head4
 '
 
 >.diff_expect
@@ -326,10 +327,11 @@ test_expect_success '--hard reset to HEAD should clear a 
failed merge' '
git checkout branch2 &&
echo "3rd line in branch2" >>secondfile &&
git commit -a -m "change in branch2" &&
+   head3=$(git rev-parse --verify HEAD) &&
 
test_must_fail git pull . branch1 &&
git reset --hard &&
-   check_changes 77abb337073fb4369a7ad69ff6f5ec0e4d6b54bb
+   check_changes $head3
 '
 
 >.diff_expect
-- 
1.8.3.1.15.g5c23c1e

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 3/5] t4205 (log-pretty-formats): don't hardcode SHA-1 in expected outputs

2013-06-26 Thread Alexey Shumkin
The expected SHA-1 digests are always available in variables. Use
them instead of hardcoding.

Signed-off-by: Alexey Shumkin 
---
 t/t4205-log-pretty-formats.sh | 48 +++
 1 file changed, 26 insertions(+), 22 deletions(-)

diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
index 26fbfde..73ba5e8 100755
--- a/t/t4205-log-pretty-formats.sh
+++ b/t/t4205-log-pretty-formats.sh
@@ -101,7 +101,11 @@ test_expect_failure 'NUL termination with --stat' '
 
 test_expect_success 'setup more commits' '
test_commit "message one" one one message-one &&
-   test_commit "message two" two two message-two
+   test_commit "message two" two two message-two &&
+   head1=$(git rev-parse --verify --short HEAD~0) &&
+   head2=$(git rev-parse --verify --short HEAD~1) &&
+   head3=$(git rev-parse --verify --short HEAD~2) &&
+   head4=$(git rev-parse --verify --short HEAD~3)
 '
 
 test_expect_success 'left alignment formatting' '
@@ -117,18 +121,18 @@ EOF
test_cmp expected actual
 '
 
-test_expect_success 'left alignment formatting at the nth column' '
-   git log --pretty="format:%h %<|(40)%s" >actual &&
+test_expect_success 'left alignment formatting at the nth column' "
+   git log --pretty='format:%h %<|(40)%s' >actual &&
# complete the incomplete line at the end
echo >>actual &&
qz_to_tab_space <<\EOF >expected &&
-fa33ab1 message twoZ
-7cd6c63 message oneZ
-1711bf9 add barZ
-af20c06 initialZ
+$head1 message twoZ
+$head2 message oneZ
+$head3 add barZ
+$head4 initialZ
 EOF
test_cmp expected actual
-'
+"
 
 test_expect_success 'left alignment formatting with no padding' '
git log --pretty="format:%<(1)%s" >actual &&
@@ -195,18 +199,18 @@ EOF
test_cmp expected actual
 '
 
-test_expect_success 'right alignment formatting at the nth column' '
-   git log --pretty="format:%h %>|(40)%s" >actual &&
+test_expect_success 'right alignment formatting at the nth column' "
+   git log --pretty='format:%h %>|(40)%s' >actual &&
# complete the incomplete line at the end
echo >>actual &&
qz_to_tab_space <<\EOF >expected &&
-fa33ab1  message two
-7cd6c63  message one
-1711bf9  add bar
-af20c06  initial
+$head1  message two
+$head2  message one
+$head3  add bar
+$head4  initial
 EOF
test_cmp expected actual
-'
+"
 
 test_expect_success 'right alignment formatting with no padding' '
git log --pretty="format:%>(1)%s" >actual &&
@@ -234,18 +238,18 @@ EOF
test_cmp expected actual
 '
 
-test_expect_success 'center alignment formatting at the nth column' '
-   git log --pretty="format:%h %><|(40)%s" >actual &&
+test_expect_success 'center alignment formatting at the nth column' "
+   git log --pretty='format:%h %><|(40)%s' >actual &&
# complete the incomplete line at the end
echo >>actual &&
qz_to_tab_space <<\EOF >expected &&
-fa33ab1   message two  Z
-7cd6c63   message one  Z
-1711bf9 add barZ
-af20c06 initialZ
+$head1   message two  Z
+$head2   message one  Z
+$head3 add barZ
+$head4 initialZ
 EOF
test_cmp expected actual
-'
+"
 
 test_expect_success 'center alignment formatting with no padding' '
git log --pretty="format:%><(1)%s" >actual &&
-- 
1.8.3.1.15.g5c23c1e

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 5/5] pretty: --format output should honor logOutputEncoding

2013-06-26 Thread Alexey Shumkin
One can set an alias
$ git config [--global] alias.lg "log --graph 
--pretty=format:'%Cred%h%Creset
-%C(yellow)%d%Creset %s %Cgreen(%cd) %C(bold blue)<%an>%Creset'
--abbrev-commit --date=local"

to see the log as a pretty tree (like *gitk* but in a terminal).

However, log messages written in an encoding i18n.commitEncoding which differs
from terminal encoding are shown corrupted even when i18n.logOutputEncoding
and terminal encoding are the same (e.g. log messages committed on a Cygwin box
with Windows-1251 encoding seen on a Linux box with a UTF-8 encoding and vice 
versa).

To simplify an example we can say the following two commands are expected
to give the same output to a terminal:

$ git log --oneline --no-color
$ git log --pretty=format:'%h %s'

However, the former pays attention to i18n.logOutputEncoding
configuration, while the latter does not when it formats "%s".

The same corruption is true for
$ git diff --submodule=log
and
$ git rev-list --pretty=format:%s HEAD
and
$ git reset --hard

This patch makes pretty --format honor logOutputEncoding when it formats
log message.

Signed-off-by: Alexey Shumkin 
---
 builtin/reset.c  |  5 -
 builtin/rev-list.c   |  1 +
 builtin/shortlog.c   |  1 +
 log-tree.c   |  1 +
 submodule.c  |  1 +
 t/t4041-diff-submodule-option.sh | 10 +-
 t/t4205-log-pretty-formats.sh| 34 +-
 t/t6006-rev-list-format.sh   |  8 
 t/t7102-reset.sh |  2 +-
 9 files changed, 35 insertions(+), 28 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index 6032131..afa6e02 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -93,10 +93,12 @@ static int reset_index(const unsigned char *sha1, int 
reset_type, int quiet)
 static void print_new_head_line(struct commit *commit)
 {
const char *hex, *body;
+   char *msg;
 
hex = find_unique_abbrev(commit->object.sha1, DEFAULT_ABBREV);
printf(_("HEAD is now at %s"), hex);
-   body = strstr(commit->buffer, "\n\n");
+   msg = logmsg_reencode(commit, NULL, get_log_output_encoding());
+   body = strstr(msg, "\n\n");
if (body) {
const char *eol;
size_t len;
@@ -107,6 +109,7 @@ static void print_new_head_line(struct commit *commit)
}
else
printf("\n");
+   logmsg_free(msg, commit);
 }
 
 static void update_index_from_diff(struct diff_queue_struct *q,
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 67701be..a5ec30d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -111,6 +111,7 @@ static void show_commit(struct commit *commit, void *data)
ctx.date_mode = revs->date_mode;
ctx.date_mode_explicit = revs->date_mode_explicit;
ctx.fmt = revs->commit_format;
+   ctx.output_encoding = get_log_output_encoding();
pretty_print_commit(&ctx, commit, &buf);
if (revs->graph) {
if (buf.len) {
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 1fd6f8a..1434f8f 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -137,6 +137,7 @@ void shortlog_add_commit(struct shortlog *log, struct 
commit *commit)
ctx.subject = "";
ctx.after_subject = "";
ctx.date_mode = DATE_NORMAL;
+   ctx.output_encoding = get_log_output_encoding();
pretty_print_commit(&ctx, commit, &ufbuf);
buffer = ufbuf.buf;
} else if (*buffer) {
diff --git a/log-tree.c b/log-tree.c
index 1946e9c..5277d3e 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -616,6 +616,7 @@ void show_log(struct rev_info *opt)
ctx.fmt = opt->commit_format;
ctx.mailmap = opt->mailmap;
ctx.color = opt->diffopt.use_color;
+   ctx.output_encoding = get_log_output_encoding();
pretty_print_commit(&ctx, commit, &msgbuf);
 
if (opt->add_signoff)
diff --git a/submodule.c b/submodule.c
index 1821a5b..78734e1 100644
--- a/submodule.c
+++ b/submodule.c
@@ -226,6 +226,7 @@ static void print_submodule_summary(struct rev_info *rev, 
FILE *f,
while ((commit = get_revision(rev))) {
struct pretty_print_context ctx = {0};
ctx.date_mode = rev->date_mode;
+   ctx.output_encoding = get_log_output_encoding();
strbuf_setlen(&sb, 0);
strbuf_addstr(&sb, line_prefix);
if (commit->object.flags & SYMMETRIC_LEFT) {
diff --git a/t/t4041-diff-submodule-option.sh b/t/t4041-diff-submodule-option.sh
index 2a7877d..0a4f496 100755
--- a/t/t4041-diff-submodule-option.sh
+++ b/t/t4041-diff-submodule-option.sh
@@ -94,7 +94,7 @@ test_expect_success 'diff.submodule does not affect plumbing' 
'
 commit_file sm1 &&
 head2=$(add_file sm1 foo3)
 
-test_ex

[PATCH v7 4/5] pretty: Add failing tests: --format output should honor logOutputEncoding

2013-06-26 Thread Alexey Shumkin
One can set an alias
$ git config alias.lg "log --graph --pretty=format:'%Cred%h%Creset
-%C(yellow)%d%Creset %s %Cgreen(%cd) %C(bold blue)<%an>%Creset'
--abbrev-commit --date=local"

to see the log as a pretty tree (like *gitk* but in a terminal).

However, log messages written in an encoding i18n.commitEncoding which differs
from terminal encoding are shown corrupted even when i18n.logOutputEncoding
and terminal encoding are the same (e.g. log messages committed on a Cygwin box
with Windows-1251 encoding seen on a Linux box with a UTF-8 encoding and vice 
versa).

To simplify an example we can say the following two commands are expected
to give the same output to a terminal:

$ git log --oneline --no-color
$ git log --pretty=format:'%h %s'

However, the former pays attention to i18n.logOutputEncoding
configuration, while the latter does not when it formats "%s".

The same corruption is true for
$ git diff --submodule=log
and
$ git rev-list --pretty=format:%s HEAD
and
$ git reset --hard

This patch adds failing tests for the next patch that fixes them.

Signed-off-by: Alexey Shumkin 
---
 t/t4041-diff-submodule-option.sh |  35 +
 t/t4205-log-pretty-formats.sh| 149 ---
 t/t6006-rev-list-format.sh   |  83 +++---
 t/t7102-reset.sh |  29 +++-
 4 files changed, 199 insertions(+), 97 deletions(-)

diff --git a/t/t4041-diff-submodule-option.sh b/t/t4041-diff-submodule-option.sh
index 32d4a60..2a7877d 100755
--- a/t/t4041-diff-submodule-option.sh
+++ b/t/t4041-diff-submodule-option.sh
@@ -1,6 +1,7 @@
 #!/bin/sh
 #
 # Copyright (c) 2009 Jens Lehmann, based on t7401 by Ping Yin
+# Copyright (c) 2011 Alexey Shumkin (+ non-UTF-8 commit encoding tests)
 #
 
 test_description='Support for verbose submodule differences in git diff
@@ -10,6 +11,9 @@ This test tries to verify the sanity of the --submodule 
option of git diff.
 
 . ./test-lib.sh
 
+# String "added" in German (translated with Google Translate), encoded in 
UTF-8,
+# used in sample commit log messages in add_file() function below.
+added=$(printf "hinzugef\303\274gt")
 add_file () {
(
cd "$1" &&
@@ -19,7 +23,8 @@ add_file () {
echo "$name" >"$name" &&
git add "$name" &&
test_tick &&
-   git commit -m "Add $name" || exit
+   msg_added_iso88591=$(echo "Add $name ($added $name)" | 
iconv -f utf-8 -t iso-8859-1) &&
+   git -c 'i18n.commitEncoding=iso-8859-1' commit -m 
"$msg_added_iso88591"
done >/dev/null &&
git rev-parse --short --verify HEAD
)
@@ -89,29 +94,29 @@ test_expect_success 'diff.submodule does not affect 
plumbing' '
 commit_file sm1 &&
 head2=$(add_file sm1 foo3)
 
-test_expect_success 'modified submodule(forward)' '
+test_expect_failure 'modified submodule(forward)' '
git diff-index -p --submodule=log HEAD >actual &&
cat >expected <<-EOF &&
Submodule sm1 $head1..$head2:
- > Add foo3
+ > Add foo3 ($added foo3)
EOF
test_cmp expected actual
 '
 
-test_expect_success 'modified submodule(forward)' '
+test_expect_failure 'modified submodule(forward)' '
git diff --submodule=log >actual &&
cat >expected <<-EOF &&
Submodule sm1 $head1..$head2:
- > Add foo3
+ > Add foo3 ($added foo3)
EOF
test_cmp expected actual
 '
 
-test_expect_success 'modified submodule(forward) --submodule' '
+test_expect_failure 'modified submodule(forward) --submodule' '
git diff --submodule >actual &&
cat >expected <<-EOF &&
Submodule sm1 $head1..$head2:
- > Add foo3
+ > Add foo3 ($added foo3)
EOF
test_cmp expected actual
 '
@@ -138,25 +143,25 @@ head3=$(
git rev-parse --short --verify HEAD
 )
 
-test_expect_success 'modified submodule(backward)' '
+test_expect_failure 'modified submodule(backward)' '
git diff-index -p --submodule=log HEAD >actual &&
cat >expected <<-EOF &&
Submodule sm1 $head2..$head3 (rewind):
- < Add foo3
- < Add foo2
+ < Add foo3 ($added foo3)
+ < Add foo2 ($added foo2)
EOF
test_cmp expected actual
 '
 
 head4=$(add_file sm1 foo4 foo5)
-test_expect_success 'modified submodule(backward and forward)' '
+test_expect_failure 'modified submodule(backward and forward)' '
git diff-index -p --submodule=log HEAD >actual &&
cat >expected <<-EOF &&
Submodule sm1 $head2...$head4:
- > Add foo5
- > Add foo4
- < Add foo3
- < Add foo2
+ > Add foo5 ($added foo5)
+ > Add foo4 ($added foo4)
+ < Add foo3 ($added foo3)
+ < Add foo2 ($added foo2)
EOF
test_cmp expected actual
 '
diff --git a/t/t4205

Re: [RFC/PATCH 0/1] cygwin: Remove the Win32 l/stat() functions

2013-06-26 Thread Torsten Bögershausen
On 2013-06-25 23.18, Junio C Hamano wrote:
> Johannes Sixt  writes:
> 
>> Some context: This is about a patch by Ramsay that removes the
>> "schizophrenic lstat" hack for Cygwin. Junio, can you please queue that
>> patch in pu?
> 
> Sure.  Thanks.

First of all,
thanks for the work.

Here some "benchmark" results, 
(The test run of the test suite did the same amout of time).

But:
git status -uno in real life takes double the time,
git 1.8.3 compared against "pu with the vanilla l/stat"
   
1 second ->  2 seconds on linux kernel
0.2 seconds -> 0.4 seconds on git.git 

Do we have any known problems with the current implementation ?
Does speed matter ?

One vote to keep the special cygwin functions.
(And have a look how to improve the core.filemode)

/Torsten

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/5] Reroll patches against v1.8.3.1

2013-06-26 Thread Junio C Hamano
Alexey Shumkin  writes:

> On Tue, Jun 25, 2013 at 12:28:02PM -0700, Junio C Hamano wrote:
> ...
> If someone can do the same with latin1, I'd be happy.
> ...
> But today I've taken a look to Cygwin's locales more closely and found
> out that I've used incorrect encoding name (`iso88595` instead of "canonic"
> `iso-8859-5` that Cygwin has and "understands")
>
> Nevertheless, as I've already said that is not a Russian locale specific
> issue.
> The problem in tests for me now is a language (that uses iso-8859-1
> encoding) I do not speak or even write ;)

Many of the people on thee list don't, either, and that is perfectly
OK.

For the purpose of the test, "áëîõù" should be sufficient. In a
sense, using such an artificial string would make it even clearer
than a real message that what we are testing, no?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: QNX support

2013-06-26 Thread David Ondřich
Hi,

sorry for long delay, I was busy on some important project so far. However, 
we’ve found out really bad thing recently.

On Feb 21, 2013, at 7:04 PM, "Kraai, Matt"  wrote:
> Junio C Hamano writes:
>> "David Ondřich"  writes:
>>> I've read [1] recently, there's been some QNX port being
>>> initiated. Does that involve also old versions of QNX 4?
> 
> No, I haven't been working on QNX 4 support.  I've been targeting QNX 6.3.2, 
> with a little testing on QNX 6.5.0.  I doubt what I've done would work on QNX 
> 4 since it's so different from QNX 6.

It certainly wouldn't work, since there’s limitation to 48 characters for 
filename in filesystem – it can be beaten up by changes in git index / pack / 
keep mechanism but it requires on-the-fly transformation when replicating a 
repository from / to other place. I've thought we would be able to implement 
such translation but what came up was its cost would be too high. It is 
definitely doable but for us it's currently the showstopper.

dond--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git diff returns fatal error with core.safecrlf is set to true.

2013-06-26 Thread Junio C Hamano
Torsten Bögershausen  writes:

>> +++ b/diff.c
>> @@ -2647,6 +2647,10 @@ static int diff_populate_gitlink(struct diff_filespec 
>> *s, int size_only)
>>  int diff_populate_filespec(struct diff_filespec *s, int size_only)
>>  {
>>  int err = 0;
>> +enum safe_crlf crlf_warn = (safe_crlf != SAFE_CRLF_FAIL
>> +? safe_crlf
>> +: SAFE_CRLF_WARN);
>
> Thanks, 
> Does it makes sense to write it the other way around?
>
> enum safe_crlf crlf_warn = (safe_crlf == SAFE_CRLF_FAIL 
>? SAFE_CRLF_WARN 
>: safe_crlf);

I didn't see much difference either way, but between "FAIL needs to
be demoted to WARN, everything else goes as-is" and the original "We
do not care about anything other than FAIL, so use it as-is, but
demote FAIL to WARN", yours look shorter.  Will replace.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [submodule] Add depth to submodule update

2013-06-26 Thread Fredrik Gustafsson
On Wed, Jun 26, 2013 at 12:11:32AM +0200, Heiko Voigt wrote:
> On Tue, Jun 25, 2013 at 12:49:25AM +0200, Fredrik Gustafsson wrote:
> > Used only when a clone is initialized. This is useful when the submodule(s)
> > are huge and you're not really interested in anything but the latest commit.
> > 
> > Signed-off-by: Fredrik Gustafsson 
> 
> I this is a valid use case. But this option only makes sense when a
> submodule is newly cloned so I am not sure whether submodule update is
> the correct place. Let me think about this a little more. Since we do
> not have any extra command that initiates the clone this is probably the
> only place we can put this option. But at the moment it does not feel
> completely right.
> 
> Apart from that the code looks good. If the user does a checkout of a
> revision that was not fetched submodule update will error out the same
> way as if someone forgot to push his submodule changes. So that should
> not be a problem.
> 

I agree and would love to say that I've a more beautiful solution, but
I haven't.

The only other solution I can think about is to add a git submodule
clone that will do only clones of non-cloned submodules.

I'm no UI expert so I don't know what's best. Maybe that's more
intuitive.

-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iv...@iveqy.com
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [submodule] Add depth to submodule update

2013-06-26 Thread Junio C Hamano
Heiko Voigt  writes:

> On Tue, Jun 25, 2013 at 12:49:25AM +0200, Fredrik Gustafsson wrote:
>> Used only when a clone is initialized. This is useful when the submodule(s)
>> are huge and you're not really interested in anything but the latest commit.
>> 
>> Signed-off-by: Fredrik Gustafsson 
>
> I this is a valid use case. But this option only makes sense when a
> submodule is newly cloned so I am not sure whether submodule update is
> the correct place. Let me think about this a little more. Since we do
> not have any extra command that initiates the clone this is probably the
> only place we can put this option. But at the moment it does not feel
> completely right.

I could imagine why people would not want to truncate the history
when they "submodule update" a submodule that has been already
initialized and cloned long time ago, but the new option is ignored
in the patch for an already cloned module, so that is not a problem.

The only possible confusion factor I can see is that the option is
ignored silently, but I do not think it is a grave enough offence to
error out when the user says "git submodule update --depth=N $path"
for a submodule at $path that has already been cloned.  It may not
even deserve a wraning, so in that sense the patch may be fine as-is.

> Apart from that the code looks good. If the user does a checkout of a
> revision that was not fetched submodule update will error out the same
> way as if someone forgot to push his submodule changes. So that should
> not be a problem.

True.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 0/5] Reroll patches against v1.8.3.1

2013-06-26 Thread Junio C Hamano
Alexey Shumkin  writes:

> v7 of this patch series includes the following changes against v6:
> 1. [PATCH v7 1/5] t6006 (rev-list-format): don't hardcode SHA-1 in expected 
> outputs
>   untouched
> 2. [PATCH v7 2/5] t7102 (reset): don't hardcode SHA-1 in expected outputs
>   untouched
> 3. [PATCH v7 3/5] t4205 (log-pretty-formats): don't hardcode SHA-1 in 
> expected outputs
>   untouched
> 4. [PATCH v7 4/5] pretty: Add failing tests: --format output should honor 
> logOutputEncoding
> cp1251 encoding changed to iso-8859-1 encoding already used in tests
> Test commit messages contain words ("changed" and "added") in German
> (which is covered by iso-8859-1 encoding)
> They are translated from English (verified in Russian) with Google 
> Translate
> 5. [PATCH v7 5/5] pretty: --format output should honor logOutputEncoding
> builtin/reset.c:
> "const char ..., *msg;" declaration replaced with "char *msg;"
> to avoid compiler warning on the line "logmsg_free(msg, commit);"
>
> P.S.
> It's all started here 
> [http://thread.gmane.org/gmane.comp.version-control.git/177634]

Thanks for a quick reroll and reference to previous threads in the
cover letter.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v15 03/16] quote.c: substitute path_relative with relative_path

2013-06-26 Thread Junio C Hamano
Jiang Xin  writes:

> Substitute the function path_relative in quote.c with the function
> relative_path. Function relative_path can be treated as an enhanced
> and robust version of path_relative.
> ...
> And if prefix has no trailing slash, path_relative can not work properly
> either. But since prefix always has a trailing slash, so it's not a
> problem.

Nicely explained.

>  static void write_name(const char* name, size_t len)
>  {
> - write_name_quoted_relative(name, len, prefix, prefix_len, stdout,
> - line_terminator);
> + /* Since prefix_len is ignored in write_name_quoted_relative, we
> +  * should turn off prefix here in case of running "git ls-files"
> +  * with "--full-name" option */
> + write_name_quoted_relative(name, len, prefix_len ? prefix : NULL,
> + prefix_len, stdout, line_terminator);
>  }

I'd tweak this function and write_naem_quoted_relative(), and
explain the reason for doing so in the log message.  Please check
what I'll push out on 'pu' later today.

-- >8 --
Subject: [PATCH] fix write_name() and write_name_quoted_relative() signature

The write_name_quoted_relative() function used to accept its two
parameters as counted strings.  Since it now uses relative_path(),
which requires both input strings to be NUL-terminated, the API of
this function need to be audited carefully.

Luckily, the only one caller is write_name() in builtin/ls-files.c,
and it in turn has only three callers.  They pass the string to be
made relative at this function and all of these strings happen to be
NUL terminated.  We can safely lose "len" parameter of write_name(),
and write_name_quoted_relative() can safely lose the length of its
name parameter as well.

The "prefix_len" parameter of write_name_quoted_relative() is either
0 (when "ls-files --full-name" is used), or the length of the prefix
string (i.e. the path to the current subdirectory).  By checking the
"--full-name" case in the caller, i.e. write_name(), we can make
write_name_quoted_relative() not to require the prefix as a counted
string.

Signed-off-by: Junio C Hamano 
---
 builtin/ls-files.c | 17 -
 quote.c|  3 +--
 quote.h|  5 ++---
 3 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 247a8a4..d87e136 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -46,13 +46,12 @@ static const char *tag_modified = "";
 static const char *tag_skip_worktree = "";
 static const char *tag_resolve_undo = "";
 
-static void write_name(const char* name, size_t len)
+static void write_name(const char *name)
 {
-   /* Since prefix_len is ignored in write_name_quoted_relative, we
-* should turn off prefix here in case of running "git ls-files"
-* with "--full-name" option */
-   write_name_quoted_relative(name, len, prefix_len ? prefix : NULL,
-   prefix_len, stdout, line_terminator);
+   /* (prefix_len == 0) is for "--full-name" output */
+   write_name_quoted_relative(name,
+  prefix_len ? prefix : NULL,
+  stdout, line_terminator);
 }
 
 static void show_dir_entry(const char *tag, struct dir_entry *ent)
@@ -66,7 +65,7 @@ static void show_dir_entry(const char *tag, struct dir_entry 
*ent)
return;
 
fputs(tag, stdout);
-   write_name(ent->name, ent->len);
+   write_name(ent->name);
 }
 
 static void show_other_files(struct dir_struct *dir)
@@ -166,7 +165,7 @@ static void show_ce_entry(const char *tag, struct 
cache_entry *ce)
   find_unique_abbrev(ce->sha1,abbrev),
   ce_stage(ce));
}
-   write_name(ce->name, ce_namelen(ce));
+   write_name(ce->name);
if (debug_mode) {
printf("  ctime: %d:%d\n", ce->ce_ctime.sec, ce->ce_ctime.nsec);
printf("  mtime: %d:%d\n", ce->ce_mtime.sec, ce->ce_mtime.nsec);
@@ -199,7 +198,7 @@ static void show_ru_info(void)
printf("%s%06o %s %d\t", tag_resolve_undo, ui->mode[i],
   find_unique_abbrev(ui->sha1[i], abbrev),
   i + 1);
-   write_name(path, len);
+   write_name(path);
}
}
 }
diff --git a/quote.c b/quote.c
index 64ff344..6af3c83 100644
--- a/quote.c
+++ b/quote.c
@@ -312,8 +312,7 @@ void write_name_quotedpfx(const char *pfx, size_t pfxlen,
fputc(terminator, fp);
 }
 
-void write_name_quoted_relative(const char *name, size_t len,
-   const char *prefix, size_t prefix_len,
+void write_name_quoted_relative(const char *name, const char *prefix,
FILE *fp, int terminator)
 {
struct strbuf sb = STRBUF_INIT;
diff --git a/quote.h b/quote.h
index 133155a..8586bcd 100644
--- a/quote.h
+++ b/quote.h
@@ -60,9 +60,8 @@ e

Re: [PATCH v15 02/16] path.c: refactor relative_path(), not only strip prefix

2013-06-26 Thread Junio C Hamano
Jiang Xin  writes:

> Original design of relative_path() is simple, just strip the prefix
> (*base) from the absolute path (*abs). In most cases, we need a real
> relative path, such as: ../foo, ../../bar. That's why there is another
> reimplementation (path_relative()) in quote.c.
>
> Borrowed some codes from path_relative() in quote.c to refactor

s/Borrowed/Borrow/; will locally amend.

> relative_path() in path.c, so that it could return real relative path,
> and user can reuse this function without reimplement his/her own.

s/reimplement/&ing/; will locally amend.

> The function path_relative() in quote.c will be substituted, and I
> would use the new relative_path() function when implement the

s/implement/&ing/; will locally amend.

> interactive git-clean later.
>
> Different results for relative_path() before and after this refactor:

I'd remove the rows that show the same result in the latter two
columns (will locally amend, if you do not object).

> abs path  base path  relative (original)  relative (refactor)
>   =  ===  ===
> /a/b  /a/b   ../
> /a/b/ /a/b   ../
> /a/a/b/  /a   ../
> / /a/b/  /../../
> /a/c  /a/b/  /a/c ../c
> /x/y  /a/b/  /x/y ../../x/y
>
> a/b/  a/b/   ../
> a/b/  a/b../
> a a/ba../
> x/y   a/b/   x/y  ../../x/y
> a/c   a/ba/c  ../c
>
> (empty)   (null) (empty)  ./
> (empty)   (empty)(empty)  ./
> (empty)   /a/b   (empty)  ./
> (null)(null) (null)   ./
> (null)(empty)(null)   ./
> (null)/a/b   (segfault)   ./

> You may notice that return value "." has been changed to "./".
> It is because:
>
>  * Function quote_path_relative() in quote.c will show the relative
>path as "./" if abs(in) and base(prefix) are the same.
>
>  * Function relative_path() is called only once (in setup.c), and
>it will be OK for the return value as "./" instead of ".".

Good explanation.  Thanks.

> Signed-off-by: Jiang Xin 
> Signed-off-by: Junio C Hamano 
> ---
>  cache.h   |   2 +-
>  path.c| 116 
> +-
>  setup.c   |   5 ++-
>  t/t0060-path-utils.sh |  39 -
>  test-path-utils.c |   4 +-
>  5 files changed, 113 insertions(+), 53 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index dd0fb..c2886 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -758,7 +758,7 @@ int is_directory(const char *);
>  const char *real_path(const char *path);
>  const char *real_path_if_valid(const char *path);
>  const char *absolute_path(const char *path);
> -const char *relative_path(const char *abs, const char *base);
> +const char *relative_path(const char *in, const char *prefix, struct strbuf 
> *sb);
>  int normalize_path_copy(char *dst, const char *src);
>  int longest_ancestor_length(const char *path, struct string_list *prefixes);
>  char *strip_path_suffix(const char *path, const char *suffix);
> diff --git a/path.c b/path.c
> index 04ff..fbe52c9b 100644
> --- a/path.c
> +++ b/path.c
> @@ -441,42 +441,100 @@ int adjust_shared_perm(const char *path)
>   return 0;
>  }
>  
> +/*
> + * Give path as relative to prefix.
> + *
> + * The strbuf may or may not be used, so do not assume it contains the
> + * returned path.
> + */
> +const char *relative_path(const char *in, const char *prefix,
> +   struct strbuf *sb)
>  {
> + int in_len = in ? strlen(in) : 0;
> + int prefix_len = prefix ? strlen(prefix) : 0;
> + int in_off = 0;
> + int prefix_off = 0;
> + int i = 0,j = 0;

s/0,j/0, j/; will locally amend.

> +
> + if (!in_len)
> + return "./";
> + else if (!prefix_len)
> + return in;
> +
> + while (i < prefix_len && j < in_len && prefix[i] == in[j]) {
> + if (is_dir_sep(prefix[i])) {
> + while (is_dir_sep(prefix[i]))
>   i++;
> + while (is_dir_sep(in[j]))
>   j++;

These i++/j++ are not checked for lengths of strings, but it is OK
because is_dir_sep() will never return true for NUL.

> + prefix_off = i;
> + in_off = j;

(mental note) *_off keeps track of the last directory separator of
the common part, i.e. where the difference begins.

> + } else {
> + i++;
> + j++;
> + }
> + }
> +
> + if (
> + /* "prefix" seems like prefix of "in" */
> + i >= prefix_len &&

So shouldn

Re: [PATCH v15 01/16] test: add test cases for relative_path

2013-06-26 Thread Junio C Hamano
Jiang Xin  writes:

> Add subcommand "relative_path" in test-path-utils, and add test cases
> in t0060.
>
> Johannes tested this commit on Windows, and found that some relative_path

"this commit", or "an earlier version of this patch"?  I am guessing
it is the latter (if so, I can easily amend locally without a need
for rerolling).

> tests should be skipped on Windows. This is because the bash on Windows
> rewrites arguments of regular Windows programs, such as git and the
> test helpers, if the arguments look like absolute POSIX paths. As a
> consequence, the actual tests performed are not what the tests scripts
> expect.
>
> The tests that need *not* be skipped are those where the two paths passed
> to 'test-path-utils relative_path' have the same prefix and the result is
> expected to be a relative path. This is because the rewriting changes
> "/a/b" to "D:/Src/MSysGit/a/b", and when both inputs are extended the same
> way, this just cancels out in the relative path computation.
>
> Signed-off-by: Jiang Xin 
> Signed-off-by: Johannes Sixt 
> Signed-off-by: Junio C Hamano 

I somehow lost track, but does the above list of sign-offs reflect
the origins of the changes contained in this patch, or is the second
one meant to be helped-by or something (if so, I can easily amend
locally without a need for rerolling)?

> +relative_path /  /a/b/   /   POSIX
> +relative_path /a/c   /a/b/   /a/cPOSIX
> +relative_path /a/c   /a/b/a/cPOSIX

(mental note, not a complaint): These are notable, as some may
expect to see "..", "../.."  and "../c" for these cases, but the
rule is "ignore base and return it if it is absolute", so they are
understandable.

> +relative_path a  a/b a   # TODO: should be: ..
> +relative_path x/ya/b x/y # TODO: should be: ../../x/y
> +relative_path a/ca/b a/c # TODO: should be: ../c

(mental note): OK.  Let's see how they evolve in later patches.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v15 04/16] Refactor quote_path_relative, remove unused params

2013-06-26 Thread Junio C Hamano
Jiang Xin  writes:

> After substitute path_relative() in quote.c with relative_path() from
> path.c, parameters (such as len and prefix_len) are obsolete in function
> quote_path_relative(). Remove unused parameters and change the order of
> parameters for quote_path_relative() function.
>
> Signed-off-by: Jiang Xin 
> Signed-off-by: Junio C Hamano 
> ---

OK.  I'll amend the log message a bit to say that we verified that
this conversion is safe.

> diff --git a/quote.c b/quote.c
> index 64ff3..ebb8 100644

You seem to be using unusually short abbrev length.

Please don't, at least in format-patch output.

"ebb8" may be unique within your repository, but may not be unique
in repositories of other people who attempt to apply your patches.

Offtopic.  Fixing this, without forcing you to use the default
abbrev length everywhere, _might_ require a bit of changes to the
configuration mechanism, or "git format-patch", or both.  I've also
seen in some other thread that --numstat was used in addition to the
usual --stat, which should not be added to format-patch output only
because the user configured it for other "diff/log" uses, which may
also need to be fixed in the same way.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v15 00/16] Interactive git clean

2013-06-26 Thread Junio C Hamano
People may have comments and improvements on the actual
"interactive" UI part, but I think the earlier parts up to "into two
phases" is ready for 'next'.  Let's queue them and have them
advance.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/16] documentation: add documentation for the bitmap format

2013-06-26 Thread Colby Ranger
> Pinning the bitmap index on the reverse index adds complexity (lookups
> are two-step: first find the entry in the reverse index, and then find
> the SHA1 in the index) and is measurably slower, in both loading and
> lookup times. Since Git doesn't have a memory problem, it's very hard
> to make an argument for design that is more complex and runs slower to
> save memory.

Sorting by SHA1 will generate a random distribution. This will require
you to inflate the entire bitmap on every fetch request, in order to
do the "contains" operation.  Sorting by pack offset allows us to
inflate only the bits we need as we are walking the graph, since they
are usually at the start of the bitmap.

What is the general size in bytes of the SHA1 sorted bitmaps?  If they
are much larger, the size of the bitmap has an impact on how fast you
can perform bitwise operations on them, which is important for fetch
when doing wants AND NOT haves.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/5] git-remote-mediawiki: new git bin-wrapper for developement

2013-06-26 Thread benoit . person
From: Benoit Person 

The introduction of the Git::Mediawiki package makes it impossible to test,
without installation, git-remote-mediawiki and git-mw.

Using a git bin-wrapper enables us to define proper $GITPERLLIB to force the
use of the developement version of the Git::Mediawiki package, bypassing its
installed version if any.

An alternate solution was to 'install' all the files required at each build
but it pollutes the toplevel with untracked files.

Signed-off-by: Benoit Person 
Signed-off-by: Matthieu Moy 

---
 contrib/mw-to-git/Makefile |  6 ++
 contrib/mw-to-git/git  | 25 +
 2 files changed, 31 insertions(+)
 create mode 100755 contrib/mw-to-git/git

diff --git a/contrib/mw-to-git/Makefile b/contrib/mw-to-git/Makefile
index a6f8b24..2a80d56 100644
--- a/contrib/mw-to-git/Makefile
+++ b/contrib/mw-to-git/Makefile
@@ -2,6 +2,12 @@
 # Copyright (C) 2013
 # Matthieu Moy 
 #
+# To build and test:
+#
+#   make:
+# ./git mw preview Some_page.mw
+# ./git clone mediawiki::http://example.com/wiki/
+#
 # To install, run Git's toplevel 'make install' then run:
 #
 #   make install
diff --git a/contrib/mw-to-git/git b/contrib/mw-to-git/git
new file mode 100755
index 000..25b60ad
--- /dev/null
+++ b/contrib/mw-to-git/git
@@ -0,0 +1,25 @@
+#!/bin/sh
+
+# git executable wrapper script for Git-Mediawiki to run tests without
+# installing all the scripts and perl packages.
+
+# based on $GIT_ROOT_DIR/wrap-for-bin.sh
+
+
+GIT_ROOT_DIR=../..
+GIT_EXEC_PATH=$(cd "$(dirname "$0")" && cd ${GIT_ROOT_DIR} && pwd)
+
+if test -n "$NO_SET_GIT_TEMPLATE_DIR"
+then
+   unset GIT_TEMPLATE_DIR
+else
+   GIT_TEMPLATE_DIR="$GIT_EXEC_PATH"'/templates/blt'
+   export GIT_TEMPLATE_DIR
+fi
+# Hack to make the `use lib` call works with multiple paths
+GITPERLLIB="$GIT_EXEC_PATH"'/contrib/mw-to-git:'"$GIT_EXEC_PATH"'/perl/blib/lib'
+GIT_TEXTDOMAINDIR="$GIT_EXEC_PATH"'/po/build/locale'
+PATH="$GIT_EXEC_PATH"'/contrib/mw-to-git:'"$GIT_EXEC_PATH"'/bin-wrappers:'"$PATH"
+export GIT_EXEC_PATH GITPERLLIB PATH GIT_TEXTDOMAINDIR
+
+exec "${GIT_EXEC_PATH}/git" "$@"
\ No newline at end of file
-- 
1.8.3.GIT

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 4/5] git-remote-mediawiki: Adding git-mw command

2013-06-26 Thread benoit . person
From: Benoit Person 

For now, git-remote-mediawiki is only a remote-helper. This patch adds a new
toolset script in which we will be able to build new tools for
git-remote-mediawiki.

This toolset uses a subcommand-mechanism to launch the proper action. For now
only the 'help' subcommand is implemented. It also provides some generic code
for the verbose and help command line options.

Signed-off-by: Benoit Person 
Signed-off-by: Matthieu Moy 

---
 contrib/mw-to-git/Makefile|  7 ++---
 contrib/mw-to-git/git-mw.perl | 60 +++
 2 files changed, 64 insertions(+), 3 deletions(-)
 create mode 100755 contrib/mw-to-git/git-mw.perl

diff --git a/contrib/mw-to-git/Makefile b/contrib/mw-to-git/Makefile
index 2a80d56..37afbbf 100644
--- a/contrib/mw-to-git/Makefile
+++ b/contrib/mw-to-git/Makefile
@@ -14,6 +14,7 @@
 
 GIT_MEDIAWIKI_PM=Git/Mediawiki.pm
 SCRIPT_PERL=git-remote-mediawiki.perl
+SCRIPT_PERL+=git-mw.perl
 GIT_ROOT_DIR=../..
 HERE=contrib/mw-to-git/
 
@@ -27,15 +28,15 @@ install_pm:
install $(GIT_MEDIAWIKI_PM) $(INSTLIBDIR)/$(GIT_MEDIAWIKI_PM)
 
 build:
-   $(MAKE) -C $(GIT_ROOT_DIR) SCRIPT_PERL=$(SCRIPT_PERL_FULL) \
+   $(MAKE) -C $(GIT_ROOT_DIR) SCRIPT_PERL="$(SCRIPT_PERL_FULL)" \
 build-perl-script
 
 install: install_pm
-   $(MAKE) -C $(GIT_ROOT_DIR) SCRIPT_PERL=$(SCRIPT_PERL_FULL) \
+   $(MAKE) -C $(GIT_ROOT_DIR) SCRIPT_PERL="$(SCRIPT_PERL_FULL)" \
 install-perl-script
 
 clean:
-   $(MAKE) -C $(GIT_ROOT_DIR) SCRIPT_PERL=$(SCRIPT_PERL_FULL) \
+   $(MAKE) -C $(GIT_ROOT_DIR) SCRIPT_PERL="$(SCRIPT_PERL_FULL)" \
 clean-perl-script
rm $(INSTLIBDIR)/$(GIT_MEDIAWIKI_PM)
 
diff --git a/contrib/mw-to-git/git-mw.perl b/contrib/mw-to-git/git-mw.perl
new file mode 100755
index 000..4a3e4a9
--- /dev/null
+++ b/contrib/mw-to-git/git-mw.perl
@@ -0,0 +1,60 @@
+#!/usr/bin/perl
+
+# Copyright (C) 2013
+# Benoit Person 
+# Celestin Matte 
+# License: GPL v2 or later
+
+# Set of tools for git repo with a mediawiki remote.
+# Documentation & bugtracker: https://github.com/moy/Git-Mediawiki/
+
+use strict;
+use warnings;
+
+use Getopt::Long;
+
+# By default, use UTF-8 to communicate with Git and the user
+binmode STDERR, ':encoding(UTF-8)';
+binmode STDOUT, ':encoding(UTF-8)';
+
+# Global parameters
+my $verbose = 0;
+sub v_print {
+   if ($verbose) {
+   return print {*STDERR} @_;
+   }
+   return;
+}
+
+my %commands = (
+   'help' =>
+   [\&help, {}, \&help]
+);
+
+# Search for sub-command
+my $cmd = $commands{'help'};
+for (0..@ARGV-1) {
+   if (defined $commands{$ARGV[$_]}) {
+   $cmd = $commands{$ARGV[$_]};
+   splice @ARGV, $_, 1;
+   last;
+   }
+};
+GetOptions( %{$cmd->[1]},
+   'help|h' => \&{$cmd->[2]},
+   'verbose|v'  => \$verbose);
+
+# Launch command
+&{$cmd->[0]};
+
+## Help Functions 
##
+
+sub help {
+   print {*STDOUT} <<'END';
+usage: git mw  
+
+git mw commands are:
+helpDisplay help information about git mw
+END
+   exit;
+}
-- 
1.8.3.GIT

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 0/5] git-remote-mediawiki: new tool to preview local changes without pushing

2013-06-26 Thread benoit . person
From: Benoit Person 

The #7 issue on git-mediawiki's issue tracker [1] states that the ability to
preview content without pushing would be a nice thing to have.

changes from v4:
  - Rebase on latest master
  - Typos in commits messages and code
  - Comments in Makefile
  - Factoring the conversion from relatives links to absolute ones in
`git mw preview`
  - Updating that "conversion" mechanism to not convert links with an
anchor '#'.
  - git-mw should be executable now.

changes from v3:
  - Rewrite all commit messages.
  - No more "\ No newline at end of file".
  - Rename GitMediawiki.pm into Git::Mediawiki.pm (so it moves GitMedawiki.pm
into Git/Mediawiki.pm).
  - Remove from the Makefile the copy_pm target (see below 'Add a bin-wrapper').
  - Use of 'install' insted of 'cp' in the Makefile.
  - Comment on the install_pm target in the Makefile.
  - Add a bin-wrapper for git to test scripts without 'make install'-ing them.
  - Move verbose option handling from previous v3-4/4 (introduction of preview
tool) into v4-4/5 (introduction of git-mw).
  - Refactor some code into subroutines to clean the global 'preview'
subroutine.
  - Rewrite some error messages to make them more concise while still giving
the same amount of information.
  - Use 'remote.${remote_name}.mwIDcontent' instead of 'mediawiki.IDContent'
as config item for the lookup ID used to combine template + new content.
  - Remove comments about what's going on in the preview subroutine.
  - Use 'clean_filename' (and not 'smudge_filename') in the preview tool to find
the correct mediawiki page name based on a filename.
  - Remove space/tab mixup in the 'help' subroutine.

changes from v2:
  - Add a way to test, without installation, code that uses GitMediawiki.pm.
  - Move more constants to GitMediawiki.pm
  - Remove the encapsulation of Git::config calls into a git_cmd_try one.
  - Remove the --blob option, distinction between files and blobs is now 
automatic.
  - Add a --verbose option to output more information on what's going on.
  - Rewrote the doc and the commit message.
  - Rewrote of the template retrieving code (see 'get_template' sub).
  - Use a configuration variable to define the content ID search in the
template. Default value set as 'bodyContent' since it seems more standard
than 'mw-content-text'.
  - Final content is now saved as utf-8 to solve encoding issues.
  - Perlcritic changes: 
- Update for loops style to a more perlish one.
- All 'print's specify their output streams.
--> Same useless warnings left in git-remote-mediawiki.perl after 
célestin's 
work and git-mw.perl after this patch :) .

changes from v1:
  - add new package GitMediawiki
- move some of git-remote-mediawiki functions into the package
- update git-remote-mediawiki to use those "moved" functions
- add a hacky-way to install it in the Makefile
- use it in the new git mw tool
  - add a way to give to the preview tool blobs as argument
  - add a fallback when the upstream's branch remote is not a mediawiki remote
  - update the `autoload` option to use `git web--browse` and not `xdg-open`
  - update the way we find the upstream's branch remote name

This serie is based on the 'master' branch merged with célestin's patch series.

[1] https://github.com/moy/Git-Mediawiki/issues/7

Benoit Person (5):
  git-remote-mediawiki: Introduction of Git::Mediawiki.pm
  git-remote-mediawiki: new git bin-wrapper for developement
  git-remote-mediawiki: factoring code between git-remote-mediawiki and
Git::Mediawiki
  git-remote-mediawiki: Adding git-mw command
  git-remote-mediawiki: Add preview subcommand into git mw.

 contrib/mw-to-git/Git/Mediawiki.pm  | 100 
 contrib/mw-to-git/Makefile  |  33 ++-
 contrib/mw-to-git/git   |  25 ++
 contrib/mw-to-git/git-mw.perl   | 368 
 contrib/mw-to-git/git-remote-mediawiki.perl |  85 +--
 5 files changed, 535 insertions(+), 76 deletions(-)
 create mode 100644 contrib/mw-to-git/Git/Mediawiki.pm
 create mode 100755 contrib/mw-to-git/git
 create mode 100755 contrib/mw-to-git/git-mw.perl

-- 
1.8.3.GIT

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/5] git-remote-mediawiki: Introduction of Git::Mediawiki.pm

2013-06-26 Thread benoit . person
From: Benoit Person 

Currently, the mw-to-git project contains only a remote helper
(git-remote-mediawiki.perl). To improve the user experience while
working with mediawiki remotes, new tools, designed for such cases,
should be created. To achieve this goal, the project needs a way to
share code between several scripts (remote helper, commands, ... ).

A perl package offers the best way to handle such case: Each script
can select what should be imported in its namespace.  The package
namespacing limits the use of side effects in the shared code.

An alternate solution is to concatenate a "toolset" file with each
*.perl when 'make'-ing the project. In that scheme, everything is
imported in the script's namespace. Plus, files should be renamed in
order to chain to Git's toplevel makefile. Hence, this solution is not
acceptable.

Signed-off-by: Benoit Person 
Signed-off-by: Matthieu Moy 

---
 contrib/mw-to-git/Git/Mediawiki.pm | 24 
 contrib/mw-to-git/Makefile | 24 +---
 2 files changed, 45 insertions(+), 3 deletions(-)
 create mode 100644 contrib/mw-to-git/Git/Mediawiki.pm

diff --git a/contrib/mw-to-git/Git/Mediawiki.pm 
b/contrib/mw-to-git/Git/Mediawiki.pm
new file mode 100644
index 000..805f42a
--- /dev/null
+++ b/contrib/mw-to-git/Git/Mediawiki.pm
@@ -0,0 +1,24 @@
+package Git::Mediawiki;
+
+use 5.008;
+use strict;
+use Git;
+
+BEGIN {
+
+our ($VERSION, @ISA, @EXPORT, @EXPORT_OK);
+
+# Totally unstable API.
+$VERSION = '0.01';
+
+require Exporter;
+
+@ISA = qw(Exporter);
+
+@EXPORT = ();
+
+# Methods which can be called as standalone functions as well:
+@EXPORT_OK = ();
+}
+
+1; # Famous last words
diff --git a/contrib/mw-to-git/Makefile b/contrib/mw-to-git/Makefile
index 1fb2424..a6f8b24 100644
--- a/contrib/mw-to-git/Makefile
+++ b/contrib/mw-to-git/Makefile
@@ -2,18 +2,36 @@
 # Copyright (C) 2013
 # Matthieu Moy 
 #
-## Build git-remote-mediawiki
+# To install, run Git's toplevel 'make install' then run:
+#
+#   make install
 
+GIT_MEDIAWIKI_PM=Git/Mediawiki.pm
 SCRIPT_PERL=git-remote-mediawiki.perl
 GIT_ROOT_DIR=../..
 HERE=contrib/mw-to-git/
 
 SCRIPT_PERL_FULL=$(patsubst %,$(HERE)/%,$(SCRIPT_PERL))
+INSTLIBDIR=$(shell $(MAKE) -C $(GIT_ROOT_DIR)/perl \
+-s --no-print-directory instlibdir)
 
 all: build
 
-build install clean:
+install_pm:
+   install $(GIT_MEDIAWIKI_PM) $(INSTLIBDIR)/$(GIT_MEDIAWIKI_PM)
+
+build:
+   $(MAKE) -C $(GIT_ROOT_DIR) SCRIPT_PERL=$(SCRIPT_PERL_FULL) \
+build-perl-script
+
+install: install_pm
$(MAKE) -C $(GIT_ROOT_DIR) SCRIPT_PERL=$(SCRIPT_PERL_FULL) \
-$@-perl-script
+install-perl-script
+
+clean:
+   $(MAKE) -C $(GIT_ROOT_DIR) SCRIPT_PERL=$(SCRIPT_PERL_FULL) \
+clean-perl-script
+   rm $(INSTLIBDIR)/$(GIT_MEDIAWIKI_PM)
+
 perlcritic:
perlcritic -2 *.perl
-- 
1.8.3.GIT

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 5/5] git-remote-mediawiki: Add preview subcommand into git mw.

2013-06-26 Thread benoit . person
From: Benoit Person 

In the current state, a user of git-remote-mediawiki can edit the markup text
locally, but has to push to the remote wiki to see how the page is rendererd.
Add a new 'git mw preview' command that allows rendering the markup text on
the remote wiki without actually pushing any change on the wiki.

This uses Mediawiki's API to render the markup and inserts it in an actual
HTML page from the wiki so that CSS can be rendered properly. Most links
should work when the page exists on the remote.

Signed-off-by: Benoit Person 
Signed-off-by: Matthieu Moy 

---
 contrib/mw-to-git/Git/Mediawiki.pm |   3 +-
 contrib/mw-to-git/git-mw.perl  | 310 -
 2 files changed, 311 insertions(+), 2 deletions(-)

diff --git a/contrib/mw-to-git/Git/Mediawiki.pm 
b/contrib/mw-to-git/Git/Mediawiki.pm
index 47fe4f4..d13c4df 100644
--- a/contrib/mw-to-git/Git/Mediawiki.pm
+++ b/contrib/mw-to-git/Git/Mediawiki.pm
@@ -19,7 +19,7 @@ require Exporter;
 
 # Methods which can be called as standalone functions as well:
 @EXPORT_OK = qw(clean_filename smudge_filename connect_maybe
-   EMPTY HTTP_CODE_OK);
+   EMPTY HTTP_CODE_OK HTTP_CODE_PAGE_NOT_FOUND);
 }
 
 # Mediawiki filenames can contain forward slashes. This variable decides by 
which pattern they should be replaced
@@ -30,6 +30,7 @@ use constant EMPTY => q{};
 
 # HTTP codes
 use constant HTTP_CODE_OK => 200;
+use constant HTTP_CODE_PAGE_NOT_FOUND => 404;
 
 sub clean_filename {
my $filename = shift;
diff --git a/contrib/mw-to-git/git-mw.perl b/contrib/mw-to-git/git-mw.perl
index 4a3e4a9..28df3ee 100755
--- a/contrib/mw-to-git/git-mw.perl
+++ b/contrib/mw-to-git/git-mw.perl
@@ -12,6 +12,14 @@ use strict;
 use warnings;
 
 use Getopt::Long;
+use URI::URL qw(url);
+use LWP::UserAgent;
+use HTML::TreeBuilder;
+
+use Git;
+use MediaWiki::API;
+use Git::Mediawiki qw(clean_filename connect_maybe
+   EMPTY HTTP_CODE_PAGE_NOT_FOUND);
 
 # By default, use UTF-8 to communicate with Git and the user
 binmode STDERR, ':encoding(UTF-8)';
@@ -26,9 +34,26 @@ sub v_print {
return;
 }
 
+# Preview parameters
+my $file_name = EMPTY;
+my $remote_name = EMPTY;
+my $preview_file_name = EMPTY;
+my $autoload = 0;
+sub file {
+   $file_name = shift;
+   return $file_name;
+}
+
 my %commands = (
'help' =>
-   [\&help, {}, \&help]
+   [\&help, {}, \&help],
+   'preview' =>
+   [\&preview, {
+   '<>' => \&file,
+   'output|o=s' => \$preview_file_name,
+   'remote|r=s' => \$remote_name,
+   'autoload|a' => \$autoload
+   }, \&preview_help]
 );
 
 # Search for sub-command
@@ -47,6 +72,288 @@ GetOptions( %{$cmd->[1]},
 # Launch command
 &{$cmd->[0]};
 
+# Preview Functions 

+
+sub preview_help {
+   print {*STDOUT} <<'END';
+USAGE: git mw preview [--remote|-r ] [--autoload|-a]
+  [--output|-o ] [--verbose|-v]
+   | 
+
+DESCRIPTION:
+Preview is an utiliy to preview local content of a mediawiki repo as if it was
+pushed on the remote.
+
+For that, preview searches for the remote name of the current branch's
+upstream if --remote is not set. If that remote is not found or if it
+is not a mediawiki, it lists all mediawiki remotes configured and asks
+you to replay your command with the --remote option set properly.
+
+Then, it searches for a file named 'filename'. If it's not found in
+the current dir, it will assume it's a blob.
+
+The content retrieved in the file (or in the blob) will then be parsed
+by the remote mediawiki and combined with a template retrieved from
+the mediawiki.
+
+Finally, preview will save the HTML result in a file. and autoload it
+in your default web browser if the option --autoload is present.
+
+OPTIONS:
+-r , --remote 
+If the remote is a mediawiki, the template and the parse engine
+used for the preview will be those of that remote.
+If not, a list of valid remotes will be shown.
+
+-a, --autoload
+Try to load the HTML output in a new tab (or new window) of your
+default web browser.
+
+-o , --output 
+Change the HTML output filename. Default filename is based on the
+input filename with its extension replaced by '.html'.
+
+-v, --verbose
+Show more information on what's going on under the hood.
+END
+   exit;
+}
+
+sub preview {
+   my $wiki;
+   my ($remote_url, $wiki_page_name);
+   my ($new_content, $template);
+   my $file_content;
+
+   if ($file_name eq EMPTY) {
+   die "Missing file argument, see `git mw help`\n";
+   }
+
+   v_print("### Selecting remote\n");
+   if ($remote_name eq EMPTY) {
+   $remote_name = find_upstream_rem

[PATCH v5 3/5] git-remote-mediawiki: factoring code between git-remote-mediawiki and Git::Mediawiki

2013-06-26 Thread benoit . person
From: Benoit Person 

For now, Git::Mediawiki contains nothing.

This first patch moves some of git-remote-mediawiki.perl's factorisable code
into Git::Mediawiki. In the same time, it removes the side effects of that code
and renames the fucntions and constants moved to expose a better API.

Signed-off-by: Benoit Person 
Signed-off-by: Matthieu Moy 

---
 contrib/mw-to-git/Git/Mediawiki.pm  | 77 +-
 contrib/mw-to-git/git-remote-mediawiki.perl | 85 +
 2 files changed, 89 insertions(+), 73 deletions(-)

diff --git a/contrib/mw-to-git/Git/Mediawiki.pm 
b/contrib/mw-to-git/Git/Mediawiki.pm
index 805f42a..47fe4f4 100644
--- a/contrib/mw-to-git/Git/Mediawiki.pm
+++ b/contrib/mw-to-git/Git/Mediawiki.pm
@@ -18,7 +18,82 @@ require Exporter;
 @EXPORT = ();
 
 # Methods which can be called as standalone functions as well:
-@EXPORT_OK = ();
+@EXPORT_OK = qw(clean_filename smudge_filename connect_maybe
+   EMPTY HTTP_CODE_OK);
+}
+
+# Mediawiki filenames can contain forward slashes. This variable decides by 
which pattern they should be replaced
+use constant SLASH_REPLACEMENT => '%2F';
+
+# Used to test for empty strings
+use constant EMPTY => q{};
+
+# HTTP codes
+use constant HTTP_CODE_OK => 200;
+
+sub clean_filename {
+   my $filename = shift;
+   $filename =~ s{@{[SLASH_REPLACEMENT]}}{/}g;
+   # [, ], |, {, and } are forbidden by MediaWiki, even URL-encoded.
+   # Do a variant of URL-encoding, i.e. looks like URL-encoding,
+   # but with _ added to prevent MediaWiki from thinking this is
+   # an actual special character.
+   $filename =~ s/[\[\]\{\}\|]/sprintf("_%%_%x", ord($&))/ge;
+   # If we use the uri escape before
+   # we should unescape here, before anything
+
+   return $filename;
+}
+
+sub smudge_filename {
+   my $filename = shift;
+   $filename =~ s{/}{@{[SLASH_REPLACEMENT]}}g;
+   $filename =~ s/ /_/g;
+   # Decode forbidden characters encoded in clean_filename
+   $filename =~ s/_%_([0-9a-fA-F][0-9a-fA-F])/sprintf('%c', hex($1))/ge;
+   return $filename;
+}
+
+sub connect_maybe {
+   my $wiki = shift;
+   if ($wiki) {
+   return $wiki;
+   }
+
+   my $remote_name = shift;
+   my $remote_url = shift;
+   my ($wiki_login, $wiki_password, $wiki_domain);
+
+   $wiki_login = Git::config("remote.${remote_name}.mwLogin");
+   $wiki_password = Git::config("remote.${remote_name}.mwPassword");
+   $wiki_domain = Git::config("remote.${remote_name}.mwDomain");
+
+   $wiki = MediaWiki::API->new;
+   $wiki->{config}->{api_url} = "${remote_url}/api.php";
+   if ($wiki_login) {
+   my %credential = (
+   'url' => $remote_url,
+   'username' => $wiki_login,
+   'password' => $wiki_password
+   );
+   Git::credential(\%credential);
+   my $request = {lgname => $credential{username},
+  lgpassword => $credential{password},
+  lgdomain => $wiki_domain};
+   if ($wiki->login($request)) {
+   Git::credential(\%credential, 'approve');
+   print {*STDERR} qq(Logged in mediawiki user 
"$credential{username}".\n);
+   } else {
+   print {*STDERR} qq(Failed to log in mediawiki user 
"$credential{username}" on ${remote_url}\n);
+   print {*STDERR} '  (error ' .
+   $wiki->{error}->{code} . ': ' .
+   $wiki->{error}->{details} . ")\n";
+   Git::credential(\%credential, 'reject');
+   exit 1;
+   }
+   }
+
+   return $wiki;
 }
 
 1; # Famous last words
diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 71baf8a..e40c034 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -14,6 +14,8 @@
 use strict;
 use MediaWiki::API;
 use Git;
+use Git::Mediawiki qw(clean_filename smudge_filename connect_maybe
+   EMPTY HTTP_CODE_OK);
 use DateTime::Format::ISO8601;
 use warnings;
 
@@ -23,9 +25,6 @@ binmode STDOUT, ':encoding(UTF-8)';
 
 use URI::Escape;
 
-# Mediawiki filenames can contain forward slashes. This variable decides by 
which pattern they should be replaced
-use constant SLASH_REPLACEMENT => '%2F';
-
 # It's not always possible to delete pages (may require some
 # privileges). Deleted pages are replaced with this content.
 use constant DELETED_CONTENT => "[[Category:Deleted]]\n";
@@ -40,8 +39,6 @@ use constant NULL_SHA1 => 
'';
 # Used on Git's side to reflect empty edit messages on the wiki
 use constant EMPTY_MESSAGE => '*Empty MediaWiki Message*';
 
-us

Re: [PATCH v5 5/5] git-remote-mediawiki: Add preview subcommand into git mw.

2013-06-26 Thread Matthieu Moy
benoit.per...@ensimag.fr writes:

> +do you want ? Use the -r option to specify the remote.

Not that it really matters, but there should be no space before ? in
English (although there is in French).

(Shouldn't prevent merging)

Other than that, the series looks good to me. Good work splitting
patches 1 to 4, the history looks much nicer now.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [submodule] Add depth to submodule update

2013-06-26 Thread Junio C Hamano
Fredrik Gustafsson  writes:

> On Wed, Jun 26, 2013 at 12:11:32AM +0200, Heiko Voigt wrote:
>> On Tue, Jun 25, 2013 at 12:49:25AM +0200, Fredrik Gustafsson wrote:
>> > Used only when a clone is initialized. This is useful when the submodule(s)
>> > are huge and you're not really interested in anything but the latest 
>> > commit.
>> > 
>> > Signed-off-by: Fredrik Gustafsson 
>> 
>> I this is a valid use case. But this option only makes sense when a
>> submodule is newly cloned so I am not sure whether submodule update is
>> the correct place. Let me think about this a little more. Since we do
>> not have any extra command that initiates the clone this is probably the
>> only place we can put this option. But at the moment it does not feel
>> completely right.
>> 
>> Apart from that the code looks good. If the user does a checkout of a
>> revision that was not fetched submodule update will error out the same
>> way as if someone forgot to push his submodule changes. So that should
>> not be a problem.
>
> I agree and would love to say that I've a more beautiful solution, but
> I haven't.
>
> The only other solution I can think about is to add a git
> submodule clone that will do only clones of non-cloned submodules.

The "update" subcommand already has "--init" to do "init && update",
and it would not complain if a given submodule is what you already
have shown interest in, so in that sense, I do not think what the
posted patch does is too bad---if it is already cloned, it just
ignores the depth altogether and makes sure the repository is there.
A separate "submodule clone" would only make it more cumbersome to
use, I suspect.

So let's queue the patch posted as-is for now; we can replace it
when/if somebody smarter than those who have spoken so far comes up
a more elegant approach.

The patch seems to lack any test on its own, by the way.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] remote: Add warnings about mixin --mirror and other remotes

2013-06-26 Thread Dennis Kaarsemaker
On zo, 2013-06-23 at 15:33 -0700, Junio C Hamano wrote:
> Dennis Kaarsemaker  writes:
> 
> > On zo, 2013-06-23 at 14:22 -0700, Junio C Hamano wrote:
> >> Dennis Kaarsemaker  writes:
> >> 
> >> > Equality for
> >> > wildcards is allowed and tested for, so do we really want to 'outlaw'
> >> > equality of non-wildcard refspecs?
> >> 
> >> I am not sure what you mean by "equality for wildcards is allowed".
> >> Do you mean this pair of remote definition is sane and not warned?
> >> 
> >>[remote "one"]
> >>fetch = refs/heads/*:refs/remotes/mixed/*
> >> 
> >>[remote "two"]
> >>fetch = refs/heads/*:refs/remotes/mixed/*
> >
> > I personally don't consider them very sane and didn't originally support
> > that. But this behavior is tested for in t5505-remote.sh test 27, which
> > started failing until I stopped warning for equal refspecs. This support
> > for "alt remotes" in prune was added by c175a7ad in 2008. The commit
> > message for that commit give a plausible reason for using them.
> 
> I actually do not read it that way.  What it wanted to do primarily
> was to avoid pruning "refs/remotes/alt/*" based on what it observed
> at the remote named "alt", when the refs fetched from that remote is
> set to update "refs/remotes/origin/*".
>
> The example in the log message is a special case where two
> physically different remotes are actually copies of a single logical
> repository, so in that narrow use case, it may be OK, but it is an
> unusual thing to do and we should "warn" about it, I think.

Apart from the exactly matching refspecs, does git in any other way
treat this as a special case?

> In any case, I've been assuming in this discussion "allow" is to
> silently accept, and overlaps are "warned" but not "rejected".  So
> if you meant by 'outlaw' to die and refuse to run, that is not what
> I meant.

Well, 1/3 is a warning on add, 3/3 is a warning and refusing to prune.
Should 3/3 do something else instead? Perhaps prompt for confirmation or
require some sort of --force?

-- 
Dennis Kaarsemaker
www.kaarsemaker.net

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rebase -i: fixup fixup! fixup!

2013-06-26 Thread Andrew Pimlott
Excerpts from Andrew Pimlott's message of Tue Jun 25 16:03:52 -0700 2013:
> Thomas's patch didn't do this: fixup! or squash! after the first is
> simply discarded, so you see:
> 
> pick d78c915 original
> fixup 0c6388e fixup! original
> fixup d15b556 fixup! original
> fixup 1e39bcd fixup! original
> 
> But it will be a simple change to keep all the fixup!s and squash!s.  I
> will do this (and try to make up for the carelessness of my previous
> patch).

In order to test this, I wrote a helper function to dump the rebase -i
todo list.  Would you like this introduced in its own patch, or
combined?  See below.

Andrew

---8<---
Subject: [PATCH] lib-rebase: set_cat_todo_editor

Add a helper for testing rebase -i todo lists.  This can be used to verify
the expected user experience, even for todo list changes that do not affect
the outcome of rebase.

Signed-off-by: Andrew Pimlott 
---
 t/lib-rebase.sh |   13 +
 1 file changed, 13 insertions(+)

diff --git a/t/lib-rebase.sh b/t/lib-rebase.sh
index 4b74ae4..d118dd6 100644
--- a/t/lib-rebase.sh
+++ b/t/lib-rebase.sh
@@ -66,6 +66,19 @@ EOF
chmod a+x fake-editor.sh
 }
 
+# After set_cat_todo_editor, rebase -i will write the todo list (ignoring
+# blank lines and comments) to stdout, and exit failure.
+
+set_cat_todo_editor () {
+   echo "#!$SHELL_PATH" >fake-editor.sh
+   cat >> fake-editor.sh <<\EOF
+grep "^[^#]" "$1"
+exit 1
+EOF
+   chmod a+x fake-editor.sh
+   test_set_editor "$(pwd)/fake-editor.sh"
+}
+
 # checks that the revisions in "$2" represent a linear range with the
 # subjects in "$1"
 test_linear_range () {
-- 
1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/1] cygwin: Remove the Win32 l/stat() functions

2013-06-26 Thread Ramsay Jones
Torsten Bögershausen wrote:
> On 2013-06-25 23.18, Junio C Hamano wrote:
>> Johannes Sixt  writes:
>>
>>> Some context: This is about a patch by Ramsay that removes the
>>> "schizophrenic lstat" hack for Cygwin. Junio, can you please queue that
>>> patch in pu?
>>
>> Sure.  Thanks.
> 
> First of all,
> thanks for the work.
> 
> Here some "benchmark" results, 
> (The test run of the test suite did the same amout of time).

The test suite runs noticeably faster for me.

> 
> But:
> git status -uno in real life takes double the time,
> git 1.8.3 compared against "pu with the vanilla l/stat"
>
> 1 second ->  2 seconds on linux kernel
> 0.2 seconds -> 0.4 seconds on git.git 

Hmm, OK, I guess I will have to try something else. Sigh :(

> Do we have any known problems with the current implementation ?

Yes. The next branch is currently broken. (see reply to Junio)

> Does speed matter ?
> 
> One vote to keep the special cygwin functions.
> (And have a look how to improve the core.filemode)

I don't understand this (parenthetical) comment; could you
elaborate on this.

ATB,
Ramsay Jones



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/1] cygwin: Remove the Win32 l/stat() functions

2013-06-26 Thread Ramsay Jones
Junio C Hamano wrote:
> Ramsay Jones  writes:
> 
>> Michael Haggerty and Jeff King have been re-vamping the reference
>> handling code. The failures noted above were provoked by patches
>> in the 'mh/ref-races' branch. At the time I wrote this patch, that
>> branch was only included in 'pu', but I notice that this topic has
>> now progressed to 'next' (see commit 71f1a182).
> 
> I had an impression that up to 98eeb09e (for_each_ref: load all
> loose refs before packed refs, 2013-06-20) that is now in 'next'
> does not agressively use the lstat timestamp of the packed-refs
> file, and the "optional" bit 5d478f5c (refs: do not invalidate the
> packed-refs cache unnecessarily, 2013-06-20), and the one in 'next'
> should be safe with the cheating-lstat.  Isn't it the case?

The next branch (from a couple of days ago, namely commit 4f488db) is
currently broken on cygwin, like so:

  $ cd t
  $ ./t3211-peel-ref.sh -i -v
  
  [ ... ]
  
  ok 7 - refs are peeled outside of refs/tags (old packed)
  
  expecting success:
  git pack-refs --all &&
  cp .git/packed-refs fully-peeled &&
  git branch yadda &&
  git pack-refs --all &&
  git branch -d yadda &&
  test_cmp fully-peeled .git/packed-refs
  
  fatal: internal error: packed-ref cache cleared while locked
  not ok 8 - peeled refs survive deletion of packed ref
  #
  #   git pack-refs --all &&
  #   cp .git/packed-refs fully-peeled &&
  #   git branch yadda &&
  #   git pack-refs --all &&
  #   git branch -d yadda &&
  #   test_cmp fully-peeled .git/packed-refs
  #
  $ cd trash\ directory.t3211-peel-ref/
  $ ls
  actual  base.t  expect
  $ ../../bin-wrappers/git pack-refs --all
  fatal: internal error: packed-ref cache cleared while locked
  $ gdb ../../git.exe
  GNU gdb 6.5.50.20060706-cvs (cygwin-special)
  Copyright (C) 2006 Free Software Foundation, Inc.
  GDB is free software, covered by the GNU General Public License, and you are
  welcome to change it and/or distribute copies of it under certain conditions.
  Type "show copying" to see the conditions.
  There is absolutely no warranty for GDB.  Type "show warranty" for details.
  This GDB was configured as "i686-pc-cygwin"...
  (gdb) b stat_validity_check
  Breakpoint 1 at 0x48a33f: file read-cache.c, line 1965.
  (gdb) run pack-refs --all
  Starting program: /home/ramsay/git/git.exe pack-refs --all
  Loaded symbols for /cygdrive/c/WINDOWS/system32/ntdll.dll
  Loaded symbols for /cygdrive/c/WINDOWS/system32/kernel32.dll
  Loaded symbols for /usr/bin/cygcrypto-0.9.8.dll
  Loaded symbols for /usr/bin/cygwin1.dll
  Loaded symbols for /cygdrive/c/WINDOWS/system32/advapi32.dll
  Loaded symbols for /cygdrive/c/WINDOWS/system32/rpcrt4.dll
  Loaded symbols for /cygdrive/c/WINDOWS/system32/secur32.dll
  Loaded symbols for /usr/bin/cygiconv-2.dll
  Loaded symbols for /usr/bin/cygz.dll
  
  Breakpoint 1, stat_validity_check (sv=0xa611a4,
  path=0x57d98c ".git/packed-refs") at read-cache.c:1965
  1965if (stat(path, &st) < 0)
  (gdb) n
  1967if (!sv->sd)
  (gdb) p sv
  $1 = (struct stat_validity *) 0xa611a4
  (gdb) p *sv
  $2 = {sd = 0xa62478}
  (gdb) p sv->sd
  $3 = (struct stat_data *) 0xa62478
  (gdb) p *sv->sd
  $4 = {sd_ctime = {sec = 1372194879, nsec = 671875000}, sd_mtime = {
  sec = 1372194879, nsec = 65625}, sd_dev = 2899104371,
sd_ino = 180184, sd_uid = 1005, sd_gid = 513, sd_size = 296}
  (gdb) p st
  $5 = {st_dev = 0, st_ino = 0, st_mode = 33152, st_nlink = 1, st_uid = 0,
st_gid = 0, st_rdev = 0, st_size = 296, st_atim = {tv_sec = 1372195048,
  tv_nsec = 890625000}, st_mtim = {tv_sec = 1372194879,
  tv_nsec = 65625}, st_ctim = {tv_sec = 1372194879,
  tv_nsec = 15625000}, st_blksize = 5636381, st_blocks = 1, st_spare4 = {
  10883480, 5757324}}
  (gdb) c
  Continuing.
  fatal: internal error: packed-ref cache cleared while locked
  
  Program exited with code 0200.
  (gdb) quit
  $ ../../test-stat .git/packed-refs
  stat for '.git/packed-refs':
  *dev:   -1395862925, 0
  *ino:   180184, 0
  *mode:  100644 -rw-, 100600 -rw-
   nlink: 1, 1
  *uid:   1005, 0
  *gid:   513, 0
  *rdev:  -1395862925, 0
   size:  296, 296
   atime: 1372195048, 1372195048 Tue Jun 25 22:17:28 2013
   mtime: 1372194879, 1372194879 Tue Jun 25 22:14:39 2013
   ctime: 1372194879, 1372194879 Tue Jun 25 22:14:39 2013
  $
  
Note that, as noted before, (at least) the following tests fail
due to the 'mh/ref-races' branch:

  t3210-pack-refs.sh(Wstat: 256 Tests: 16 Failed: 12)
Failed tests:  4-9, 11-16
Non-zero exit status: 1
  t3211-peel-ref.sh (Wstat: 256 Tests: 8 Failed: 1)
Failed test:  8
Non-zero exit status: 1
  t5500-fetch-pack.sh   (Wstat: 256 Tests: 54 Failed: 1)
Failed test:  43
Non-zero exit status: 1

> In any case, if removing the cheating-lstat will improve th

Re: [RFC/PATCH 0/1] cygwin: Remove the Win32 l/stat() functions

2013-06-26 Thread Ramsay Jones
Michael Haggerty wrote:
> On 06/25/2013 07:07 AM, Junio C Hamano wrote:
>> Ramsay Jones  writes:
>>
>>> Michael Haggerty and Jeff King have been re-vamping the reference
>>> handling code. The failures noted above were provoked by patches
>>> in the 'mh/ref-races' branch. At the time I wrote this patch, that
>>> branch was only included in 'pu', but I notice that this topic has
>>> now progressed to 'next' (see commit 71f1a182).
>>
>> I had an impression that up to 98eeb09e (for_each_ref: load all
>> loose refs before packed refs, 2013-06-20) that is now in 'next'
>> does not agressively use the lstat timestamp of the packed-refs
>> file, and the "optional" bit 5d478f5c (refs: do not invalidate the
>> packed-refs cache unnecessarily, 2013-06-20), and the one in 'next'
>> should be safe with the cheating-lstat.  Isn't it the case?
>>
>> In any case, if removing the cheating-lstat will improve the
>> robustness without hurting performance, I am all for it.
>>
>> The less platform specific hacks, the better ;-).
> 
> The following patch in the "non-optional" commits (which has already
> been merged to next) uses stat() via the new stat_validity API:
> 
> ca9199300e get_packed_ref_cache: reload packed-refs file when it changes
> 
> This patch adds some *extra* cache invalidation that was heretofore
> missing.  If stat() is broken it could
> 
> (a) cause a false positive, resulting in some unnecessary cache
> invalidation and re-reading of packed-refs, which will hurt performance
> but not correctness; or
> 
> (b) cause a false negative, in which case the stale cache might be used
> for reading (but not writing), just as was *always* the case before this
> patch.
> 
> As far as I understand, the concern for cygwin is (a).  I will leave it
> to others to measure and/or decide whether the performance loss is too
> grave to endure until the cygwin stat() situation is fixed.
> 

Hmm, I'm not sure I understand ... However, I can confirm that the
'mh/ref-races' branch in next is broken on cygwin. (i.e. it is not
just a speed issue; it provokes fatal errors).

See reply to Junio.

ATB,
Ramsay Jones



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/16] documentation: add documentation for the bitmap format

2013-06-26 Thread Colby Ranger
>> Pinning the bitmap index on the reverse index adds complexity (lookups
>> are two-step: first find the entry in the reverse index, and then find
>> the SHA1 in the index) and is measurably slower, in both loading and
>> lookup times. Since Git doesn't have a memory problem, it's very hard
>> to make an argument for design that is more complex and runs slower to
>> save memory.
>
> Sorting by SHA1 will generate a random distribution. This will require
> you to inflate the entire bitmap on every fetch request, in order to
> do the "contains" operation.  Sorting by pack offset allows us to
> inflate only the bits we need as we are walking the graph, since they
> are usually at the start of the bitmap.
>
> What is the general size in bytes of the SHA1 sorted bitmaps?  If they
> are much larger, the size of the bitmap has an impact on how fast you
> can perform bitwise operations on them, which is important for fetch
> when doing wants AND NOT haves.

Furthermore, JGit primarily operates on the bitmap representation,
rarely converting bitmap id -> SHA1 during clone. When the bitmap of
objects to include in the output pack contains all of the objects in
the bitmap'd pack, we only do the translation of the bitmap ids of new
objects, not in the bitmap index, and it is just a lookup in an array.
Those objects are put at the front of the stream. The rest of the
objects are streamed directly from the pack, with some header munging,
since it is guaranteed to be a fully connected pack. Most of the time
this works because JGit creates 2 packs during GC: a heads pack, which
is bitmap'd, and an everything else pack.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/1] cygwin: Remove the Win32 l/stat() functions

2013-06-26 Thread Jeff King
On Wed, Jun 26, 2013 at 10:45:48PM +0100, Ramsay Jones wrote:

> > This patch adds some *extra* cache invalidation that was heretofore
> > missing.  If stat() is broken it could
> > 
> > (a) cause a false positive, resulting in some unnecessary cache
> > invalidation and re-reading of packed-refs, which will hurt performance
> > but not correctness; or
> > 
> > (b) cause a false negative, in which case the stale cache might be used
> > for reading (but not writing), just as was *always* the case before this
> > patch.
> > 
> > As far as I understand, the concern for cygwin is (a).  I will leave it
> > to others to measure and/or decide whether the performance loss is too
> > grave to endure until the cygwin stat() situation is fixed.
> 
> Hmm, I'm not sure I understand ... However, I can confirm that the
> 'mh/ref-races' branch in next is broken on cygwin. (i.e. it is not
> just a speed issue; it provokes fatal errors).

I think Michael's assessment above is missing one thing. It is true that
a false positive is just a performance problem in most cases, as we
unnecessarily reload the file, thinking it has changed.

However, when we have taken a lock on the file, there is an additional
safety measure: if we find the file is changed, we abort, as that should
never happen (it means somebody else modified the file while we had it
locked). But of course Cygwin's false positive here triggers the safety
valve, and we die without even realizing that nothing changed.

In theory we can drop the safety valve; it should never actually happen.
But I'd like to keep it there for working systems. Perhaps it is worth
doing something like this:

diff --git a/refs.c b/refs.c
index 4302206..7cac42d 100644
--- a/refs.c
+++ b/refs.c
@@ -1080,6 +1080,16 @@ static struct packed_ref_cache 
*get_packed_ref_cache(struct ref_cache *refs)
packed_refs_file = git_path("packed-refs");
 
if (refs->packed &&
+#ifdef STAT_VALIDITY_GIVES_FALSE_POSITIVES
+   /*
+* If we get false positives from our stat calls on this platform,
+* then we must assume that once we have locked the packed-refs
+* file it does not change. Otherwise it looks like somebody else
+* has changed it out from under us (while we have it locked!), and
+* we die().
+*/
+   !refs->packed->lock &&
+#endif
!stat_validity_check(&refs->packed->validity, packed_refs_file))
clear_packed_ref_cache(refs);
 

and then we can add that define to Cygwin in the Makefile. I verified
the issue on Linux by "breaking" stat_validity_check to always return 0
(i.e., to always give a false positive that the file has changed, which
is what Cygwin is doing).

I am curious how often Cygwin gives us the false positive. If it is
every time, then the check is not doing much good at all. Is it possible
for you to instrument stat_validity_check to report how often it does or
does not do anything useful?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/1] cygwin: Remove the Win32 l/stat() functions

2013-06-26 Thread Jeff King
On Wed, Jun 26, 2013 at 06:35:52PM -0400, Jeff King wrote:

> I am curious how often Cygwin gives us the false positive. If it is
> every time, then the check is not doing much good at all. Is it possible
> for you to instrument stat_validity_check to report how often it does or
> does not do anything useful?

Maybe like this:

diff --git a/read-cache.c b/read-cache.c
index d5201f9..19dcb69 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1958,6 +1958,14 @@ void stat_validity_clear(struct stat_validity *sv)
sv->sd = NULL;
 }
 
+static int unchanged;
+static int attempts;
+static void print_stats(void)
+{
+   fprintf(stderr, "stat data was unchanged %d/%d\n",
+   unchanged, attempts);
+}
+
 int stat_validity_check(struct stat_validity *sv, const char *path)
 {
struct stat st;
@@ -1966,7 +1974,16 @@ int stat_validity_check(struct stat_validity *sv, const 
char *path)
return sv->sd == NULL;
if (!sv->sd)
return 0;
-   return S_ISREG(st.st_mode) && !match_stat_data(sv->sd, &st);
+   if (!S_ISREG(st.st_mode))
+   return 0;
+   if (!attempts++)
+   atexit(print_stats);
+   if (!match_stat_data(sv->sd, &st)) {
+   unchanged++;
+   return 1;
+   }
+   else
+   return 0;
 }
 
 void stat_validity_update(struct stat_validity *sv, int fd)

Running "t3211 -v", I see things like:

  stat data was unchanged 3/3
  stat data was unchanged 20/20
  stat data was unchanged 2/2
  Deleted branch yadda (was d1ff1c9).
  stat data was unchanged 8/8
  ok 8 - peeled refs survive deletion of packed ref

I am curious if you will see 0/20 or 19/20 there on Cygwin.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/16] documentation: add documentation for the bitmap format

2013-06-26 Thread Thomas Rast
Vicent Martí  writes:

> On Tue, Jun 25, 2013 at 5:58 PM, Thomas Rast  wrote:
>>
>> Please document the RLW format here.
>
> Har har. I was going to comment on your review of the Ewah patchset,
> but might as well do it here: the only thing I know about Ewah bitmaps
> is that they work. And I know this because I did extensive fuzz
> testing of my C port. Unfortunately, the original Java code I ported
> from has 0 comments, so any documentation here would have to be
> reverse-engineered.

I think the below would be a reasonable documentation, to be appended
after your description of the EWAH format.  Maybe Colby can correct me
if I got anything wrong.  You can basically read this off from the
implementation of ewah_each_bit() and the helper functions it uses.

-- 8< --
The compressed bitmap is stored in a form of run-length encoding, as
follows.  It consists of a concatenation of an arbitrary number of
chunks.  Each chunk consists of one or more 64-bit words

 H  L_1  L_2  L_3  L_M

H is called RLW (run length word).  It consists of (from lower to higher
order bits):

 - 1 bit: the repeated bit B

 - 32 bits: repetition count K (unsigned)

 - 31 bits: literal word count M (unsigned)

The bitstream represented by the above chunk is then:

 - K repetitions of B

 - The bits stored in `L_1` through `L_M`.  Within a word, bits at
   lower order come earlier in the stream than those at higher
   order.

The next word after `L_M` (if any) must again be a RLW, for the next
chunk.  For efficient appending to the bitstream, the EWAH stores a
format to the last RLW in the stream.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/16] rev-list: add bitmap mode to speed up lists

2013-06-26 Thread Thomas Rast
Vicent Martí  writes:

> I'm afraid I cannot reproduce the segfault locally (assuming you're
> performing the rev-list on the git/git repository). Could you please
> send me more information, and a core dump if possible?

Sure, but isn't the core dump useless if you don't have the same
executable?  And since I'm building "custom" git, you won't have that.

Here's a semi-full backtrace (I left out the spammy output in the
outermost frames).  Some variables in #2 and #3 seem to have gone off
the rails.

#0  0x772b06fb in __memset_sse2 () from /lib64/libc.so.6
No symbol table info available.
#1  0x0054c31c in bitmap_set (self=0x89c360, pos=18446744072278122040) 
at ewah/bitmap.c:46
old_size = 7666
block = 288230376129345656
#2  0x004e6c70 in add_to_include_set (data=0x7fffcd00, 
sha1=0x85c014 
"\230\062˝M\311i\373\372\317\321\370\224\017\313\336\301\213\271\060", 
bitmap_pos=-1431429576) at pack-bitmap.c:428
hash_pos = 512
#3  0x004e6cd6 in should_include (commit=0x85c010, 
_data=0x7fffcd00) at pack-bitmap.c:443
data = 0x7fffcd00
bitmap_pos = -1431429576
#4  0x0050cf1d in add_parents_to_list (revs=0x7fffce30, 
commit=0x85c010, list=0x7fffce30, cache_ptr=0x0) at revision.c:784
parent = 0x88c260
left_flag = 32767
cached_base = 0x0
#5  0x00512b66 in get_revision_1 (revs=0x7fffce30) at 
revision.c:2857
entry = 0x8f9ce0
commit = 0x85c010
#6  0x00512dcf in get_revision_internal (revs=0x7fffce30) at 
revision.c:2964
c = 0x0
l = 0x1000
#7  0x00512fe1 in get_revision (revs=0x7fffce30) at revision.c:3040
c = 0xb92608
reversed = 0x89c360
#8  0x004d2a24 in traverse_commit_list (revs=0x7fffce30, 
show_commit=0x4e6b72 , show_object=0x4e6afa , 
data=0x89c360) at list-objects.c:179
i = -1
commit = 0xb92608
base = {
  alloc = 4097, 
  len = 0, 
  buf = 0x87bbe0 ""
}
#9  0x004e6fa4 in find_objects (revs=0x7fffce30, roots=0x0, 
seen=0x85b760) at pack-bitmap.c:549
incdata = {
  base = 0x89c360, 
  seen = 0x85b760
}
base = 0x89c360
needs_walk = true
not_mapped = 0x8f9dc0
#10 0x004e747b in prepare_bitmap_walk (revs=0x7fffce30, 
result_size=0x0) at pack-bitmap.c:679
i = 2
pending_nr = 2
pending_alloc = 64
pending_e = 0x853e10
wants = 0x8545b0
haves = 0x854820
wants_bitmap = 0x0
haves_bitmap = 0x85b760
#11 0x00474bb3 in cmd_rev_list (argc=2, argv=0x7fffd6e8, 
prefix=0x0) at builtin/rev-list.c:356
#12 0x00405820 in run_builtin (p=0x7c3ef8 , 
argc=4, argv=0x7fffd6e8) at git.c:291
#13 0x004059b3 in handle_internal_command (argc=4, argv=0x7fffd6e8) 
at git.c:454
#14 0x00405b87 in main (argc=4, av=0x7fffd6e8) at git.c:544


This is with a version of your series that you can find at

  https://github.com/trast/git.git vm/ewah

I am'd your patches on top of Junio's master at the time, except for the
parts to the Makefile that did not apply, which I fixed up manually.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/16] documentation: add documentation for the bitmap format

2013-06-26 Thread Thomas Rast
Thomas Rast  writes:

[...]
> The next word after `L_M` (if any) must again be a RLW, for the next
> chunk.  For efficient appending to the bitstream, the EWAH stores a
> format to the last RLW in the stream.
  ^^

I have no idea what Freud did there, but "pointer" or some such is
probably a saner choice.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] remote: Add warnings about mixin --mirror and other remotes

2013-06-26 Thread Junio C Hamano
Dennis Kaarsemaker  writes:

> Apart from the exactly matching refspecs, does git in any other way
> treat this as a special case?

Sorry, I do not quite understand, especially the "exactly matching"
part, which as far as I know is not special (in other words, I am
not sure what kind of "special casing" you are discussing).

>> In any case, I've been assuming in this discussion "allow" is to
>> silently accept, and overlaps are "warned" but not "rejected".  So
>> if you meant by 'outlaw' to die and refuse to run, that is not what
>> I meant.
>
> Well, 1/3 is a warning on add, 3/3 is a warning and refusing to prune.

By "refusing to prune" do you mean "error out with die()"?

I was trying to make sure your "outlaw" was not something else
(e.g. "die without not pruning anything that are not even
overlapping"), which is probably too strong.

I think what your patch did was "keep them around not pruned because
we do not know where they came from" (I just checked $gmane/228589
again); that is in line with "warned but not rejected", so I think
we are OK.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rebase -i: fixup fixup! fixup!

2013-06-26 Thread Junio C Hamano
Andrew Pimlott  writes:

> In order to test this, I wrote a helper function to dump the rebase -i
> todo list.  Would you like this introduced in its own patch, or
> combined?  See below.

Depends on how involved the addition of the tests that actually use
the helper, but in general it would be a good idea to add it in the
first patch that actually uses it.  Unused code added in a separate
patch will not point at that patch when bisecting, if that unused
code was broken from the beginning (not that I see anything
immediately broken in the code the following adds).


> ---8<---
> Subject: [PATCH] lib-rebase: set_cat_todo_editor
>
> Add a helper for testing rebase -i todo lists.  This can be used to verify
> the expected user experience, even for todo list changes that do not affect
> the outcome of rebase.
>
> Signed-off-by: Andrew Pimlott 
> ---
>  t/lib-rebase.sh |   13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/t/lib-rebase.sh b/t/lib-rebase.sh
> index 4b74ae4..d118dd6 100644
> --- a/t/lib-rebase.sh
> +++ b/t/lib-rebase.sh
> @@ -66,6 +66,19 @@ EOF
>   chmod a+x fake-editor.sh
>  }
>  
> +# After set_cat_todo_editor, rebase -i will write the todo list (ignoring
> +# blank lines and comments) to stdout, and exit failure.
> +
> +set_cat_todo_editor () {
> + echo "#!$SHELL_PATH" >fake-editor.sh
> + cat >> fake-editor.sh <<\EOF
> +grep "^[^#]" "$1"
> +exit 1
> +EOF
> + chmod a+x fake-editor.sh

These days we should use write_script to do this kind of thing, I
think.

> + test_set_editor "$(pwd)/fake-editor.sh"
> +}
> +
>  # checks that the revisions in "$2" represent a linear range with the
>  # subjects in "$1"
>  test_linear_range () {
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rebase -i: fixup fixup! fixup!

2013-06-26 Thread Andrew Pimlott
Excerpts from Junio C Hamano's message of Wed Jun 26 16:48:57 -0700 2013:
> Andrew Pimlott  writes:
> > In order to test this, I wrote a helper function to dump the rebase -i
> > todo list.  Would you like this introduced in its own patch, or
> > combined?  See below.
> 
> Depends on how involved the addition of the tests that actually use
> the helper, but in general it would be a good idea to add it in the
> first patch that actually uses it.  Unused code added in a separate
> patch will not point at that patch when bisecting, if that unused
> code was broken from the beginning (not that I see anything
> immediately broken in the code the following adds).

Ok, here is the complete commit, incorporating all feedback.

Andrew

---8<---
Subject: [PATCH 1/3] rebase -i: handle fixup! fixup! in --autosquash

In rebase -i --autosquash, ignore all "fixup! " or "squash! " after the
first.  This supports the case when a git commit --fixup/--squash referred
to an earlier fixup/squash instead of the original commit (whether
intentionally, as when the user expressly meant to note that the commit
fixes an earlier fixup; or inadvertently, as when the user meant to refer to
the original commit with :/msg; or out of laziness, as when the user could
remember how to refer to the fixup but not the original).

In the todo list, the full commit message is preserved, in case it provides
useful cues to the user.  A test helper set_cat_todo_editor is introduced to
check this.

Helped-by: Thomas Rast 
Helped-by: Junio C Hamano 
Signed-off-by: Andrew Pimlott 
---
 Documentation/git-rebase.txt |4 ++-
 git-rebase--interactive.sh   |   25 ++
 t/lib-rebase.sh  |   14 +++
 t/t3415-rebase-autosquash.sh |   57 ++
 4 files changed, 94 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index c84854a..6b2e1c8 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -389,7 +389,9 @@ squash/fixup series.
the same ..., automatically modify the todo list of rebase -i
so that the commit marked for squashing comes right after the
commit to be modified, and change the action of the moved
-   commit from `pick` to `squash` (or `fixup`).
+   commit from `pick` to `squash` (or `fixup`).  Ignores subsequent
+   "fixup! " or "squash! " after the first, in case you referred to an
+   earlier fixup/squash with `git commit --fixup/--squash`.
 +
 This option is only valid when the '--interactive' option is used.
 +
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index f953d8d..169e876 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -689,8 +689,22 @@ rearrange_squash () {
case "$message" in
"squash! "*|"fixup! "*)
action="${message%%!*}"
-   rest="${message#*! }"
-   echo "$sha1 $action $rest"
+   rest=$message
+   prefix=
+   # skip all squash! or fixup! (but save for later)
+   while :
+   do
+   case "$rest" in
+   "squash! "*|"fixup! "*)
+   prefix="$prefix${rest%%!*},"
+   rest="${rest#*! }"
+   ;;
+   *)
+   break
+   ;;
+   esac
+   done
+   echo "$sha1 $action $prefix $rest"
# if it's a single word, try to resolve to a full sha1 
and
# emit a second copy. This allows us to match on both 
message
# and on sha1 prefix
@@ -699,7 +713,7 @@ rearrange_squash () {
if test -n "$fullsha"; then
# prefix the action to uniquely 
identify this line as
# intended for full sha1 match
-   echo "$sha1 +$action $fullsha"
+   echo "$sha1 +$action $prefix $fullsha"
fi
fi
esac
@@ -714,7 +728,7 @@ rearrange_squash () {
esac
printf '%s\n' "$pick $sha1 $message"
used="$used$sha1 "
-   while read -r squash action msg_content
+   while read -r squash action msg_prefix msg_content
do
case " $used" in
*" $squash "*) continue ;;
@@ -730,7 +744,8 @@ rearrange_squash () {
case "$message" in "$msg_content"*) emit=1;; 
esac ;;
  

Re: [PATCH 09/16] documentation: add documentation for the bitmap format

2013-06-26 Thread Colby Ranger
> +  Generating this reverse index at runtime is **not** free (around 900ms
> +  generation time for a repository like `torvalds/linux`), and once again,
> +  this generation time needs to happen every time `pack-objects` is
> +  spawned.

If generating the reverse index is expensive, it is probably
worthwhile to create a ".revidx" or extend the ".idx" with the
information sorted by offset.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] am: replace uses of --resolved with --continue

2013-06-26 Thread Kevin Bracey
git am was previously modified to provide --continue for consistency
with rebase, merge etc, and the documentation changed to showing
--continue as the primary form.

Complete the work by replacing remaining uses of --resolved by
--continue, most notably in suggested command reminders.

Signed-off-by: Kevin Bracey 
---
 Documentation/git-am.txt  | 4 ++--
 Documentation/user-manual.txt | 2 +-
 git-am.sh | 8 
 t/t7512-status-help.sh| 4 ++--
 wt-status.c   | 2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-am.txt b/Documentation/git-am.txt
index 5bbe7b6..54d8461 100644
--- a/Documentation/git-am.txt
+++ b/Documentation/git-am.txt
@@ -132,7 +132,7 @@ default.   You can use `--no-utf8` to override this.
 --resolvemsg=::
When a patch failure occurs,  will be printed
to the screen before exiting.  This overrides the
-   standard message informing you to use `--resolved`
+   standard message informing you to use `--continue`
or `--skip` to handle the failure.  This is solely
for internal use between 'git rebase' and 'git am'.
 
@@ -176,7 +176,7 @@ aborts in the middle.  You can recover from this in one of 
two ways:
 
 . hand resolve the conflict in the working directory, and update
   the index file to bring it into a state that the patch should
-  have produced.  Then run the command with the '--resolved' option.
+  have produced.  Then run the command with the '--continue' option.
 
 The command refuses to process new mailboxes until the current
 operation is finished, so if you decide to start over from scratch,
diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
index e831cc2..8218cf9 100644
--- a/Documentation/user-manual.txt
+++ b/Documentation/user-manual.txt
@@ -1835,7 +1835,7 @@ Once the index is updated with the results of the conflict
 resolution, instead of creating a new commit, just run
 
 -
-$ git am --resolved
+$ git am --continue
 -
 
 and Git will create the commit for you and continue applying the
diff --git a/git-am.sh b/git-am.sh
index 9f44509..7ea40fe 100755
--- a/git-am.sh
+++ b/git-am.sh
@@ -6,7 +6,7 @@ SUBDIRECTORY_OK=Yes
 OPTIONS_KEEPDASHDASH=
 OPTIONS_SPEC="\
 git am [options] [(|)...]
-git am [options] (--resolved | --skip | --abort)
+git am [options] (--continue | --skip | --abort)
 --
 i,interactive   run interactively
 b,binary*   (historical option -- no-op)
@@ -102,7 +102,7 @@ stop_here_user_resolve () {
printf '%s\n' "$resolvemsg"
stop_here $1
 fi
-eval_gettextln "When you have resolved this problem, run \"\$cmdline 
--resolved\".
+eval_gettextln "When you have resolved this problem, run \"\$cmdline 
--continue\".
 If you prefer to skip this patch, run \"\$cmdline --skip\" instead.
 To restore the original branch and stop patching, run \"\$cmdline --abort\"."
 
@@ -523,7 +523,7 @@ Use \"git am --abort\" to remove it.")"
esac
fi
 
-   # Make sure we are not given --skip, --resolved, nor --abort
+   # Make sure we are not given --skip, --continue, nor --abort
test "$skip$resolved$abort" = "" ||
die "$(gettext "Resolve operation not in progress, we are not 
resuming.")"
 
@@ -670,7 +670,7 @@ do
#  - patch is the patch body.
#
# When we are resuming, these files are either already prepared
-   # by the user, or the user can tell us to do so by --resolved flag.
+   # by the user, or the user can tell us to do so by --continue flag.
case "$resume" in
'')
if test -f "$dotest/rebasing"
diff --git a/t/t7512-status-help.sh b/t/t7512-status-help.sh
index 4f09bec..bd8aab0 100755
--- a/t/t7512-status-help.sh
+++ b/t/t7512-status-help.sh
@@ -510,7 +510,7 @@ test_expect_success 'status in an am session: file already 
exists' '
cat >expected <<-\EOF &&
# On branch am_already_exists
# You are in the middle of an am session.
-   #   (fix conflicts and then run "git am --resolved")
+   #   (fix conflicts and then run "git am --continue")
#   (use "git am --skip" to skip this patch)
#   (use "git am --abort" to restore the original branch)
#
@@ -532,7 +532,7 @@ test_expect_success 'status in an am session: file does not 
exist' '
cat >expected <<-\EOF &&
# On branch am_not_exists
# You are in the middle of an am session.
-   #   (fix conflicts and then run "git am --resolved")
+   #   (fix conflicts and then run "git am --continue")
#   (use "git am --skip" to skip this patch)
#   (use "git am --abort" to restore the original branch)
#
diff --git a/wt-status.c b/wt-status.c
index 438a40d..b191c65 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -826,7 +826,7 @@ static void show_am_in_p

Re: [PATCH v15 01/16] test: add test cases for relative_path

2013-06-26 Thread Jiang Xin
2013/6/27 Junio C Hamano 
>
> Jiang Xin  writes:
>
> > Add subcommand "relative_path" in test-path-utils, and add test cases
> > in t0060.
> >
> > Johannes tested this commit on Windows, and found that some relative_path
>
> "this commit", or "an earlier version of this patch"?  I am guessing
> it is the latter (if so, I can easily amend locally without a need
> for rerolling).

Sorry, my English. I should say: Johannes helped to test these test cases on
Windows, and found that ...

>
> > tests should be skipped on Windows. This is because the bash on Windows
> > rewrites arguments of regular Windows programs, such as git and the
> > test helpers, if the arguments look like absolute POSIX paths. As a
> > consequence, the actual tests performed are not what the tests scripts
> > expect.
> >
> > The tests that need *not* be skipped are those where the two paths passed
> > to 'test-path-utils relative_path' have the same prefix and the result is
> > expected to be a relative path. This is because the rewriting changes
> > "/a/b" to "D:/Src/MSysGit/a/b", and when both inputs are extended the same
> > way, this just cancels out in the relative path computation.
> >
> > Signed-off-by: Jiang Xin 
> > Signed-off-by: Johannes Sixt 
> > Signed-off-by: Junio C Hamano 
>
> I somehow lost track, but does the above list of sign-offs reflect
> the origins of the changes contained in this patch, or is the second
> one meant to be helped-by or something (if so, I can easily amend
> locally without a need for rerolling)?

Johannes offered a nice patch based on the last one of this
patch series. I move his patch right after the first patch
(patch 01: test: add test cases for relative_path). But I think
as a cooking topic, it should not has a commit to fix another.
So I squash Johannes' commit to the previous commit, and
add his signed-off-by.


--
Jiang Xin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/16] documentation: add documentation for the bitmap format

2013-06-26 Thread Shawn Pearce
On Tue, Jun 25, 2013 at 4:08 PM, Vicent Martí  wrote:
> On Tue, Jun 25, 2013 at 11:17 PM, Junio C Hamano  wrote:
>> What case are you talking about?
>>
>> The n-th object must be one of these four types and can never be of
>> more than one type at the same time, so a natural expectation from
>> the reader is "If you OR them together, you will get the same set".
>> If you say "If you XOR them", that forces the reader to wonder when
>> these bitmaps ever can overlap at the same bit position.
>
> I guess this is just wording. I don't particularly care about the
> distinction, but I'll change it to OR.

Hmm, OK. If you think XOR and OR are the same operation, I also have a
bridge to sell you. Its in Brooklyn. Its a great value.

The correct operation is OR. Not XOR. OR. Drop the X.

> It cannot be mmapped not particularly because of endianness issues,
> but because the original format is not indexed and requires a full
> parse of the whole index before it can be accessed programatically.
> The wrong endianness just increases the parse time.

Wrong endianness has nothing to do with the parse time. Modern CPUs
can flip a word around very quickly. In JGit we chose to parse the
file at load time because its simpler than having an additional index
segment, and we do what you did which is to toss the object SHA-1s
into a hashtable for fast lookup. By the time we look for the SHA-1s
and toss them into a hashtable we can stride through the file and find
the bitmap regions. Simple.

In other words, the least complex solution possible that still
provides good performance. I'd say we have pretty good performance.

>>> and I'm going to try to make it run fast enough in that
>>> encoding.
>>
>> Hmph.  Is it an option to start from what JGit does, so that people
>> can use both JGit and your code on the same repository?

I'm afraid I agree here with Junio. The JGit format is already
shipping in JGit 3.0, Gerrit Code Review 2.6, and in heavy production
use for almost a year on android.googlesource.com, and Google's own
internal Git trees.

I would prefer to see a series adding bitmap support to C Git start
with the existing format, make it run, taking advantage of the
optimizations JGit uses (many of which you ignored and tried to "fix"
in other ways), and then look at improving the file format itself if
load time is still the largest low hanging fruit in upload-pack. I'm
guessing its not. You futzed around with the object table, but JGit
sped itself up considerably by simply not using the object table when
the bitmap is used. I think there are several such optimizations you
missed in your rush to redefine the file format.

>>  And then if
>> you do not succeed, after trying to optimize in-core processing
>> using that on-disk format to make it fast enough, start thinking
>> about tweaking the on-disk format?
>
> I'm afraid this is not an option. I have an old patchset that
> implements JGit v1 bitmap loading (and in fact that's how I initially
> developed these series -- by loading the bitmaps from JGit for
> debugging), but I discarded it because it simply doesn't pan out in
> production. ~3 seconds time to spawn `upload-pack` is not an option
> for us. I did not develop a tweaked on-disk format out of boredom.

I think your code or experiments are bogus. Even on our systems with
JGit a cold start for the Linux kernel doesn't take 3s. And this is
JGit where Java is slow because "Jesus it has a lot of factories", and
without mmap'ing the file into the server's address space. Hell the
file has to come over the network from a remote disk array.

> I could dig up the patch if you're particularly interested in
> backwards compatibility, but since it was several times slower than
> the current iteration, I have no interest (time, actually) to maintain
> it, brush it up, and so on. I have already offered myself to port the
> v2 format to JGit as soon as it's settled. It sounds like a better
> investment of all our times.

Actually, I think the format you propose here is inferior to the JGit
format. In particular the idx-ordering means the EWAH code is useless.
You might as well not use the EWAH format and just store 2.6M bits per
commit. The idx-ordering also makes *much* harder to emit a pack file
a reasonable order for the client. Colby and I tried idx-ordering and
discarded it when it didn't perform as well as the pack-ordering that
JGit uses.

> Following up on Shawn's comments, I removed the little-endian support
> from the on-disk format and implemented lazy loading of the bitmaps to
> make up for it. The result is decent (slowed down from 250ms to 300ms)
> and it lets us keep the whole format as NWO on disk. I think it's a
> good tradeback.

The maintenance burden of two endian formats in a single file is too
high to justify. I'm glad to see you saw that.

> As it stands right now, the only two changes from v1 of the on-disk format 
> are:
>
> - There is an index at the end. This is a good idea.

I don't think the index is

Re: [PATCH 09/16] documentation: add documentation for the bitmap format

2013-06-26 Thread Shawn Pearce
On Tue, Jun 25, 2013 at 11:11 PM, Jeff King  wrote:
> On Tue, Jun 25, 2013 at 09:33:11PM +0200, Vicent Martí wrote:
>
>> > One way we side-stepped the size inflation problem in JGit was to only
>> > use the bitmap index information when sending data on the wire to a
>> > client. Here delta reuse plays a significant factor in building the
>> > pack, and we don't have to be as accurate on matching deltas. During
>> > the equivalent of `git repack` bitmaps are not used, allowing the
>> > traditional graph enumeration algorithm to generate path hash
>> > information.
>>
>> OH BOY HERE WE GO. This is worth its own thread, lots to discuss here.
>> I think peff will have a patchset regarding this to upstream soon,
>> we'll get back to it later.
>
> We do the same thing (only use bitmaps during on-the-wire fetches).  But
> there a few problems with assuming delta reuse.
>
> For us (GitHub), the foremost one is that we pack many "forks" of a
> repository together into a single packfile. That means when you clone
> torvalds/linux, an object you want may be stored in the on-disk pack
> with a delta against an object that you are not going to get. So we have
> to throw out that delta and find a new one.

Gerrit Code Review ran into the same problem a few years ago with the
refs/changes namespace. Objects reachable from a branch were often
delta compressed against dropped code review revisions, making for
some slow transfers. We fixed this by creating a pack of everything
reachable from refs/heads/* and then another pack of the other stuff.

I would encourage you to do what you suggest...

> I'm dealing with that by adding an option to respect "islands" during
> packing, where an island is a set of common objects (we split it by
> fork, since we expect those objects to be fetched together, but you
> could use other criteria). The rule is that an object cannot delta
> against another object that is not in all of its islands. So everybody
> can delta against shared history, but objects in your fork can only
> delta against other objects in the fork.  You are guaranteed to be able
> to reuse such deltas during a full clone of a fork, and the on-disk pack
> size does not suffer all that much (because there is usually a good
> alternate delta base within your reachable history).

Yes, exactly. I want to do the same thing on our servers, as we have
many forks of some popular open source repositories that are also not
small (Linux kernel, WebKit). Unfortunately Google has not had the
time to develop the necessary support into JGit.

> So with that series, we can get good reuse for clones. But there are
> still two cases worth considering:
>
>   1. When you fetch a subset of the commits, git marks only the edges as
>  preferred bases, and does not walk the full object graph down to
>  the roots. So any object you want that is delta'd against something
>  older will not get reused. If you have reachability bitmaps, I
>  don't think there is any reason that we cannot use the entire
>  object graph (starting at the "have" tips, of course) as preferred
>  bases.

In JGit we use the reachability bitmap to provide proof a client has
an object. Even if its not in the edges. This allows us much better
delta reuse, as often frequently deltas will be available pointing to
something behind the edge, but that the client certainly has given the
edges we know about.

We also use the reachability bitmap to provide proof a client does not
need an object. We found a reduction in number of objects transferred
because the "want AND NOT have" subtracted out a number of objects not
in the edge. Apparently merges, reverts and cherry-picks happen often
enough in the repositories we host that this particular optimization
helps reduce data transfer, and work at both server and client ends of
the connection. Its a nice freebie the bitmap algorithm gives us.

>   2. The server is not necessarily fully packed. In an active repo, you
>  may have a large "base" pack with bitmaps, with several recently
>  pushed packs on top. You still need to delta the recently pushed
>  objects against the base objects.

Yes, this is unfortunate. One way we avoid this in JGit is to keep
everything in pack files, rather than exploding loose. The
reachability bitmap often proves the client has the delta base the
pusher used to make the object, allowing us to reuse the delta. It may
not be the absolute best delta in the world, but reuse is faster than
inflate()+delta()+deflate(), and the delta is probably "good enough"
until the server can do a real GC in the background.

We combine small packs from pushes together by almost literally just
concat'ing the packs together and creating a new .idx. Newer pushed
data is put in front of the older data, the pack is clustered by
"commit, tree, blob" ordering, duplicates are removed, and its written
back to disk. Typically we complete this "pack concat" operation mere
seconds after a push finishes, 

Re: [PATCH 09/16] documentation: add documentation for the bitmap format

2013-06-26 Thread Shawn Pearce
On Wed, Jun 26, 2013 at 6:53 PM, Colby Ranger  wrote:
>> +  Generating this reverse index at runtime is **not** free (around 900ms
>> +  generation time for a repository like `torvalds/linux`), and once again,
>> +  this generation time needs to happen every time `pack-objects` is
>> +  spawned.

900ms is fishy. Creating the revidx should not take that long. But if it is...

> If generating the reverse index is expensive, it is probably
> worthwhile to create a ".revidx" or extend the ".idx" with the
> information sorted by offset.

Colby is probably right that a cached copy of the revidx would help.
Or update the .idx format to have two additional sections that stores
the length of each packed object and the delta base of each packed
object, allowing pack-objects to avoid creating the revidx. This would
be an additional ~8 bytes per object, so ~19.8M for the Linux kernel
(given ~2.6M objects).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v15 04/16] Refactor quote_path_relative, remove unused params

2013-06-26 Thread Jiang Xin
2013/6/27 Junio C Hamano :
>> diff --git a/quote.c b/quote.c
>> index 64ff3..ebb8 100644
>
> You seem to be using unusually short abbrev length.
>
> Please don't, at least in format-patch output.
>
> "ebb8" may be unique within your repository, but may not be unique
> in repositories of other people who attempt to apply your patches.
>
> Offtopic.  Fixing this, without forcing you to use the default
> abbrev length everywhere, _might_ require a bit of changes to the
> configuration mechanism, or "git format-patch", or both.  I've also
> seen in some other thread that --numstat was used in addition to the
> usual --stat, which should not be added to format-patch output only
> because the user configured it for other "diff/log" uses, which may
> also need to be fixed in the same way.

Thank you for your notice. I find that I have a setting: core.abbrev=4,
which overrides the default. Maybe I set it two years ago, when I wrote
my book on Git in Chinese.

-- 
Jiang Xin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/1] cygwin: Remove the Win32 l/stat() functions

2013-06-26 Thread Mark Levedahl

On 06/26/2013 10:19 AM, Torsten Bögershausen wrote:

On 2013-06-25 23.18, Junio C Hamano wrote:

Johannes Sixt  writes:


Some context: This is about a patch by Ramsay that removes the
"schizophrenic lstat" hack for Cygwin. Junio, can you please queue that
patch in pu?

Sure.  Thanks.

First of all,
thanks for the work.

Here some "benchmark" results,
(The test run of the test suite did the same amout of time).

But:
git status -uno in real life takes double the time,
git 1.8.3 compared against "pu with the vanilla l/stat"

 1 second ->  2 seconds on linux kernel

0.2 seconds -> 0.4 seconds on git.git

Do we have any known problems with the current implementation ?
Does speed matter ?

One vote to keep the special cygwin functions.
(And have a look how to improve the core.filemode)

/Torsten


There have been threads on the cygwin mailing lists for at least a 
decade looking to speed up cygwin's posix stat / lstat (and fork). If 
improvement were merely difficult, it would have been done long ago. As 
git cares about things like execute bits, file / repository permissions, 
and soft links, whatever stat / lstat git uses needs to fully support 
those under cygwin, either by using what cygwin provides or providing a 
complete replacement. Note my other reply - with Ramsay's patch I can 
complete the test suite (except for t0008.sh that has a known hang) 
while without it I find the test suite randomly (unrepeatable) hangs in 
many tests. So, this stat/lstat replacement is at least implicated in 
current troubles.


Mark
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/16] documentation: add documentation for the bitmap format

2013-06-26 Thread Vicent Martí
That was a very rude reply. :(

Please refrain from interacting with me in the ML in the future. I'l
do accordingly.

Thanks!
vmg

On Thu, Jun 27, 2013 at 3:11 AM, Shawn Pearce  wrote:
> On Tue, Jun 25, 2013 at 4:08 PM, Vicent Martí  wrote:
>> On Tue, Jun 25, 2013 at 11:17 PM, Junio C Hamano  wrote:
>>> What case are you talking about?
>>>
>>> The n-th object must be one of these four types and can never be of
>>> more than one type at the same time, so a natural expectation from
>>> the reader is "If you OR them together, you will get the same set".
>>> If you say "If you XOR them", that forces the reader to wonder when
>>> these bitmaps ever can overlap at the same bit position.
>>
>> I guess this is just wording. I don't particularly care about the
>> distinction, but I'll change it to OR.
>
> Hmm, OK. If you think XOR and OR are the same operation, I also have a
> bridge to sell you. Its in Brooklyn. Its a great value.
>
> The correct operation is OR. Not XOR. OR. Drop the X.
>
>> It cannot be mmapped not particularly because of endianness issues,
>> but because the original format is not indexed and requires a full
>> parse of the whole index before it can be accessed programatically.
>> The wrong endianness just increases the parse time.
>
> Wrong endianness has nothing to do with the parse time. Modern CPUs
> can flip a word around very quickly. In JGit we chose to parse the
> file at load time because its simpler than having an additional index
> segment, and we do what you did which is to toss the object SHA-1s
> into a hashtable for fast lookup. By the time we look for the SHA-1s
> and toss them into a hashtable we can stride through the file and find
> the bitmap regions. Simple.
>
> In other words, the least complex solution possible that still
> provides good performance. I'd say we have pretty good performance.
>
 and I'm going to try to make it run fast enough in that
 encoding.
>>>
>>> Hmph.  Is it an option to start from what JGit does, so that people
>>> can use both JGit and your code on the same repository?
>
> I'm afraid I agree here with Junio. The JGit format is already
> shipping in JGit 3.0, Gerrit Code Review 2.6, and in heavy production
> use for almost a year on android.googlesource.com, and Google's own
> internal Git trees.
>
> I would prefer to see a series adding bitmap support to C Git start
> with the existing format, make it run, taking advantage of the
> optimizations JGit uses (many of which you ignored and tried to "fix"
> in other ways), and then look at improving the file format itself if
> load time is still the largest low hanging fruit in upload-pack. I'm
> guessing its not. You futzed around with the object table, but JGit
> sped itself up considerably by simply not using the object table when
> the bitmap is used. I think there are several such optimizations you
> missed in your rush to redefine the file format.
>
>>>  And then if
>>> you do not succeed, after trying to optimize in-core processing
>>> using that on-disk format to make it fast enough, start thinking
>>> about tweaking the on-disk format?
>>
>> I'm afraid this is not an option. I have an old patchset that
>> implements JGit v1 bitmap loading (and in fact that's how I initially
>> developed these series -- by loading the bitmaps from JGit for
>> debugging), but I discarded it because it simply doesn't pan out in
>> production. ~3 seconds time to spawn `upload-pack` is not an option
>> for us. I did not develop a tweaked on-disk format out of boredom.
>
> I think your code or experiments are bogus. Even on our systems with
> JGit a cold start for the Linux kernel doesn't take 3s. And this is
> JGit where Java is slow because "Jesus it has a lot of factories", and
> without mmap'ing the file into the server's address space. Hell the
> file has to come over the network from a remote disk array.
>
>> I could dig up the patch if you're particularly interested in
>> backwards compatibility, but since it was several times slower than
>> the current iteration, I have no interest (time, actually) to maintain
>> it, brush it up, and so on. I have already offered myself to port the
>> v2 format to JGit as soon as it's settled. It sounds like a better
>> investment of all our times.
>
> Actually, I think the format you propose here is inferior to the JGit
> format. In particular the idx-ordering means the EWAH code is useless.
> You might as well not use the EWAH format and just store 2.6M bits per
> commit. The idx-ordering also makes *much* harder to emit a pack file
> a reasonable order for the client. Colby and I tried idx-ordering and
> discarded it when it didn't perform as well as the pack-ordering that
> JGit uses.
>
>> Following up on Shawn's comments, I removed the little-endian support
>> from the on-disk format and implemented lazy loading of the bitmaps to
>> make up for it. The result is decent (slowed down from 250ms to 300ms)
>> and it lets us keep the whole format as NWO on disk. I thi

Re: [PATCH 09/16] documentation: add documentation for the bitmap format

2013-06-26 Thread Jeff King
On Thu, Jun 27, 2013 at 04:36:54AM +0200, Vicent Martí wrote:

> That was a very rude reply. :(
> 
> Please refrain from interacting with me in the ML in the future. I'l
> do accordingly.

I agree that the pointer arithmetic thing may have been a little much,
but I think there are some points we need to address in Shawn's email.

In particular, it seems like the slowness we saw with the v1 bitmap
format is not what Shawn and Colby have experienced. So it's possible
that our test setup is bad or different. Or maybe the C v1 reading
implementation had some problems that are fixable. It's hard to say
because we haven't shown any code that can be timed and compared.

And the pack-order versus idx-order for the bitmaps is still up in the
air. Do we have numbers on the on-disk sizes of the resulting EWAHs? The
pack-order ones should be more amenable to run-length encoding,
especially as you get further down into history (the tip ones would
mostly be 1's, no matter how you order them).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v15 02/16] path.c: refactor relative_path(), not only strip prefix

2013-06-26 Thread Jiang Xin
2013/6/27 Junio C Hamano :
>> + } else {
>> + i++;
>> + j++;
>> + }
>> + }
>> +
>> + if (
>> + /* "prefix" seems like prefix of "in" */
>> + i >= prefix_len &&
>
> So shouldn't this be "i == prefix_len"?
>
>> + /*
>> +  * but "/foo" is not a prefix of "/foobar"
>> +  * (i.e. prefix not end with '/')
>> +  */
>> + prefix_off < prefix_len) {
>> + if (j >= in_len) {
>
> Again, "j == in_len", isn't it?  Or can i and j overrun in_len and
> prefix_len?

Yes, better write as ==. But both will pass the test cases.
Since this commit has been merged to next, so let it be?


-- 
Jiang Xin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


device files should be handled by git

2013-06-26 Thread Perry Wagle
Hi --

I have a disk image of a small embedded device whose root file system I'd like 
to check-in to git as a means of distributing its GPL'd software.  In that disk 
image are device files, which GIT studiously ignores.  If symlinks are handled 
(contents being the path that the symlink points at), I don't see why device 
files can't be handled (contents being the type (char or block) and the major 
and minor device number).  TAR, for example, handles this fine, except that 
using tar in git sort-of goes against the granularity of the objects being 
modified (like adding a bunch of extra "sd??" devices), such that you are 
modifying a whole tar ball instead of the individual (device) files.

Is there a reason not to handle device files other than "its not traditional"?  
That's the only reason given in google or the IRC channel.

Thanks!

-- Perry

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


“git remote status” incorrect result

2013-06-26 Thread Hyunho Cho
i have configured like this

git remote add myremote1 ...
git config --global push.default upstream
git branch --set-upstream-to=myremote1/master remote1-master

and git pull, git push in remote1-master work i expected

but "git remote status myremote1" display..

Local ref configured for 'git push':
   master pushes to master (local out of date)

section always display master to master. and status incorrectly

it would be correct:

Local ref configured for 'git push':
   remote1-master pushes to master (up to date)

if i add another remote "myremote2" there is also exist "master" branch

"Local ref configured for 'git push':" section have to display

according to local branch and upstream setting

and remote status correctly i think
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 0/1] cygwin: Remove the Win32 l/stat() functions

2013-06-26 Thread Michael Haggerty
On 06/27/2013 12:35 AM, Jeff King wrote:
> On Wed, Jun 26, 2013 at 10:45:48PM +0100, Ramsay Jones wrote:
> 
>>> This patch adds some *extra* cache invalidation that was heretofore
>>> missing.  If stat() is broken it could
>>>
>>> (a) cause a false positive, resulting in some unnecessary cache
>>> invalidation and re-reading of packed-refs, which will hurt performance
>>> but not correctness; or
>>>
>>> (b) cause a false negative, in which case the stale cache might be used
>>> for reading (but not writing), just as was *always* the case before this
>>> patch.
>>>
>>> As far as I understand, the concern for cygwin is (a).  I will leave it
>>> to others to measure and/or decide whether the performance loss is too
>>> grave to endure until the cygwin stat() situation is fixed.
>>
>> Hmm, I'm not sure I understand ... However, I can confirm that the
>> 'mh/ref-races' branch in next is broken on cygwin. (i.e. it is not
>> just a speed issue; it provokes fatal errors).
> 
> I think Michael's assessment above is missing one thing.

Peff is absolutely right; for some unknown reason I was thinking of the
consistency check as having been already fixed.

> However, when we have taken a lock on the file, there is an additional
> safety measure: if we find the file is changed, we abort, as that should
> never happen (it means somebody else modified the file while we had it
> locked). But of course Cygwin's false positive here triggers the safety
> valve, and we die without even realizing that nothing changed.
> 
> In theory we can drop the safety valve; it should never actually happen.
> But I'd like to keep it there for working systems. Perhaps it is worth
> doing something like this:
> 
> [...#ifdef out consistency check on cygwin when lock is held...]

Yes, this would work.

But, taking a step back, I think it is a bad idea to have an unreliable
stat() masquerading as a real stat().  If we want to allow the use of an
unreliable stat for certain purposes, let's have two stat() interfaces:

* the true stat() (in this case I guess cygwin's slow-but-correct
implementation)

* some fast_but_maybe_unreliable_stat(), which would map to stat() on
most platforms but might map to the Windows stat() on cygwin when so
configured.

By default the true stat() would always be used.  It should have to be a
conscious decision, taken only in specific, vetted scenarios, to use the
unreliable stat.

For example, I can't imagine that checking the freshness of the index or
of the packed-refs file is ever going to be a bottleneck, so there is no
reason at all to use an unreliable stat() here.

On the other hand, stat() seems definitely to be a bottleneck when
testing for changes in a 100,000 file working tree, and here occasional
mistakes might be considered acceptable.  So for this purpose the
unreliable stat() might be used.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/16] compat: add endinanness helpers

2013-06-26 Thread Peter Krefting

Vicent Martí:

I'm aware of that, but Git needs to build with glibc 2.7+ (or was it 
2.6?), hence the need for this compat layer.


Right. But perhaps the compatibility layer could provide the 
functionality with the names available in the later glibc versions 
(and on *BSD)? That would make it easier to read the code that is 
using it.


--
\\// Peter - http://www.softwolves.pp.se/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html