Re: [PATCH 2/4] t5310: test delta reuse with bitmaps

2018-09-01 Thread Jeff King
On Sat, Sep 01, 2018 at 10:29:25PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Anyway. Not that exciting, and kind of obviously dumb in retrospect. But
> > I think it was worth analyzing to see what went wrong. If there's an
> > immediate lesson, it is probably: add tests even for changes that aren't
> > really user-visible to make sure the code is exercised.
> 
> Test-wise, isn't the problem rather that that we didn't have something
> like what's described in t/README as "Running tests with special setups"
> for bitmaps? I.e. stuff like GIT_TEST_SPLIT_INDEX=, or running it
> with GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all to stress the fsmonitor
> code.

I'm a little skeptical of that as a general approach, for two reasons:

 - the test suite is filled with toy examples, so they often don't
   trigger the interesting cases

 - the tests are often looking for very particular outcomes or
   repository state; munging that state is going to confuse them

> So we could add some option to the test suite to e.g. run a custom
> command before every "git push" or "git fetch", and then just do a gc
> with a repack/commit graph write/midx write etc. in that codepath, along
> with (in the case of stuff like midx) setting any neede config knobs to
> turn it on.

We can try that out with something like this:

diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
index 42dc4da5a1..f40e0b7a04 100644
--- a/builtin/upload-pack.c
+++ b/builtin/upload-pack.c
@@ -30,6 +30,14 @@ int cmd_upload_pack(int argc, const char **argv, const char 
*prefix)
OPT_END()
};
 
+   /*
+* This environment variable hackery is necessary because repack in a
+* partial clone might actually try to fetch, spawning an infinite
+* recursion.
+*/
+   system("test -z \"$SUPPRESS_BITMAP_MAGIC\" && "
+  "SUPPRESS_BITMAP_MAGIC=1 git repack -adb >/dev/null 2>&1");
+
packet_trace_identity("upload-pack");
read_replace_refs = 0;
 

It actually _does_ find the bug in question (which I wasn't at all sure
it would). But it also turns up several other unrelated test failures.

And it's only triggering in some limited cases (fetches). If we tried to
get better coverage of bitmaps in general, I'm not sure where we would
slip in such a repack command. But I think you'd get even more false
positives.

> Of course the utility of that sort of thing is limited unless we have
> some dedicated smoke testers or CI capacity to run the various
> combinations of those options. But FWIW when I build our own in-house
> git I build the package with:

Yes, it also gets really expensive. That can be helped with more CI
machines, but even neglecting cost, I'm not sure our CI workflow is that
great right now (for example, I still haven't figured out the simple
feature of: if I push up a commit that fails, I'd like to get an email).

> So if there was a "test bitmaps everywhere" mode that would have been
> caught during the build, unless I've misunderstood how this particular
> bug manifests, but then again, it happened on just a plain git.git after
> repack, so wasn't any bitmap + push pretty much all that was needed?, I
> haven't read your patches in any detail.

The test patch describes the minimal scenario you need. Which would be
pretty common on any decent sized repo, but rare on toy ones (though as
I said above, there are a few cases in the test suite big enough to
trigger this).

-Peff


Re: [PATCH v2 2/3] Add test for commit --dry-run --short.

2018-09-01 Thread Stephen & Linda Smith
On Saturday, September 1, 2018 7:18:34 PM MST Eric Sunshine wrote:
> > diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
> > @@ -682,4 +682,12 @@ test_expect_success '--dry-run with conflicts fixed
> > from a merge' ' +test_expect_failure '--dry-run --short' '
> > +   # setup two branches with conflicting information
> > +   # in the same file, resolve the conflict
> 
> What is this comment all about? It doesn't seem to have any relation
> to what the test itself is actually doing. (I see that it was copied
> from an earlier test in this script, but that doesn't help me
> understand what it is trying to say.)

Agreed.   

I saw your other email about not being worth a re-roll, but I've made the 
change locally in case Junio wants me to do so.  

Additionally if there are other comments I can wrap them into a single set.





Re: [BUG] index corruption with git commit -p

2018-09-01 Thread Jeff King
On Sun, Sep 02, 2018 at 12:17:53AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Here are the steps to reproduce it:
> >   $ git clone git://github.com/lucvoo/sparse-dev.git 
> >   $ cd 
> >   $ git co index-corruption
> >   $ git rm -r validation/ Documentation/
> >   $ git commit -m  -p
> >   $ git status
> > error: index uses $?+? extension, which we do not understand
> > fatal: index file corrupt
> >
> > The 'extension' pattern '$?+?', can vary a bit, sometimes
> > it's just '', but always seems 4 chars.
> > If the commit command doesn't use the '-p' flag, there is no
> > problem. The repository itself is not corrupted, it's only
> > the index. It happends with git 2.18.0 and 2.17.0
> 
> Yeah this is a bug, I didn't dig much but testing with this script down
> to 2.8.0:
> [...]
> I found that the first bad commit was: 680ee550d7 ("commit: skip
> discarding the index if there is no pre-commit hook", 2017-08-14)

I think it's much older than that. I set up my test repo like this:

  git clone git://github.com/lucvoo/sparse-dev.git
  cd sparse-dev
  git checkout --detach

and then bisected with this script:

  cd /path/to/sparse-dev
  rm .git/index
  git reset --hard index-corruption &&
  git rm -q -r validation/ Documentation/ &&
  git commit -qm foo -p &&
  git status

Since a33fc72fe9 (read-cache: force_verify_index_checksum, 2017-04-14),
that produces the corrupt extension error. But before that, I
consistently get:

  error: bad index file sha1 signature
  fatal: index file corrupt

from git-commit. And that bisects back to 9c4d6c0297 (cache-tree: Write
updated cache-tree after commit, 2014-07-13).

If I revert that commit (which takes some untangling, see below), then
the problem seems to go away. Here's the patch I tried on top of the
current master, though I think it is actually the first hunk that is
making the difference.

---
diff --git a/builtin/commit.c b/builtin/commit.c
index 0d9828e29e..779c5e2cb5 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -359,13 +359,6 @@ static const char *prepare_index(int argc, const char 
**argv, const char *prefix
 
discard_cache();
read_cache_from(get_lock_file_path(_lock));
-   if (update_main_cache_tree(WRITE_TREE_SILENT) == 0) {
-   if (reopen_lock_file(_lock) < 0)
-   die(_("unable to write index file"));
-   if (write_locked_index(_index, _lock, 0))
-   die(_("unable to update temporary index"));
-   } else
-   warning(_("Failed to update main cache tree"));
 
commit_style = COMMIT_NORMAL;
ret = get_lock_file_path(_lock);
@@ -408,9 +401,6 @@ static const char *prepare_index(int argc, const char 
**argv, const char *prefix
if (!only && !pathspec.nr) {
hold_locked_index(_lock, LOCK_DIE_ON_ERROR);
refresh_cache_or_die(refresh_flags);
-   if (active_cache_changed
-   || !cache_tree_fully_valid(active_cache_tree))
-   update_main_cache_tree(WRITE_TREE_SILENT);
if (write_locked_index(_index, _lock,
   COMMIT_LOCK | SKIP_IF_UNCHANGED))
die(_("unable to write new_index file"));
@@ -457,7 +447,6 @@ static const char *prepare_index(int argc, const char 
**argv, const char *prefix
hold_locked_index(_lock, LOCK_DIE_ON_ERROR);
add_remove_files();
refresh_cache(REFRESH_QUIET);
-   update_main_cache_tree(WRITE_TREE_SILENT);
if (write_locked_index(_index, _lock, 0))
die(_("unable to write new_index file"));
 

-Peff


Re: [PATCH v2 2/3] Add test for commit --dry-run --short.

2018-09-01 Thread Eric Sunshine
On Sat, Sep 1, 2018 at 10:18 PM Eric Sunshine  wrote:
> On Sat, Sep 1, 2018 at 7:53 PM Stephen P. Smith  wrote:
> > The test demonstrates that the setting of the commitable flag is
> > broken.
>
> s/commitable/committable/

Looking at patch 3/3, I see that this misspelling exists in the code
itself, so I guess my recommended spelling correction isn't needed. It
might make sense, though, to quote this word in the commit message to
make it more clear that that is the literal spelling in the code. That
is:

   ...demonstrates that the setting of the 'commitable' flag...

(Not itself worth a re-roll.)


Re: [PATCH v2 2/3] Add test for commit --dry-run --short.

2018-09-01 Thread Eric Sunshine
On Sat, Sep 1, 2018 at 7:53 PM Stephen P. Smith  wrote:
> Add test for commit --dry-run --short.

The first line of a commit message typically mentions the area or
module being touched. It's also customary not to capitalize the first
word and to omit the final period. So, for instance:

t7501: add test of "--dry-run --short" setting 'committable' flag

> Add test for commit with --dry-run --short for a new file of zero
> length.
>
> The test demonstrates that the setting of the commitable flag is
> broken.

s/commitable/committable/

> Signed-off-by: Stephen P. Smith 
> ---
> diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
> @@ -682,4 +682,12 @@ test_expect_success '--dry-run with conflicts fixed from 
> a merge' '
> +test_expect_failure '--dry-run --short' '
> +   # setup two branches with conflicting information
> +   # in the same file, resolve the conflict

What is this comment all about? It doesn't seem to have any relation
to what the test itself is actually doing. (I see that it was copied
from an earlier test in this script, but that doesn't help me
understand what it is trying to say.)

> +   >test-file &&
> +   git add test-file &&
> +   git commit --dry-run --short
> +'


Re: [BUG] A part of an edge from an octopus merge gets colored, even with --color=never

2018-09-01 Thread Noam Postavsky
On 6 August 2018 at 17:26, Jeff King  wrote:

> I suspect it still has a bug, which is that it is handling this
> first-parent-goes-left case, but probably gets the straight-parent case
> wrong. But at least in this form, I think it is obvious to see where
> that bug is (the "three" in the comment is not accurate in that latter
> case, and it should be two).

Yes, thanks, it makes a lot more sense this way. I believe the
attached handles both parent types correctly.
From a841a50b016c0cfc9183384e6c3ca85a23d1e11f Mon Sep 17 00:00:00 2001
From: Noam Postavsky 
Date: Sat, 1 Sep 2018 20:07:16 -0400
Subject: [PATCH v3] log: Fix coloring of certain octupus merge shapes

For octopus merges where the first parent edge immediately merges into
the next column to the left:

| *-.
| |\ \
|/ / /

then the number of columns should be one less than the usual case:

| *-.
| |\ \
| | | *

Also refactor the code to iterate over columns rather than dashes,
building from an initial patch suggestion by Jeff King.

Signed-off-by: Noam Postavsky 
---
 graph.c | 48 
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/graph.c b/graph.c
index e1f6d3bdd..478c86dfb 100644
--- a/graph.c
+++ b/graph.c
@@ -848,21 +848,45 @@ static int graph_draw_octopus_merge(struct git_graph *graph,
 struct strbuf *sb)
 {
 	/*
-	 * Here dashless_commits represents the number of parents
-	 * which don't need to have dashes (because their edges fit
-	 * neatly under the commit).
+	 * Here dashless_commits represents the number of parents which don't
+	 * need to have dashes (because their edges fit neatly under the
+	 * commit).  And dashful_commits are the remaining ones.
 	 */
 	const int dashless_commits = 2;
-	int col_num, i;
-	int num_dashes =
-		((graph->num_parents - dashless_commits) * 2) - 1;
-	for (i = 0; i < num_dashes; i++) {
-		col_num = (i / 2) + dashless_commits + graph->commit_index;
-		strbuf_write_column(sb, >new_columns[col_num], '-');
+	int dashful_commits = graph->num_parents - dashless_commits;
+
+	/*
+	 * Usually, each parent gets its own column, like this:
+	 *
+	 * | *-.
+	 * | |\ \
+	 * | | | *
+	 *
+	 * Sometimes the first parent goes into an existing column, like this:
+	 *
+	 * | *-.
+	 * | |\ \
+	 * |/ / /
+	 *
+	 */
+	int parent_in_existing_cols = graph->num_parents -
+		(graph->num_new_columns - graph->num_columns);
+
+	/*
+	 * Draw the dashes.  We start in the column following the
+	 * dashless_commits, but subtract out the parent which goes to an
+	 * existing column: we've already counted that column in commit_index.
+	 */
+	int first_col = graph->commit_index + dashless_commits
+		- parent_in_existing_cols;
+	int i;
+
+	for (i = 0; i < dashful_commits; i++) {
+		strbuf_write_column(sb, >new_columns[i+first_col], '-');
+		strbuf_write_column(sb, >new_columns[i+first_col],
+i == dashful_commits-1 ? '.' : '-');
 	}
-	col_num = (i / 2) + dashless_commits + graph->commit_index;
-	strbuf_write_column(sb, >new_columns[col_num], '.');
-	return num_dashes + 1;
+	return 2 * dashful_commits;
 }
 
 static void graph_output_commit_line(struct git_graph *graph, struct strbuf *sb)
-- 
2.11.0



[PATCH v2 0/3] wt-status.c: commitable flag

2018-09-01 Thread Stephen P. Smith
A couple of years ago, during a patch review Junio found that the
commitable bit as implemented in wt-status.c was broken.

Stephen P. Smith (3):
  Move has_unmerged earlier in the file.
  Add test for commit --dry-run --short.
  wt-status.c: Set the commitable flag in the collect phase.

 t/t7501-commit.sh | 12 ++--
 wt-status.c   | 39 ---
 2 files changed, 34 insertions(+), 17 deletions(-)

-- 
2.18.0



[PATCH v2 1/3] wt-status.c: Move has_unmerged earlier in the file.

2018-09-01 Thread Stephen P. Smith
Move has_unmerged up in the file so that has_unmerged can be called in
wt_status_collect where we need to place a merge check.

Signed-off-by: Stephen P. Smith 
---
 wt-status.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index 5ffab6101..180faf6ba 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -724,6 +724,19 @@ static void wt_status_collect_untracked(struct wt_status 
*s)
s->untracked_in_ms = (getnanotime() - t_begin) / 100;
 }
 
+static int has_unmerged(struct wt_status *s)
+{
+   int i;
+
+   for (i = 0; i < s->change.nr; i++) {
+   struct wt_status_change_data *d;
+   d = s->change.items[i].util;
+   if (d->stagemask)
+   return 1;
+   }
+   return 0;
+}
+
 void wt_status_collect(struct wt_status *s)
 {
wt_status_collect_changes_worktree(s);
@@ -1063,19 +1076,6 @@ static void wt_longstatus_print_tracking(struct 
wt_status *s)
strbuf_release();
 }
 
-static int has_unmerged(struct wt_status *s)
-{
-   int i;
-
-   for (i = 0; i < s->change.nr; i++) {
-   struct wt_status_change_data *d;
-   d = s->change.items[i].util;
-   if (d->stagemask)
-   return 1;
-   }
-   return 0;
-}
-
 static void show_merge_in_progress(struct wt_status *s,
struct wt_status_state *state,
const char *color)
-- 
2.18.0



[PATCH v2 3/3] wt-status.c: Set the commitable flag in the collect phase.

2018-09-01 Thread Stephen P. Smith
In an update to fix a bug with "commit --dry-run" it was found that
the commitable flag was broken. The update was, at the time, accepted
as it was better than the previous version. [1]

Since the setting of the commitable flag had been done in
wt_longstatus_print_updated, move it to wt_status_collect_updated_cb.

Set the commitable flag in wt_status_collect_changes_initial to keep
from introducing a rebase regression.

Instead of setting the commitable flag in show_merge_in_progress, in
wt_status_cllect check for a merge that has not been committed. If
present then set the commitable flag.

Change the tests to expect success since updates to the wt-status
broken code section is being fixed.

[1] https://public-inbox.org/git/xmqqr3gcj9i5@gitster.mtv.corp.google.com/

Signed-off-by: Stephen P. Smith 
---
 t/t7501-commit.sh |  6 +++---
 wt-status.c   | 13 +++--
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
index 5812dc2f8..4cda088cc 100755
--- a/t/t7501-commit.sh
+++ b/t/t7501-commit.sh
@@ -99,12 +99,12 @@ test_expect_success '--dry-run with stuff to commit returns 
ok' '
git commit -m next -a --dry-run
 '
 
-test_expect_failure '--short with stuff to commit returns ok' '
+test_expect_success '--short with stuff to commit returns ok' '
echo bongo bongo bongo >>file &&
git commit -m next -a --short
 '
 
-test_expect_failure '--porcelain with stuff to commit returns ok' '
+test_expect_success '--porcelain with stuff to commit returns ok' '
echo bongo bongo bongo >>file &&
git commit -m next -a --porcelain
 '
@@ -682,7 +682,7 @@ test_expect_success '--dry-run with conflicts fixed from a 
merge' '
git commit -m "conflicts fixed from merge."
 '
 
-test_expect_failure '--dry-run --short' '
+test_expect_success '--dry-run --short' '
# setup two branches with conflicting information
# in the same file, resolve the conflict
>test-file &&
diff --git a/wt-status.c b/wt-status.c
index 180faf6ba..578328979 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -540,10 +540,12 @@ static void wt_status_collect_updated_cb(struct 
diff_queue_struct *q,
/* Leave {mode,oid}_head zero for an add. */
d->mode_index = p->two->mode;
oidcpy(>oid_index, >two->oid);
+   s->commitable = 1;
break;
case DIFF_STATUS_DELETED:
d->mode_head = p->one->mode;
oidcpy(>oid_head, >one->oid);
+   s->commitable = 1;
/* Leave {mode,oid}_index zero for a delete. */
break;
 
@@ -561,6 +563,7 @@ static void wt_status_collect_updated_cb(struct 
diff_queue_struct *q,
d->mode_index = p->two->mode;
oidcpy(>oid_head, >one->oid);
oidcpy(>oid_index, >two->oid);
+   s->commitable = 1;
break;
case DIFF_STATUS_UNMERGED:
d->stagemask = unmerged_mask(p->two->path);
@@ -665,11 +668,13 @@ static void wt_status_collect_changes_initial(struct 
wt_status *s)
 * code will output the stage values directly and not 
use the
 * values in these fields.
 */
+   s->commitable = 1;
} else {
d->index_status = DIFF_STATUS_ADDED;
/* Leave {mode,oid}_head zero for adds. */
d->mode_index = ce->ce_mode;
oidcpy(>oid_index, >oid);
+   s->commitable = 1;
}
}
 }
@@ -739,6 +744,7 @@ static int has_unmerged(struct wt_status *s)
 
 void wt_status_collect(struct wt_status *s)
 {
+   struct wt_status_state state;
wt_status_collect_changes_worktree(s);
 
if (s->is_initial)
@@ -746,6 +752,11 @@ void wt_status_collect(struct wt_status *s)
else
wt_status_collect_changes_index(s);
wt_status_collect_untracked(s);
+
+   memset(, 0, sizeof(state));
+   wt_status_get_state(, s->branch && !strcmp(s->branch, "HEAD"));
+   if (state.merge_in_progress && !has_unmerged(s))
+   s->commitable = 1;
 }
 
 static void wt_longstatus_print_unmerged(struct wt_status *s)
@@ -786,7 +797,6 @@ static void wt_longstatus_print_updated(struct wt_status *s)
continue;
if (!shown_header) {
wt_longstatus_print_cached_header(s);
-   s->commitable = 1;
shown_header = 1;
}
wt_longstatus_print_change_data(s, WT_STATUS_UPDATED, it);
@@ -1089,7 +1099,6 @@ static void show_merge_in_progress(struct wt_status *s,
 

[PATCH v2 2/3] Add test for commit --dry-run --short.

2018-09-01 Thread Stephen P. Smith
Add test for commit with --dry-run --short for a new file of zero
length.

The test demonstrates that the setting of the commitable flag is
broken.

Signed-off-by: Stephen P. Smith 
---
 t/t7501-commit.sh | 8 
 1 file changed, 8 insertions(+)

diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
index 4cae92804..5812dc2f8 100755
--- a/t/t7501-commit.sh
+++ b/t/t7501-commit.sh
@@ -682,4 +682,12 @@ test_expect_success '--dry-run with conflicts fixed from a 
merge' '
git commit -m "conflicts fixed from merge."
 '
 
+test_expect_failure '--dry-run --short' '
+   # setup two branches with conflicting information
+   # in the same file, resolve the conflict
+   >test-file &&
+   git add test-file &&
+   git commit --dry-run --short
+'
+
 test_done
-- 
2.18.0



Re: [PATCH 2/4] t5310: test delta reuse with bitmaps

2018-09-01 Thread Ben Peart




On 9/1/2018 4:29 PM, Ævar Arnfjörð Bjarmason wrote:



B.t.w. for Ben or anyone else who knows about the fsmonitor part of
this: I've long been running the whole test suite with
`GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all prove ...` (also along with
GIT_TEST_SPLIT_INDEX=) after all the main tests pass as additional
stress testing.

It's not documented under the "special setups" section. So I was going
to add it, but I see that in 5c8cdcfd80 ("fsmonitor: add test cases for
fsmonitor extension", 2017-09-22) it's documented that you should also
set GIT_FORCE_PRELOAD_TEST=true, is that needed for GIT_FSMONITOR_TEST?
Or is it yet another mode, and if so to be combined with fsmonitor in
particular, or stand-alone?



I was unaware of the "special setups" section until recently so if you 
would add the fsmonitor information that would be great.


I added the GIT_FORCE_PRELOAD_TEST as when I went to test fsmonitor I 
realized there was no way to manually override the default behavior and 
force preload to happen. I added tests into t7519-status-fsmonitor.sh 
that use this capability to test fsmonitor with and without preload to 
ensure they play nice together as I had to add logic into the preload 
code to mark cache entries as clean for fsmonitor.


GIT_FORCE_PRELOAD_TEST should probably be added to the "special setups" 
as well so that the entire test suite can be used to exercise that code 
path.


Ultimately, GIT_FSMONITOR_TEST and GIT_FORCE_PRELOAD_TEST can be used to 
test their respective code paths using the test suite.  The challenge is 
that the number of potential combinations (including split index, 
uncommon pack modes, etc) gets large pretty quickly.



There may be a larger lesson about tracking code coverage, but I don't
know that most general code coverage tools would have helped (any
overall percentage number would be too large to move). A tool that
looked at the diff and said "of the N lines you added/touched, this
percent is exercised in the test suite" might have been useful.


This would be very useful.



Re: [BUG] index corruption with git commit -p

2018-09-01 Thread Ævar Arnfjörð Bjarmason


On Sat, Sep 01 2018, Luc Van Oostenryck wrote:

> Hi,
>
> I've just had a scary error:
>   error: index uses $?+? extension, which we do not understand
>   fatal: index file corrupt
>
> Things were quickly recovered by deleting the index but it clearly
> looks to a but to me.
>
> Here are the steps to reproduce it:
>   $ git clone git://github.com/lucvoo/sparse-dev.git 
>   $ cd 
>   $ git co index-corruption
>   $ git rm -r validation/ Documentation/
>   $ git commit -m  -p
>   $ git status
> error: index uses $?+? extension, which we do not understand
> fatal: index file corrupt
>
>
> The 'extension' pattern '$?+?', can vary a bit, sometimes
> it's just '', but always seems 4 chars.
> If the commit command doesn't use the '-p' flag, there is no
> problem. The repository itself is not corrupted, it's only
> the index. It happends with git 2.18.0 and 2.17.0

Yeah this is a bug, I didn't dig much but testing with this script down
to 2.8.0:

#!/bin/sh

cd ~/g/git
make -j $(parallel --number-of-cores) USE_LIBPCRE2=YesPlease CFLAGS="-O0 -g 
-ggdb3" DEVELOPER=1 DEVOPTS=no-error NO_OPENSSL=Y all

(
rm -rf /tmp/x;
~/g/git/git --exec-path=/home/avar/g/git clone 
git://github.com/lucvoo/sparse-dev.git /tmp/x &&
cd /tmp/x &&
~/g/git/git --exec-path=/home/avar/g/git checkout index-corruption &&
~/g/git/git --exec-path=/home/avar/g/git rm -r validation/ 
Documentation/ &&
~/g/git/git --exec-path=/home/avar/g/git commit -p
)

~/g/git/git --exec-path=/home/avar/g/git -C /tmp/x status

if ~/g/git/git --exec-path=/home/avar/g/git -C /tmp/x status
then
exit 0
else
exit 1
fi

I found that the first bad commit was: 680ee550d7 ("commit: skip
discarding the index if there is no pre-commit hook", 2017-08-14)

Now, note the two invocations of "status" in my script. Before we'd
already been complaining about a bad index, but after that commit is the
first time we started getting a persistent error, and indeed even
reverting it now on top of master makes the error non-persistent.

So not a fix, but a strong signal to see where we should start
looking. I.e. the index file handling around discard_cache() &
"interactive" in commit.c is likely what's broken.


[BUG] index corruption with git commit -p

2018-09-01 Thread Luc Van Oostenryck
Hi,

I've just had a scary error:
error: index uses $?+? extension, which we do not understand
fatal: index file corrupt

Things were quickly recovered by deleting the index but it clearly
looks to a but to me.

Here are the steps to reproduce it:
  $ git clone git://github.com/lucvoo/sparse-dev.git 
  $ cd 
  $ git co index-corruption
  $ git rm -r validation/ Documentation/
  $ git commit -m  -p
  $ git status
error: index uses $?+? extension, which we do not understand
fatal: index file corrupt


The 'extension' pattern '$?+?', can vary a bit, sometimes
it's just '', but always seems 4 chars.
If the commit command doesn't use the '-p' flag, there is no
problem. The repository itself is not corrupted, it's only
the index. It happends with git 2.18.0 and 2.17.0


-- Luc Van Oostenryck


Re: [PATCH] Makefile: enable DEVELOPER by default

2018-09-01 Thread Kaartic Sivaraam
On Mon, 2018-08-06 at 13:02 -0400, Randall S. Becker wrote:
> 
> Any idea when this is going to be in an official release, and exactly
> what the settings will be for "Not Developer". I assume DEVELOPER=0
> and DEVOPTS=error, which is the current behaviour, correct? I am the
> platform maintainer for HPE NonStop and need to make sure I'm not
> packaging DEV builds to anyone, since I'm the only one doing this for
> the platform. It's another hoop, but hopefully not a bad one. The
> question is the best place to set this, assuming we are using Jenkins
> for our builds, and I'd rather keep the existing config.mak.uname the
> same, since at least it seems stable.

Just a FYI and in case you aren't aware, you could create a
"config.mak" to store your custom configurations. You can be sure it's
used due to the following Makefile part:

...

include config.mak.uname
-include config.mak.autogen
-include config.mak
...

It's just not a hard dependency.

Hope that helps,
Sivaraam



Re: [PATCH 2/4] t5310: test delta reuse with bitmaps

2018-09-01 Thread Ævar Arnfjörð Bjarmason


On Sat, Sep 01 2018, Jeff King wrote:

> On Sat, Sep 01, 2018 at 03:48:13AM -0400, Jeff King wrote:
>
>> Commit 6a1e32d532 (pack-objects: reuse on-disk deltas for
>> thin "have" objects, 2018-08-21) taught pack-objects a new
>> optimization trick. Since this wasn't meant to change
>> user-visible behavior, but only produce smaller packs more
>> quickly, testing focused on t/perf/p5311.
>>
>> However, since people don't run perf tests very often, we
>> should make sure that the feature is exercised in the
>> regular test suite. This patch does so.
>
> This, by the way, is the crux of how such an obvious and severe bug made
> it to 'next'.
>
> The original series was tested quite extensively via t/perf and in
> production at GitHub. When I re-rolled v2, the only change was the
> addition of the assertion, so I didn't bother re-doing the perf tests,
> since they're slow and there wouldn't be a measurable impact.
>
> I did run the normal test suite (as I'm sure Junio did, too) as a
> double-check for correctness, but as we noticed, the code wasn't
> actually exercised there.
>
> Nor had I yet backported the revised series to the version we run at
> GitHub, so it hadn't been run there, either.
>
> And all of that coupled with the fact that it only triggers with
> bitmaps, so day-to-day use of the buggy Git (like Junio trying to push
> out the result ;) ) wouldn't show it.
>
> Anyway. Not that exciting, and kind of obviously dumb in retrospect. But
> I think it was worth analyzing to see what went wrong. If there's an
> immediate lesson, it is probably: add tests even for changes that aren't
> really user-visible to make sure the code is exercised.

Test-wise, isn't the problem rather that that we didn't have something
like what's described in t/README as "Running tests with special setups"
for bitmaps? I.e. stuff like GIT_TEST_SPLIT_INDEX=, or running it
with GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all to stress the fsmonitor
code.

That comment b.t.w. is not meant as a "you should have done that!"
blame, but just musings on how we could make things better.

Git has things like bitmaps, midx, commit graph, and probably a few
other things I'm forgetting which all have their own tests, but really
fall more in the category of something like the split index in that they
can potentially impact every test in some unexpected way.

So we could add some option to the test suite to e.g. run a custom
command before every "git push" or "git fetch", and then just do a gc
with a repack/commit graph write/midx write etc. in that codepath, along
with (in the case of stuff like midx) setting any neede config knobs to
turn it on.

Of course the utility of that sort of thing is limited unless we have
some dedicated smoke testers or CI capacity to run the various
combinations of those options. But FWIW when I build our own in-house
git I build the package with:

# Set "false" to test the build procedure itself
if true
then
export BKNG_GIT_HARNESS_OPTIONS="%{?_smp_mflags} 
--state=failed,slow,save --timer"
echo Testing without any custom options:
(cd t && /usr/bin/prove $BKNG_GIT_HARNESS_OPTIONS t[0-9]*.sh)

echo Testing while roundtripping everything through the fsmonitor 
codepath:
(cd t && GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all 
GIT_SKIP_TESTS="t3404.7 t7411.3 t7411.4" /usr/bin/prove 
$BKNG_GIT_HARNESS_OPTIONS t[0-9]*.sh)

echo Testing split index
(cd t && GIT_TEST_SPLIT_INDEX=true GIT_SKIP_TESTS="t3903 t4015.77" 
/usr/bin/prove $BKNG_GIT_HARNESS_OPTIONS t[0-9]*.sh)

echo Testing uncommon pack modes. See ci/run-tests.sh in git
(cd t && GIT_TEST_FULL_IN_PACK_ARRAY=true GIT_TEST_OE_SIZE=10 
/usr/bin/prove $BKNG_GIT_HARNESS_OPTIONS t[0-9]*.sh)
fi

Those skipped tests are various intermittent bugs related to those
codpaths which I haven't had time to track down / report yet.

So if there was a "test bitmaps everywhere" mode that would have been
caught during the build, unless I've misunderstood how this particular
bug manifests, but then again, it happened on just a plain git.git after
repack, so wasn't any bitmap + push pretty much all that was needed?, I
haven't read your patches in any detail.

B.t.w. for Ben or anyone else who knows about the fsmonitor part of
this: I've long been running the whole test suite with
`GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all prove ...` (also along with
GIT_TEST_SPLIT_INDEX=) after all the main tests pass as additional
stress testing.

It's not documented under the "special setups" section. So I was going
to add it, but I see that in 5c8cdcfd80 ("fsmonitor: add test cases for
fsmonitor extension", 2017-09-22) it's documented that you should also
set GIT_FORCE_PRELOAD_TEST=true, is that needed for GIT_FSMONITOR_TEST?
Or is it yet another mode, and if so to be combined with fsmonitor in
particular, or stand-alone?

> There may be a larger lesson about tracking code coverage, but I don't
> know that most 

Re: [PATCH v3 05/11] t0027: make hash size independent

2018-09-01 Thread brian m. carlson
On Fri, Aug 31, 2018 at 02:40:12PM -0400, Eric Sunshine wrote:
> On Fri, Aug 31, 2018 at 2:21 PM Torsten Bögershausen  wrote:
> > Out of interest: why do we use a "tmp" file here?
> > Would it make more sense  to chain the 'tr' with 'sed' and skip the
> > tmp file ?
> >
> > tr '\015\000abcdef0123456789' QN0 <"$2" |
> > sed -e "s/*/$ZERO_OID/"  >"$exp" &&
> >
> > Yes, we will loose the exit status of 'tr', I think.
> > How important is the exit status ?
> 
> As far as I understand, it is only Git commands for which we worry
> about losing the exit status upstream in pipes. System utilities, on
> the other hand, are presumed to be bug-free, thus we don't mind having
> them upstream.

If that's the case, that's fine, and I can make that change.  I know
that we do often want to preserve the exit status of a system command,
but presumably the tr and sed here would exit 0, so I'm happy to assume
that for the test.

> A different question is why does this need to run both 'tr' and 'sed'
> when 'sed itself could do the entire job since 'sed' has 'tr'
> functionality built in (see sed's "y" command)?

It doesn't.  I went for a minimal change, but I could switch to using
both s/// and y/// in sed instead.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Possible bug: identical lines added/removed in git diff

2018-09-01 Thread Stefan Beller
On Fri, Aug 31, 2018 at 2:39 PM Johannes Schindelin
 wrote:
>
> Hi,
>
> On Thu, 30 Aug 2018, Stefan Beller wrote:
>
> > On Thu, Aug 30, 2018 at 12:20 PM Jeff King  wrote:
> > >
> > > [...] Myers does not promise to find the absolute minimal diff. [...]
> >
> > The `Myers` (our default) diff algorithm is really the Myers algorithm +
> > a heuristic that cuts off the long tail when it is very costly to compute
> > the minimal diff.
>
> IIRC Myers

(the original, which we spell `minimal`)

> promises minimal diffs only for -U0. As soon as you add
> context, Myers might in fact end up with a suboptimal diff, even without
> that heuristic.

ah, the last 4 words made it clear.

I have debated the cost function for diffs some time ago and
came to the conclusion that having one line added/removed costing
a flat price of 1 in the search of the 'lowest cost diff' is pretty good
as it does an ok job in the broad world, it is not 'overfitting' assumptions
on a problem.

For example in some patches I pay more attention to the lines
removed than to the lines added, or vice versa, so assigning
different costs to added/removed lines would make sense.
(It depend on the type of patch, but that cannot be deduced
at the low level of diff machinery)

Starting a new hunk (i.e. add cost to -U for n >0) could be
costly. In fact we have an after-Myers optimization in the xdiff
code that checks if hunks can be coalesced together, but
if we could have that cost at diff computation time, this might
make for better diffs already.

Another example is number of changes between
added/removed/context parts. If I have to review only
one large added part and one large removed part, it may
be more understandable than having interleaved adds/removes.
(I think --patience goes in that direction, but does optimize for
a different metric)

Stefan


Re: non-smooth progress indication for git fsck and git gc

2018-09-01 Thread Ævar Arnfjörð Bjarmason


On Sat, Sep 01 2018, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Aug 16 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> On Thu, Aug 16 2018, Jeff King wrote:
>>
>>> On Thu, Aug 16, 2018 at 08:54:25AM +0200, Ulrich Windl wrote:
>>>
 I'd like to point out some minor issue observed while processing some
 5-object repository with many binary objects, but most are rather
 small:

 Between the two phases of "git fsck" (checking directories and
 checking objects) there was a break of several seconds where no
 progress was indicated.

 During "git gc" the writing objects phase did not update for some
 seconds, but then the percentage counter jumped like from 15% to 42%.

 I understand that updating the progress output too often can be a
 performance bottleneck, while upating it too rarely might only bore
 the user... ;-)
>>>
>>> We update the counter integer for every object we process, and then
>>> actually update the display whenever the percentage increases or a
>>> second has elapsed, whichever comes first.
>>>
>>> What you're seeing is likely an artifact of _what_ we're counting:
>>> written objects. Not all objects are the same size, so it's not uncommon
>>> to see thousands/sec when dealing with small ones, and then several
>>> seconds for one giant blob.
>>>
>>> The only way to solve that is to count bytes. We don't have a total byte
>>> count in most cases, and it wouldn't always make sense (e.g., the
>>> "Compressing objects" meter can show the same issue, but it's not really
>>> putting through bytes in a linear way).  In some cases we do show
>>> transmitted size and throughput, but that's just for network operations.
>>> We could do the same for "gc" with the patch below. But usually
>>> throughput isn't all that interesting for a filesystem write, because
>>> bandwidth isn't the bottleneck.
>>>
>>> Possibly we could have a "half throughput" mode that counts up the total
>>> size written, but omits the speed indicator. That's not an unreasonable
>>> thing to show for a local pack, since you end up with the final pack
>>> size. The object counter would still be jumpy, but you'd at least have
>>> one number updated at least once per second as you put through a large
>>> blob.
>>>
>>> If you really want a smooth percentage, then we'd have to start counting
>>> bytes instead of objects. Two reasons we don't do that are:
>>>
>>>   - we often don't know the total byte size exactly. E.g., for a
>>> packfile write, it depends on the result of deflating each object.
>>> You can make an approximation and just pretend at the end that you
>>> hit 100%.  Or you can count the pre-deflate sizes, but then your
>>> meter doesn't match the bytes from the throughput counter.
>>>
>>>   - for something like fsck, we're not actually writing out bytes.  So I
>>> guess you'd be measuring "here's how many bytes of objects I
>>> fsck-ed". But is that on-disk compressed bytes, or decompressed
>>> bytes?
>>>
>>> If the former, that's only marginally better as a measure of effort,
>>> since delta compression means that a small number of on-disk bytes
>>> may require a big effort (imagine processing a 100 byte blob versus
>>> a 100 byte delta off of a 100MB blob).
>>>
>>> The latter is probably more accurate. But it's also not free to
>>> pre-generate the total. We can get the number of objects or the size
>>> of the packfile in constant-time, but totaling up the uncompressed
>>> size of all objects is O(n). So that's extra computation, but it
>>> also means a potential lag before we can start the progress meter.
>>>
>>> I'm also not sure how meaningful a byte count is for a user there.
>>>
>>> So there. That's probably more than you wanted to know about Git's
>>> progress code. I think it probably _could_ be improved by counting
>>> more/different things, but I also think it can be a bit of a rabbit
>>> hole. Which is why AFAIK nobody's really looked too seriously into
>>> changing it.
>>>
>>> -Peff
>>
>> This is all interesting, but I think unrelated to what Ulrich is talking
>> about. Quote:
>>
>> Between the two phases of "git fsck" (checking directories and
>> checking objects) there was a break of several seconds where no
>> progress was indicated
>>
>> I.e. it's not about the pause you get with your testcase (which is
>> certainly another issue) but the break between the two progress bars.
>>
>> Here's a test case you can clone:
>> https://github.com/avar/2015-04-03-1M-git (or might already have
>> "locally" :)
>>
>> If you fsck this repository it'll take around (on my spinning rust
>> server) 30 seconds between 100% of "Checking object directories" before
>> you get any output from "Checking objects".
>>
>> The breakdown of that is (this is from approximate eyeballing):
>>
>>  * We spend 1-3 seconds just on this:
>>
>> 

Re: non-smooth progress indication for git fsck and git gc

2018-09-01 Thread Ævar Arnfjörð Bjarmason


On Thu, Aug 16 2018, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Aug 16 2018, Jeff King wrote:
>
>> On Thu, Aug 16, 2018 at 08:54:25AM +0200, Ulrich Windl wrote:
>>
>>> I'd like to point out some minor issue observed while processing some
>>> 5-object repository with many binary objects, but most are rather
>>> small:
>>>
>>> Between the two phases of "git fsck" (checking directories and
>>> checking objects) there was a break of several seconds where no
>>> progress was indicated.
>>>
>>> During "git gc" the writing objects phase did not update for some
>>> seconds, but then the percentage counter jumped like from 15% to 42%.
>>>
>>> I understand that updating the progress output too often can be a
>>> performance bottleneck, while upating it too rarely might only bore
>>> the user... ;-)
>>
>> We update the counter integer for every object we process, and then
>> actually update the display whenever the percentage increases or a
>> second has elapsed, whichever comes first.
>>
>> What you're seeing is likely an artifact of _what_ we're counting:
>> written objects. Not all objects are the same size, so it's not uncommon
>> to see thousands/sec when dealing with small ones, and then several
>> seconds for one giant blob.
>>
>> The only way to solve that is to count bytes. We don't have a total byte
>> count in most cases, and it wouldn't always make sense (e.g., the
>> "Compressing objects" meter can show the same issue, but it's not really
>> putting through bytes in a linear way).  In some cases we do show
>> transmitted size and throughput, but that's just for network operations.
>> We could do the same for "gc" with the patch below. But usually
>> throughput isn't all that interesting for a filesystem write, because
>> bandwidth isn't the bottleneck.
>>
>> Possibly we could have a "half throughput" mode that counts up the total
>> size written, but omits the speed indicator. That's not an unreasonable
>> thing to show for a local pack, since you end up with the final pack
>> size. The object counter would still be jumpy, but you'd at least have
>> one number updated at least once per second as you put through a large
>> blob.
>>
>> If you really want a smooth percentage, then we'd have to start counting
>> bytes instead of objects. Two reasons we don't do that are:
>>
>>   - we often don't know the total byte size exactly. E.g., for a
>> packfile write, it depends on the result of deflating each object.
>> You can make an approximation and just pretend at the end that you
>> hit 100%.  Or you can count the pre-deflate sizes, but then your
>> meter doesn't match the bytes from the throughput counter.
>>
>>   - for something like fsck, we're not actually writing out bytes.  So I
>> guess you'd be measuring "here's how many bytes of objects I
>> fsck-ed". But is that on-disk compressed bytes, or decompressed
>> bytes?
>>
>> If the former, that's only marginally better as a measure of effort,
>> since delta compression means that a small number of on-disk bytes
>> may require a big effort (imagine processing a 100 byte blob versus
>> a 100 byte delta off of a 100MB blob).
>>
>> The latter is probably more accurate. But it's also not free to
>> pre-generate the total. We can get the number of objects or the size
>> of the packfile in constant-time, but totaling up the uncompressed
>> size of all objects is O(n). So that's extra computation, but it
>> also means a potential lag before we can start the progress meter.
>>
>> I'm also not sure how meaningful a byte count is for a user there.
>>
>> So there. That's probably more than you wanted to know about Git's
>> progress code. I think it probably _could_ be improved by counting
>> more/different things, but I also think it can be a bit of a rabbit
>> hole. Which is why AFAIK nobody's really looked too seriously into
>> changing it.
>>
>> -Peff
>
> This is all interesting, but I think unrelated to what Ulrich is talking
> about. Quote:
>
> Between the two phases of "git fsck" (checking directories and
> checking objects) there was a break of several seconds where no
> progress was indicated
>
> I.e. it's not about the pause you get with your testcase (which is
> certainly another issue) but the break between the two progress bars.
>
> Here's a test case you can clone:
> https://github.com/avar/2015-04-03-1M-git (or might already have
> "locally" :)
>
> If you fsck this repository it'll take around (on my spinning rust
> server) 30 seconds between 100% of "Checking object directories" before
> you get any output from "Checking objects".
>
> The breakdown of that is (this is from approximate eyeballing):
>
>  * We spend 1-3 seconds just on this:
>
> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L181
>
>  * We spend the majority of the ~30s on this:
>
> 

Re: [PATCH 2/3] Add test for commit --dry-run --short.

2018-09-01 Thread SZEDER Gábor
On Thu, Aug 30, 2018 at 10:39:20PM -0700, Stephen P. Smith wrote:
> diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
> index 810d4cea7..fc69da816 100755
> --- a/t/t7501-commit.sh
> +++ b/t/t7501-commit.sh
> @@ -682,4 +682,14 @@ test_expect_success '--dry-run with conflicts fixed from 
> a merge' '
>   git commit -m "conflicts fixed from merge."
>  '
>  
> +test_expect_success '--dry-run --short with conflicts fixed from a merge' '
> + # setup two branches with conflicting information
> + # in the same file, resolve the conflict,
> + # call commit with --dry-run --short

I think the last line of the comment is unnecessary: it doesn't say
anything that is not obvious from the test's last line.

> + rm -f test-file &&
> + touch testfile &&

That filename should be 'test-file', i.e. with a dash, shouldn't it?

Anyway, if you want to truncate the file, then please use '>test-file'
instead of 'rm' and 'touch'.

> + git add test-file &&
> + git commit --dry-run --short
> +'
> +
>  test_done
> -- 
> 2.18.0
> 


Re: [PATCH v2 1/2] rerere: mention caveat about unmatched conflict markers

2018-09-01 Thread Thomas Gummerer
On 08/29, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > Yeah that makes sense.  Maybe something like this?
> >
> > (replying to  here to keep
> > the patches in one thread)
> >
> >  Documentation/technical/rerere.txt | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/Documentation/technical/rerere.txt 
> > b/Documentation/technical/rerere.txt
> > index e65ba9b0c6..8fefe51b00 100644
> > --- a/Documentation/technical/rerere.txt
> > +++ b/Documentation/technical/rerere.txt
> > @@ -149,7 +149,10 @@ version, and the sorting the conflict hunks, both for 
> > the outer and the
> >  inner conflict.  This is done recursively, so any number of nested
> >  conflicts can be handled.
> >  
> > +Note that this only works for conflict markers that "cleanly nest".  If
> > +there are any unmatched conflict markers, rerere will fail to handle
> > +the conflict and record a conflict resolution.
> > +
> >  The only difference is in how the conflict ID is calculated.  For the
> >  inner conflict, the conflict markers themselves are not stripped out
> >  before calculating the sha1.
> 
> Looks good to me except for the line count on the @@ line.  The
> preimage ought to have 6 (not 7) lines and adding 4 new lines makes
> it a 10 line postimage.  I wonder who miscounted the hunk---it is
> immediately followed by the signature cut mark "-- \n" and some
> tools (including Emacs's patch editing mode) are known to
> misinterpret it as a preimage line that was removed.

Sorry about that.  Yeah Emacs's patch editing mode doing that would
explain it.  I did a round of proof-reading in my editor, and spotted
a typo.  Since it was trivial to fix I just edited the patch
directly, and Emacs changed the line count.  Sorry about that, I'll be
more careful about this in the future.

> What is curious is that your 2/2 counts the preimage lines
> correctly.

I only added some text after the '---' line in 2/2, but did not edit
the patch directly.  Emacs's patch editing mode only seems to change
the line numbers of the patch that's being edited, not if anything
surrounding that is changed, so the line count stayed the same as what
format-patch put in the file in the first place.

> In any case, both patches look good.  Will apply.

Thanks!

> Thanks.


Re: Git in Outreachy Dec-Mar?

2018-09-01 Thread Jeff King
On Fri, Aug 31, 2018 at 10:16:49AM +0200, Christian Couder wrote:

> >   2. To get our landing page and list of projects in order (and also
> >  micro-projects for applicants). This can probably build on the
> >  previous round at:
> >
> >https://git.github.io/Outreachy-15/
> >
> >  and on the project/microprojects lists for GSoC (which will need
> >  some updating and culling).
> 
> Ok to take a look at that.

Thanks. I think sooner is better for this (for you or anybody else who's
interested in mentoring). The application period opens on September
10th, but I think the (still growing) list of projects is already being
looked at by potential candidates.

> >   3. To figure out funding (unlike GSoC, the intern stipend comes from
> >  the projects). I can look into getting outside funds (which is what
> >  we did last year). Worst case, we do have enough project money to
> >  cover an intern. Last year[1] opinions were that this was a
> >  reasonable use of project money, but of course new opinions are
> >  welcome.
> 
> I can also look at getting outside funds.
> 
> My opinion though is that it is probably better if the Git project can
> use its own fund for this, as it makes it easier for possible mentors
> if they don't need to look at getting outside funds.

I disagree. An internship costs more than we generally take in over the
course of a year. So we would eventually run out of money doing this.

I also think it doesn't need to be the mentor's responsibility to find
the funding. That can be up to an "org admin", and I don't think it
should be too big a deal (I had no trouble getting funding from GitHub
last year, and I don't expect any this year; I just didn't want to start
that process until I knew we were serious about participating).

So if you (or anybody else) wants to mentor, please focus on the project
list and application materials.

-Peff


Re: Git in Outreachy Dec-Mar?

2018-09-01 Thread Jeff King
On Fri, Aug 31, 2018 at 01:30:05PM +0300, Оля Тележная wrote:

> I was Outreachy intern last winter. I guess I need to speak up: I will
> be happy if my feedback helps you.
> At first, I want to repeat all thanks to Outreachy organizers and Git
> mentors. That was unique experience and I am so proud of being a part
> of this project. But, I need to say that my internship wasn't ideal.
> Mentors, please do not feel guilty: I just want to improve the quality
> of future internships and give some advises.

As one of those mentors, let me first say: thank you for this feedback.
It's very valuable to get your honest perspective.

I more or less agree with everything you said. A few specific comments:

> 1. The main problem of Outreachy internship is positioning. I mean, I
> had strong confidence that it's an internship for newbies in
> programming. All my friends had the same confidence, and that's the
> reason why 2 my friends failed in the middle of the Outreachy
> internship. Load was so big for them, noone explained this fact in the
> beginning, noone helped with this situation during the internship. I
> was thinking I could be overqualified and I took someone's place (I
> had 2 other SWE internships before Outreachy). The truth is that my
> skills were barely enough.

Some of this may be due to Outreachy (I'm not very familiar with the
materials they use to get applicants). But we propose the individual
projects, and I don't think there's anything stopping us from something
that might be smaller in scope (i.e., to focus more on "soft" skills
like participating in the project, and less time on system design or
tricky coding).

I think ideally we'd have various project options with a range of
difficulties, and part of the application period could involve steering
candidates to the right project.

> 2. Please tell more about minimal requirements: write it down on a
> landing page in the beginning and maybe repeat them in every task. I
> guess it would be the same this year: good knowledge of C, gdb, Git
> (as a user: intern needs to know how to work with forks, git remote,
> git rebase -i, etc), Shell, base understanding of Linux terminal,
> being ready to work remotely. It's good idea to mention that it's not
> 100% requirement, but anyway at least 60% from the list must be
> familiar.

Yes, I agree that we don't really communicate the expected skills very
well. That's something we should be able to fix pretty immediately for
the next round.

> 3. If you decide to be a mentor - at first, thanks a lot. Please be
> ready to spend A LOT OF time on it. You need to explain not only the
> task to your intern, but also how to split the task into subtasks, how
> to look for solutions, how to work with the terminal, how to debug
> better and many other questions. It's not only about solving
> internship task. It's about learning something new. And I did not
> mention code reviews: there would be many stupid errors and it's a
> talent not to be angry about that.

I'd agree with this. I think one of the biggest mistakes I made for your
internship was not being focused and spending enough time. Johannes
mentioned that he actually does online pair-programming with his GSoC
students, and I think that would have helped a lot in our case.

Ironically, I was actually worried about being _too_ involved (which is
obviously dumb in retrospect). Since there were some interesting design
problems, I didn't want to just dictate "here's what your design should
look like, go code it and get back to me". I wanted to give you the
space to explore the problem, maybe even make some mistakes, and be
there to "unstick" you when you got stuck. But with basically weekly
check-ins, 3 months goes by _really_ fast.

I think we probably needed to be talking things through and working in
real-time at least an hour a day.

> 4. I fully sure that you need to talk with your intern by the voice. I
> mean regular calls, at least once a week. It's good idea to share the
> desktop and show how you are working, what are you using, etc.
> Ask your intern to share the desktop: you need to feel confident that
> they understand how to work with the task. Help them with the
> shortcuts.

Yeah. I think it would have helped a lot to have a real-time session
where we're actually working on the problem collaboratively, and not
just discussing problems you might have run into. That gives the
opportunity to reveal workflow issues: the intern can see how the mentor
does things (and ask "how/why did you do that neat thing?") and the
mentor can see how the intern does things ("I see you're doing it this
way; did you know you can also do it this way, which is easier?").

> 5. In the ideal world, I want to force all mentors to get special
> courses (it will solve problems 2-3-4). Great developer is not equal
> to great mentor. And, if you work with really newbie, it becomes so
> necessary.

I definitely agree with the "not equal" thing. It might even be
inversely 

Re: [PATCH 2/4] t5310: test delta reuse with bitmaps

2018-09-01 Thread Jeff King
On Sat, Sep 01, 2018 at 03:48:13AM -0400, Jeff King wrote:

> Commit 6a1e32d532 (pack-objects: reuse on-disk deltas for
> thin "have" objects, 2018-08-21) taught pack-objects a new
> optimization trick. Since this wasn't meant to change
> user-visible behavior, but only produce smaller packs more
> quickly, testing focused on t/perf/p5311.
> 
> However, since people don't run perf tests very often, we
> should make sure that the feature is exercised in the
> regular test suite. This patch does so.

This, by the way, is the crux of how such an obvious and severe bug made
it to 'next'.

The original series was tested quite extensively via t/perf and in
production at GitHub. When I re-rolled v2, the only change was the
addition of the assertion, so I didn't bother re-doing the perf tests,
since they're slow and there wouldn't be a measurable impact.

I did run the normal test suite (as I'm sure Junio did, too) as a
double-check for correctness, but as we noticed, the code wasn't
actually exercised there.

Nor had I yet backported the revised series to the version we run at
GitHub, so it hadn't been run there, either.

And all of that coupled with the fact that it only triggers with
bitmaps, so day-to-day use of the buggy Git (like Junio trying to push
out the result ;) ) wouldn't show it.

Anyway. Not that exciting, and kind of obviously dumb in retrospect. But
I think it was worth analyzing to see what went wrong. If there's an
immediate lesson, it is probably: add tests even for changes that aren't
really user-visible to make sure the code is exercised.

There may be a larger lesson about tracking code coverage, but I don't
know that most general code coverage tools would have helped (any
overall percentage number would be too large to move). A tool that
looked at the diff and said "of the N lines you added/touched, this
percent is exercised in the test suite" might have been useful.

-Peff


[PATCH 4/4] pack-bitmap: drop "loaded" flag

2018-09-01 Thread Jeff King
In the early days of the bitmap code, there was a single
static bitmap_index struct that was used behind the scenes,
and any bitmap-related functions could lazily check
bitmap_git.loaded to see if they needed to read the on-disk
data.

But since 3ae5fa0768 (pack-bitmap: remove bitmap_git global
variable, 2018-06-07), the caller is responsible for the
lifetime of the bitmap_index struct, and we return it from
prepare_bitmap_git() and prepare_bitmap_walk(), both of
which load the on-disk data (or return NULL).

So outside of these functions, it's not possible to have a
bitmap_index for which the loaded flag is not true. Nor is
it possible to accidentally pass an already-loaded
bitmap_index to the loading function (which is static-local
to the file).

We can drop this unnecessary and confusing flag.

Signed-off-by: Jeff King 
---
 pack-bitmap.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8c1af1cca2..40debd5e20 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -91,8 +91,6 @@ struct bitmap_index {
 
/* Version of the bitmap index */
unsigned int version;
-
-   unsigned loaded : 1;
 };
 
 static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
@@ -306,7 +304,7 @@ static int open_pack_bitmap_1(struct bitmap_index 
*bitmap_git, struct packed_git
 
 static int load_pack_bitmap(struct bitmap_index *bitmap_git)
 {
-   assert(bitmap_git->map && !bitmap_git->loaded);
+   assert(bitmap_git->map);
 
bitmap_git->bitmaps = kh_init_sha1();
bitmap_git->ext_index.positions = kh_init_sha1_pos();
@@ -321,7 +319,6 @@ static int load_pack_bitmap(struct bitmap_index *bitmap_git)
if (load_bitmap_entries_v1(bitmap_git) < 0)
goto failed;
 
-   bitmap_git->loaded = 1;
return 0;
 
 failed:
@@ -336,7 +333,7 @@ static int open_pack_bitmap(struct bitmap_index *bitmap_git)
struct packed_git *p;
int ret = -1;
 
-   assert(!bitmap_git->map && !bitmap_git->loaded);
+   assert(!bitmap_git->map);
 
for (p = get_packed_git(the_repository); p; p = p->next) {
if (open_pack_bitmap_1(bitmap_git, p) == 0)
@@ -738,7 +735,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info 
*revs)
 * from disk. this is the point of no return; after this the rev_list
 * becomes invalidated and we must perform the revwalk through bitmaps
 */
-   if (!bitmap_git->loaded && load_pack_bitmap(bitmap_git) < 0)
+   if (load_pack_bitmap(bitmap_git) < 0)
goto cleanup;
 
object_array_clear(>pending);
-- 
2.19.0.rc1.549.g6ce4dc0f08


[PATCH 3/4] traverse_bitmap_commit_list(): don't free result

2018-09-01 Thread Jeff King
Since it was introduced in fff42755ef (pack-bitmap: add
support for bitmap indexes, 2013-12-21), this function has
freed the result after traversing it. That is an artifact of
the early days of the bitmap code, when we had a single
static "struct bitmap_index". Back then, it was intended
that you would do:

  prepare_bitmap_walk();
  traverse_bitmap_commit_list();

Since the actual bitmap_index struct was totally behind the
scenes, it was convenient for traverse_bitmap_commit_list()
to clean it up, clearing the way for another traversal.

But since 3ae5fa0768 (pack-bitmap: remove bitmap_git global
variable, 2018-06-07), the caller explicitly manages the
bitmap_index struct itself, like this:

  b = prepare_bitmap_walk();
  traverse_bitmap_commit_list(b, );
  free_bitmap_index(b);

It no longer makes sense to auto-free the result after the
traversal. If you want to do another traversal, you'd just
create a new bitmap_index. And while nobody tries to call
traverse_bitmap_commit_list() twice, the fact that it throws
away the result might be surprising, and is better avoided.

Note that in the "old" way it was possible for two walks to
amortize the cost of opening the on-disk .bitmap file (since
it was stored in the global bitmap_index), but we lost that
in 3ae5fa0768. However, no code actually does this, so it's
not worth addressing now. The solution might involve a new:

  reset_bitmap_walk(b, );

call. Or we might even attach the bitmap data to its
matching packed_git struct, so that multiple
prepare_bitmap_walk() calls could use it. That can wait
until somebody actually has need of the optimization (and
until then, we'll do the correct, unsurprising thing).

Signed-off-by: Jeff King 
---
 pack-bitmap.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 76fd93a3de..8c1af1cca2 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -848,9 +848,6 @@ void traverse_bitmap_commit_list(struct bitmap_index 
*bitmap_git,
OBJ_TAG, show_reachable);
 
show_extended_objects(bitmap_git, show_reachable);
-
-   bitmap_free(bitmap_git->result);
-   bitmap_git->result = NULL;
 }
 
 static uint32_t count_object_type(struct bitmap_index *bitmap_git,
-- 
2.19.0.rc1.549.g6ce4dc0f08



[PATCH 2/4] t5310: test delta reuse with bitmaps

2018-09-01 Thread Jeff King
Commit 6a1e32d532 (pack-objects: reuse on-disk deltas for
thin "have" objects, 2018-08-21) taught pack-objects a new
optimization trick. Since this wasn't meant to change
user-visible behavior, but only produce smaller packs more
quickly, testing focused on t/perf/p5311.

However, since people don't run perf tests very often, we
should make sure that the feature is exercised in the
regular test suite. This patch does so.

Signed-off-by: Jeff King 
---
Side note: If we do squash patch 1 into the eariler series, that will
invalidate the commit id mentioned in the commit message here.

 t/t5310-pack-bitmaps.sh | 93 +
 1 file changed, 93 insertions(+)

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 557bd0d0c0..af38776054 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -342,4 +342,97 @@ test_expect_success 'truncated bitmap fails gracefully' '
test_i18ngrep corrupt stderr
 '
 
+# have_delta  
+#
+# Note that because this relies on cat-file, it might find _any_ copy of an
+# object in the repository. The caller is responsible for making sure
+# there's only one (e.g., via "repack -ad", or having just fetched a copy).
+have_delta () {
+   echo $2 >expect &&
+   echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
+   test_cmp expect actual
+}
+
+# Create a state of history with these properties:
+#
+#  - refs that allow a client to fetch some new history, while sharing some old
+#history with the server; we use branches delta-reuse-old and
+#delta-reuse-new here
+#
+#  - the new history contains an object that is stored on the server as a delta
+#against a base that is in the old history
+#
+#  - the base object is not immediately reachable from the tip of the old
+#history; finding it would involve digging down through history we know the
+#other side has
+#
+# This should result in a state where fetching from old->new would not
+# traditionally reuse the on-disk delta (because we'd have to dig to realize
+# that the client has it), but we will do so if bitmaps can tell us cheaply
+# that the other side has it.
+test_expect_success 'set up thin delta-reuse parent' '
+   # This first commit contains the buried base object.
+   test-tool genrandom delta 16384 >file &&
+   git add file &&
+   git commit -m "delta base" &&
+   base=$(git rev-parse --verify HEAD:file) &&
+
+   # These intermediate commits bury the base back in history.
+   # This becomes the "old" state.
+   for i in 1 2 3 4 5
+   do
+   echo $i >file &&
+   git commit -am "intermediate $i" || return 1
+   done &&
+   git branch delta-reuse-old &&
+
+   # And now our new history has a delta against the buried base. Note
+   # that this must be smaller than the original file, since pack-objects
+   # prefers to create deltas from smaller objects to larger.
+   test-tool genrandom delta 16300 >file &&
+   git commit -am "delta result" &&
+   delta=$(git rev-parse --verify HEAD:file) &&
+   git branch delta-reuse-new &&
+
+   # Repack with bitmaps and double check that we have the expected delta
+   # relationship.
+   git repack -adb &&
+   have_delta $delta $base
+'
+
+# Now we can sanity-check the non-bitmap behavior (that the server is not able
+# to reuse the delta). This isn't strictly something we care about, so this
+# test could be scrapped in the future. But it makes sure that the next test is
+# actually triggering the feature we want.
+#
+# Note that our tools for working with on-the-wire "thin" packs are limited. So
+# we actually perform the fetch, retain the resulting pack, and inspect the
+# result.
+test_expect_success 'fetch without bitmaps ignores delta against old base' '
+   test_config pack.usebitmaps false &&
+   test_when_finished "rm -rf client.git" &&
+   git init --bare client.git &&
+   (
+   cd client.git &&
+   git config transfer.unpackLimit 1 &&
+   git fetch .. delta-reuse-old:delta-reuse-old &&
+   git fetch .. delta-reuse-new:delta-reuse-new &&
+   have_delta $delta $ZERO_OID
+   )
+'
+
+# And do the same for the bitmap case, where we do expect to find the delta.
+test_expect_success 'fetch with bitmaps can reuse old base' '
+   test_config pack.usebitmaps true &&
+   test_when_finished "rm -rf client.git" &&
+   git init --bare client.git &&
+   (
+   cd client.git &&
+   git config transfer.unpackLimit 1 &&
+   git fetch .. delta-reuse-old:delta-reuse-old &&
+   git fetch .. delta-reuse-new:delta-reuse-new &&
+   have_delta $delta $base
+   )
+'
+
 test_done
-- 
2.19.0.rc1.549.g6ce4dc0f08



[PATCH 1/4] bitmap_has_sha1_in_uninteresting(): drop BUG check

2018-09-01 Thread Jeff King
Commit 30cdc33fba (pack-bitmap: save "have" bitmap from
walk, 2018-08-21) introduced a new function for looking at
the "have" side of a bitmap walk. Because it only makes
sense to do so after we've finished the walk, we added an
extra safety assertion, making sure that bitmap_git->result
is non-NULL.

However, this safety is misguided. It was trying to catch
the case where we had called prepare_bitmap_walk() to give
us a "struct bitmap_index", but had not yet called
traverse_bitmap_commit_list() to walk it. But all of the
interesting computation (including setting up the result and
"have" bitmaps) happens in the first function! The latter
function only delivers the result to a callback function.

So the case we were worried about is impossible; if you get
a non-NULL result from prepare_bitmap_walk(), then its
"have" field will be fully formed.

But much worse, traverse_bitmap_commit_list() actually frees
the result field as it finishes. Which means that this
assertion is worse than useless: it's almost guaranteed to
trigger!

Our test suite didn't catch this because the function isn't
actually exercised at all. The only caller comes from
6a1e32d532 (pack-objects: reuse on-disk deltas for thin
"have" objects, 2018-08-21), and that's triggered only when
you fetch or push history that contains an object with a
base that is found deep in history. Our test suite fetches
and pushes either don't use bitmaps, or use too-small
example repositories. But any reasonably-sized real-world
push or fetch (with bitmaps) would trigger this.

This patch drops the harmful assertion and tweaks the
docstring for the function to make the precondition clear.
The tests need to be improved to exercise this new
pack-objects feature, but we'll do that in a separate
commit.

Signed-off-by: Jeff King 
---
 pack-bitmap.c | 2 --
 pack-bitmap.h | 2 +-
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index c3231ef9ef..76fd93a3de 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1128,8 +1128,6 @@ int bitmap_has_sha1_in_uninteresting(struct bitmap_index 
*bitmap_git,
 
if (!bitmap_git)
return 0; /* no bitmap loaded */
-   if (!bitmap_git->result)
-   BUG("failed to perform bitmap walk before querying");
if (!bitmap_git->haves)
return 0; /* walk had no "haves" */
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index c633bf5238..189dd68ad3 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -54,7 +54,7 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct 
packing_data *mapping
 void free_bitmap_index(struct bitmap_index *);
 
 /*
- * After a traversal has been performed on the bitmap_index, this can be
+ * After a traversal has been performed by prepare_bitmap_walk(), this can be
  * queried to see if a particular object was reachable from any of the
  * objects flagged as UNINTERESTING.
  */
-- 
2.19.0.rc1.549.g6ce4dc0f08



[PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-01 Thread Jeff King
On Fri, Aug 31, 2018 at 06:55:58PM -0400, Jeff King wrote:

> On Fri, Aug 31, 2018 at 05:23:17PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> > On Tue, Aug 21 2018, Jeff King wrote:
> > 
> > > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git,
> > > +  const unsigned char *sha1)
> > > +{
> > > + int pos;
> > > +
> > > + if (!bitmap_git)
> > > + return 0; /* no bitmap loaded */
> > > + if (!bitmap_git->result)
> > > + BUG("failed to perform bitmap walk before querying");
> > 
> > Some part of what calls this completely breaks pushing from the "next"
> > branch when you have local bitmaps (we *really* should have some tests
> > for this...).
> 
> Yikes, thanks for reporting. I agree we need better tests here.

OK, here is the fix. Since the problem is in 'next', this is done as a
patch on top of jk/pack-delta-reuse-with-bitmap. But since we're set to
rewind 'next' post-release anyway, we could squash it directly into
30cdc33fba from the original series. That would help later bisections
from running into it, which may be worth it as it's a pretty severe
breakage.  Or maybe not:

  1. The test suite doesn't actually fail, because it's toy repos are
 too small.

  2. It only triggers in the real-world if you have bitmaps turned on,
 which are not the default.

So it may not be that likely in practice to bother a hypothetical future
bisecting developer.

> [1] Actually, there is also prepare_bitmap_git(), but it is not really
> for general use by callers. It should be made static, or better yet,
> I suspect it can be folded into its callers.

This actually turned out not to work. There's a caller over in
pack-bitmap-write.c, and it makes things worse to try to expand the
logic there. So it technically _is_ possible to have a bitmap_index
without a "have" field, but it also doesn't make sense to ask about
"uninteresting" objects there. You haven't done (and cannot do) a
traversal on such an object.

Which I think goes back to Stefan's original question: is this just a
crappy API? And the answer is "yes, to some degree". There are really
two uses of bitmaps:

  - you want to do a traverse_commit_list() walk, but faster

  - you want to selectively query the on-disk bitmaps (e.g., you are
walking for --contains and want to ask "do we have a bitmap for this
object?"

Those currently use the same struct bitmap_index, but with two different
constructors (prepare_bitmap_git and prepare_bitmap_walk). It probably
ought to be two different ones (with the "walk" variant using the
"query" variant under the hood). I've punted on that full conversion for
now, but did clean up a few confusing bits.

  [1/4]: bitmap_has_sha1_in_uninteresting(): drop BUG check

The actual fix. This should get merged to next ASAP (or the original
topic just reverted).

  [2/4]: t5310: test delta reuse with bitmaps

I did this separately to give us flexibility to squash or merge
quickly. But it does find Ævar's bug on a git without patch 1.

  [3/4]: traverse_bitmap_commit_list(): don't free result

The original assert should have simply been useless, but it was the
surprising behavior of this function that turned it into a bug.

  [4/4]: pack-bitmap: drop "loaded" flag

And this is just an annoyance I ran into, which is a fallout from
our conversion to using an allocated bitmap_index struct.

 pack-bitmap.c   | 14 ++-
 pack-bitmap.h   |  2 +-
 t/t5310-pack-bitmaps.sh | 93 +
 3 files changed, 97 insertions(+), 12 deletions(-)

-Peff


Re: Git in Outreachy Dec-Mar?

2018-09-01 Thread Christian Couder
Hi Olga,

On Fri, Aug 31, 2018 at 12:30 PM, Оля Тележная  wrote:
> Hi everyone,
>
> I was Outreachy intern last winter. I guess I need to speak up: I will
> be happy if my feedback helps you.
> At first, I want to repeat all thanks to Outreachy organizers and Git
> mentors. That was unique experience and I am so proud of being a part
> of this project. But, I need to say that my internship wasn't ideal.
> Mentors, please do not feel guilty: I just want to improve the quality
> of future internships and give some advises.

Thanks a lot for this feedback! I think it can be very useful and I
don't feel guilty as I think no one is really to blame in these kinds
of situations. We all have to learn how we can improve.

I think a part of the problem as just that the work is really very
difficult for interns even if it doesn't seem that it should be so
difficult which makes the intern/mentor relationship tricky.

> I guess some of the problems aren't related to Git, and it's Outreachy
> weak points. Please forward this email to Outreachy organizers if you
> want.
>
> 1. The main problem of Outreachy internship is positioning. I mean, I
> had strong confidence that it's an internship for newbies in
> programming. All my friends had the same confidence, and that's the
> reason why 2 my friends failed in the middle of the Outreachy
> internship. Load was so big for them, noone explained this fact in the
> beginning, noone helped with this situation during the internship. I
> was thinking I could be overqualified and I took someone's place (I
> had 2 other SWE internships before Outreachy). The truth is that my
> skills were barely enough.

Yeah, but will it be better if Outreachy scares too many people away
from applying? I am not really sure.

Also I think success depends not so much on the students/interns
technical skills but on their willingness to ask questions and their
ability to get the help the need. Maybe Outreachy and Git should state
that more clearly.

For a long time I thought that it could be enough to just tell
students/interns that mentors are here to help, so they should not be
afraid of asking even the most basic questions. But over time I have
been realizing that mentors should actively try to understand what
help the student/intern need.

> 2. Please tell more about minimal requirements: write it down on a
> landing page in the beginning and maybe repeat them in every task. I
> guess it would be the same this year: good knowledge of C, gdb, Git
> (as a user: intern needs to know how to work with forks, git remote,
> git rebase -i, etc), Shell, base understanding of Linux terminal,
> being ready to work remotely. It's good idea to mention that it's not
> 100% requirement, but anyway at least 60% from the list must be
> familiar.

On our project page (https://git.github.io/Outreachy-15/) we tried a
bit to do that, for example there are things like:

Language: C
Difficulty: medium to hard

for each project. And there were no "easy" tasks.

So yeah instead of "Language: C" we could have something like:

Requirements: very good knowledge and practice of C, shell, gdb,
Git, Linux terminal, ...

and for difficulty we could remove "medium" and just select between
"hard", "very hard" and "impossible" :-)

But this could scare possible students/interns away and that's not
really what we want.

We think that if someone can successfully complete a micro project,
they should have enough basic technical skills to get started and then
we can teach them what they need. Maybe we could state that more
clearly?

I also think that the Git project doesn't make enough effort to be
newcomer friendly. Maybe we could start by adding a document somewhere
that could contain basic useful information for newcomers? Perhaps
this could be based on the presentation that Peff gave at the
beginning of the Bloomberg Hackathon last November?

> 3. If you decide to be a mentor - at first, thanks a lot. Please be
> ready to spend A LOT OF time on it. You need to explain not only the
> task to your intern, but also how to split the task into subtasks, how
> to look for solutions, how to work with the terminal, how to debug
> better and many other questions. It's not only about solving
> internship task. It's about learning something new. And I did not
> mention code reviews: there would be many stupid errors and it's a
> talent not to be angry about that.

I think mentors are ready to do that. Often the problem is that we
just don't know how we could help or how we can make the
students/interns confident enough to tell us how we could help or what
is blocking them.

> 4. I fully sure that you need to talk with your intern by the voice. I
> mean regular calls, at least once a week. It's good idea to share the
> desktop and show how you are working, what are you using, etc.
> Ask your intern to share the desktop: you need to feel confident that
> they understand how to work with the task. Help them with the
> shortcuts.