Re: [PATCH] pull: do not segfault when HEAD refers to missing object file

2017-03-05 Thread Jeff King
On Mon, Mar 06, 2017 at 07:52:10AM +0100, Johannes Sixt wrote:

> > Yeah, it is. You can do it easily with 'sed', of course, but if you want
> > to avoid the extra process and do it in pure shell, it's more like:
> > 
> >   last38=${REV#??}
> >   first2=${REV%$last38}
> >   rm -f .git/objects/$first2/$last38
> 
> Is it "HEAD points to non-existent object" or ".git/HEAD contains junk"? In
> both cases there are simpler solutions than to remove an object. For
> example, `echo "$_x40" >.git/HEAD` or `echo "this is junk" >.git/HEAD`?

Good point. Unfortunately, it's a little tricky. You can't use "this is
junk" because then git does not recognize it as a git repo at all. You
can't use $_x40 because the null sha1 is used internally to signal an
unborn branch, so we mix it up with that case.

Something like "111..." for 40 characters would work, though at that
point I think just simulating an actual corruption might be less
convoluted.

-Peff


Re: [PATCH] pull: do not segfault when HEAD refers to missing object file

2017-03-05 Thread Jeff King
On Mon, Mar 06, 2017 at 12:42:22AM +0100, André Laszlo wrote:

> git pull --rebase on a corrupted HEAD used to segfault;it has been
> corrected to error out with a message. A test has also been added to
> verify this fix.

Your commit message mostly tells us the "what" that we can see from the
diff. But why are we segfaulting? What assumption is being violated, and
why is the fix the right thing?

You've stuck some of the details in your notes, and they should probably
make it into the commit message.

> When add_head_to_pending fails to add a pending object, git pull
> --rebase segfaults. This can happen if HEAD is referring to a corrupt
> or missing object.

The other obvious time when HEAD is not valid is when you are on an
unborn branch. So we should also consider how such a case interacts with
the callsites you are touching.

I think it is primarily this hunk:

> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -2252,6 +2252,11 @@ int has_uncommitted_changes(int ignore_submodules)
>   DIFF_OPT_SET(&rev_info.diffopt, IGNORE_SUBMODULES);
>   DIFF_OPT_SET(&rev_info.diffopt, QUICK);
>   add_head_to_pending(&rev_info);
> +
> + /* The add_head_to_pending call might not have added anything. */
> + if (!rev_info.pending.nr)
> + die("bad object %s", "HEAD");
> +
>   diff_setup_done(&rev_info.diffopt);
>   result = run_diff_index(&rev_info, 1);
>   return diff_result_code(&rev_info.diffopt, result);

that is the fix. We assume that add_head_to_pending() put something into
rev_info.pending, but it might not.

In the case of corruption, "bad object HEAD" is a reasonable outcome.

In the case of an unborn branch, we'd probably want to compare against
the empty tree. This trick is used elsewhere in wt-status.c, as in
wt_status_collect_changes_index().

I'm not sure if we need to deal with that here or not. I wasn't able to
trigger this code with an unborn branch. If you have no index, then the
is_cache_unborn() check triggers earlier in the function. If you do have
an index, then we have an earlier check:

  if (is_null_sha1(orig_head) && !is_cache_unborn())
die(_("Updating an unborn branch with changes added to the index."));

that covers this case. I am not 100% sure that check is correct, though.
If I do:

  git init
  echo content >foo
  git add foo
  git rm -f foo

then I have an index which is not "unborn", but also does not contain
any changes. I think the conditional above should just be checking
"active_nr". If we were to fix that, then I think
has_uncommitted_changes() would need to be adjusted as well.

I admit that this is probably an absurd corner case.  So I think I am OK
with your fix for now, and we can revisit the logic later if anybody
starts to care.

> I discovered this segfault on my machine after pulling a repo from
> GitHub, but I have been unable to reproduce the sequence of events
> that lead to the corrupted HEAD (I think it may have been caused by a
> lost network connection in my case).

Yikes. That should never lead to a corrupted HEAD (unless you are on a
networked filesystem). I can't think of another read add_head_to_pending
would fail, though, aside from a missing or corrupted object.

Arguably add_head_to_pending() should die loudly if it can't parse the
object pointed to by HEAD, rather than quietly returning without adding
anything.

> diff --git a/diff-lib.c b/diff-lib.c
> index 52447466b..9d26b18c3 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -512,7 +512,7 @@ int run_diff_index(struct rev_info *revs, int cached)
>   struct object_array_entry *ent;
>  
>   ent = revs->pending.objects;
> - if (diff_cache(revs, ent->item->oid.hash, ent->name, cached))
> + if (!ent || diff_cache(revs, ent->item->oid.hash, ent->name, cached))
>   exit(128);

So if I understand correctly, this hunk should not trigger anymore,
because we would never call run_diff_index() without something in
pending.

I think it would be a programming error elsewhere to do so, and we
should treat this as an assertion by complaining loudly (for this and
any other confusing cases):

  if (revs->pending.nr != 1)
  die("BUG: run_diff_index requires exactly one rev (got %d)",
  revs->pending.nr);

Lastly, I think this is your first patch. So welcome, and thank you. :)
I know there was a lot of discussion and critique above, but I think
overall your patch is going in the right direction.

-Peff



Re: [PATCH] pull: do not segfault when HEAD refers to missing object file

2017-03-05 Thread Johannes Sixt

Am 06.03.2017 um 04:51 schrieb Jeff King:

On Sun, Mar 05, 2017 at 11:52:22PM +, brian m. carlson wrote:


On Mon, Mar 06, 2017 at 12:42:22AM +0100, André Laszlo wrote:

+test_expect_success 'git pull --rebase with corrupt HEAD does not segfault' '
+   mkdir corrupted &&
+   (cd corrupted &&


We usally indent this like so:

(
cd corrupted &&
echo one >file &&
git add file &&
...
) &&


+   git init &&
+   echo one >file && git add file &&
+   git commit -m one &&
+   REV=$(git rev-parse HEAD) &&
+   rm -f .git/objects/${REV:0:2}/${REV:2} &&


I think this is a bashism.  On dash, I get the following:

  genre ok % dash -c 'foo=abcdefg; echo ${foo:0:2}; echo ${foo:2}'
  dash: 1: Bad substitution


Yeah, it is. You can do it easily with 'sed', of course, but if you want
to avoid the extra process and do it in pure shell, it's more like:

  last38=${REV#??}
  first2=${REV%$last38}
  rm -f .git/objects/$first2/$last38


Is it "HEAD points to non-existent object" or ".git/HEAD contains junk"? 
In both cases there are simpler solutions than to remove an object. For 
example, `echo "$_x40" >.git/HEAD` or `echo "this is junk" >.git/HEAD`?


-- Hannes



[PATCH v2] git svn: fix authenticaton with 'branch'

2017-03-05 Thread Hiroshi Shirosaki
Authentication fails with svn branch while svn rebase and
svn dcommit work fine without authentication failures.

$ git svn branch v7_3
Copying https://xxx at r27519
to https:///v7_3...
Can't create session: Unable to connect to a repository at URL
'https://': No more
credentials or we tried too many times.
Authentication failed at
C:\Program Files\Git\mingw64/libexec/git-core\git-svn line 1200.

We add auth configuration to SVN::Client->new() to fix the issue.

Signed-off-by: Hiroshi Shirosaki 
---
 git-svn.perl | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index fa42364..d240418 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -1175,10 +1175,10 @@ sub cmd_branch {
::_req_svn();
require SVN::Client;
 
+   my ($config, $baton, undef) = Git::SVN::Ra::prepare_config_once();
my $ctx = SVN::Client->new(
-   config => SVN::Core::config_get_config(
-   $Git::SVN::Ra::config_dir
-   ),
+   auth => $baton,
+   config => $config,
log_msg => sub {
${ $_[0] } = defined $_message
? $_message
-- 
2.7.4



Re: Git download

2017-03-05 Thread Torsten Bögershausen

On 03/05/2017 09:26 PM, Cory Kilpatrick wrote:

I have downloaded Git and cannot find the application on my Mac. Should I try 
to download it again?


I don't think so.

It could be helpful if we can get some more information:

- Could you open the terminal application and type

 which git

git --version

and post the results here ?
It may be worth to mention that Git is a command line tool, so that you 
may not


see anything in the "Applications" folder.



Re: [PATCH] pull: do not segfault when HEAD refers to missing object file

2017-03-05 Thread Jeff King
On Sun, Mar 05, 2017 at 11:52:22PM +, brian m. carlson wrote:

> On Mon, Mar 06, 2017 at 12:42:22AM +0100, André Laszlo wrote:
> > +test_expect_success 'git pull --rebase with corrupt HEAD does not 
> > segfault' '
> > +   mkdir corrupted &&
> > +   (cd corrupted &&
> > +   git init &&
> > +   echo one >file && git add file &&
> > +   git commit -m one &&
> > +   REV=$(git rev-parse HEAD) &&
> > +   rm -f .git/objects/${REV:0:2}/${REV:2} &&
> 
> I think this is a bashism.  On dash, I get the following:
> 
>   genre ok % dash -c 'foo=abcdefg; echo ${foo:0:2}; echo ${foo:2}'
>   dash: 1: Bad substitution

Yeah, it is. You can do it easily with 'sed', of course, but if you want
to avoid the extra process and do it in pure shell, it's more like:

  last38=${REV#??}
  first2=${REV%$last38}
  rm -f .git/objects/$first2/$last38

-Peff


Re: Delta compression not so effective

2017-03-05 Thread Linus Torvalds
On Sat, Mar 4, 2017 at 12:27 AM, Marius Storm-Olsen  wrote:
>
> I reran the repack with the options above (dropping the zlib=9, as you
> suggested)
>
> $ time git -c pack.threads=4 repack -a -d -F \
>--window=350 --depth=250 --window-memory=30g
>
> and ended up with
> $ du -sh .
> 205G.
>
> In other words, going from 6G to 30G window didn't help a lick on finding
> deltas for those binaries.

Ok.

> I did
> fprintf(stderr, "%s %u %lu\n",
> sha1_to_hex(delta_list[i]->idx.sha1),
> delta_list[i]->hash,
> delta_list[i]->size);
>
> I assume that's correct?

Looks good.

> I've removed all commit messages, and "sanitized" some filepaths etc, so
> name hashes won't match what's reported, but that should be fine. (the
> object_entry->hash seems to be just a trivial uint32 hash for sorting
> anyways)

Yes. I see your name list and your pack-file index.

> BUT, if I look at the last 3 entries of the sorted git verify-pack output,
> and look for them in the 'git log --oneline --raw -R --abbrev=40' output, I
> get:
...
> while I cannot find ANY of them in the delta_list output?? \

Yes. You have a lot of of object names in that log file you sent in
private that aren't in the delta list.

Now, objects smaller than 50 bytes we don't ever try to even delta. I
can't see the object sizes when they don't show up in the delta list,
but looking at some of those filenames I'd expect them to not fall in
that category.

I guess you could do the printout a bit earlier (on the
"to_pack.objects[]" array - to_pack.nr_objects is the count there).
That should show all of them. But the small objects shouldn't matter.

But if you have a file like

   extern/win/FlammableV3/x64/lib/FlameProxyLibD.lib

I would have assumed that it has a size that is > 50. Unless those
"extern" things are placeholders?

> You might get an idea for how to easily create a repo which reproduces the
> issue, and which would highlight it more easily for the ML.

Looking at your sorted object list ready for packing, it doesn't look
horrible. When sorting for size, it still shows a lot of those large
files with the same name hash, so they sorted together in that form
too.

I do wonder if your dll data just simply is absolutely horrible for
xdelta. We've also limited the delta finding a bit, simply because it
had some O(m*n) behavior that gets very expensive on some patterns.
Maybe your blobs trigger some of those case.

The diff-delta work all goes back to 2005 and 2006, so it's a long time ago.

What I'd ask you to do is try to find if you could make a reposity of
just one of the bigger DLL's with its history, particularly if you can
find some that you don't think is _that_ sensitive.

Looking at it, for example, I see that you have that file

   extern/redhat-5/FlammableV3/x64/plugins/libFlameCUDA-3.0.703.so

that seems to have changed several times, and is a largish blob. Could
you try creating a repository with git fast-import that *only*
contains that file (or pick another one), and see if that delta's
well?

And if you find some case that doesn't xdelta well, and that you feel
you could make available outside, we could have a test-case...

 Linus


Re: RFC: Another proposed hash function transition plan

2017-03-05 Thread brian m. carlson
On Sat, Mar 04, 2017 at 06:35:38PM -0800, Linus Torvalds wrote:
> On Fri, Mar 3, 2017 at 5:12 PM, Jonathan Nieder  wrote:
> >
> > This document is still in flux but I thought it best to send it out
> > early to start getting feedback.
> 
> This actually looks very reasonable if you can implement it cleanly
> enough. In many ways the "convert entirely to a new 256-bit hash" is
> the cleanest model, and interoperability was at least my personal
> concern. Maybe your model solves it (devil in the details), in which
> case I really like it.

If you think you can do it, I'm all for it.

> Btw, I do think the particular choice of hash should still be on the
> table. sha-256 may be the obvious first choice, but there are
> definitely a few reasons to consider alternatives, especially if it's
> a complete switch-over like this.
> 
> One is large-file behavior - a parallel (or tree) mode could improve
> on that noticeably. BLAKE2 does have special support for that, for
> example. And SHA-256 does have known attacks compared to SHA-3-256 or
> BLAKE2 - whether that is due to age or due to more effort, I can't
> really judge. But if we're switching away from SHA1 due to known
> attacks, it does feel like we should be careful.

I agree with Linus on this.  SHA-256 is the slowest option, and it's the
one with the most advanced cryptanalysis.  SHA-3-256 is faster on 64-bit
machines (which, as we've seen on the list, is the overwhelming majority
of machines using Git), and even BLAKE2b-256 is stronger.

Doing this all over again in another couple years should also be a
non-goal.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH] pull: do not segfault when HEAD refers to missing object file

2017-03-05 Thread brian m. carlson
On Mon, Mar 06, 2017 at 12:42:22AM +0100, André Laszlo wrote:
> +test_expect_success 'git pull --rebase with corrupt HEAD does not segfault' '
> + mkdir corrupted &&
> + (cd corrupted &&
> + git init &&
> + echo one >file && git add file &&
> + git commit -m one &&
> + REV=$(git rev-parse HEAD) &&
> + rm -f .git/objects/${REV:0:2}/${REV:2} &&

I think this is a bashism.  On dash, I get the following:

  genre ok % dash -c 'foo=abcdefg; echo ${foo:0:2}; echo ${foo:2}'
  dash: 1: Bad substitution

> + test_expect_code 128 git pull --rebase > /dev/null
> + )
> +'
> +
>  test_done
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Transition plan for git to move to a new hash function

2017-03-05 Thread brian m. carlson
On Sun, Mar 05, 2017 at 01:45:46PM +, Ian Jackson wrote:
> brian m. carlson writes ("Re: Transition plan for git to move to a new hash 
> function"):
> > Instead, I was referring to areas like the notes code.  It has extensive
> > use of the last byte as a type of lookup table key.  It's very dependent
> > on having exactly one hash, since it will always want to use the last
> > byte.
> 
> You mean note_tree_search ?  (My tree here may be a bit out of date.)
> This doesn't seem difficult to fix.  The nontrivial changes would be
> mostly confined to SUBTREE_SHA1_PREFIXCMP and GET_NIBBLE.
> 
> It's true that like most of git there's a lot of hardcoded `sha1'.

I'm talking about the entire notes.c file.  There are several different
uses of "19" in there, and they compose at least two separate concepts.
My object-id-part9 series tries to split those out into logical
constants.

This code is not going to handle repositories with different-length
objects well, which I believe was your initial proposal.  I originally
thought that mixed-hash repositories would be viable as well, but I no
longer do.

> Are you arguing in favour of "replace git with git2 by simply
> s/20/64/g; s/sha1/blake/g" ?  This seems to me to be a poor idea.
> Takeup of the new `git2' would be very slow because of the pain
> involved.

I'm arguing that the same binary ought to be able to handle both SHA-1
and the new hash.  I'm also arguing that a given object have exactly one
hash and that we not mix hashes in the same object.  A repository will
be composed of one type of object, and if that's the new hash, a lookup
table will be used to translate SHA-1.  We can synthesize the old
objects, should we need them.

That allows people to use the SHA-1 hashes (in my view, with a prefix,
such as "sha1:") in repositories using the new hash.  It also allows
verifying old tags and commits if need be.

What I *would* like to see is an extension to the tag and commit objects
which names the hash that was used to make them.  That makes it easy to
determine which object the signature should be verified over, as it will
verify over only one of them.

> [1] I've heard suggestions here that instead we should expect users to
> "git1 fast-export", which you would presumably feed into "git2
> fast-import".  But what is `git1' here ?  Is it the current git
> codebase frozen in time ?  I don't think it can be.  With this
> conversion strategy, we will need to maintain git1 for decades.  It
> will need portability fixes, security fixes, fixes for new hostile
> compiler optimisations, and so on.  The difficulty of conversion means
> there will be pressure to backport new features from `git2' to `git1'.
> (Also this approach means that all signatures are definitively lost
> during the conversion process.)

I'm proposing we have a git hash-convert (the name doesn't matter that
much) that converts in place.  It rebuilds the objects and builds a
lookup table.  Since the contents of git objects are deterministic, this
makes it possible for each individual user to make the transition in
place.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH] pull: do not segfault when HEAD refers to missing object file

2017-03-05 Thread André Laszlo
git pull --rebase on a corrupted HEAD used to segfault; it has been
corrected to error out with a message. A test has also been added to
verify this fix.

Signed-off-by: André Laszlo 
---

Notes:
When add_head_to_pending fails to add a pending object, git pull
--rebase segfaults. This can happen if HEAD is referring to a corrupt
or missing object.

I discovered this segfault on my machine after pulling a repo from
GitHub, but I have been unable to reproduce the sequence of events
that lead to the corrupted HEAD (I think it may have been caused by a
lost network connection in my case).

The following commands use add_head_to_pending:

format-patch  setup_revisions before add_head_to_pending
diff  checks rev.pending.nr
shortlog  checks rev.pending.nr
log   uses resolve_ref_unsafe

All of the above return an error code of 128 and print "fatal: bad
object HEAD" instead of segfaulting, which I think is correct
behavior. The check and error message have been added to
has_uncommitted_changes, where they were missing, as well as to
diff-lib.c (without the error message).

 diff-lib.c  |  2 +-
 t/t5520-pull.sh | 12 
 wt-status.c |  5 +
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/diff-lib.c b/diff-lib.c
index 52447466b..9d26b18c3 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -512,7 +512,7 @@ int run_diff_index(struct rev_info *revs, int cached)
struct object_array_entry *ent;
 
ent = revs->pending.objects;
-   if (diff_cache(revs, ent->item->oid.hash, ent->name, cached))
+   if (!ent || diff_cache(revs, ent->item->oid.hash, ent->name, cached))
exit(128);
 
diff_set_mnemonic_prefix(&revs->diffopt, "c/", cached ? "i/" : "w/");
diff --git a/t/t5520-pull.sh b/t/t5520-pull.sh
index 17f4d0fe4..1edb6a97a 100755
--- a/t/t5520-pull.sh
+++ b/t/t5520-pull.sh
@@ -664,4 +664,16 @@ test_expect_success 'git pull --rebase against local 
branch' '
test file = "$(cat file2)"
 '
 
+test_expect_success 'git pull --rebase with corrupt HEAD does not segfault' '
+   mkdir corrupted &&
+   (cd corrupted &&
+   git init &&
+   echo one >file && git add file &&
+   git commit -m one &&
+   REV=$(git rev-parse HEAD) &&
+   rm -f .git/objects/${REV:0:2}/${REV:2} &&
+   test_expect_code 128 git pull --rebase > /dev/null
+   )
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index d47012048..3d60eaff5 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -2252,6 +2252,11 @@ int has_uncommitted_changes(int ignore_submodules)
DIFF_OPT_SET(&rev_info.diffopt, IGNORE_SUBMODULES);
DIFF_OPT_SET(&rev_info.diffopt, QUICK);
add_head_to_pending(&rev_info);
+
+   /* The add_head_to_pending call might not have added anything. */
+   if (!rev_info.pending.nr)
+   die("bad object %s", "HEAD");
+
diff_setup_done(&rev_info.diffopt);
result = run_diff_index(&rev_info, 1);
return diff_result_code(&rev_info.diffopt, result);
-- 
2.12.0



Re: [PATCH v1] Travis: also test on 32-bit Linux

2017-03-05 Thread Ramsay Jones


On 05/03/17 17:38, Lars Schneider wrote:
>> On 02 Mar 2017, at 16:17, Ramsay Jones  wrote:
>> On 02/03/17 11:24, Johannes Schindelin wrote:
>>> On Thu, 2 Mar 2017, Lars Schneider wrote:
>>>
>> [snip]
 One thing that still bugs me: In the Linux32 environment prove adds the
 CPU times to every test run: ( 0.02 usr  0.00 sys +  0.00 cusr  0.00
 csys ...  Has anyone an idea why that happens and how we can disable it?
>>>
>>> I have no idea.
>>
>> I have no idea either, but it is not unique to this 32bit Linux, but
>> rather the version of prove. For example, I am seeing this on Linux
>> Mint 18.1 (64bit _and_ 32bit), whereas Linux Mint 17.x did not do
>> this. (They used different Ubuntu LTS releases).
>>
>> [Mint 18.1 'prove --version' says: TAP::Harness v3.35 and Perl v5.22.1]
> 
> I think I found it. It was introduced in TAP::Harness v3.34:
> https://github.com/Perl-Toolchain-Gang/Test-Harness/commit/66cbf6355928b4828db517a99f1099b7fed35e90
> 
> ... and it is enabled with the "--timer" switch.

Yep, that looks like it.

When I updated to Mint 18, this broke a perl script of mine, so I had
a quick look to see what I could do to suppress it. The man page seemed
to imply that you could replace the output formatter, but that didn't
take me too far (search CPAN for TAP::Parser::Formatter: ;-) ). I suppose
you could replace Tap::Formatter::Base, or some such, but I didn't need
to go that far - I simply changed a couple of regex-es to ignore the
excess output! :-P

Do you really need to suppress that timing information or, like me, can
you simply ignore it?

ATB,
Ramsay Jones




Git download

2017-03-05 Thread Cory Kilpatrick
I have downloaded Git and cannot find the application on my Mac. Should I try 
to download it again? 

[PATCH v3] Travis: also test on 32-bit Linux

2017-03-05 Thread Lars Schneider
From: Johannes Schindelin 

When Git v2.9.1 was released, it had a bug that showed only on Windows
and on 32-bit systems: our assumption that `unsigned long` can hold
64-bit values turned out to be wrong.

This could have been caught earlier if we had a Continuous Testing
set up that includes a build and test run on 32-bit Linux.

Let's do this (and take care of the Windows build later). This patch
asks Travis CI to install a Docker image with 32-bit libraries and then
goes on to build and test Git using this 32-bit setup.

Signed-off-by: Johannes Schindelin 
Signed-off-by: Lars Schneider 
---

Hi,

when I looked into the 32-bit/line-log issue [1], I realized that my
proposed docker setup is not ideal for local debugging. Here is an
approach that I think is better. I changed the following:
- disable sudo as it is not required for the Travis job
- keep all docker commands in the .travis.yml
- add option to run commands inside docker with the same UID as the
  host user to make output files accessible
- pass environment variables directly to the docker container

Sorry for the back and forth.

Cheers,
Lars


[1] http://public-inbox.org/git/2205f1a7-a694-4f40-b994-d68c3947f...@gmail.com/


Notes:
Base Ref: master
Web-Diff: https://github.com/larsxschneider/git/commit/a6fe1def16
Checkout: git fetch https://github.com/larsxschneider/git 
travisci/linux32-v3 && git checkout a6fe1def16

Interdiff (v2..v3):

diff --git a/.travis.yml b/.travis.yml
index fd60fd8328..591cc57b80 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -41,13 +41,25 @@ matrix:
   include:
 - env: Linux32
   os: linux
-  sudo: required
   services:
 - docker
   before_install:
 - docker pull daald/ubuntu32:xenial
   before_script:
-  script: ci/run-linux32-build.sh daald/ubuntu32:xenial
+  script:
+- >
+  docker run
+  --interactive
+  --env DEFAULT_TEST_TARGET
+  --env GIT_PROVE_OPTS
+  --env GIT_TEST_OPTS
+  --env GIT_TEST_CLONE_2GB
+  --volume "${PWD}:/usr/src/git"
+  daald/ubuntu32:xenial
+  /usr/src/git/ci/run-linux32-build.sh $(id -u $USER)
+# Use the following command to debug the docker build locally:
+# $ docker run -itv "${PWD}:/usr/src/git" --entrypoint /bin/bash 
daald/ubuntu32:xenial
+# root@container:/# /usr/src/git/ci/run-linux32-build.sh
 - env: Documentation
   os: linux
   compiler: clang
diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
index 13c184d41c..f7a3434985 100755
--- a/ci/run-linux32-build.sh
+++ b/ci/run-linux32-build.sh
@@ -1,31 +1,30 @@
 #!/bin/sh
 #
-# Build and test Git in a docker container running a 32-bit Ubuntu Linux
+# Build and test Git in a 32-bit environment
 #
 # Usage:
-#   run-linux32-build.sh [container-image]
+#   run-linux32-build.sh [host-user-id]
 #

-CONTAINER="${1:-daald/ubuntu32:xenial}"
-
-sudo docker run --interactive --volume "${PWD}:/usr/src/git" "$CONTAINER" \
-/bin/bash -c 'linux32 --32bit i386 sh -c "
-: update packages &&
+# Update packages to the latest available versions
+linux32 --32bit i386 sh -c '
 apt update >/dev/null &&
 apt install -y build-essential libcurl4-openssl-dev libssl-dev \
-libexpat-dev gettext python >/dev/null &&
+libexpat-dev gettext python >/dev/null
+' &&
+
+# If this script runs inside a docker container, then all commands are
+# usually executed as root. Consequently, the host user might not be
+# able to access the test output files.
+# If a host user id is given, then create a user "ci" with the host user
+# id to make everything accessible to the host user.
+HOST_UID=$1 &&
+CI_USER=$USER &&
+test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER) &&

-: build and test &&
+# Build and test
+linux32 --32bit i386 su -m -l $CI_USER -c '
 cd /usr/src/git &&
-export DEFAULT_TEST_TARGET='$DEFAULT_TEST_TARGET' &&
-export GIT_PROVE_OPTS=\"'"$GIT_PROVE_OPTS"'\" &&
-export GIT_TEST_OPTS=\"'"$GIT_TEST_OPTS"'\" &&
-export GIT_TEST_CLONE_2GB='$GIT_TEST_CLONE_2GB' &&
 make --jobs=2 &&
-make --quiet test || (
-
-: make test-results readable to non-root user on TravisCI &&
-test '$TRAVIS' &&
-find t/test-results/ -type f -exec chmod o+r {} \; &&
-false )
-"'
+make --quiet test
+'

\0

 .travis.yml | 21 +
 ci/run-linux32-build.sh | 30 ++
 2 files changed, 51 insertions(+)
 create mode 100755 ci/run-linux32-build.sh

diff --git a/.travis.yml b/.travis.yml
index 9c63c8c3f6..591c

Re: [PATCH v1] Travis: also test on 32-bit Linux

2017-03-05 Thread Lars Schneider

> On 02 Mar 2017, at 16:17, Ramsay Jones  wrote:
> 
> 
> 
> On 02/03/17 11:24, Johannes Schindelin wrote:
>> Hi Lars,
>> 
>> On Thu, 2 Mar 2017, Lars Schneider wrote:
>> 
> [snip]
>>> One thing that still bugs me: In the Linux32 environment prove adds the
>>> CPU times to every test run: ( 0.02 usr  0.00 sys +  0.00 cusr  0.00
>>> csys ...  Has anyone an idea why that happens and how we can disable it?
>> 
>> I have no idea.
> 
> I have no idea either, but it is not unique to this 32bit Linux, but
> rather the version of prove. For example, I am seeing this on Linux
> Mint 18.1 (64bit _and_ 32bit), whereas Linux Mint 17.x did not do
> this. (They used different Ubuntu LTS releases).
> 
> [Mint 18.1 'prove --version' says: TAP::Harness v3.35 and Perl v5.22.1]

I think I found it. It was introduced in TAP::Harness v3.34:
https://github.com/Perl-Toolchain-Gang/Test-Harness/commit/66cbf6355928b4828db517a99f1099b7fed35e90

... and it is enabled with the "--timer" switch.

- Lars

difflame improvements

2017-03-05 Thread Edmundo Carmona Antoranz
Hi!

Since my last post the biggest improvement is the ability to detect
that the user has requested a "reverse" analysis.

Under "normal" circumstances a user would ask difflame to get the diff
from an ancestor (call "difflame treeish1 treeish2" so that merge-base
of treeish1 treeish2 equals treeish1). In this case the blame result
is done using straight blame output for added lines and additional
analysis to detect where a line was deleted (analysis has improved a
lot in this regard I haven't heard anything from Peff, though).
But if the user requests the opposite (call "difflame treeish1
treeish2" so that merge-base of treeish1 treeish2 is treeish2) then
the analysis has to be driven "in reverse".

Here's one example taken from difflame itself:

normal "forward" call (hope output doesn't get butchered):

$ ./difflame.py HEAD~3 HEAD~2
diff --git a/difflame.py b/difflame.py
index e70154a..04c7577 100755
--- a/difflame.py
+++ b/difflame.py
@@ -365,7 +365,7 @@ def get_full_revision_id(revision):
 e5b218e4 (Edmundo 2017-02-01 365) # we already had the revision
 50528377 (Edmundo 2017-03-04 366) return REVISIONS_ID_CACHE[revision]
 d1d11d8a (Edmundo 2017-02-02 367) # fallback to get it from git
   b1a6693 use rev-list to get revision IDs
-b1a6693 (Edmundo 2017-03-04 368) full_revision =
run_git_command(["show", "--pretty=%H", revision]).split("\n")[0]
+b1a66932 (Edmundo 2017-03-04 368) full_revision =
run_git_command(["rev-list", "--max-count=1",
revision]).split("\n")[0]
 50528377 (Edmundo 2017-03-04 369) REVISIONS_ID_CACHE[revision] =
full_revision
 e5b218e4 (Edmundo 2017-02-01 370) return full_revision
 91b7d3f5 (Edmundo 2017-01-31 371)

"reverse" call:
$ ./difflame.py HEAD~2 HEAD~3
diff --git a/difflame.py b/difflame.py
index 04c7577..e70154a 100755
--- a/difflame.py
+++ b/difflame.py
@@ -365,7 +365,7 @@ def get_full_revision_id(revision):
 e5b218e4 (Edmundo 2017-02-01 365) # we already had the revision
 50528377 (Edmundo 2017-03-04 366) return REVISIONS_ID_CACHE[revision]
 d1d11d8a (Edmundo 2017-02-02 367) # fallback to get it from git
   b1a6693 use rev-list to get revision IDs
-b1a66932 (Edmundo 2017-03-04 368) full_revision =
run_git_command(["rev-list", "--max-count=1",
revision]).split("\n")[0]
+b1a6693 (Edmundo 2017-03-04 368) full_revision =
run_git_command(["show", "--pretty=%H", revision]).split("\n")[0]
 50528377 (Edmundo 2017-03-04 369) REVISIONS_ID_CACHE[revision] =
full_revision
 e5b218e4 (Edmundo 2017-02-01 370) return full_revision
 91b7d3f5 (Edmundo 2017-01-31 371)

Notice how the revision reported in both difflame calls is the same:

$ git show b1a66932
commit b1a66932704245fd653f8d48c0a718f168f334a7
Author: Edmundo Carmona Antoranz 
Date:   Sat Mar 4 13:59:50 2017 -0600

   use rev-list to get revision IDs

diff --git a/difflame.py b/difflame.py
index e70154a..04c7577 100755
--- a/difflame.py
+++ b/difflame.py
@@ -365,7 +365,7 @@ def get_full_revision_id(revision):
# we already had the revision
return REVISIONS_ID_CACHE[revision]
# fallback to get it from git
-full_revision = run_git_command(["show", "--pretty=%H",
revision]).split("\n")[0]
+full_revision = run_git_command(["rev-list", "--max-count=1",
revision]).split("\n")[0]
REVISIONS_ID_CACHE[revision] = full_revision
return full_revision


If this "detection" to perform reverse analysis hadn't been done, then
there wouldn't be a lot of useful information because there are no
revisions in HEAD~2..HEAD~3 and so the output would have been
something like:

diff --git a/difflame.py b/difflame.py
index 04c7577..e70154a 100755
--- a/difflame.py
+++ b/difflame.py
@@ -365,7 +365,7 @@ def get_full_revision_id(revision):
 e5b218e4 (Edmundo 2017-02-01 365) # we already had the revision
 50528377 (Edmundo 2017-03-04 366) return REVISIONS_ID_CACHE[revision]
 d1d11d8a (Edmundo 2017-02-02 367) # fallback to get it from git
   b1a6693 use rev-list to get revision IDs
%b1a6693 (Edmundo 2017-03-04 368) full_revision =
run_git_command(["rev-list", "--max-count=1",
revision]).split("\n")[0]
   e5b218e printing hints for deleted lines
+e5b218e4 (Edmundo 2017-02-01 368) full_revision =
run_git_command(["show", "--pretty=%H", revision]).split("\n")[0]
 50528377 (Edmundo 2017-03-04 369) REVISIONS_ID_CACHE[revision] =
full_revision
 e5b218e4 (Edmundo 2017-02-01 370) return full_revision
 91b7d3f5 (Edmundo 2017-01-31 371)

Notice how both the added line and the deleted line are reporting the
_wrong_ revision. It should be b1a66932 in all cases.


One question that has been bugging me for a while is what to do in
cases where treeish1, treeish2 are not "direct" descendants" (as in
merge-base treeish1 treeish2 is something other than treeish1 or
treeish2). Suppose a line was added on an ancestor of treeish1 but it
hasn't been merged into treeish2. In this case if we diff
treeish1..treeish2 we will ge

Re: What's cooking in git.git (Mar 2017, #02; Fri, 3)

2017-03-05 Thread Pranit Bauva
Hey Stephan

On Sat, Mar 4, 2017 at 8:05 PM, Stephan Beyer  wrote:
> Hi Pranit,
>
> On 03/04/2017 12:26 AM, Junio C Hamano wrote:
>> --
>> [Stalled]
> [...]
>>
>> * pb/bisect (2017-02-18) 28 commits
>>  - fixup! bisect--helper: `bisect_next_check` & bisect_voc shell function in 
>> C
>>  - bisect--helper: remove the dequote in bisect_start()
>>  - bisect--helper: retire `--bisect-auto-next` subcommand
>>  - bisect--helper: retire `--bisect-autostart` subcommand
>>  - bisect--helper: retire `--bisect-write` subcommand
>>  - bisect--helper: `bisect_replay` shell function in C
>>  - bisect--helper: `bisect_log` shell function in C
>>  - bisect--helper: retire `--write-terms` subcommand
>>  - bisect--helper: retire `--check-expected-revs` subcommand
>>  - bisect--helper: `bisect_state` & `bisect_head` shell function in C
>>  - bisect--helper: `bisect_autostart` shell function in C
>>  - bisect--helper: retire `--next-all` subcommand
>>  - bisect--helper: retire `--bisect-clean-state` subcommand
>>  - bisect--helper: `bisect_next` and `bisect_auto_next` shell function in C
>>  - t6030: no cleanup with bad merge base
>>  - bisect--helper: `bisect_start` shell function partially in C
>>  - bisect--helper: `get_terms` & `bisect_terms` shell function in C
>>  - bisect--helper: `bisect_next_check` & bisect_voc shell function in C
>>  - bisect--helper: `check_and_set_terms` shell function in C
>>  - bisect--helper: `bisect_write` shell function in C
>>  - bisect--helper: `is_expected_rev` & `check_expected_revs` shell function 
>> in C
>>  - bisect--helper: `bisect_reset` shell function in C
>>  - wrapper: move is_empty_file() and rename it as is_empty_or_missing_file()
>>  - t6030: explicitly test for bisection cleanup
>>  - bisect--helper: `bisect_clean_state` shell function in C
>>  - bisect--helper: `write_terms` shell function in C
>>  - bisect: rewrite `check_term_format` shell function in C
>>  - bisect--helper: use OPT_CMDMODE instead of OPT_BOOL
>>
>>  Move more parts of "git bisect" to C.
>>
>>  Expecting a reroll.
>
> I guess you are short on time but I am hoping that you are still going
> to send a reroll to the list (probably on top of [1]?). This is because
> I would like to start to work on rerolling my bisect patches from last
> year [2] ... but to avoid a mess of merge conflicts, I am waiting until
> pb/bisect found its way into "next". (There were also recent discussions
> on other bisect strategies [3] and it's probably only a matter of time
> until a new big patchset on bisect--helper comes up...)

I am sorry I haven't found much time on it. I actually came across a
bug and haven't been able to fix that so I had just not worked on it
then. I almost forgot that you too had a patch series and this series
is important for you. I will start working on this and send a re-roll
soon.

Regards,
Pranit Bauva


Re: bisect-helper: we do not bisect --objects

2017-03-05 Thread Philip Oakley

From: "Junio C Hamano" 

"Philip Oakley"  writes:


Bikeshedding: If the given boundary is a tag, it could be tagging a
blob or tree rather than a commit. Would that be a scenario that
reaches this part of the code?


I do not think it is relevant.

Bisection is an operation over a "bisectable" commit DAG, where
commits can be partitioned into "new" (aka "bad") and "old" (aka
"good") camp, all descendants of a "new" commit are all "new", all
ancestors of an "old" commit are all "old".  Think of the "new"-ness
as a 100% inheritable disease that happens spontaneously and without
cure.  Once you are infected with "new"-ness, all your descendants
are forever "new".  If you know you are free of the "new"-ness, you
know that all your ancestors are not with the "new"-ness, either.

The goal of the operation is to find a "new" commit whose parents
are all "old".

The bisectability of the commit DAG is what allows you to say "this
commit is new" to a commit somewhere in the middle of the history
and avoid having to test any descendants, as they all inherit the
"new"-ness from it (similarly when you have one commit that is
"old", you do not have to test any ancestor), thereby reducing the
number of test from N (all commits in good..bad range) to log(N).

There is no room for a tree or a blob to participate in this graph
partitioning problem.  A "bad" tree that is "new" cannot infect its
children with the "new"-ness and a "good" tree cannot guarantee the
lack of "new"-ness of its parents, because a tree or a blob does not
have parent or child commits.


Thanks.

I was happy with how the bisect actually works. I was more responding the 
the open question about how that piece of code may have come into existance, 
and the potential thought processes that can lead to such 'mistakes'.


My line of reasoning was that it is reasonable to pass both commits and 
tags, as revisions(7), to the cli as being the bad and good points in the 
graph. This could then lead to a expectation that the objects they point to 
should be suitably marked, which is quite reasonable for commits, and for 
most tags.


However there are tags that point to the commit tree, rather than the commit 
itself, so if that initial rule was followed in a simplistic manner, then 
(falsely) the peeled tree of the tag would be marked as good/bad, which as 
your patch identifies, would be the wrong thing to do.


The study of human error is quite interesting

regards

Philip




Re: Transition plan for git to move to a new hash function

2017-03-05 Thread Ian Jackson
brian m. carlson writes ("Re: Transition plan for git to move to a new hash 
function"):
> Instead, I was referring to areas like the notes code.  It has extensive
> use of the last byte as a type of lookup table key.  It's very dependent
> on having exactly one hash, since it will always want to use the last
> byte.

You mean note_tree_search ?  (My tree here may be a bit out of date.)
This doesn't seem difficult to fix.  The nontrivial changes would be
mostly confined to SUBTREE_SHA1_PREFIXCMP and GET_NIBBLE.

It's true that like most of git there's a lot of hardcoded `sha1'.


Are you arguing in favour of "replace git with git2 by simply
s/20/64/g; s/sha1/blake/g" ?  This seems to me to be a poor idea.
Takeup of the new `git2' would be very slow because of the pain
involved.

Any sensible method of moving to a new hash that isn't "make a
completely incompatible new version of git" is going to involve
teaching the code we have in git right now to handle new hashes as
well as sha1 hashes.

Even if the plan is to try to convert old data, rather than keep it
and be able to refer to it from new data, something will have to be
able to parse old packfiles, old commits, old tags, old notes,
etc. etc. etc.  Either that's going to be some separate conversion
utility, or it has to be the same code in git that's there already.[1]

The ability to handle both old-format and new-format data can be
achieved in the code by doing away with the hardcoded sha1s, so that
instead the hash is an abstract data type with operations like
"initialise", "compare", "get a nybble", etc.  We've already seen
patches going in this direction.

[1] I've heard suggestions here that instead we should expect users to
"git1 fast-export", which you would presumably feed into "git2
fast-import".  But what is `git1' here ?  Is it the current git
codebase frozen in time ?  I don't think it can be.  With this
conversion strategy, we will need to maintain git1 for decades.  It
will need portability fixes, security fixes, fixes for new hostile
compiler optimisations, and so on.  The difficulty of conversion means
there will be pressure to backport new features from `git2' to `git1'.
(Also this approach means that all signatures are definitively lost
during the conversion process.)

So if we want to provide both `git1' and `git2', it's still better to
compile `git' and `git2' from the same codebase.  But if we do that,
the resulting ifdeffery and/or other hash abstractions are most of the
work to be hash-agile.  It's just the difference between a
compile-time and runtime switch.

I think the incompatibile approach is much more work in the medium and
long term - and it leads to a longer transition period.


Bear in mind that our objective is not to minimise the time until the
new version of git is available.  Our objective is to minimise the
time until (most) people are using it.  An approach which takes longer
for the git community to develop, but which is easier to deploy, can
easily be better.

Or maybe the objective is to minimise overall effort.  In which case
more work on git, for an easier transition for all the users, seems
like a no-brainer.  I think this is arguably true even from the point
of view of effort amongst the community of git contributors.  git
contributors start out as git users - and if git's users are all busy
struggling with a difficult transition, they will have less time to
improve other stuff and will tend less to get involved upstream.  (And
they may be less inclined to feel that the git upstream developers
understand their needs well.)

The better alternative is to adopt a plan that has a clear and
straightforward transition for users, and ask git users to help with
implementation.

I think many git users, including sophisticated users and competent
organisations, are concerned about sha1.  Currently most of those
users will find it difficult to help, because it's not clear to them
what needs to be done.

Thanks,
Ian.


Re: [PATCH] line-log: use COPY_ARRAY to fix mis-sized memcpy

2017-03-05 Thread Vegard Nossum

On 05/03/2017 12:44, Jeff King wrote:

On Sun, Mar 05, 2017 at 06:36:19AM -0500, Jeff King wrote:


range_set_init(dst, src->nr);
-   memcpy(dst->ranges, src->ranges, src->nr*sizeof(struct range_set));
+   memcpy(dst->ranges, src->ranges, src->nr*sizeof(struct range));


I think "sizeof(*dst->ranges)" is probably an even better fix, as it
infers the type of "dst". But these days we have COPY_ARRAY() to make it
even harder to get this kind of thing wrong.


So here's your fix wrapped up with a commit message, mostly for Junio's
convenience. I listed you as the author, since you did the hard part. If
you're OK with it, please indicate that it's OK to add your
signed-off-by. If you prefer to do it differently, feel free to post
your own patch.


Thanks.



-- >8 --
From: Vegard Nossum 
Subject: [PATCH] line-log: use COPY_ARRAY to fix mis-sized memcpy

This memcpy meant to get the sizeof a "struct range", not a
"range_set", as the former is what our array holds. Rather
than swap out the types, let's convert this site to
COPY_ARRAY, which avoids the problem entirely (and confirms
that the src and dst types match).

Note for curiosity's sake that this bug doesn't trigger on
I32LP64 systems, but does on ILP32 systems. The mistaken
"struct range_set" has two ints and a pointer. That's 16
bytes on LP64, or 12 on ILP32. The correct "struct range"
type has two longs, which is also 16 on LP64, but only 8 on
ILP32.

Likewise an IL32P64 system would experience the bug.

Signed-off-by: Jeff King 


Signed-off-by: Vegard Nossum 


---
 line-log.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/line-log.c b/line-log.c
index 65f3558b3..a2477f601 100644
--- a/line-log.c
+++ b/line-log.c
@@ -43,9 +43,10 @@ void range_set_release(struct range_set *rs)
 static void range_set_copy(struct range_set *dst, struct range_set *src)
 {
range_set_init(dst, src->nr);
-   memcpy(dst->ranges, src->ranges, src->nr*sizeof(struct range_set));
+   COPY_ARRAY(dst->ranges, src->ranges, src->nr);
dst->nr = src->nr;
 }
+
 static void range_set_move(struct range_set *dst, struct range_set *src)
 {
range_set_release(dst);




Vegard


Re: [PATCH v1] Travis: also test on 32-bit Linux

2017-03-05 Thread Jeff King
On Sun, Mar 05, 2017 at 06:36:19AM -0500, Jeff King wrote:

> I grepped for 'memcpy.*sizeof' and found one other case that's not a
> bug, but is questionable.

And here's the fix for that case. It can be applied separately from the
other patch if need be.

-- >8 --
Subject: [PATCH] ewah: fix eword_t/uint64_t confusion

The ewah subsystem typedefs eword_t to be uint64_t, but some
code uses a bare uint64_t. This isn't a bug now, but it's a
potential maintenance problem if the definition of eword_t
ever changes. Let's use the correct type.

Note that we can't use COPY_ARRAY() here because the source
and destination point to objects of different sizes. For
that reason we'll also skip the usual "sizeof(*dst)" and use
the real type, which should make it more clear that there's
something tricky going on.

Signed-off-by: Jeff King 
---
 ewah/ewah_io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
index 61f6a4357..f73210973 100644
--- a/ewah/ewah_io.c
+++ b/ewah/ewah_io.c
@@ -142,8 +142,8 @@ int ewah_read_mmap(struct ewah_bitmap *self, const void 
*map, size_t len)
 * the endianness conversion in a separate pass to ensure
 * we're loading 8-byte aligned words.
 */
-   memcpy(self->buffer, ptr, self->buffer_size * sizeof(uint64_t));
-   ptr += self->buffer_size * sizeof(uint64_t);
+   memcpy(self->buffer, ptr, self->buffer_size * sizeof(eword_t));
+   ptr += self->buffer_size * sizeof(eword_t);
 
for (i = 0; i < self->buffer_size; ++i)
self->buffer[i] = ntohll(self->buffer[i]);
-- 
2.12.0.426.g9d5d0eeae



[PATCH] line-log: use COPY_ARRAY to fix mis-sized memcpy

2017-03-05 Thread Jeff King
On Sun, Mar 05, 2017 at 06:36:19AM -0500, Jeff King wrote:

> > range_set_init(dst, src->nr);
> > -   memcpy(dst->ranges, src->ranges, src->nr*sizeof(struct range_set));
> > +   memcpy(dst->ranges, src->ranges, src->nr*sizeof(struct range));
> 
> I think "sizeof(*dst->ranges)" is probably an even better fix, as it
> infers the type of "dst". But these days we have COPY_ARRAY() to make it
> even harder to get this kind of thing wrong.

So here's your fix wrapped up with a commit message, mostly for Junio's
convenience. I listed you as the author, since you did the hard part. If
you're OK with it, please indicate that it's OK to add your
signed-off-by. If you prefer to do it differently, feel free to post
your own patch.

-- >8 --
From: Vegard Nossum 
Subject: [PATCH] line-log: use COPY_ARRAY to fix mis-sized memcpy

This memcpy meant to get the sizeof a "struct range", not a
"range_set", as the former is what our array holds. Rather
than swap out the types, let's convert this site to
COPY_ARRAY, which avoids the problem entirely (and confirms
that the src and dst types match).

Note for curiosity's sake that this bug doesn't trigger on
I32LP64 systems, but does on ILP32 systems. The mistaken
"struct range_set" has two ints and a pointer. That's 16
bytes on LP64, or 12 on ILP32. The correct "struct range"
type has two longs, which is also 16 on LP64, but only 8 on
ILP32.

Likewise an IL32P64 system would experience the bug.

Signed-off-by: Jeff King 
---
 line-log.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/line-log.c b/line-log.c
index 65f3558b3..a2477f601 100644
--- a/line-log.c
+++ b/line-log.c
@@ -43,9 +43,10 @@ void range_set_release(struct range_set *rs)
 static void range_set_copy(struct range_set *dst, struct range_set *src)
 {
range_set_init(dst, src->nr);
-   memcpy(dst->ranges, src->ranges, src->nr*sizeof(struct range_set));
+   COPY_ARRAY(dst->ranges, src->ranges, src->nr);
dst->nr = src->nr;
 }
+
 static void range_set_move(struct range_set *dst, struct range_set *src)
 {
range_set_release(dst);
-- 
2.12.0.426.g9d5d0eeae




Re: [PATCH v1] Travis: also test on 32-bit Linux

2017-03-05 Thread Jeff King
On Sat, Mar 04, 2017 at 09:08:40PM +0100, Vegard Nossum wrote:

> > At a glance, looks like range_set_copy() is using
> > sizeof(struct range_set) == 12, but
> > range_set_init/range_set_grow/ALLOC_GROW/REALLOC_ARRAY is using
> > sizeof(rs->range) == 8.
> 
> Attached patch seems to fix it -- basically, range_set_copy() is trying
> to copy more than it should. It was uncovered with the test case from
> Allan's commit because it's creating enough ranges to overflow the
> initial allocation on 32-bit.

Ugh, yeah, that is definitely a bug.

> diff --git a/line-log.c b/line-log.c
> index 951029665..cb0dc1110 100644
> --- a/line-log.c
> +++ b/line-log.c
> @@ -43,7 +43,7 @@ void range_set_release(struct range_set *rs)
>  static void range_set_copy(struct range_set *dst, struct range_set *src)
>  {
>   range_set_init(dst, src->nr);
> - memcpy(dst->ranges, src->ranges, src->nr*sizeof(struct range_set));
> + memcpy(dst->ranges, src->ranges, src->nr*sizeof(struct range));

I think "sizeof(*dst->ranges)" is probably an even better fix, as it
infers the type of "dst". But these days we have COPY_ARRAY() to make it
even harder to get this kind of thing wrong.

I grepped for 'memcpy.*sizeof' and found one other case that's not a
bug, but is questionable.

Of the "good" cases, I think most of them could be converted into
something more obviously-correct, which would make auditing easier. The
three main cases I saw were:

  1. Ones which can probably be converted to COPY_ARRAY().

  2. Ones which just copy a single object, like:

   memcpy(&dst, &src, sizeof(dst));

 Perhaps we should be using struct assignment like:

   dst = src;

 here. It's safer and it should give the compiler more room to
 optimize. The only downside is that if you have pointers, it is
 easy to write "dst = src" when you meant "*dst = *src".

  3. There were a number of alloc-and-copy instances. The copy part is
 the same as (2) above, but you have to repeat the size, which is
 potentially error-prone. I wonder if we would want something like:

   #define ALLOC_COPY(dst, src) do { \
 (dst) = xmalloc(sizeof(*(dst))); \
 COPY_ARRAY(dst, src, 1); \
   while(0)

 That avoids having to specify the size at all, and triggers a
 compile-time error if "src" and "dst" point to objects of different
 sizes.

 I suspect our friendly neighborhood coccinelle wizards could cook
 up a conversion.

-Peff


Re: RFC: Another proposed hash function transition plan

2017-03-05 Thread David Lang

Translation table
~
A fast bidirectional mapping between sha1-names and sha256-names of
all local objects in the repository is kept on disk. The exact format
of that mapping is to be determined.

All operations that make new objects (e.g., "git commit") add the new
objects to the translation table.


This seems like a rather nontrival thing to design. It will need to hold 
millions of mappings, and be quickly searchable from either direction (sha1->new 
and new->sha1) while still be fairly fast to insert new records into.


For Linux, just the list of hashes recording the commits is going to be in the 
millions, whiel the list of hashes of individual files for all those commits is 
going to be substantially larger.


David Lang