Re: suboptimal behavior of fast-import in some cases with "from"

2015-07-08 Thread Mike Hommey
On Thu, Jul 09, 2015 at 02:03:15PM +0900, Mike Hommey wrote:
> On Mon, Jul 06, 2015 at 03:54:35PM -0700, Junio C Hamano wrote:
> > Mike Hommey  writes:
> > 
> > > One of the first things parse_from does is unconditionally throw away
> > > the tree for the given branch, and then the "from" tree is loaded. So
> > > when the "from" commit is the current head of the branch, that make
> > > fast-import do more work than necessary.
> > 
> > If it is very common that the next commit the input stream wants to
> > create is often on top of the commit that was just created, and if
> > it is very common that the input stream producer knows what the
> > commit object name of the commit that was just created, then
> > optimising for that case does not sound too bad.  It really depends
> > on two factors:
> > 
> >  - How likely is it that other people make the same mistake?
> 
> Looks like the answer is: very. Assuming my quick glance at the code is
> not mistaken, the same mistake is made in at least git-p4.py (yes, the
> one that comes with git), felipec's git-remote-hg, and hg-fast-export,
> and that's 100% of the sample I looked at.
> 
> I won't claim to know what fast-import is doing, not having looked at
> more than the parse_from* functions and the commit message for 4cabf858,
> but it seems plausible this also skips making tree deltas for those
> trees.

It doesn't /seem/ to be the case.

> >  - How bad the damage to parse_from() would be if we wanted to
> >optimize for this case?
> 
> I /think/ it would look like this (untested), which doesn't seem too
> damaging:

It's actually not enough. It does avoid to reread trees, but it doesn't
avoid the pack to be closed/reopened.

Mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suboptimal behavior of fast-import in some cases with "from"

2015-07-08 Thread Mike Hommey
On Mon, Jul 06, 2015 at 03:54:35PM -0700, Junio C Hamano wrote:
> Mike Hommey  writes:
> 
> > One of the first things parse_from does is unconditionally throw away
> > the tree for the given branch, and then the "from" tree is loaded. So
> > when the "from" commit is the current head of the branch, that make
> > fast-import do more work than necessary.
> 
> If it is very common that the next commit the input stream wants to
> create is often on top of the commit that was just created, and if
> it is very common that the input stream producer knows what the
> commit object name of the commit that was just created, then
> optimising for that case does not sound too bad.  It really depends
> on two factors:
> 
>  - How likely is it that other people make the same mistake?

Looks like the answer is: very. Assuming my quick glance at the code is
not mistaken, the same mistake is made in at least git-p4.py (yes, the
one that comes with git), felipec's git-remote-hg, and hg-fast-export,
and that's 100% of the sample I looked at.

I won't claim to know what fast-import is doing, not having looked at
more than the parse_from* functions and the commit message for 4cabf858,
but it seems plausible this also skips making tree deltas for those
trees.

>  - How bad the damage to parse_from() would be if we wanted to
>optimize for this case?

I /think/ it would look like this (untested), which doesn't seem too
damaging:

diff --git a/fast-import.c b/fast-import.c
index e78ca10..cb232e0 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2588,14 +2588,12 @@ static int parse_from(struct branch *b)
 {
const char *from;
struct branch *s;
+   unsigned char sha1[20];
 
if (!skip_prefix(command_buf.buf, "from ", &from))
return 0;
 
-   if (b->branch_tree.tree) {
-   release_tree_content_recursive(b->branch_tree.tree);
-   b->branch_tree.tree = NULL;
-   }
+   hashcpy(sha1, b->branch_tree.versions[1].sha1);
 
s = lookup_branch(from);
if (b == s)
@@ -2626,6 +2624,11 @@ static int parse_from(struct branch *b)
else
die("Invalid ref name or SHA1 expression: %s", from);
 
+   if (b->branch_tree.tree && hashcmp(sha1, 
b->branch_tree.versions[1].sha1)) {
+   release_tree_content_recursive(b->branch_tree.tree);
+   b->branch_tree.tree = NULL;
+   }
+
read_next_command();
return 1;
 }

Mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suboptimal behavior of fast-import in some cases with "from"

2015-07-06 Thread Junio C Hamano
Mike Hommey  writes:

> One of the first things parse_from does is unconditionally throw away
> the tree for the given branch, and then the "from" tree is loaded. So
> when the "from" commit is the current head of the branch, that make
> fast-import do more work than necessary.

If it is very common that the next commit the input stream wants to
create is often on top of the commit that was just created, and if
it is very common that the input stream producer knows what the
commit object name of the commit that was just created, then
optimising for that case does not sound too bad.  It really depends
on two factors:

 - How likely is it that other people make the same mistake?

 - How bad the damage to parse_from() would be if we wanted to
   optimize for this case?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


suboptimal behavior of fast-import in some cases with "from"

2015-07-06 Thread Mike Hommey
Hi,

I did something "stupid" with a script using fast-import, and that made
the whole process ~20% slower on Linux and 400~500% slower on Mac. The
reason this happened is that the script was essentially adding a "from"
to every "commit" command, even when the "from" commit is the current
head of the branch.

One of the first things parse_from does is unconditionally throw away
the tree for the given branch, and then the "from" tree is loaded. So
when the "from" commit is the current head of the branch, that make
fast-import do more work than necessary.

Even more so when the pack flush code in gfi_unpack_entry is triggered,
which, on mac, is extra slow (and explains the huge slow down there).

Now, I do understand that my script is doing something stupid. So the
question is whether it's worth fixing in fast-import or not. I already
fixed my script anyways.

Mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html