Re: git-svn performance

2014-10-20 Thread Jakob Stoklund Olesen

 On Oct 19, 2014, at 18:16, Eric Wong normalper...@yhbt.net wrote:
 
 Jakob Stoklund Olesen stokl...@2pi.dk wrote:
 If cached_mergeinfo is using too much memory, you can probably drop
 that cache entirely. IIRC, it didn't give that much of a speed up.
 
 I am surprised that it is using a lot of memory, though. There is only
 one entry per SVN branch.
 
 Something like the below?  (on top of your original two patches)
 Pushed to my master @ git://bogomips.org/git-svn.git

Yes, but I think you can remove cached_mergeinfo_rev too. 

Thanks
/Jakob


Eric Wong (2):
  git-svn: reduce check_cherry_pick cache overhead
  git-svn: cache only mergeinfo revisions
 
Jakob Stoklund Olesen (2):
  git-svn: only look at the new parts of svn:mergeinfo
  git-svn: only look at the root path for svn:mergeinfo
 
 git-svn still seems to have some excessive memory usage problems,
 even independenty of mergeinfo stuff.
 --8
 From: Eric Wong normalper...@yhbt.net
 Date: Mon, 20 Oct 2014 01:02:53 +
 Subject: [PATCH] git-svn: cache only mergeinfo revisions
 
 This should reduce excessive memory usage from the new mergeinfo
 caches without hurting performance too much, assuming reasonable
 latency to the SVN server.
 
 Cc: Hin-Tak Leung ht...@users.sourceforge.net
 Suggested-by: Jakob Stoklund Olesen stokl...@2pi.dk
 Signed-off-by: Eric Wong normalper...@yhbt.net
 ---
 perl/Git/SVN.pm | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)
 
 diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
 index 171af37..f8a75b1 100644
 --- a/perl/Git/SVN.pm
 +++ b/perl/Git/SVN.pm
 @@ -1713,13 +1713,10 @@ sub mergeinfo_changes {
# Initialize cache on the first call.
unless (defined $self-{cached_mergeinfo_rev}) {
$self-{cached_mergeinfo_rev} = {};
 -$self-{cached_mergeinfo} = {};
}
 
my $cached_rev = $self-{cached_mergeinfo_rev}{$old_path};
 -if (defined $cached_rev  $cached_rev == $old_rev) {
 -$old_minfo = $self-{cached_mergeinfo}{$old_path};
 -} else {
 +unless (defined $cached_rev  $cached_rev == $old_rev) {
my $ra = $self-ra;
# Give up if $old_path isn't in the repo.
# This is probably a merge on a subtree.
 @@ -1728,19 +1725,16 @@ sub mergeinfo_changes {
directory didn't exist in r$old_rev\n;
return {};
}
 -my (undef, undef, $props) =
 -$self-ra-get_dir($old_path, $old_rev);
 -if (defined $props-{svn:mergeinfo}) {
 -my %omi = map {split :, $_ } split \n,
 -$props-{svn:mergeinfo};
 -$old_minfo = \%omi;
 -}
 -$self-{cached_mergeinfo}{$old_path} = $old_minfo;
 -$self-{cached_mergeinfo_rev}{$old_path} = $old_rev;
}
 +my (undef, undef, $props) = $self-ra-get_dir($old_path, $old_rev);
 +if (defined $props-{svn:mergeinfo}) {
 +my %omi = map {split :, $_ } split \n,
 +$props-{svn:mergeinfo};
 +$old_minfo = \%omi;
 +}
 +$self-{cached_mergeinfo_rev}{$old_path} = $old_rev;
 
# Cache the new mergeinfo.
 -$self-{cached_mergeinfo}{$path} = \%minfo;
$self-{cached_mergeinfo_rev}{$path} = $rev;
 
my %changes = ();
 -- 
 EW
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git svn's performance issue and strange pauses, and other thing

2014-10-19 Thread Jakob Stoklund Olesen
On Oct 18, 2014, at 21:12, Eric Wong normalper...@yhbt.net wrote:

 I am somwhat worry about the dramatic difference between the two 
 .svn/.caches -
 check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
 _rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
 
 Calling patterns changed, and it looks like Jakob's changes avoided some
 calls.

It is possible that those functions don't need to be memoized any more. My 
patch is trying to avoid calling them with the same arguments over and over, 
and memoizing doesn't help when arguments are changing.

Thanks,
/jakob--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-svn performance

2014-10-19 Thread Jakob Stoklund Olesen

 On Oct 18, 2014, at 19:33, Eric Wong normalper...@yhbt.net wrote:
 
 Eric Wong normalper...@yhbt.net wrote:
 This reduces hash lookups for looking up cache data and will
 simplify tying data to disk in the next commit.
 
 I considered the following, but GDBM might not be readily available on
 non-POSIX platforms.  I think the other problem is the existing caches
 are still in memory (whether YAML or Storable) even if disk-backed,
 causing a large amount of memory usage anyways.

If cached_mergeinfo is using too much memory, you can probably drop that cache 
entirely. IIRC, it didn't give that much of a speed up.

I am surprised that it is using a lot of memory, though. There is only one 
entry per SVN branch.

 (Both patches on top of Jakob's)
 -
 Subject: [RFC] git-svn: tie cached_mergeinfo to a GDBM_File store
 
 This should reduce per-instance memory usage by allowing
 serialization to disk.  Using the existing Memoize::Storable
 or YAML backends does not allow fast lookups.
 
 GDBM_File should be available in most Perl installations
 and should not pose unnecessary burden
 ---
 perl/Git/SVN.pm | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)
 
 diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
 index 25dbcd5..3e477c7 100644
 --- a/perl/Git/SVN.pm
 +++ b/perl/Git/SVN.pm
 @@ -14,6 +14,7 @@ use IPC::Open3;
 use Memoize;  # core since 5.8.0, Jul 2002
 use Memoize::Storable;
 use POSIX qw(:signal_h);
 +use Storable qw(freeze thaw);
 
 use Git qw(
 command
 @@ -1713,10 +1714,21 @@ sub mergeinfo_changes {
 
# Initialize cache on the first call.
unless (defined $cached_mergeinfo) {
 -$cached_mergeinfo = $self-{cached_mergeinfo} = {};
 +my %hash;
 +eval '
 +require File::Temp;
 +use GDBM_File;
 +my $fh = File::Temp-new(TEMPLATE = mergeinfo.);
 +$self-{cached_mergeinfo_fh} = $fh;
 +$fh-unlink_on_destroy(1);
 +tie %hash = GDBM_File, $fh-filename, GDBM_WRCREAT, 0600;
 +';
 +$cached_mergeinfo = $self-{cached_mergeinfo} = \%hash;
}
 
my $cached = $cached_mergeinfo-{$old_path};
 +$cached = thaw($cached) if defined $cached;
 +
if (defined $cached  $cached-[0] == $old_rev) {
$old_minfo = $cached-[1];
} else {
 @@ -1735,11 +1747,12 @@ sub mergeinfo_changes {
$props-{svn:mergeinfo};
$old_minfo = \%omi;
}
 -$cached_mergeinfo-{$old_path} = [ $old_rev, $old_minfo ];
 +$cached_mergeinfo-{$old_path} =
 +freeze([ $old_rev, $old_minfo ]);
}
 
# Cache the new mergeinfo.
 -$cached_mergeinfo-{$path} = [ $rev, \%minfo ];
 +$cached_mergeinfo-{$path} = freeze([ $rev, \%minfo ]);
 
my %changes = ();
foreach my $p (keys %minfo) {
 -- 
 EW
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git svn's performance issue and strange pauses, and other thing

2014-09-19 Thread Jakob Stoklund Olesen


 On Sep 19, 2014, at 1:25, Eric Wong normalper...@yhbt.net wrote:
 
 Hin-Tak Leung ht...@users.sourceforge.net wrote:
 
 -  I know I can probably just read the source, but I'd like to know
 why .git/svn/.caches is even larger than .git/objects (which supposedly
 contains everything that's of interest)? I hope this can be documented
 towards the end of the man-page, for example, of important parts
 of .git/svn (and what not to do with them...), without needing to
 'read the source'. Here is part of du from a couple of days ago:
 
 254816.git/objects
 307056.git/svn/.caches
 332452.git/svn
 588064.git
 
 The actual .git/config is here - this should be sufficient info for
 somebody looking into experiencing the issues I mentioned above.
 
 IIRC, the caching is unique to mergeinfo, so perhaps Jakob's patches
 help, there, too.

IIRC the caches are used for memoization, and with my two patches applied it 
doesn't improve performance much.

You could try removing the memoization after applying my patches.

Thanks,
/Jakob--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo

2014-04-27 Thread Jakob Stoklund Olesen

On Apr 22, 2014, at 11:54 AM, Eric Wong normalper...@yhbt.net wrote:

 Jakob Stoklund Olesen stokl...@2pi.dk wrote:
 Subversion can put mergeinfo on any sub-directory to track cherry-picks.
 Since cherry-picks are not represented explicitly in git, git-svn should
 just ignore it.
 
 Hi, was git-svn trying to track cherry-picks as merge before?

It would try and fail. I didn't explain that properly in the commit message.

Suppose I have a standard svn layout with $url/trunk and $url/branches/topic1. 
My topic1 branch has a change in subdir1 that I want to cherry-pick into trunk:

% svn switch $url/trunk
% cd subdir1
% svn merge $url/branches/topic1/subdir1
% cd ..
% svn commit

This operation will set svn:mergeinfo on $url/trunk/subdir1 where a normal full 
merge would set it on $url/trunk:

% svn pg svn:mergeinfo subdir1 
/branches/topic1/subdir1:3-4

When git-svn fetches these changes, it currently does examine the svn:mergeinfo 
change on the subdirectory as if it were a full merge. It then fails to find a 
revmap for /branches/topic1/subdir1:

Couldn't find revmap for file:///tmp/sdb/branches/topic1/subdir1
r5 = 5ce1f687c30495deca40730fb7be3baa0e145479 (refs/remotes/trunk)

It is looking for refs/remotes/topic1/subdir1, but we only have the 
refs/remotes/topic1 branch in git.

This patch makes git-svn stop trying to reconstruct those subdirectory merges 
that we know will fail anyway.

 This changes behavior a bit, so two independent users of git-svn
 may not have identical histories as a result, correct?

For normal subdirectory cherry-picks as described above, the behavior doesn't 
change. This is just a performance optimization.

For weirder cases where a whole branch has been merged onto a subdirectory of 
trunk, behavior does change. Currently, git-svn will mark that as a full merge 
in git. With this change it won't.

 Can you add a test to ensure this behavior is preserved?
 Thanks.

I'll add a test for the subdirectory merge described above.

 Sorry, I've never looked at mergeinfo myself, mainly relying on
 Sam + tests for this.
 
 [1] - Historically, git-svn (using defaults) has always tried to
  preserve identical histories for independent users across
  different git-svn versions.  However, mergeinfo may be
  enough of a corner-case where we can make an exception.


I agree. It doesn't seem worthwhile to try to preserve git-svn's historical 
behavior in weird corner cases.

BTW, this performance optimization matters not because of sporadic manual 
cherry-picks, but because certain older svn releases would replicate 
svn:mergeinfo on every subdirectory in a standard merge. With hundreds of 
subdirectories and thousands of merged branches, git-svn gets completely stuck 
processing all those mergeinfo lines.

Thanks,
/jakob

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo

2014-04-17 Thread Jakob Stoklund Olesen
In a Subversion repository where many feature branches are merged into a
trunk, the svn:mergeinfo property can grow very large. This severely
slows down git-svn's make_log_entry() because it is checking all
mergeinfo entries every time the property changes.

In most cases, the additions to svn:mergeinfo since the last commit are
pretty small, and there is nothing to gain by checking merges that were
already checked for the last commit in the branch.

Add a mergeinfo_changes() function which computes the set of interesting
changes to svn:mergeinfo since the last commit. Filter out merged
branches whose ranges haven't changed, and remove a common prefix of
ranges from other merged branches.

This speeds up git svn fetch by several orders of magnitude on a large
repository where thousands of feature branches have been merged.

Signed-off-by: Jakob Stoklund Olesen stokl...@2pi.dk
---
 perl/Git/SVN.pm | 84 -
 1 file changed, 72 insertions(+), 12 deletions(-)

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index a59564f..d3785ab 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1178,7 +1178,7 @@ sub find_parent_branch {
  or die SVN connection failed somewhere...\n;
}
print STDERR Successfully followed parent\n unless $::_q  1;
-   return $self-make_log_entry($rev, [$parent], $ed);
+   return $self-make_log_entry($rev, [$parent], $ed, $r0, 
$branch_from);
}
return undef;
 }
@@ -1210,7 +1210,7 @@ sub do_fetch {
unless ($self-ra-gs_do_update($last_rev, $rev, $self, $ed)) {
die SVN connection failed somewhere...\n;
}
-   $self-make_log_entry($rev, \@parents, $ed);
+   $self-make_log_entry($rev, \@parents, $ed, $last_rev);
 }
 
 sub mkemptydirs {
@@ -1478,9 +1478,9 @@ sub find_extra_svk_parents {
 sub lookup_svn_merge {
my $uuid = shift;
my $url = shift;
-   my $merge = shift;
+   my $source = shift;
+   my $revs = shift;
 
-   my ($source, $revs) = split :, $merge;
my $path = $source;
$path =~ s{^/}{};
my $gs = Git::SVN-find_by_url($url.$source, $url, $path);
@@ -1702,6 +1702,62 @@ sub parents_exclude {
return @excluded;
 }
 
+# Compute what's new in svn:mergeinfo.
+sub mergeinfo_changes {
+   my ($self, $old_path, $old_rev, $path, $rev, $mergeinfo_prop) = @_;
+   my %minfo = map {split :, $_ } split \n, $mergeinfo_prop;
+   my $old_minfo = {};
+
+   # Initialize cache on the first call.
+   unless (defined $self-{cached_mergeinfo_rev}) {
+   $self-{cached_mergeinfo_rev} = {};
+   $self-{cached_mergeinfo} = {};
+   }
+
+   my $cached_rev = $self-{cached_mergeinfo_rev}{$old_path};
+   if (defined $cached_rev  $cached_rev == $old_rev) {
+   $old_minfo = $self-{cached_mergeinfo}{$old_path};
+   } else {
+   my $ra = $self-ra;
+   # Give up if $old_path isn't in the repo.
+   # This is probably a merge on a subtree.
+   if ($ra-check_path($old_path, $old_rev) != $SVN::Node::dir) {
+   warn W: ignoring svn:mergeinfo on $old_path, ,
+   directory didn't exist in r$old_rev\n;
+   return {};
+   }
+   my (undef, undef, $props) =
+   $self-ra-get_dir($old_path, $old_rev);
+   if (defined $props-{svn:mergeinfo}) {
+   my %omi = map {split :, $_ } split \n,
+   $props-{svn:mergeinfo};
+   $old_minfo = \%omi;
+   }
+   $self-{cached_mergeinfo}{$old_path} = $old_minfo;
+   $self-{cached_mergeinfo_rev}{$old_path} = $old_rev;
+   }
+
+   # Cache the new mergeinfo.
+   $self-{cached_mergeinfo}{$path} = \%minfo;
+   $self-{cached_mergeinfo_rev}{$path} = $rev;
+
+   my %changes = ();
+   foreach my $p (keys %minfo) {
+   my $a = $old_minfo-{$p} || ;
+   my $b = $minfo{$p};
+   # Omit merged branches whose ranges lists are unchanged.
+   next if $a eq $b;
+   # Remove any common range list prefix.
+   ($a ^ $b) =~ /^[\0]*/;
+   my $common_prefix = rindex $b, ,, $+[0] - 1;
+   $changes{$p} = substr $b, $common_prefix + 1;
+   }
+   print STDERR Checking svn:mergeinfo changes since r$old_rev: ,
+   scalar(keys %minfo),  sources, ,
+   scalar(keys %changes),  changed\n;
+
+   return \%changes;
+}
 
 # note: this function should only be called if the various dirprops
 # have actually changed
@@ -1715,14 +1771,15 @@ sub find_extra_svn_parents {
# history.  Then, we figure out which git revisions are in
# that tip, but not this revision.  If all of those revisions

[PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo

2014-04-17 Thread Jakob Stoklund Olesen
Subversion can put mergeinfo on any sub-directory to track cherry-picks.
Since cherry-picks are not represented explicitly in git, git-svn should
just ignore it.

Signed-off-by: Jakob Stoklund Olesen stokl...@2pi.dk
---
 perl/Git/SVN.pm | 29 +
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index d3785ab..0aa4dd3 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1210,7 +1210,7 @@ sub do_fetch {
unless ($self-ra-gs_do_update($last_rev, $rev, $self, $ed)) {
die SVN connection failed somewhere...\n;
}
-   $self-make_log_entry($rev, \@parents, $ed, $last_rev);
+   $self-make_log_entry($rev, \@parents, $ed, $last_rev, $self-path);
 }
 
 sub mkemptydirs {
@@ -1859,21 +1859,18 @@ sub make_log_entry {
my $untracked = $self-get_untracked($ed);
 
my @parents = @$parents;
-   my $ps = $ed-{path_strip} || ;
-   for my $path ( grep { m/$ps/ } %{$ed-{dir_prop}} ) {
-   my $props = $ed-{dir_prop}{$path};
-   if ( $props-{svk:merge} ) {
-   $self-find_extra_svk_parents
-   ($ed, $props-{svk:merge}, \@parents);
-   }
-   if ( $props-{svn:mergeinfo} ) {
-   my $mi_changes = $self-mergeinfo_changes
-   ($parent_path || $path, $parent_rev,
-$path, $rev,
-$props-{svn:mergeinfo});
-   $self-find_extra_svn_parents
-   ($ed, $mi_changes, \@parents);
-   }
+   my $props = $ed-{dir_prop}{$self-path};
+   if ( $props-{svk:merge} ) {
+   $self-find_extra_svk_parents
+   ($ed, $props-{svk:merge}, \@parents);
+   }
+   if ( $props-{svn:mergeinfo} ) {
+   my $mi_changes = $self-mergeinfo_changes
+   ($parent_path, $parent_rev,
+$self-path, $rev,
+$props-{svn:mergeinfo});
+   $self-find_extra_svn_parents
+   ($ed, $mi_changes, \@parents);
}
 
open my $un, '', $self-{dir}/unhandled.log or croak $!;
-- 
1.8.5.2 (Apple Git-48)

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html