[Monotone-devel] Re: [PATCH] git_export: improve mark import when file is empty

2009-12-18 Thread Felipe Contreras
On Sun, Nov 29, 2009 at 8:40 PM, Felipe Contreras
 wrote:
> If the marks file is empty, git_export will complain about the format.
> Instead, we should ignore the marks, just like git fast-import does.
>
> Signed-off-by: Felipe Contreras 
> ---
>  git_export.cc |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/git_export.cc b/git_export.cc
> index 6b4fcdc..ccee591 100644
> --- a/git_export.cc
> +++ b/git_export.cc
> @@ -89,6 +89,7 @@ import_marks(system_path const & marks_file,
>   data mark_data;
>   read_data(marks_file, mark_data);
>   istringstream marks(mark_data());
> +  marks.peek();
>   while (!marks.eof())
>     {
>       char c;
> --

Anything wrong with this patch? Derek, can you please apply it?

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] [PATCH] git_export: improve mark import when file is empty

2009-11-29 Thread Felipe Contreras
If the marks file is empty, git_export will complain about the format.
Instead, we should ignore the marks, just like git fast-import does.

Signed-off-by: Felipe Contreras 
---
 git_export.cc |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/git_export.cc b/git_export.cc
index 6b4fcdc..ccee591 100644
--- a/git_export.cc
+++ b/git_export.cc
@@ -89,6 +89,7 @@ import_marks(system_path const & marks_file,
   data mark_data;
   read_data(marks_file, mark_data);
   istringstream marks(mark_data());
+  marks.peek();
   while (!marks.eof())
 {
   char c;
-- 
1.6.6.rc0.63.g0471c



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Wiki update: projects using monotone

2009-10-20 Thread Felipe Contreras
Hi,

It seems the page is totally outdated:
http://monotone.ca/wiki/ProjectsUsingMonotone/

I know at least OpenEmbedded stopped using mtn a long time ago. Can
somebody please update that?

Also, it doesn't make sense to list monotone-related projects; of
course they would be using monotone, what else? At the very least they
should be listed separately.

Cheers.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: git_export improvement

2009-06-27 Thread Felipe Contreras
On Wed, Jun 3, 2009 at 6:51 AM, Derek Scherger wrote:
>
>
> On Sun, May 24, 2009 at 2:37 AM, Felipe Contreras
>  wrote:
>>
>> However, there's one missing feature: now that I'm not testing any
>> more I would like git_export to abort when the author is not listed.
>> Can you do that?
>
> By "not listed" do you mean not present in the authors-file?
>
> Note that this doesn't have much, if anything, to do with the ""
> string used as the default author when there are *no* author certs. In that
> case, there's not much else to do, some string needs to be used for the
> author, and this string is looked up in the authors-file allowing you to
> change it if you like.
>
> At the moment, if the authors-file lookup fails the author string is checked
> to ensure that it is wrapped in '<' and '>' characters which seems to work
> ok for many historical monotone names and avoids the need to create a full
> authors-file. It sounds like you want to require a successful authors-file
> lookup and disable this name fixup.
>
> I think it might be better to use a lua hook for doing whatever name fixups
> are required if the authors-file lookup fails and if this hook fails we'll
> cancel the export. This should allow for using either the authors-file, or
> the hook or both.
>
>> IMHO putting some fake id such as 'unknown' is good for testing
>> purposes so perhaps it should be turned on with --enable-author-guess
>> or something.
>
> As mentioned above, the "" string is used when there are no author
> certs available. We could provide an option to set the value of this string
> to something else, but you can already change this string using the
> authors-file so another option would be redundant.

Any update on this?

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: git_export improvement

2009-06-06 Thread Felipe Contreras
On Wed, Jun 3, 2009 at 6:51 AM, Derek Scherger wrote:
>
>
> On Sun, May 24, 2009 at 2:37 AM, Felipe Contreras
>  wrote:
>>
>> However, there's one missing feature: now that I'm not testing any
>> more I would like git_export to abort when the author is not listed.
>> Can you do that?
>
> By "not listed" do you mean not present in the authors-file?
>
> Note that this doesn't have much, if anything, to do with the ""
> string used as the default author when there are *no* author certs. In that
> case, there's not much else to do, some string needs to be used for the
> author, and this string is looked up in the authors-file allowing you to
> change it if you like.
>
> At the moment, if the authors-file lookup fails the author string is checked
> to ensure that it is wrapped in '<' and '>' characters which seems to work
> ok for many historical monotone names and avoids the need to create a full
> authors-file. It sounds like you want to require a successful authors-file
> lookup and disable this name fixup.
>
> I think it might be better to use a lua hook for doing whatever name fixups
> are required if the authors-file lookup fails and if this hook fails we'll
> cancel the export. This should allow for using either the authors-file, or
> the hook or both.
>
>> IMHO putting some fake id such as 'unknown' is good for testing
>> purposes so perhaps it should be turned on with --enable-author-guess
>> or something.
>
> As mentioned above, the "" string is used when there are no author
> certs available. We could provide an option to set the value of this string
> to something else, but you can already change this string using the
> authors-file so another option would be redundant.

My purpose is to notice whenever an author is missing from the author
mapping, so that I can investigate the real name and email, and add it
there. I'm not sure a hook would fit... up to you.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] git_export improvement

2009-05-24 Thread Felipe Contreras
Hi Derek,

I've been using git_export for a while and it works pretty well, I'm
getting rid of my custom mtn2git scripts :)

However, there's one missing feature: now that I'm not testing any
more I would like git_export to abort when the author is not listed.
Can you do that?

IMHO putting some fake id such as 'unknown' is good for testing
purposes so perhaps it should be turned on with --enable-author-guess
or something.

Cheers.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Google Code DVCS Analysis

2009-05-01 Thread Felipe Contreras
On Tue, Apr 28, 2009 at 9:41 AM, Lapo Luchini  wrote:
> http://code.google.com/p/support/wiki/DVCSAnalysis
>
> Summary:
>
> In terms of implementation effort, Mercurial has a clear advantage due
> to its efficient HTTP transport protocol.
>
> In terms of features, Git is more powerful, but this tends to be offset
> by it being more complicated to use.

Just for the record, that was done almost one year ago and some points
don't apply any more:

 * Git has a strong Linux heritage, and the official way to run it
under Windows is to use cygwin <- not any more
 * Maintenance. Git requires periodic maintenance of repositories
(i.e. git-gc) <- not any more

Other points are not valid:

 * History is Sacred.

They themselves say "a custom Git server could be written to disallow
the loss of data", although that's not quite true. Git has hooks,
simply modifying the hooks can disallow they behavior they want to
disallow. And on the other hand mercurial has rebase support too.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: [PATCH 1/2] git_export: avoid multiple sql queries

2009-03-10 Thread Felipe Contreras
On Tue, Mar 10, 2009 at 2:37 AM, Jack Lloyd  wrote:
> On Sun, Mar 08, 2009 at 09:41:40PM -0600, Derek Scherger wrote:
>>
>> Committed in rev 0d53349ddb2728ddf1342cdfe69810840bef5252. I've also done
>> something similar to the log command and this made a *huge* difference. On
>> my laptop logging net.venge.monotone back to the root took around 9 minutes.
>> With this change it takes around 4 seconds.
>
> From 5 minutes to 3 seconds on my desktop. Nice!
>
> However, a minor problem: in cases where two keys signed a revision,
> log will now show the changelog and branch certs twice. This is mostly
> visible on merge certs.
>
> Also (very minor, AFIACT): datestamps and author ids of certs with
> multiple signers are printed in a different order than prior. I'm not
> sure if anything cares about that, though.

This could be considered breaking backwards compatibility. But come
on, who could care that the ordering is different?

Still, just in case it should be mentioned in the NEWS I guess.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: [PATCH 1/2] git_export: avoid multiple sql queries

2009-03-10 Thread Felipe Contreras
On Mon, Mar 9, 2009 at 5:41 AM, Derek Scherger  wrote:
> On Sat, Mar 7, 2009 at 11:31 PM, Felipe Contreras
>  wrote:
>>
>> This improves performance while exporting. In my system I see an
>> improvement from 52 minutes to 6 seconds.
>
> Committed in rev 0d53349ddb2728ddf1342cdfe69810840bef5252. I've also done
> something similar to the log command and this made a *huge* difference. On
> my laptop logging net.venge.monotone back to the root took around 9 minutes.
> With this change it takes around 4 seconds.
>
> The problem that this has uncovered seems to be that the query for certs by
> revision id and name uses the unique index on revision_certs which isn't the
> one we want. Querying without the name uses the revision id index which
> seems to be much better.
>
> Thanks Felipe!

Awesome! I always thought mtn log was *sloow*, not any more thanks to you :)

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: [PATCH 1/2] git_export: avoid multiple sql queries

2009-03-10 Thread Felipe Contreras
On Sun, Mar 8, 2009 at 10:40 PM, Derek Scherger  wrote:
>
> On Sun, Mar 8, 2009 at 1:01 PM, Felipe Contreras
>  wrote:
>>
>> Yeah, I think there must be something fishy in my system, that's why
>> you were getting better performance all the time, but hopefully with
>> this patch my system (and other ones) will perform better.
>>
>> I'm interested in the Pidgin repo, I've been using my mtn2git script
>
> For reference, exporting a pidgin repo I downloaded a while ago takes about
> 70 minutes now and I think your patch should cut that down to about 40
> minutes but I haven't tested it yet. I'll let you know when I do.

It was taking two hours here, now it takes 60 min. There was quite a
big difference between your system and mine but my patch seems to
decrease the difference significantly.

So whatever it was, it's less relevant now.

>> that is also doing only a few sql queries but unfortunately that
>> generates different results than git_export. So I made the changes to
>> do the queries exactly as you do, and *bang*... slow as hell. My bet
>> is my sqlite (3.5.9).
>
> I seem to have sqlite 3.6.6.2 and the same problem. My guess is that it's
> not using an index that it should be but so far that's just a guess and I
> need to dig into it further. Not sure when I'll get to that. I did try a
> similar change to mtn log which should be suffering from the same problem
> but it didn't make much difference there.
>
>> I still need to push some stuff, like an option to find out missing
>> authors from the map.
>>
>> After finding this I decided to profile the 'loading' step, since IMO
>> it's taking too much time. I used gprof and gprof2dot and it looks
>>
>> like the biggest offender is get_change->roster_t::get_name taking 60%
>> of the time. That is after my modifications, which I guess can't be
>> applied upstream but maybe you would like to take a look?
>
> Sure, it can't hurt to look. If getting names from rosters is slow and we
> can speed it up then great.

Ok, attached. These are just for testing purposes I'm not even sure
how much each patch improves performance if any at all.

All I know is last time I measured the time it takes to import the
Pidgin repo it was 44min (from 60 in current mainline). It was
different baseline and different CFLAGS... it might not be due to the
patches.

>> Cool, it would be interesting to find out what caused this.
>
> If I fiind anything I'll let you know.
>
>>
>> I don't know... it's a normal patch with some extra info. It looks
>> like the patches came from mtn revision
>> '44683b999fa8092a1e7111728cf72e429b0abd0d'.
>
> It wasn't that it failed to apply cleanly, it was that patch didn't like it
> at all:
>
> $ patch -p0 < export1.patch
> patching file git_export.cc
> Hunk #1 FAILED at 215.
> patch:  malformed patch at line 13:   cert_vector tags;

It could be the way you saved it from your mailer.

I just tried
patch -p1 -i 0001-Add-get_roster_version_fast.patch

No problem at all.

Now I'm sending the patches as attachments, maybe that helps.

Cheers.

-- 
Felipe Contreras


0001-Add-get_roster_version_fast.patch
Description: Binary data


0002-Add-get_name_fast.patch
Description: Binary data


0003-Random-cleanups.patch
Description: Binary data
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: [PATCH 1/2] git_export: avoid multiple sql queries

2009-03-08 Thread Felipe Contreras
On Sun, Mar 8, 2009 at 8:39 PM, Derek Scherger  wrote:
>
>
> On Sat, Mar 7, 2009 at 11:31 PM, Felipe Contreras
>  wrote:
>>
>> This improves performance while exporting. In my system I see an
>> improvement from 52 minutes to 6 seconds.
>
> I do see quite a big performance improvement with this patch as well,
> although not nearly as dramatic as your 52 minutes down to 6 seconds.
> Exporting my monotone database takes ~26 minutes without this patch and ~17
> minutes with it, so there is about 9 minutes of cert loading time which
> certainly needs fixing.

Yeah, I think there must be something fishy in my system, that's why
you were getting better performance all the time, but hopefully with
this patch my system (and other ones) will perform better.

> I've also tested against pidgin, xaraya and OE databases and it shaves about
> 30 minutes off of each of those as well. For pidgin and xaraya this cuts the
> export time almost in half. OE takes *forever* to export though (12 hours or
> so) so cutting 30 minutes off doesn't make such a big difference. I think OE
> has lots more active files and much larger rosters but I haven't really
> looked into why it's so slow.
>
> So, I'm *really* curious as to exactly what you're testing to get this
> speedup. Can you provide some more details?

I'm interested in the Pidgin repo, I've been using my mtn2git script
that is also doing only a few sql queries but unfortunately that
generates different results than git_export. So I made the changes to
do the queries exactly as you do, and *bang*... slow as hell. My bet
is my sqlite (3.5.9).

BTW, my script is in Ruby, so sqlite is being used through the
bindings. You can find it here:
http://github.com/felipec/mtn2git

I still need to push some stuff, like an option to find out missing
authors from the map.

After finding this I decided to profile the 'loading' step, since IMO
it's taking too much time. I used gprof and gprof2dot and it looks
like the biggest offender is get_change->roster_t::get_name taking 60%
of the time. That is after my modifications, which I guess can't be
applied upstream but maybe you would like to take a look?

> I don't have any objections to this patch and I'll probably commit it later
> today, after poking around a bit more to see why in the heck loading certs
> by name is as slow as it is.

Cool, it would be interesting to find out what caused this.

> I did have to make a few adjustments to the patch as I've merged the
> fast-export branch back to net.venge.monotone and a few minor things have
> changed there. patch(1) also didn't like this patch and I had to apply it by
> hand. Is there some know problem with git-format-patch (assuming that's what
> you're using) that produces patches that patch(1) doesn't like?

I don't know... it's a normal patch with some extra info. It looks
like the patches came from mtn revision
'44683b999fa8092a1e7111728cf72e429b0abd0d'.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] [PATCH 2/2] git_export: cleanup

2009-03-07 Thread Felipe Contreras
These are possible after the previous commit.

Signed-off-by: Felipe Contreras 
---
 git_export.cc |   93 -
 1 files changed, 46 insertions(+), 47 deletions(-)

diff --git a/git_export.cc b/git_export.cc
index 803caa4..9583d8a 100644
--- a/git_export.cc
+++ b/git_export.cc
@@ -216,43 +216,46 @@ export_changes(database & db,
   typedef map::const_iterator lookup_iterator;
 
   cert_vector certs;
-  cert_vector authors;
-  cert_vector branches;
-  cert_vector changelogs;
-  cert_vector comments;
-  cert_vector dates;
-  cert_vector tags;
+  vector authors;
+  vector branches;
+  vector changelogs;
+  vector comments;
+  vector dates;
+  vector tags;
 
   db.get_revision_certs(*r, certs);
 
+  // default to  committer and author if no author certs exist
+  // this may be mapped to a different value with the authors-file option
+  string author_name = ""; // used as the git author
+  string author_key  = ""; // used as the git committer
+  date_t author_date = date_t::now();
+
   for (cert_iterator i = certs.begin(); i != certs.end(); i++)
 {
   if (i->inner().name == author_cert_name)
-authors.push_back(*i);
+{
+  if (authors.empty())
+{
+  author_name = trim(i->inner().value());
+  author_key  = trim(i->inner().key());
+}
+  authors.push_back(i->inner().value());
+}
   else if (i->inner().name == date_cert_name)
-dates.push_back(*i);
+{
+  if (dates.empty())
+author_date = date_t(i->inner().value());
+  dates.push_back(i->inner().value());
+}
   else if (i->inner().name == changelog_cert_name)
-changelogs.push_back(*i);
+changelogs.push_back(i->inner().value());
   else if (i->inner().name == branch_cert_name)
-branches.push_back(*i);
+branches.push_back(i->inner().value());
   else if (i->inner().name == tag_cert_name)
-tags.push_back(*i);
+tags.push_back(i->inner().value());
   else if (i->inner().name == comment_cert_name)
-comments.push_back(*i);
-}
-
-  // default to  committer and author if no author certs exist
-  // this may be mapped to a different value with the authors-file option
-  string author_name = ""; // used as the git author
-  string author_key  = ""; // used as the git committer
-  date_t author_date = date_t::now();
-
-  cert_iterator author = authors.begin();
-
-  if (author != authors.end())
-{
-  author_name = trim(author->inner().value());
-  author_key  = trim(author->inner().key());
+comments.push_back(i->inner().value());
 }
 
   // all monotone keys and authors that don't follow the "Name "
@@ -280,17 +283,12 @@ export_changes(database & db,
author_name.find('>') == string::npos)
 author_name = "<" + author_name + ">";
 
-  cert_iterator date = dates.begin();
-
-  if (date != dates.end())
-author_date = date_t(date->inner().value());
-
   // default to unknown branch if no branch certs exist
   // this may be mapped to a different value with the branches-file option
   string branch_name = "unknown";
 
   if (!branches.empty())
-branch_name = branches.begin()->inner().value();
+branch_name = *(branches.begin());
 
   branch_name = trim(branch_name);
 
@@ -307,10 +305,10 @@ export_changes(database & db,
   changelogs.insert(changelogs.end(),
 comments.begin(), comments.end());
 
-  for (cert_iterator changelog = changelogs.begin();
+  for (vector::iterator changelog = changelogs.begin();
changelog != changelogs.end(); ++changelog)
 {
-  string value = changelog->inner().value();
+  string value = *changelog;
   if (messages.find(value) == messages.end())
 {
   messages.insert(value);
@@ -380,19 +378,20 @@ export_changes(database & db,
 
   if (log_certs)
 {
+  vector::iterator i;
+
   message << "\n";
-  for ( ; author != authors.end(); ++author)
-message << "Monotone-Author: " << author->inner().value() << "\n";
+  for (i = authors.begin(); i != authors.end(); ++i)
+message << "Monotone-Author: " << *i << "\n";
 
-  for ( ; date != dates.end(); ++date)
-message << "Monotone-Date: 

[Monotone-devel] [PATCH 1/2] git_export: avoid multiple sql queries

2009-03-07 Thread Felipe Contreras
This improves performance while exporting. In my system I see an
improvement from 52 minutes to 6 seconds.

Signed-off-by: Felipe Contreras 
---

Obviously this applies on to of the fast-export branch.

 git_export.cc |   24 ++--
 1 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/git_export.cc b/git_export.cc
index fc92b3b..803caa4 100644
--- a/git_export.cc
+++ b/git_export.cc
@@ -215,6 +215,7 @@ export_changes(database & db,
   typedef cert_vector::const_iterator cert_iterator;
   typedef map::const_iterator lookup_iterator;
 
+  cert_vector certs;
   cert_vector authors;
   cert_vector branches;
   cert_vector changelogs;
@@ -222,12 +223,23 @@ export_changes(database & db,
   cert_vector dates;
   cert_vector tags;
 
-  db.get_revision_certs(*r, author_cert_name, authors);
-  db.get_revision_certs(*r, branch_cert_name, branches);
-  db.get_revision_certs(*r, changelog_cert_name, changelogs);
-  db.get_revision_certs(*r, comment_cert_name, comments);
-  db.get_revision_certs(*r, date_cert_name, dates);
-  db.get_revision_certs(*r, tag_cert_name, tags);
+  db.get_revision_certs(*r, certs);
+
+  for (cert_iterator i = certs.begin(); i != certs.end(); i++)
+{
+  if (i->inner().name == author_cert_name)
+authors.push_back(*i);
+  else if (i->inner().name == date_cert_name)
+dates.push_back(*i);
+  else if (i->inner().name == changelog_cert_name)
+changelogs.push_back(*i);
+  else if (i->inner().name == branch_cert_name)
+branches.push_back(*i);
+  else if (i->inner().name == tag_cert_name)
+tags.push_back(*i);
+  else if (i->inner().name == comment_cert_name)
+comments.push_back(*i);
+}
 
   // default to  committer and author if no author certs exist
   // this may be mapped to a different value with the authors-file option
-- 
1.6.2



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] nvm.fast-export

2009-03-06 Thread Felipe Contreras
On Thu, Mar 5, 2009 at 6:44 PM, Derek Scherger  wrote:
>
> On Thu, Mar 5, 2009 at 2:32 AM, Felipe Contreras
>  wrote:
>>
>> I'm about at 90% of verifying the exactness compared to my method. I
>> think I'll be able to complete that before the weekend, but I don't
>> think I'll find any issue.
>>
>> Great job!
>
> Ok good. I'm not in any real rush so maybe I'll wait until you finish.

Verified. Both methods generate exactly the same sha1's :)

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] nvm.fast-export

2009-03-05 Thread Felipe Contreras
On Thu, Mar 5, 2009 at 4:56 AM, Derek Scherger  wrote:
> If there aren't any objections I think I'm ready to land the fast-export
> branch (the new git_export command) but I'll wait until this weekend before
> I do. This branch doesn't change any existing functionality so there should
> be no risk of breaking anything. I've exported and successfully imported a
> monotone database, a pidgin database, an older open embedded database and a
> xaraya database so at the very least it does produce things that git
> fast-import can handle. I've also done some verifications on each of these
> by checking out each branch and tag that can be checked out (by a dumb
> script) and diffing the results and I haven't found any problems so far.

I'm about at 90% of verifying the exactness compared to my method. I
think I'll be able to complete that before the weekend, but I don't
think I'll find any issue.

Great job!

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: checkout with no branch cert [was re: git fast-export]

2009-03-01 Thread Felipe Contreras
On Mon, Mar 2, 2009 at 7:53 AM, Derek Scherger  wrote:
> On Fri, Feb 27, 2009 at 10:41 PM, Derek Scherger 
> wrote:
>>
>> My impression at the moment is that the exported history does have correct
>> permissions because it agrees with a monotone checkout (which requires
>> addition of a branch cert) of the same revision. It seems that there are two
>> different problems with monotone here (1) checkout is not possible for
>> revisions that have no branch certs and (2) update doesn't always produce
>> correct execute permissions.
>
> As  mentioned in my previous email, I think (2) is  now fixed and I'm
> wondering how we want to approach (1).
>
> It seems like we should probably just remove the checks for a branch option
> from the places that don't actually need it, in particular setup and
> checkout come to mind but there may be others. This would essentially delay
> aborting on a pending problem until it actually becomes a real problem. It
> would also prevent the possibility of a problem from being a real problem
> itself, as in the case of checkout.
>
> Another way of thinking about this is that, at the moment, a workspace is
> almost required to have a branch option and maybe it doesn't really need
> one. Various operations in that workspace will need a branch option, but
> that is their problem, not the workspace's.

Perhaps it helps to think about it in a different way; it is possible
to have commits with no branch, currently it's not possible to
checkout those commits, therefore the current behavior is broken.

I think your proposed solution makes sense (delay the blockage until
it's a real problem). Many people checkout the source code and don't
really commit anything... Why would they care if the checkout is in
two branches? Why are they forced to choose one?

>From any point of view I think your idea is good :)

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-28 Thread Felipe Contreras
On Sun, Mar 1, 2009 at 12:31 AM,   wrote:
> On Sat, Feb 28, 2009 at 08:59:04PM +0200, Felipe Contreras wrote:
>> Appending two changelogs will never look 'natural', besides, some
>> people might not like the way two changelogs are appended. I think
>> that's the kind of decision that a team should do when converting a
>> repository. I'm just mirroring, I don't want to think about that, just
>> produce something that looks good and it's functional.
>
> It's important that information be round-trip-stable.  That is, if info
> goes fom git to monotone, back to get, back to monotone, at some point
> it should stop changing.

That will never be possible, there are many incompatibilities between
git and mtn:

a) git branches are just pointers, so a tree where there's a commit in
the middle that is in no branch (b->b->X->b->b), can't be stored in
git.

b) multiple and special certs can't be stored reliably. For example a
'comment' cert, you can append it to the log, but it won't have the
key information. Or the manual merge flag, etc.

c) certificate stuff can't be stored

There's many other things... in short it's just not possible to
convert from mtn to git and back again without loosing information.
IMO the amount of information is a threshold that would vary.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-28 Thread Felipe Contreras
On Sat, Feb 28, 2009 at 8:38 PM, Derek Scherger  wrote:
>
> On Sat, Feb 28, 2009 at 2:49 AM, Felipe Contreras
>  wrote:
>>
>> >> As I said, my objective is to generate git clone for people to
>> >> develop/follow/maintain instead of the mtn repo, in this case there no
>> >> need to have every single bit of information since the mtn repo would
>> >> still be available.
>> >
>> > Does a bit of "extra" information hurt this use-case somehow?
>>
>> Yes, because you see two changelogs appended instead of one, possibly
>> with the comments too. It doesn't look like a native git repo.
>>
>> >> On the other hand, when a project moves away from mtn to git, then
>> >> your method makes more sense.
>
> It seems to me that this directly contradicts your previous statement, that
> "looking like a native git repo" is somehow important for a mirrored
> repository and yet unimportant for a converted repository. Nonetheless, I'm
> tired of arguing about this and I've added a --use-one-changelog option that
> picks one and uses it. I will be very surprised if anyone else ever uses
> this option but it's harmless.

Appending two changelogs will never look 'natural', besides, some
people might not like the way two changelogs are appended. I think
that's the kind of decision that a team should do when converting a
repository. I'm just mirroring, I don't want to think about that, just
produce something that looks good and it's functional.

Please don't think I'm saying that option is *a must*, I'm just saying
that if it's not there I would have to modify the code, which is not a
big deal for me.

>> Could you make a patch that gets rid of the 'no branch' error?
>
> With any luck at all someone else will beat me to it. I've got too many
> other things on the go at the moment to get to this now but I will
> eventually if no one else does.

It looks like I've got it (attached).

-- 
Felipe Contreras


mtn.diff
Description: Binary data
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-28 Thread Felipe Contreras
On Sat, Feb 28, 2009 at 8:26 AM, Paul Aurich  wrote:
> On Feb 27, 2009, at 21:41, Derek Scherger wrote:
>>
>> My impression at the moment is that the exported history does have correct
>> permissions because it agrees with a monotone checkout (which requires
>> addition of a branch cert) of the same revision. It seems that there are two
>> different problems with monotone here (1) checkout is not possible for
>> revisions that have no branch certs and (2) update doesn't always produce
>> correct execute permissions.
>
> Felipe discovered what I believe to be the cause of this a few months ago
> [1]. As I understand the issue, there is no `mtn update` hook for unsetting
> execute bits, so unsetting that attribute doesn't have any effect. However,
> when doing an update that would involve moving very far through history
> (say, from the revision Felipe mentions in that email to
> h:im.pidgin.pidgin), I believe Monotone optimizes that operation to 'check
> out the new manifest [and apply working changes]', and as the mtn:exec
> property isn't set on the files in the target revision, the file's exec bit
> is unset.

So there's no fix and no clear path on how this will get fixed, right?

> I may have some of the details of how Monotone handles these cases wrong,
> but hopefully my description is clear enough to be sensible. And of course,
> credit for discovering and figuring out why it sometimes does work go to
> Felipe and some people in #pidgin (sorry, I don't remember who,
> specifically).

I found the issue and they found the underlying problem.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-28 Thread Felipe Contreras
On Sat, Feb 28, 2009 at 7:41 AM, Derek Scherger  wrote:
>
> On Fri, Feb 27, 2009 at 1:21 AM, Felipe Contreras
>  wrote:
>>
>> As I said, my objective is to generate git clone for people to
>> develop/follow/maintain instead of the mtn repo, in this case there no
>> need to have every single bit of information since the mtn repo would
>> still be available.
>
> Does a bit of "extra" information hurt this use-case somehow?

Yes, because you see two changelogs appended instead of one, possibly
with the comments too. It doesn't look like a native git repo.

>> On the other hand, when a project moves away from mtn to git, then
>> your method makes more sense.
>>
>> That's why I think it should be an option.
>
> So something like --use-one-changelog that grabs one of the changelog certs
> at random and spits that out? Sorry, I'm really having a hard time seeing
> how this could actually be useful. Are you just trying to get this export to
> exactly match what your script produces so that they can compare
> identically? If so, would it be possible instead to change your script so
> that it appends all changelogs into one complete message?

I've changed my script to simulate yours when I think it's sensible,
otherwise I've modified your code to do what my script is doing. So
far this has been the only change that I'm still doing... once it find
a changelog, I break the loop.

>> >> Anyway, I've been able to reach a little further and now I've finally
>> >> found a difference in the trees between your and my method. In
>> >> Pidigin's repo there's a commit
>> >> '3f1b3854a77850131531d1d6f19c44a0b9174107' that in my method some
>> >> files have exec flag off, and with your method it has the exec flag
>> >> on. Can you take a look?
>> >
>> > Good catch. The monotone checkout of this revision has execute bits on
>> > some
>> > files that the git checkout does not. I'll have to do some digging to
>> > see
>> > what's going wrong here.
>>
>> I'm not exactly sure what you mean with this, but there's a bug in
>> 'mtn update' that sometimes doesn't pick the correct exec flag. That's
>> why I'm doing a full 'mtn checkout'.
>
> Here's what I see:
>
> A git checkout of refs/mtn/revs/3f1b3854a77850131531d1d6f19c44a0b9174107
> from the exported git repo does not have execute permissions on
> ./po/{id,ne,ps}.po or on a few files in ./doc/oscar/. If I update a monotone
> workspace to this revision it does have execute permissions on these files
> and disagrees with the git workspace exactly on these permission bits.
>
> A monotone checkout of the same revision does NOT have execute permissions
> on these files and all permission bits are in agreement with the git
> checkout. Note that this revision has no branch cert which apparently
> prevents it from being checked out from monotone so I've added a bogus
> branch cert  to my local database to make a checkout possible.

That's really annoying! I've had to do many hacks in my script to make
the checkout possible... got parent by parent until there's one that
has a branch cert, checkout that, then update to the original commit.

> My impression at the moment is that the exported history does have correct
> permissions because it agrees with a monotone checkout (which requires
> addition of a branch cert) of the same revision. It seems that there are two
> different problems with monotone here (1) checkout is not possible for
> revisions that have no branch certs and (2) update doesn't always produce
> correct execute permissions.

Agreed.

>> The problem is not your method, the problem is mine, which is
>> painfully slow, but it's needed for a bit-exact comparison. It's
>> tedious but hopefully the comparisons will soon be done.
>
> It's great to have another method to compare the output against and make
> sure both produce equivalent results so I do appreciate the effort. Have you
> previously done lots of verification of the output of your script, to the
> point where you trust it to a reasonable degree?

Yes I have. That's why I found issues in mtn in the first place. I've
been trying something foolproof, first I was doing 'mtn update' and
importing the exact workplace. Then I found issues and I tried with
'mtn checkout' and still I found issues (with no branches).

I've found that in the case of
3f1b3854a77850131531d1d6f19c44a0b9174107 my script is unreliable
because the result depends on which parent I'm basing my update. It
looks like my script cannot be reliable unless I avoid 'mtn update' or
it is fixed in mtn.

Could you make a patch that gets rid of the 'no branch' error?

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-27 Thread Felipe Contreras
On Fri, Feb 27, 2009 at 7:58 AM, Derek Scherger  wrote:
>
> On Wed, Feb 25, 2009 at 12:37 PM, Felipe Contreras
>  wrote:
>>
>> I think it should be an option. Otherwise the people that want a
>> single message would have trouble running a git filter-branch command
>> to strip the message out. It would be much easier to do that in the
>> mtn export.
>
> Looking through the pidgin repo that I have here, there are several commits
> with multiple changelog's some of which consist of a single 'a' character.
> Selecting one of these arbitrarily is going to select the 'a' changelogs
> sometimes which I suspect is also not what you want, unless you're thinking
> of a different option that I am. As I recall what you wanted was an option
> to just grab one changelog and use that right? Maybe the longest changelog
> would be the best one to use? I see several other revisions with multiple
> distinct changelogs that seem like they would be good to preserve as well.
>
> I'm somewhat reluctant to add an option that does this because it does not
> seem like a general thing that anyone else will want and it seems like it
> will just move your problem around a bit. Instead of getting changelogs you
> don't want you'll be missing changelogs you do want.
>
> The other options here are to (1) filter the exported data and remove the
> messages you don't want or (2) delete the unwanted changelog certs from a
> copy of your monotone database and export from that. Both of these should be
> scriptable without too much trouble although Identifying the specific
> changelogs to drop will probably be rather tedious.

As I said, my objective is to generate git clone for people to
develop/follow/maintain instead of the mtn repo, in this case there no
need to have every single bit of information since the mtn repo would
still be available.

On the other hand, when a project moves away from mtn to git, then
your method makes more sense.

That's why I think it should be an option.

>> I don't know the exact commit id in the Pidgin repo, but I can assure
>> you, it's there.
>
> Oh I beleive you, but it still might be useful to see the actual real data
> and do something based on that. So, if you do come across the revision id,
> I'd still like to see it.

Sure, if I find it I'll let you know.

>> no author cert: ''
>> user_id not mapped: ''
>> user_id mapped: obvious
>
> The current code works mostly like this. In the unmapped case it only adds
> '<' and '>' when neither is present. There are monotone users who have their
> keys named like "User Name " and adding another set of '<'
> and '>' around these wouldn't make much sense.

Ahh, right. In Pidgin all the userids are 'u...@foobar' or just 'user'
so far but I think maybe in the latest commits they are doing as you
say. So yeah, your code makes sense.

>> Anyway, I've been able to reach a little further and now I've finally
>> found a difference in the trees between your and my method. In
>> Pidigin's repo there's a commit
>> '3f1b3854a77850131531d1d6f19c44a0b9174107' that in my method some
>> files have exec flag off, and with your method it has the exec flag
>> on. Can you take a look?
>
> Good catch. The monotone checkout of this revision has execute bits on some
> files that the git checkout does not. I'll have to do some digging to see
> what's going wrong here.

I'm not exactly sure what you mean with this, but there's a bug in
'mtn update' that sometimes doesn't pick the correct exec flag. That's
why I'm doing a full 'mtn checkout'.

>> Now I'm using a bit different method so I'll be able to test faster.
>
> The latest monotone git_export code runs quite a bit faster as well. I can
> export the pidgin repo I have here in a little over an hour instead of the 5
> hours it was taking previously.

The problem is not your method, the problem is mine, which is
painfully slow, but it's needed for a bit-exact comparison. It's
tedious but hopefully the comparisons will soon be done.

Cheers.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-25 Thread Felipe Contreras
On Tue, Feb 10, 2009 at 8:27 AM, Derek Scherger  wrote:
>
> On Mon, Feb 9, 2009 at 11:14 PM, Felipe Contreras
>  wrote:
>>
>> It's just that I don't like that behavior. It doesn't matter how smart
>> is the algorithm, it will always look like two messages instead of
>> one, which might be ok for some people, but not for other.
>
> What I was trying to do was to only use *one* of the two messages if they
> were identical so it shouldn't look like two messages at all.

But they are not identical.

>> In any case, the first error I got was with a revision that had a
>> change like this [merge 0123...] and another one like [merge
>> '0123...].
>
> In the case of merges it should be catching the fact that they have the same
> message and only including one of them. I guess if one of them has a quote
> and one doesn't it will fail though. It seems odd that you would be getting
> this and makes me wonder whether different monotone versions have had
> different automated messages in these cases. Can you post the *exact*
> contents of the message you're getting please?
>
> If it is the case that you have two slightly different automatically
> generated merge messages then this isn't going to handle that and I think
> the best thing to do in that case is keep all of the messages in the
> exported data, rather than losing information. If you don't like specific
> messages there's always the option of removing some things from the exported
> data before importing it which seems like it would generally be easier than
> adding things to the exported data that it doesn't contain.

I think it should be an option. Otherwise the people that want a
single message would have trouble running a git filter-branch command
to strip the message out. It would be much easier to do that in the
mtn export.

I don't know the exact commit id in the Pidgin repo, but I can assure
you, it's there.

>> > Yes. Git doesn't like authors without a email address wrapped in < and >
>> > so
>> > you need to put these in the --authors-file mappings.
>>
>> Why not? I thought '' was ok.
>
> '' is only used when there are no author certs, not when some
> author cert is not found in the author map. If there is an author cert with
> no  then git won't like it. Another option would be to require these
> values to exist in the author map or replace them with  as you seem
> to be suggesting.

I'm sorry, I can't recall what specifically we where discussing but I
think it should work this way:

no author cert: ''
user_id not mapped: ''
user_id mapped: obvious

Anyway, I've been able to reach a little further and now I've finally
found a difference in the trees between your and my method. In
Pidigin's repo there's a commit
'3f1b3854a77850131531d1d6f19c44a0b9174107' that in my method some
files have exec flag off, and with your method it has the exec flag
on. Can you take a look?

Now I'm using a bit different method so I'll be able to test faster.

Cheers.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-09 Thread Felipe Contreras
On Tue, Feb 10, 2009 at 6:58 AM, Derek Scherger  wrote:
>
> On Mon, Feb 9, 2009 at 4:40 PM, Felipe Contreras
>  wrote:
>>
>> I'm getting a bit further now, but I'm still having some
>> incompatibilities. I've tried to emulate most of them, but it takes
>> time between each try.
>
> I've checked in a change that improves performance quite a lot, which I'll
> describe in another email.
>>
>>
>> The first issue I stumbled upon is that you do the following:
>>
>>  db.get_revision_certs(*r, author_cert_name, authors);
>>  db.get_revision_certs(*r, branch_cert_name, branches);
>>  db.get_revision_certs(*r, changelog_cert_name, changelogs);
>>  db.get_revision_certs(*r, comment_cert_name, comments);
>>  db.get_revision_certs(*r, date_cert_name, dates);
>>
>> While I get all the certs and handle them in one single go. I think
>> it's more efficient to do a single sql query. This generates a
>
> sqlite handles rapid-fire queries *much* faster than your average oracle or
> postgresql database does. I would be very surprised if issuing these 5
> queries per rev made any measurable difference to performance, but I have
> been wrong before. ;)
>
>> different order when there's multiple committers. For now I'm doing it
>> separately as your code does.
>>
>> The next one is regarding the changelogs. I want to have only one
>> changelog, not concatenate them. So I patched your code. I'm attaching
>> the patch.
>
> I'm curious as to what the exact problem you're having here is. Can you give
> an example of the messages you're getting and what you would like to have?
>
> The export code should not be repeating changelogs that are due to multiple
> people arriving at the same merge or propagate. The intent of the code
> that's there is that if multiple people did have unique things to say it
> will preserve all of their messages which seems better than randomly
> throwing away someone's comments. It should be retaining only one
> automatically generated merge or propagate message though.

It's just that I don't like that behavior. It doesn't matter how smart
is the algorithm, it will always look like two messages instead of
one, which might be ok for some people, but not for other.

In any case, the first error I got was with a revision that had a
change like this [merge 0123...] and another one like [merge
'0123...].

>> Then I tried to update to your latest revision and I'm getting a
>> failure from fast import. I'm attaching the report.
>
> Yes. Git doesn't like authors without a email address wrapped in < and > so
> you need to put these in the --authors-file mappings.

Why not? I thought '' was ok.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-09 Thread Felipe Contreras
On Tue, Feb 10, 2009 at 1:40 AM, Felipe Contreras
 wrote:
> On Sun, Jan 25, 2009 at 6:40 AM, Derek Scherger  wrote:
>> On Thu, Jan 22, 2009 at 1:33 PM, Derek Scherger  wrote:
>>>
>>>> I think "" is a better start point.
>>
>> Both committer and author are defaulted to "" in rev
>> 2c207d528e37e59d8d8d14e24edd14fb34a10a21.
>>
>> I'm using the value from the author cert as the git author and the key from
>> the same cert as the git committer. That should generally be the same key
>> that would be on a changelog cert but it's not guaranteed to be.
>>
>> I've also removed the code that was trying to clean up various things, like
>> adding < and > around email addresses. These should all be fixed by mappings
>> using the --authors-file now.
>>
>> The list of committers that might need mapping comes from:
>>
>> $ mtn db execute 'select distinct keypair from revision_certs where name =
>> "author"'
>>
>> The list of authors that might need mapping comes from:
>>
>> $ mtn db execute 'select distinct value from revision_certs where name =
>> "author"'
>
> Great!
>
> I'm getting a bit further now, but I'm still having some
> incompatibilities. I've tried to emulate most of them, but it takes
> time between each try.
>
> The first issue I stumbled upon is that you do the following:
>
>  db.get_revision_certs(*r, author_cert_name, authors);
>  db.get_revision_certs(*r, branch_cert_name, branches);
>  db.get_revision_certs(*r, changelog_cert_name, changelogs);
>  db.get_revision_certs(*r, comment_cert_name, comments);
>  db.get_revision_certs(*r, date_cert_name, dates);
>
> While I get all the certs and handle them in one single go. I think
> it's more efficient to do a single sql query. This generates a
> different order when there's multiple committers. For now I'm doing it
> separately as your code does.
>
> The next one is regarding the changelogs. I want to have only one
> changelog, not concatenate them. So I patched your code. I'm attaching
> the patch.
>
> Then I tried to update to your latest revision and I'm getting a
> failure from fast import. I'm attaching the report.

Ah, I read your comments in the code, so it fails when the author is
not defined... I think that should be optional.

How about this patch?

-- 
Felipe Contreras


mtn-fast-export-authors.diff
Description: Binary data
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-22 Thread Felipe Contreras
On Thu, Jan 22, 2009 at 6:05 PM, Derek Scherger  wrote:
> On Wed, Jan 21, 2009 at 3:17 PM, Felipe Contreras
>  wrote:
>>
>> However, I found some issues:
>>
>> 1) no author
>>
>> Where is no author it appears as "unknown"; it's missing a
>> space and I think first-letter capitalization looks better for names
>> ("Unknown").
>
> Good point. I'll fix that.
>>
>> 2) no name
>>
>> As discussed before I prefer "Unknown " but your approach
>> ("") is not bad.
>>
>> 3) no email
>>
>> When there's no email I get "Name"; it's missing a space.
>
> What I'm planning to do is this:
>
> 1. start with "unknown " (all lowercase, with a space) and only
> used this when there are no author certs
> 2. grab the value from the first (see below for definition of "first")
> author cert if there is one
> 3. look up the value from above in the --author-file to see if there is a
> mapping to something else and use the result if there is one
> 4. use the result from above for both committer and author
>
> There's a bit of extra complexity in this at the moment (adding < and >
> around unadorned email addresses, etc.) that's left over from before I added
> the --author-file option and it doesn't really make a lot of sense any more.
> If you want something other than "unknown " then mapping that to
> whatever you would like in the --author-file should suffice. For other
> authors that either don't have a name or don't have an email address you'll
> need to add mappings (IIRC git will not accept committers that lack an email
> address).

Or I can modify your code.

I think "" is a better start point.

>> I really don't like 1), there is *always* a committer in mtn. I
>
> There ought to be but there's no real requirement by the data model and if a
> pull operation was interrupted at the wrong moment it is likely possible to
> miss some certs. Also, when pulling you always get revs, but you might not
> get certs (at least branch certs) if they don't match the pattern you're
> pulling with. I can't recall if no certs are pulled for revs that don't
> match the branch pattern.

There must be a least one cert per revision. Doesn't it? Date? Changelog?

>> propose to use the first committer of the changelog cert as the git
>
> There is not really any inherent order in the author certs so by first I
> mean "which ever one I get first."
>
> The changelog does not have a "first committer", all it has is a signature
> from some key, who's name might match the value of the author cert, or it
> might not. If your database is old enough to have been through a rebuild
> (and an epoch change) then all of the certs from prior to the rebuild will
> be signed by the person who did the rebuild, not their original signer.
> Using these as committers wouldn't be very good.

Why not? It's a better approximation than "author" which is already
available in the git commits. What would be the advantage of having
both git author and git committer set to mtn author?

>> committer, and then, if there's no mtn author, use the same committer
>> as author.
>>
>> Anyway, I'll try to simulate that behaviour so I can make an exact
>> comparison of the repos.
>
> Sounds good. I'm also thinking of adding --export-marks and --import-marks
> options as documented in git-fast-import and git-fast-export, which should
> allow for incremental exporting. It will probably be a few days before I get
> to any of this though.

Cool. It's not a feature that I need right now, but would be nice :)

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-21 Thread Felipe Contreras
On Thu, Jan 22, 2009 at 4:17 AM,   wrote:
> On Thu, Jan 22, 2009 at 12:17:14AM +0200, Felipe Contreras wrote:
>>
>> However, I found some issues:
>>
>> 1) no author
>>
>> Where is no author it appears as "unknown"; it's missing a
>> space and I think first-letter capitalization looks better for names
>> ("Unknown").
>
> The point of lower-case here is presumably that "unknown" isn't
> somebody's name, thereby obviating confusion with Mr. Simon J. Unknown.

But it's part of the "name" field, so perhaps "" would be better?

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-21 Thread Felipe Contreras
On Sun, Jan 18, 2009 at 9:40 PM, Derek Scherger  wrote:
>
> On Sun, Jan 18, 2009 at 12:26 AM, Felipe Contreras
>  wrote:
>>
>> True. Then the issue is with my script (ruby date parsing).
>>
>> It looks like I would have to re-generate my repo clone (yay for one
>> whole day of conversion) for the comparison.
>>
>> Before I do that, can we agree on a format for unknown committers?
>
> Is there anything wrong with what I have now, which I think should be
> "unknown "? If you want something else use the --authors-file and
> set 'unknown = Unknown '

Since I'll be re-generating the repo with my script once more, I'll
follow you convention.

However, I found some issues:

1) no author

Where is no author it appears as "unknown"; it's missing a
space and I think first-letter capitalization looks better for names
("Unknown").

2) no name

As discussed before I prefer "Unknown " but your approach
("") is not bad.

3) no email

When there's no email I get "Name"; it's missing a space.

I really don't like 1), there is *always* a committer in mtn. I
propose to use the first committer of the changelog cert as the git
committer, and then, if there's no mtn author, use the same committer
as author.

Anyway, I'll try to simulate that behaviour so I can make an exact
comparison of the repos.

Cheers.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-17 Thread Felipe Contreras
On Sun, Jan 18, 2009 at 1:18 AM, Paul Aurich  wrote:
> And Felipe Contreras spake on 01/17/2009 02:52 PM, saying:
>> I ran my comparison script, but unfortunately the first revision has a
>> missmatch:
>>
>> yours:
>> author Tailor Script  953780991 +
>>
>> mine:
>> author Tailor Script  953773791 +0200
>>
>> Which suggest that your script is not handling the timezone correctly
>> (not sure about that).
>
> No, his script is handling it properly (based solely on this example).
>
> Monotone uses UTC internally and git does some crazy wacky things with
> timezones (in short, the timezone shown is /just/ for prettifying the date;
> the timestamp by itself *must* be the time of the commit in UTC). See
> https://kerneltrap.org/mailarchive/git/2007/2/6/237902
>
>  $ mtn automate certs d137c7046bae7e4a0144fee82bfce8061f61e3b3 | grep date
> -A 1
>   name "date"
>  value "2000-03-23T03:09:51"
>
>  $ date -d "2000-03-23 03:09:51" -u
>  Thu Mar 23 03:09:51 UTC 2000
>  $ date -d "2000-03-23 03:09:51" -u  +"%s"
>  953780991

True. Then the issue is with my script (ruby date parsing).

It looks like I would have to re-generate my repo clone (yay for one
whole day of conversion) for the comparison.

Before I do that, can we agree on a format for unknown committers?

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-17 Thread Felipe Contreras
On Sat, Jan 17, 2009 at 9:04 AM, Derek Scherger  wrote:
> On Wed, Jan 14, 2009 at 12:50 AM, Felipe Contreras
>  wrote:
>>
>> >> 1) Your tool adds a bunch of "Monotone-" fields, can those be disabled?
>> >
>> > There's no option at the moment but it would be easy to add.
>>
>> It would be really useful.
>
> I've added --log-revids and --log-certs to enable including revision ids and
> cert values in the commit logs. These are off by default .

Great!

>> >> 2) There's no author mapping, can this option be added?
>> >
>> > I'm not exactly sure what you mean by author mapping but I assume
>> > translating between things like "f...@bedrock.com" and "Fred Flintstone
>> > "? Is there a generally accepted format that other
>> > tools
>> > use for this?
>>
>> Yes, that's what I meant.
>>
>> The only format I know is the one from git-svn:
>> felipec = Felipe Contreras 
>
> I've added --authors-file and --branches-file options that work like this
> for mapping author names and branch names respectively. Names not found in
> these maps are used as-is. I've also changed the default branch to "unknown"
> from "master" but this can be changed with the branches-file mapping to
> whatever you want with a line like "unknown = whatever-you-want."

Very nice. Now the only difference is that for unknown users my script
maps them to "Unknown ".

>> >> 3) I add the mtn sha1 in refs/mtn/
>> >
>> > This is easy to add too. I have added refs/mtn/roots/ and
>> > refs/mtn/leaves/ and was wondering about all of the monotone
>> > revision
>> > ids. I assume the leaf refs would prevent git from wanting to garbage
>> > collect otherwise unreferenced revs if there were any?
>
> I've added --refs=roots, --refs=leaves and --refs=revs to include
> refs/mtn/roots, refs/mtn/leaves and refs/mtn/revs respectively.

Great :)

>> If there's a ref pointing to it, then it's not pruned.
>
> Good. Including --refs=leaves should make sure that nothing is subject to
> garbage collection then.
>
> Branches and tags can be manually fixed a posteriori, no big issue.
>>
>> The important things are the commits themselves.
>
> Not always. Monotone allows things in branch names that git does not. If
> these aren't changed git will fail to import them.
> Use --branches-file to map offending names to something git can handle.

True.

>> It probably depends on the intent of the clone:
>> a) migrate the repo forever
>> b) mirror a mtn repo
>>
>> Right now I'm interested in b), so I find the ref/mtn approach very
>> useful since I can quickly look for the mtn or git sha1.
>
> The --refs=revs option does clutter up the gitk display somewhat but
> otherwise seems fine.

I tested your changes and converted the pidgin repo, on the second try
I was finally able to convert it, took me 8 hours (1 revs/s).

I ran my comparison script, but unfortunately the first revision has a
missmatch:

yours:
author Tailor Script  953780991 +

mine:
author Tailor Script  953773791 +0200

Which suggest that your script is not handling the timezone correctly
(not sure about that).

Would it be possible to pause after a certain amount of commits, or at
least issue a checkpoint? (maybe git fast-import has this option)

Very good job! I'll try to look further into this later.

Cheers.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: Mini Summit 2009

2009-01-17 Thread Felipe Contreras
On Sat, Jan 17, 2009 at 6:36 PM, Emile Snyder  wrote:
> There's a link off of the monotone home page.  It's channel #monotone on
> irc.oftc.net I think.

Why oftc? Why not freenode?

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-13 Thread Felipe Contreras
On Wed, Jan 14, 2009 at 6:41 AM, Derek Scherger  wrote:
> On Tue, Jan 13, 2009 at 8:30 PM, Felipe Contreras
>  wrote:
>>
>> It seems to speed up at some points, I have tried again two times but
>> I had issues, I still don't have the numbers but it's probably faster
>> than what I thought.
>
> Yeah, it depends a lot on the length of the delta chains required to
> reconstruct rosters. Newer rosters reconstruct faster.
>>
>> > I have not verified this imported repo in any way yet though so who
>> > knows
>> > whether its accurate or not.
>>
>> I can verify comparing to the output of my tool, but there are some
>> differences:
>
> Good and bad I guess. ;)
>
>> 1) Your tool adds a bunch of "Monotone-" fields, can those be disabled?
>
> There's no option at the moment but it would be easy to add.

It would be really useful.

> Note that monotone revisions can have multiple authors, dates, changelogs,
> etc. if several people merge two revisions to the same result. The git
> fast-import format doesn't seem to allow more than committer and author and
> the monotone side doesn't indicate which would be which. So, at the moment I
> just grab one author, date and branch, cert and use that. I do concatenate
> all the changelogs and comment certs together for the git commit message and
> add the "Monotone-" values on to the end of that in case they are
> interesting.

I ignore the comments and use only the first value of changelog, date
and author. I set the committer to the first person that added a
changelog.

The git format only allows one author and one committer, but the
convention is to add multiple 'signed-off-by' lines for the people
that reviewed and accepted the patch. I have an option to add the
s-o-b lines, but I have it off.

I'm interested in a git repo that looks natural, not to have every
single bit of information from the mtn repo.

>> 2) There's no author mapping, can this option be added?
>
> I'm not exactly sure what you mean by author mapping but I assume
> translating between things like "f...@bedrock.com" and "Fred Flintstone
> "? Is there a generally accepted format that other tools
> use for this?

Yes, that's what I meant.

The only format I know is the one from git-svn:
felipec = Felipe Contreras 

> This would be easy enough to add but with the caveat above about picking one
> author from several. We will very likely not agree on author or date on some
> revisions where multiple certs exist.

Right, but I don't think there's any point in trying to mirror exactly
the original repo; it's not possible. Lets just settle for a
reasonably good approximation.

>> 3) I add the mtn sha1 in refs/mtn/
>
> This is easy to add too. I have added refs/mtn/roots/ and
> refs/mtn/leaves/ and was wondering about all of the monotone revision
> ids. I assume the leaf refs would prevent git from wanting to garbage
> collect otherwise unreferenced revs if there were any?

If there's a ref pointing to it, then it's not pruned.

>> Only 1) would be required to do a comparison, 2) would be great to
>> avoid converting the huge repo again without author mappings.
>
> Another option here is to process the exported output through
> sed/awk/perl/python during the fast-import phase. I suspect this may be
> needed in some cases anyway to fix branch names and things allthough I guess
> a branch mapping file would also be a possibility.

I don't like the sed approach.

Branches and tags can be manually fixed a posteriori, no big issue.
The important things are the commits themselves.

>> In order to do future updates I think 3) would be really great, that
>> way it's possible to know if a revision has been imported or not, and
>> makes possible to do quick lookups like: git show mtn/.
>
> Yeah, this is probably worth having, at least for checking things over after
> an import. I'm not sure if you would want to keep these refs around long
> term or not. I was wondering about exporting the marks file as well, but
> this would probably be better.

It probably depends on the intent of the clone:
a) migrate the repo forever
b) mirror a mtn repo

Right now I'm interested in b), so I find the ref/mtn approach very
useful since I can quickly look for the mtn or git sha1.

> All of these things had crossed my mind previously and I'll probably get to
> them at some point.

Cool, then we are mostly in sync ;)

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-13 Thread Felipe Contreras
On Wed, Jan 7, 2009 at 8:50 AM, Derek Scherger  wrote:
> On Mon, Jan 5, 2009 at 7:27 PM, Felipe Contreras
>  wrote:
>>
>> On Tue, Jan 6, 2009 at 1:42 AM, Derek Scherger 
>> wrote:
>> > On Mon, Jan 5, 2009 at 2:22 PM, Felipe Contreras
>> >  wrote:
>> >>
>> >> Why an extra "master" branch? There's no need for that branch.
>> >
>> > It's used for revs that have no other branch certs to use.
>>
>> Would it make sense to use another name? "nil", "unknown" or something?
>
> Possibly. It's probably not a big deal to rename it after importing either.
>
>> >> taking about 6 seconds per commit, that's too slow.
>> >> Working with the roster is extremely slow. Right now your tool is
>
> progress revision a19da46cd3d31611d768b67a772c2861aded46c5 (27501/27501)
>
> real277m23.646s
> user251m33.919s
> sys 24m42.357s
>
> For the record I can apparently convert the pidgin database in about 4.5
> hours on a 2.4GHz core 2 which works out to about 1.65 revs per second (0.61
> seconds per rev) on average. The export file is 3.4GB in size.

It seems to speed up at some points, I have tried again two times but
I had issues, I still don't have the numbers but it's probably faster
than what I thought.

> I have not verified this imported repo in any way yet though so who knows
> whether its accurate or not.

I can verify comparing to the output of my tool, but there are some differences:

1) Your tool adds a bunch of "Monotone-" fields, can those be disabled?
2) There's no author mapping, can this option be added?
3) I add the mtn sha1 in refs/mtn/

Only 1) would be required to do a comparison, 2) would be great to
avoid converting the huge repo again without author mappings.

In order to do future updates I think 3) would be really great, that
way it's possible to know if a revision has been imported or not, and
makes possible to do quick lookups like: git show mtn/.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Felipe Contreras
On Tue, Jan 6, 2009 at 1:42 AM, Derek Scherger  wrote:
> On Mon, Jan 5, 2009 at 2:22 PM, Felipe Contreras
>  wrote:
>>
>> Why an extra "master" branch? There's no need for that branch.
>
> It's used for revs that have no other branch certs to use.

Would it make sense to use another name? "nil", "unknown" or something?

>> Working with the roster is extremely slow. Right now your tool is
>> taking about 6 seconds per commit, that's too slow.
>
> How many files are in a pidgin checkout? I was getting about 3 revs per
> second on the monotone database on my laptop.
> How fast is your machine, what does hdparm report, how much ram do you have,
> etc.?

It depends at which point in the history, but that point I guess about
1700 files.

I have a centrino duo 1.83ghz, 2 GB of ram.

hdparm -t shows:

 Timing buffered disk reads:  116 MB in  3.08 seconds =  37.63 MB/sec

I don't know if that's what you are looking for.

Using revisions I've been able to convert the 27000 commits to git in
about 2 hours, but it doesn't work properly, I need to rewrite my
script.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Felipe Contreras
On Mon, Jan 5, 2009 at 8:09 AM, Derek Scherger  wrote:
> I've spent a bit of holiday hacking time working on a git_export command for
> monotone, more as an experiment than anything else. I've committed the
> result to net.venge.monotone.fast-export for people to have a look at.
> There's probably not much preventing this from landing on mainline, other
> than some documentation and possibly tests. Although I'm not really sure how
> we would want to go about testing it beyond what I've already done. The fun
> part about a command like this is that I expect most users of it would have
> some expectation of being their own testers in terms of verifying their
> conversions and such.

Great! I'm already trying it with Pidgin.

> This successfully (I think) converts the entire monotone database with 276
> branches (more or less what you get when you pull '*' from monotone.ca) to a
> git repository.Here's some details on the conversion:
>
> exported monotone database
> - 174MB in size
> - 276 branches
> - 127 tags (with one duplicate name monotone-viz-1.0.1-1
> - export time 83m42.134s (on a 2.0GHz pentium-m laptop)
> - export file size 2.9GB
> - 15245 revisions exported
>
> imported git repository
> - 719MB in size (before being repacked)
> - import time 23m15.463s
> - repack -adf time 3m14.385s
> - packed repository size 60MB
> - 277 branches (the extra one is "master")

Why an extra "master" branch? There's no need for that branch.

> - 126 tags (missing the duplicate above)
>
> Three exported branch names "net.prjek:tester",
> "net.prjet:tester/drop-for-propagate" and "prjek.net:tester" where changed
> (with sed) during the import process because git does not allow colon's (and
> various other characters) in branch/ref names. I simply changed ":" and "/"
> in these names to "." although the "/" should have worked it did cause an
> error of some sort.
>
> The conversion was verified by checking out each of the 276 branches and 126
> tags from both git and mtn and comparing the resulting workspaces. The
> script I used to do this verification was a bit dumb and failed to checkout
> a few revisions so these weren't compared. Using only the branch name failed
> in some cases because there were multiple heads and using only a tag name
> failed in some cases because the tagged revisions had no branch certs. All
> of the branches and tags that did checkout were identical according to diff
> -qr so I'm reasonably confident that the new exporter basically works.

I have a ruby script (mtn2git) that I'm pretty confident generates an
exact clone, the problem is that it's *very* slow.

I could probably compare the output of mtn2git with your tool but it
would probably take more than one entire day to generate the repo.

> I suspect that the various other git fast-import conversion scripts that
> exist for monotone are probably slower and less robust than this
> implementation (unless they work similarly from rosters) which uses the
> monotone internals to do the work. I spent a bit of time initially trying to
> export revisions using the revision data structures but this didn't work
> very well. Git only deals with files and trying to order a mix of renames of
> directories and files from monotone correctly from revisions was difficult.
> Ultimately I didn't use the revision data structures at all but built up a
> similar files-only based revision representation by comparing rosters. Much
> like what is done for make_cset, but ignoring directories and producing only
> file deletions, renames and additions. This works much better, correctly
> handles pivot_root and a few other odd things that working with revisions
> proved difficult.

Working with the roster is extremely slow. Right now your tool is
taking about 6 seconds per commit, that's too slow.

I agree that working with revisions it very error prone, but it's the
only decent approach if you want something fast.

I think the best way to do this would be with revisions, and careful
comparisons with other more robust approaches, until all the issues
are tracked down.

Cheers.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] interface versions, again

2008-12-22 Thread Felipe Contreras
On Mon, Dec 22, 2008 at 3:36 AM, Ethan Blanton  wrote:
> Stephen Leake spake unto us the following wisdom:
>> While we are on this topic, can I request that the monotone version be
>> bumped to 0.43 immediately after the release, rather than immediately
>> before the next release? That makes it easier for development versions
>> of Emacs DVC (and other similar tools) to check for the presence of
>> brand new stuff in monotone.
>
> For Pidgin, we have found it helpful to, immediately after release,
> increment the version number and append 'devel'.  This helps us
> identify those users who are using a development version which may
> *not* have a particular feature, even though its version number would
> otherwise suggest it.  Monotone does, of course, provide the revision
> identifier to help with this, but I for one don't know much about a
> revision after a quick glance at the rev ID.  ;-)
>
> For example, in this case, after cutting the 0.42 release, nvm would
> immediately be changed to 0.43devel.

Or you can do it the git way: 0.43-6-digit-sha1.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: exec flag not always set (Was: Re: [Monotone-devel] checkout automate?)

2008-12-20 Thread Felipe Contreras
On Sat, Dec 20, 2008 at 10:14 AM, Hugo Cornelis  wrote:
> On Fri, Dec 19, 2008 at 5:39 PM, Felipe Contreras
>  wrote:
>> I have some scripts that convert a mtn repo to git repo (as you might
>> know) but I stumbled upon some issues while doing 'mtn update' (the
>> exec flag is not properly set sometimes), so now I'm using 'mtn
>
> I have seen this to a couple of times, but I don't know how to
> reproduce it.  Especially annoying when this breaks automated tests.
>
> Does anyone have an idea what the problem is, and is there a solution
> or workaround ?

I can reproduce it reliably with a Pidgin repo:
mtn update --revision 00673aa20e125b6b2d904b2e1f31c6db51e63e3a --reallyquiet
mtn update --revision 570066b17b01473ab44d5ef95e4acd0ac6a8deb3 --reallyquiet
ls -l po/id.po
mtn update --revision b7add4f0aed4d53f564e4fde2227f70a42923e31 --reallyquiet
mtn update --revision 570066b17b01473ab44d5ef95e4acd0ac6a8deb3 --reallyquiet
ls -l po/id.po

AFAIK the Pidgin guys where able to reproduce this issue and even
found a workaround.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] checkout automate?

2008-12-19 Thread Felipe Contreras
Hi,

I have some scripts that convert a mtn repo to git repo (as you might
know) but I stumbled upon some issues while doing 'mtn update' (the
exec flag is not properly set sometimes), so now I'm using 'mtn
checkout'. The problem is that some revisions don't have the branch
properly set so they are impossible to checkout; I need to go back in
history to find out a parent that has a branch, check that out, and
then update, but still, there could be errors in the process.

What would be the recommended way to checkout all of the revisions in
the repo as fast as possible and without any errors?

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: [ANN] mtn2git v0.1

2008-09-13 Thread Felipe Contreras
On Sat, Sep 13, 2008 at 3:02 PM, Jakub Narebski <[EMAIL PROTECTED]> wrote:
> On Sat, 13 Sep 2008, Felipe Contreras wrote:
>> On Sat, Sep 13, 2008 at 12:45 PM, Jakub Narebski <[EMAIL PROTECTED]> wrote:
>>> "Felipe Contreras" <[EMAIL PROTECTED]> writes:



>>>> After some RFCs on git and monotone mailing lists it seems now that
>>>> the script is going in the right direction.
>>>
>>> When you feel this script to be ready, could you add it to the
>>> "Interaction with other Revision Control Systems" section on
>>>  http://git.or.cz/gitwiki/InterfacesFrontendsAndTools
>>> As far as I can see there ain't any Monotone to Git converter on this
>>> list.
>>
>> Ok, done. I think it's ready if you can bare the slowness of the
>> 'checkout' method. The only missing feature is tags, but should be
>> easy to implement.
>
> Thank you.
>
> BTW. did you have any problems with (from what I understand) slightly
> different concept of branches between Monotone and Git?

Monotone can have multiple heads in one single branch, but from what I
understand that mostly happens locally (not on the published repo).
Anyway, If that happens the commits are still there, just dangling
temporarily in no branch.

There isn't much we can do for that situation, except maybe create
branch_n or something. I don't think it's a big problem.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: [ANN] mtn2git v0.1

2008-09-13 Thread Felipe Contreras
On Sat, Sep 13, 2008 at 12:45 PM, Jakub Narebski <[EMAIL PROTECTED]> wrote:
> "Felipe Contreras" <[EMAIL PROTECTED]> writes:
>
>> This is the result of various experiments I've been doing while trying
>> to import mtn repositories into git. I've looked into other mtn2git
>> scripts but none fitted my needs.
>
> mtn or mnt?

monotone = mtn

>> After some RFCs on git and monotone mailing lists it seems now that
>> the script is going in the right direction.
>
> When you feel this script to be ready, could you add it to the
> "Interaction with other Revision Control Systems" section on
>  http://git.or.cz/gitwiki/InterfacesFrontendsAndTools
> As far as I can see there ain't any Monotone to Git converter on this
> list.

Ok, done. I think it's ready if you can bare the slowness of the
'checkout' method. The only missing feature is tags, but should be
easy to implement.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] [ANN] mtn2git v0.1

2008-09-12 Thread Felipe Contreras
Hi,

This is the result of various experiments I've been doing while trying
to import mtn repositories into git. I've looked into other mtn2git
scripts but none fitted my needs.

After some rfcs on git and monotone mailing lists it seems now the
script is going in the right direction.

There are two modes:

== checkout ==

In this mode each revision is checked out and imported directly into
git. This means it's 100% sure that the result would be an exact
clone.

The disadvantage is that it's extremely slow (1 day for 25,000 commits).

== fast-import ==

This mode requires a few patches for git fast-import, it's very fast
(40 min for 25,000 commits), but not 100% reliable yet.

There are also some missing features like branch creation and updates.


My plan is to keep these two modes in the code until fast-import
method is reliable enough.

I've tried this with Pidgin's repository. The result is here:
http://github.com/felipec/pidgin-clone

It would be interesting to do something similar with OE's repo. Any takers?

Comments and patches are welcome :)

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: [RFC] mtn to git conversion script

2008-09-04 Thread Felipe Contreras
On Thu, Sep 4, 2008 at 1:50 PM, Thomas Moschny <[EMAIL PROTECTED]> wrote:
> On Thu, Sep 4, 2008, Felipe Contreras wrote:
>> Ok, now the basics seem to be working. So I'm uploading some code if
>> anyone wants to take a look.
>>
>> The C code is generating a topologically sorted list of revisions, and
>> storing the relevant information (certs and parents) separately. This
>> code is very fast. It's using GLib and sqlite3, so probably the GLib
>> stuff should be converted to use libgit.
>> http://gist.github.com/8742
>
> You shouldn't access Monotone's sqlite database directly, for various reasons.
> Use the Automation Interface instead, see
> http://www.monotone.ca/docs/Automation.html#Automation. Using 'mtn automate
> stdio', you can feed an arbitrary amount of commands to one single running mtn
> process.

I use mtn stdio when needed, that is, when doing it manually would be
too complicated (get_file). Doing it directly with sqlite3 is *very*
fast, I don't see any reason to not to do it.

Feel free to modify the code for stdio.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: [RFC] mtn to git conversion script

2008-09-04 Thread Felipe Contreras
On Thu, Sep 4, 2008 at 1:31 PM, Jakub Narebski <[EMAIL PROTECTED]> wrote:
> "Felipe Contreras" <[EMAIL PROTECTED]> writes:
>
>> Ok, now the basics seem to be working. So I'm uploading some code if
>> anyone wants to take a look.
>>
>> The C code is generating a topologically sorted list of revisions, and
>> storing the relevant information (certs and parents) separately. This
>> code is very fast. It's using GLib and sqlite3, so probably the GLib
>> stuff should be converted to use libgit.
>> http://gist.github.com/8742
>>
>> The Ruby code takes that input and generates an output suitable for
>> fast-import. It would be tedious to port the parsing stuff to C, but
>> straightforward.
>> http://gist.github.com/8740
>>
>> A patch for fast-import is required, I already submitted it.
>>
>> Comments?
>
> If you feel like it is good enough, could you add information about
> this tool to Git Wiki:
>  http://git.or.cz/gitwiki/InterfacesFrontendsAndTools
> in the "Interaction with other Revision Control Systems" section?

Not yet.

It still needs to add the branches, tags, and HEAD.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: [RFC] mtn to git conversion script

2008-09-04 Thread Felipe Contreras
On Thu, Aug 28, 2008 at 12:03 PM, Felipe Contreras
<[EMAIL PROTECTED]> wrote:
> On Thu, Aug 28, 2008 at 8:57 AM, Anand Kumria <[EMAIL PROTECTED]> wrote:
>>
>> Hi Felipe,
>>
>> On Mon, 25 Aug 2008 03:45:11 +0300, Felipe Contreras wrote:
>>
>>>
>>> Anyway, very nice tool. It's going much faster (1h) compared to before
>>> (1 day).
>>
>> Will you be submitting this as something for/to contrib?
>
> Yes, that's the plan.
>
> However, I still don't have something that creates an exact clone with
> fast-import.
>
> Also, I'm trying different ways to see what would be most efficient.
> Right now it's a combination of Ruby + C, but once I get it working
> I'll post it to the OE mailing lists to see if it works fine for them
> too.
>
> Once the design is good enough I might move everything to C.
>
> Best regards.

Ok, now the basics seem to be working. So I'm uploading some code if
anyone wants to take a look.

The C code is generating a topologically sorted list of revisions, and
storing the relevant information (certs and parents) separately. This
code is very fast. It's using GLib and sqlite3, so probably the GLib
stuff should be converted to use libgit.
http://gist.github.com/8742

The Ruby code takes that input and generates an output suitable for
fast-import. It would be tedious to port the parsing stuff to C, but
straightforward.
http://gist.github.com/8740

A patch for fast-import is required, I already submitted it.

Comments?

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] mtn2git issues

2008-09-01 Thread Felipe Contreras
On Sun, Aug 31, 2008 at 4:19 PM, Felipe Contreras
<[EMAIL PROTECTED]> wrote:
> On Sun, Aug 31, 2008 at 3:55 PM, Stephen Leake
> <[EMAIL PROTECTED]> wrote:
>> "Felipe Contreras" <[EMAIL PROTECTED]> writes:
>>
>>> On Sun, Aug 31, 2008 at 9:24 AM, Patrick Georgi <[EMAIL PROTECTED]> wrote:
>>>
>>> The only difference is that in git the changes in both branches of a
>>> merge are 'already done' so you can't do them again. So I guess what
>>> fast-import is doing is taking the changes strictly of the merge, and
>>> then the rest of the files are taken from the parents.
>>
>> The problem is defining what files were changed "strictly by the
>> merge". I suspect this means files that were modified from the common
>> ancestor in both parents, and thus needed "file merging" during the
>> revision merge.
>>
>> You can identify such files in the output of get_revision; they are
>> the ones that appear in both changesets.
>
> Yes, I'm trying your suggestion right now.
>
> Good idea.
>
>> However, if the result of the file merge is identical to one of the
>> parents (due to user choices during the file merge), then maybe it's
>> not considered modified during the merge? So you have to look at the
>> file ids, and compare them to the file ids in the parent revisions.
>>
>> I don't see an operation for that in mtn automate; it would be
>> something like:
>>
>> mtn automate get_file_id  
>>
>> or maybe (slightly faster):
>>
>> mtn automate get_corresponding_file_id   
>> 
>
> Right now I'm fetching the whole contents of the file anyway, git
> would check if the file has changed or not.
>
> In order compare the file ids of a revision to the parents I would
> have to either a) keep the whole tree of every revision, or b) use
> get_file_id for the revision and both parents. I assume it would be
> more efficient to do b), but would 3 calls to get_file_id be more
> efficient than just grabbing the whole file contents?
>
> Maybe a get_revision_changes command would make sense; it does the 3
> get_file_id gets, makes the comparison and generates a stripped down
> version.

Ok, instead of generating the "right output" I modified git
fast-import to drop bad changes. It seems to work fine.

Best regards.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: mtn2git issues

2008-08-31 Thread Felipe Contreras
On Sun, Aug 31, 2008 at 6:01 PM, Lapo Luchini <[EMAIL PROTECTED]> wrote:
> Felipe Contreras wrote:
>>
>> Right now I'm fetching the whole contents of the file anyway, git
>> would check if the file has changed or not.
>
> Then you can easily have the file id, that is (currently) the SHA-1 of the
> file...

I would have to get the file contents of the revision and the parents.
But let's not forget why I want the file id: to see if I should
retrieve the file and send it to git, or not. I would rather just send
it to git and let it find out if it changed or not.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] mtn2git issues

2008-08-31 Thread Felipe Contreras
On Sun, Aug 31, 2008 at 3:55 PM, Stephen Leake
<[EMAIL PROTECTED]> wrote:
> "Felipe Contreras" <[EMAIL PROTECTED]> writes:
>
>> On Sun, Aug 31, 2008 at 9:24 AM, Patrick Georgi <[EMAIL PROTECTED]> wrote:
>>
>> The only difference is that in git the changes in both branches of a
>> merge are 'already done' so you can't do them again. So I guess what
>> fast-import is doing is taking the changes strictly of the merge, and
>> then the rest of the files are taken from the parents.
>
> The problem is defining what files were changed "strictly by the
> merge". I suspect this means files that were modified from the common
> ancestor in both parents, and thus needed "file merging" during the
> revision merge.
>
> You can identify such files in the output of get_revision; they are
> the ones that appear in both changesets.

Yes, I'm trying your suggestion right now.

Good idea.

> However, if the result of the file merge is identical to one of the
> parents (due to user choices during the file merge), then maybe it's
> not considered modified during the merge? So you have to look at the
> file ids, and compare them to the file ids in the parent revisions.
>
> I don't see an operation for that in mtn automate; it would be
> something like:
>
> mtn automate get_file_id  
>
> or maybe (slightly faster):
>
> mtn automate get_corresponding_file_id   

Right now I'm fetching the whole contents of the file anyway, git
would check if the file has changed or not.

In order compare the file ids of a revision to the parents I would
have to either a) keep the whole tree of every revision, or b) use
get_file_id for the revision and both parents. I assume it would be
more efficient to do b), but would 3 calls to get_file_id be more
efficient than just grabbing the whole file contents?

Maybe a get_revision_changes command would make sense; it does the 3
get_file_id gets, makes the comparison and generates a stripped down
version.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] mtn2git issues

2008-08-31 Thread Felipe Contreras
On Sun, Aug 31, 2008 at 12:01 PM, Felipe Contreras
<[EMAIL PROTECTED]> wrote:
> On Sun, Aug 31, 2008 at 9:24 AM, Patrick Georgi <[EMAIL PROTECTED]> wrote:
>> Felipe Contreras schrieb:
>>> Then I tried get_manifest_of, however, that is very slow. And even if
>>> it wasn't, it would require a considerable amount of processing and
>>> store every tree of every revision.
>>>
>>> Do you have any recommendation on how to get what changed in each
>>> revision? Surely it must not be that difficult.
>>>
>> Did you try rosters? They're an internal representation of manifests
>> (with some differences, they contain some data specific to the local
>> repository, which you might not care about), and they're also stored in
>> delta format, like manifests. Maybe access is still faster (because they
>> don't need as much validation, as far as I know)
>>
>> For git, you want to know how the tree looked at specific times
>> (snapshots), but the native data exchange format in monotone is
>> changesets (what you get with get_revision - it tells you what changed
>> from rev A to rev B, and in merge scenarios, to each side "the whole
>> other side" happened, to get to the merged result).
>
> Well, this is for git fast-import. So I want the changes of each revision.
>
> The only difference is that in git the changes in both branches of a
> merge are 'already done' so you can't do them again. So I guess what
> fast-import is doing is taking the changes strictly of the merge, and
> then the rest of the files are taken from the parents.

I forgo to mention that I didn't try rosters. I assumed
get_manifest_of would use them.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] mtn2git issues

2008-08-31 Thread Felipe Contreras
On Sun, Aug 31, 2008 at 9:24 AM, Patrick Georgi <[EMAIL PROTECTED]> wrote:
> Felipe Contreras schrieb:
>> Then I tried get_manifest_of, however, that is very slow. And even if
>> it wasn't, it would require a considerable amount of processing and
>> store every tree of every revision.
>>
>> Do you have any recommendation on how to get what changed in each
>> revision? Surely it must not be that difficult.
>>
> Did you try rosters? They're an internal representation of manifests
> (with some differences, they contain some data specific to the local
> repository, which you might not care about), and they're also stored in
> delta format, like manifests. Maybe access is still faster (because they
> don't need as much validation, as far as I know)
>
> For git, you want to know how the tree looked at specific times
> (snapshots), but the native data exchange format in monotone is
> changesets (what you get with get_revision - it tells you what changed
> from rev A to rev B, and in merge scenarios, to each side "the whole
> other side" happened, to get to the merged result).

Well, this is for git fast-import. So I want the changes of each revision.

The only difference is that in git the changes in both branches of a
merge are 'already done' so you can't do them again. So I guess what
fast-import is doing is taking the changes strictly of the merge, and
then the rest of the files are taken from the parents.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] mtn2git issues

2008-08-30 Thread Felipe Contreras
Hi,

I've been working on a mtn2git script and I've been having many issues.

What git requires is very simple; a list of the files that changed
(added, modified, renamed, removed) in each revision.

First I tried get_revision which is very fast, however, the output is
not exactly what I want.

When a merge happens, get_revision is listing all the changes (the
changes from both branches), not the changes that happened strictly of
that revision.

Then I tried get_manifest_of, however, that is very slow. And even if
it wasn't, it would require a considerable amount of processing and
store every tree of every revision.

Do you have any recommendation on how to get what changed in each
revision? Surely it must not be that difficult.

Best regards.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: [RFC] mtn to git conversion script

2008-08-28 Thread Felipe Contreras
On Thu, Aug 28, 2008 at 8:57 AM, Anand Kumria <[EMAIL PROTECTED]> wrote:
>
> Hi Felipe,
>
> On Mon, 25 Aug 2008 03:45:11 +0300, Felipe Contreras wrote:
>
>>
>> Anyway, very nice tool. It's going much faster (1h) compared to before
>> (1 day).
>
> Will you be submitting this as something for/to contrib?

Yes, that's the plan.

However, I still don't have something that creates an exact clone with
fast-import.

Also, I'm trying different ways to see what would be most efficient.
Right now it's a combination of Ruby + C, but once I get it working
I'll post it to the OE mailing lists to see if it works fine for them
too.

Once the design is good enough I might move everything to C.

Best regards.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: [RFC] mtn to git conversion script

2008-08-25 Thread Felipe Contreras
On Mon, Aug 25, 2008 at 7:35 PM, Brian Downing <[EMAIL PROTECTED]> wrote:
> On Sun, Aug 24, 2008 at 12:18:50PM +0300, Felipe Contreras wrote:
>> I developed a script that converts a monotone repository into a git
>> one (exact clone), I want to contribute it so everybody can use it.
>>
>> This is the gist of the script:
>>
>> mtn update --revision [EMAIL PROTECTED] --reallyquiet
>> git ls-files --modified --others --exclude-standard -z | git
>> update-index --add --remove -z --stdin
>> git write-tree
>> git write-raw < /tmp/commit.txt
>> git update-ref refs/mtn/[EMAIL PROTECTED] [EMAIL PROTECTED]
>>
>> branches.each do |e|
>> git update-ref refs/heads/#{e} [EMAIL PROTECTED]
>> end
>
> You definitely want to use fast-import, but you probably want to do
> something a lot closer to fast-export for monotone (read: use its
> automate stdio interface and avoid expensive calls).
>
> Here's a simple monotone to git converter I wrote.  You'll need the
> Monotone::AutomateStdio perl module to use it (which I think I got it
> from monotone's net.venge.monotone.contrib.lib.automate-stdio branch).
> It is very fast; it can convert the OpenEmbedded repo in something like
> 5-10 minutes on my machine.

Interesting, how many commits?

> Note that for monotone export to go fast you absolutely /must/ avoid the
> get_manifest operation.  In my converter I use the revision information
> directly.  Getting the renames right with this is a little tricky; IIRC,
> the ordering that works is:
>
> * Rename all renamed files, innermost files first, to temporary names.
> * Delete all deleted files, innermost first.
> * Rename all temporary names to permanent names, outermost first.
> * Add all new/modified files.
>
> Conveniently, all of the above can be done by directly emitting
> fast-import commands, so you don't have to keep track of trees directly.
> (With one exception, which I'll elaborate on in a different email.)

I guess I haven't stumbled upon that problem yet =/

Best regards.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: [RFC] mtn to git conversion script

2008-08-24 Thread Felipe Contreras
On Mon, Aug 25, 2008 at 1:46 AM, Shawn O. Pearce <[EMAIL PROTECTED]> wrote:
> Felipe Contreras <[EMAIL PROTECTED]> wrote:
>> On Sun, Aug 24, 2008 at 4:14 PM, Miklos Vajna <[EMAIL PROTECTED]> wrote:
>> > On Sun, Aug 24, 2008 at 12:18:50PM +0300, Felipe Contreras <[EMAIL 
>> > PROTECTED]> wrote:
>> >> What do you think? Does it makes sense to have a 'write-raw' command?
>> >> Or should I somehow use 'fast-import'?
>> >
>> > Yes, you should. ;-)
>> >
>> > The syntax of it is not so hard, see for example 'git fast-export
>> > HEAD~2..' on a git repo and you'll see.
>> >
>> > This should help a lot if you are like me, who likes to learn from
>> > examples.
>>
>> Is it possible to create a fast-import from the index? I realize this
>> is not the best thing to do, but for now I would like to do that.
>
> No, fast-import uses its own internal structure and avoids the
> index file.

Yeah, I knew that, but wanted to just replace the 'write-raw' command.
To avoid doing unnecessary changes.

> Also, look at `git-hash-objects -w` as a replacement for your
> git-write-raw tool if you aren't going to use git-fast-import.

Awesome, but I just did it properly :)

A few comments regarding fast-import:

Why the distinction between 'from' and 'merge'? Doesn't it make more
sense to use 'parent' for both?

I'm doing: commit refs/mtn/d137c7046bae7e4a0144fee82bfce8061f61e3b3

So I was expecing this to work:
from mtn/d137c7046bae7e4a0144fee82bfce8061f61e3b3

But it didn't, probably because the commit hasn't actually been
committed. Wouldn't it make sense to store it as a temporal commit so
my script doesn't have to deal with that?

Anyway, very nice tool. It's going much faster (1h) compared to before (1 day).

Best regards.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: [RFC] mtn to git conversion script

2008-08-24 Thread Felipe Contreras
On Sun, Aug 24, 2008 at 4:14 PM, Miklos Vajna <[EMAIL PROTECTED]> wrote:
> On Sun, Aug 24, 2008 at 12:18:50PM +0300, Felipe Contreras <[EMAIL 
> PROTECTED]> wrote:
>> What do you think? Does it makes sense to have a 'write-raw' command?
>> Or should I somehow use 'fast-import'?
>
> Yes, you should. ;-)
>
> The syntax of it is not so hard, see for example 'git fast-export
> HEAD~2..' on a git repo and you'll see.
>
> This should help a lot if you are like me, who likes to learn from
> examples.

Is it possible to create a fast-import from the index? I realize this
is not the best thing to do, but for now I would like to do that.

Best regards.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] [RFC] mtn to git conversion script

2008-08-24 Thread Felipe Contreras
Hi,

I developed a script that converts a monotone repository into a git
one (exact clone), I want to contribute it so everybody can use it.

However, I might have not done it correctly.

This is the gist of the script:

mtn update --revision [EMAIL PROTECTED] --reallyquiet
git ls-files --modified --others --exclude-standard -z | git
update-index --add --remove -z --stdin
git write-tree
git write-raw < /tmp/commit.txt
git update-ref refs/mtn/[EMAIL PROTECTED] [EMAIL PROTECTED]

branches.each do |e|
git update-ref refs/heads/#{e} [EMAIL PROTECTED]
end

I wrote "git write-raw" which takes the commit text as is, and puts it
into the repository.

I've read about 'fast-import' but I'm not sure if it would be more
efficient, because you would have to parse the output of different mtn
tools.

What do you think? Does it makes sense to have a 'write-raw' command?
Or should I somehow use 'fast-import'?

Best regards.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel