Re: [Monotone-devel] git fast-export

2009-02-28 Thread Felipe Contreras
On Sat, Feb 28, 2009 at 7:41 AM, Derek Scherger de...@echologic.com wrote:

 On Fri, Feb 27, 2009 at 1:21 AM, Felipe Contreras
 felipe.contre...@gmail.com wrote:

 As I said, my objective is to generate git clone for people to
 develop/follow/maintain instead of the mtn repo, in this case there no
 need to have every single bit of information since the mtn repo would
 still be available.

 Does a bit of extra information hurt this use-case somehow?

Yes, because you see two changelogs appended instead of one, possibly
with the comments too. It doesn't look like a native git repo.

 On the other hand, when a project moves away from mtn to git, then
 your method makes more sense.

 That's why I think it should be an option.

 So something like --use-one-changelog that grabs one of the changelog certs
 at random and spits that out? Sorry, I'm really having a hard time seeing
 how this could actually be useful. Are you just trying to get this export to
 exactly match what your script produces so that they can compare
 identically? If so, would it be possible instead to change your script so
 that it appends all changelogs into one complete message?

I've changed my script to simulate yours when I think it's sensible,
otherwise I've modified your code to do what my script is doing. So
far this has been the only change that I'm still doing... once it find
a changelog, I break the loop.

  Anyway, I've been able to reach a little further and now I've finally
  found a difference in the trees between your and my method. In
  Pidigin's repo there's a commit
  '3f1b3854a77850131531d1d6f19c44a0b9174107' that in my method some
  files have exec flag off, and with your method it has the exec flag
  on. Can you take a look?
 
  Good catch. The monotone checkout of this revision has execute bits on
  some
  files that the git checkout does not. I'll have to do some digging to
  see
  what's going wrong here.

 I'm not exactly sure what you mean with this, but there's a bug in
 'mtn update' that sometimes doesn't pick the correct exec flag. That's
 why I'm doing a full 'mtn checkout'.

 Here's what I see:

 A git checkout of refs/mtn/revs/3f1b3854a77850131531d1d6f19c44a0b9174107
 from the exported git repo does not have execute permissions on
 ./po/{id,ne,ps}.po or on a few files in ./doc/oscar/. If I update a monotone
 workspace to this revision it does have execute permissions on these files
 and disagrees with the git workspace exactly on these permission bits.

 A monotone checkout of the same revision does NOT have execute permissions
 on these files and all permission bits are in agreement with the git
 checkout. Note that this revision has no branch cert which apparently
 prevents it from being checked out from monotone so I've added a bogus
 branch cert  to my local database to make a checkout possible.

That's really annoying! I've had to do many hacks in my script to make
the checkout possible... got parent by parent until there's one that
has a branch cert, checkout that, then update to the original commit.

 My impression at the moment is that the exported history does have correct
 permissions because it agrees with a monotone checkout (which requires
 addition of a branch cert) of the same revision. It seems that there are two
 different problems with monotone here (1) checkout is not possible for
 revisions that have no branch certs and (2) update doesn't always produce
 correct execute permissions.

Agreed.

 The problem is not your method, the problem is mine, which is
 painfully slow, but it's needed for a bit-exact comparison. It's
 tedious but hopefully the comparisons will soon be done.

 It's great to have another method to compare the output against and make
 sure both produce equivalent results so I do appreciate the effort. Have you
 previously done lots of verification of the output of your script, to the
 point where you trust it to a reasonable degree?

Yes I have. That's why I found issues in mtn in the first place. I've
been trying something foolproof, first I was doing 'mtn update' and
importing the exact workplace. Then I found issues and I tried with
'mtn checkout' and still I found issues (with no branches).

I've found that in the case of
3f1b3854a77850131531d1d6f19c44a0b9174107 my script is unreliable
because the result depends on which parent I'm basing my update. It
looks like my script cannot be reliable unless I avoid 'mtn update' or
it is fixed in mtn.

Could you make a patch that gets rid of the 'no branch' error?

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-28 Thread Felipe Contreras
On Sat, Feb 28, 2009 at 8:26 AM, Paul Aurich p...@aurich.com wrote:
 On Feb 27, 2009, at 21:41, Derek Scherger wrote:

 My impression at the moment is that the exported history does have correct
 permissions because it agrees with a monotone checkout (which requires
 addition of a branch cert) of the same revision. It seems that there are two
 different problems with monotone here (1) checkout is not possible for
 revisions that have no branch certs and (2) update doesn't always produce
 correct execute permissions.

 Felipe discovered what I believe to be the cause of this a few months ago
 [1]. As I understand the issue, there is no `mtn update` hook for unsetting
 execute bits, so unsetting that attribute doesn't have any effect. However,
 when doing an update that would involve moving very far through history
 (say, from the revision Felipe mentions in that email to
 h:im.pidgin.pidgin), I believe Monotone optimizes that operation to 'check
 out the new manifest [and apply working changes]', and as the mtn:exec
 property isn't set on the files in the target revision, the file's exec bit
 is unset.

So there's no fix and no clear path on how this will get fixed, right?

 I may have some of the details of how Monotone handles these cases wrong,
 but hopefully my description is clear enough to be sensible. And of course,
 credit for discovering and figuring out why it sometimes does work go to
 Felipe and some people in #pidgin (sorry, I don't remember who,
 specifically).

I found the issue and they found the underlying problem.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-28 Thread Felipe Contreras
On Sat, Feb 28, 2009 at 8:38 PM, Derek Scherger de...@echologic.com wrote:

 On Sat, Feb 28, 2009 at 2:49 AM, Felipe Contreras
 felipe.contre...@gmail.com wrote:

  As I said, my objective is to generate git clone for people to
  develop/follow/maintain instead of the mtn repo, in this case there no
  need to have every single bit of information since the mtn repo would
  still be available.
 
  Does a bit of extra information hurt this use-case somehow?

 Yes, because you see two changelogs appended instead of one, possibly
 with the comments too. It doesn't look like a native git repo.

  On the other hand, when a project moves away from mtn to git, then
  your method makes more sense.

 It seems to me that this directly contradicts your previous statement, that
 looking like a native git repo is somehow important for a mirrored
 repository and yet unimportant for a converted repository. Nonetheless, I'm
 tired of arguing about this and I've added a --use-one-changelog option that
 picks one and uses it. I will be very surprised if anyone else ever uses
 this option but it's harmless.

Appending two changelogs will never look 'natural', besides, some
people might not like the way two changelogs are appended. I think
that's the kind of decision that a team should do when converting a
repository. I'm just mirroring, I don't want to think about that, just
produce something that looks good and it's functional.

Please don't think I'm saying that option is *a must*, I'm just saying
that if it's not there I would have to modify the code, which is not a
big deal for me.

 Could you make a patch that gets rid of the 'no branch' error?

 With any luck at all someone else will beat me to it. I've got too many
 other things on the go at the moment to get to this now but I will
 eventually if no one else does.

It looks like I've got it (attached).

-- 
Felipe Contreras


mtn.diff
Description: Binary data
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-28 Thread hendrik
On Sat, Feb 28, 2009 at 08:59:04PM +0200, Felipe Contreras wrote:
 On Sat, Feb 28, 2009 at 8:38 PM, Derek Scherger de...@echologic.com wrote:
 
  On Sat, Feb 28, 2009 at 2:49 AM, Felipe Contreras
  felipe.contre...@gmail.com wrote:
 
   As I said, my objective is to generate git clone for people to
   develop/follow/maintain instead of the mtn repo, in this case there no
   need to have every single bit of information since the mtn repo would
   still be available.
  
   Does a bit of extra information hurt this use-case somehow?
 
  Yes, because you see two changelogs appended instead of one, possibly
  with the comments too. It doesn't look like a native git repo.
 
   On the other hand, when a project moves away from mtn to git, then
   your method makes more sense.
 
  It seems to me that this directly contradicts your previous statement, that
  looking like a native git repo is somehow important for a mirrored
  repository and yet unimportant for a converted repository. Nonetheless, I'm
  tired of arguing about this and I've added a --use-one-changelog option that
  picks one and uses it. I will be very surprised if anyone else ever uses
  this option but it's harmless.
 
 Appending two changelogs will never look 'natural', besides, some
 people might not like the way two changelogs are appended. I think
 that's the kind of decision that a team should do when converting a
 repository. I'm just mirroring, I don't want to think about that, just
 produce something that looks good and it's functional.

It's important that information be round-trip-stable.  That is, if info 
goes fom git to monotone, back to get, back to monotone, at some point 
it should stop changing.

-- hendrik


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-27 Thread Felipe Contreras
On Fri, Feb 27, 2009 at 7:58 AM, Derek Scherger de...@echologic.com wrote:

 On Wed, Feb 25, 2009 at 12:37 PM, Felipe Contreras
 felipe.contre...@gmail.com wrote:

 I think it should be an option. Otherwise the people that want a
 single message would have trouble running a git filter-branch command
 to strip the message out. It would be much easier to do that in the
 mtn export.

 Looking through the pidgin repo that I have here, there are several commits
 with multiple changelog's some of which consist of a single 'a' character.
 Selecting one of these arbitrarily is going to select the 'a' changelogs
 sometimes which I suspect is also not what you want, unless you're thinking
 of a different option that I am. As I recall what you wanted was an option
 to just grab one changelog and use that right? Maybe the longest changelog
 would be the best one to use? I see several other revisions with multiple
 distinct changelogs that seem like they would be good to preserve as well.

 I'm somewhat reluctant to add an option that does this because it does not
 seem like a general thing that anyone else will want and it seems like it
 will just move your problem around a bit. Instead of getting changelogs you
 don't want you'll be missing changelogs you do want.

 The other options here are to (1) filter the exported data and remove the
 messages you don't want or (2) delete the unwanted changelog certs from a
 copy of your monotone database and export from that. Both of these should be
 scriptable without too much trouble although Identifying the specific
 changelogs to drop will probably be rather tedious.

As I said, my objective is to generate git clone for people to
develop/follow/maintain instead of the mtn repo, in this case there no
need to have every single bit of information since the mtn repo would
still be available.

On the other hand, when a project moves away from mtn to git, then
your method makes more sense.

That's why I think it should be an option.

 I don't know the exact commit id in the Pidgin repo, but I can assure
 you, it's there.

 Oh I beleive you, but it still might be useful to see the actual real data
 and do something based on that. So, if you do come across the revision id,
 I'd still like to see it.

Sure, if I find it I'll let you know.

 no author cert: 'unknown'
 user_id not mapped: 'user_id'
 user_id mapped: obvious

 The current code works mostly like this. In the unmapped case it only adds
 '' and '' when neither is present. There are monotone users who have their
 keys named like User Name u...@foobar.com and adding another set of ''
 and '' around these wouldn't make much sense.

Ahh, right. In Pidgin all the userids are 'u...@foobar' or just 'user'
so far but I think maybe in the latest commits they are doing as you
say. So yeah, your code makes sense.

 Anyway, I've been able to reach a little further and now I've finally
 found a difference in the trees between your and my method. In
 Pidigin's repo there's a commit
 '3f1b3854a77850131531d1d6f19c44a0b9174107' that in my method some
 files have exec flag off, and with your method it has the exec flag
 on. Can you take a look?

 Good catch. The monotone checkout of this revision has execute bits on some
 files that the git checkout does not. I'll have to do some digging to see
 what's going wrong here.

I'm not exactly sure what you mean with this, but there's a bug in
'mtn update' that sometimes doesn't pick the correct exec flag. That's
why I'm doing a full 'mtn checkout'.

 Now I'm using a bit different method so I'll be able to test faster.

 The latest monotone git_export code runs quite a bit faster as well. I can
 export the pidgin repo I have here in a little over an hour instead of the 5
 hours it was taking previously.

The problem is not your method, the problem is mine, which is
painfully slow, but it's needed for a bit-exact comparison. It's
tedious but hopefully the comparisons will soon be done.

Cheers.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-27 Thread Derek Scherger
On Fri, Feb 27, 2009 at 1:21 AM, Felipe Contreras 
felipe.contre...@gmail.com wrote:


 As I said, my objective is to generate git clone for people to
 develop/follow/maintain instead of the mtn repo, in this case there no
 need to have every single bit of information since the mtn repo would
 still be available.


Does a bit of extra information hurt this use-case somehow?


 On the other hand, when a project moves away from mtn to git, then
 your method makes more sense.

 That's why I think it should be an option.


So something like --use-one-changelog that grabs one of the changelog certs
at random and spits that out? Sorry, I'm really having a hard time seeing
how this could actually be useful. Are you just trying to get this export to
exactly match what your script produces so that they can compare
identically? If so, would it be possible instead to change your script so
that it appends all changelogs into one complete message?


  Anyway, I've been able to reach a little further and now I've finally
  found a difference in the trees between your and my method. In
  Pidigin's repo there's a commit
  '3f1b3854a77850131531d1d6f19c44a0b9174107' that in my method some
  files have exec flag off, and with your method it has the exec flag
  on. Can you take a look?
 
  Good catch. The monotone checkout of this revision has execute bits on
 some
  files that the git checkout does not. I'll have to do some digging to see
  what's going wrong here.

 I'm not exactly sure what you mean with this, but there's a bug in
 'mtn update' that sometimes doesn't pick the correct exec flag. That's
 why I'm doing a full 'mtn checkout'.


Here's what I see:

A git checkout of refs/mtn/revs/3f1b3854a77850131531d1d6f19c44a0b9174107
from the exported git repo does not have execute permissions on
./po/{id,ne,ps}.po or on a few files in ./doc/oscar/. If I update a monotone
workspace to this revision it does have execute permissions on these files
and disagrees with the git workspace exactly on these permission bits.

A monotone checkout of the same revision does NOT have execute permissions
on these files and all permission bits are in agreement with the git
checkout. Note that this revision has no branch cert which apparently
prevents it from being checked out from monotone so I've added a bogus
branch cert  to my local database to make a checkout possible.

My impression at the moment is that the exported history does have correct
permissions because it agrees with a monotone checkout (which requires
addition of a branch cert) of the same revision. It seems that there are two
different problems with monotone here (1) checkout is not possible for
revisions that have no branch certs and (2) update doesn't always produce
correct execute permissions.

The problem is not your method, the problem is mine, which is
 painfully slow, but it's needed for a bit-exact comparison. It's
 tedious but hopefully the comparisons will soon be done.


It's great to have another method to compare the output against and make
sure both produce equivalent results so I do appreciate the effort. Have you
previously done lots of verification of the output of your script, to the
point where you trust it to a reasonable degree?

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-27 Thread Paul Aurich

On Feb 27, 2009, at 21:41, Derek Scherger wrote:
My impression at the moment is that the exported history does have  
correct permissions because it agrees with a monotone checkout  
(which requires addition of a branch cert) of the same revision. It  
seems that there are two different problems with monotone here (1)  
checkout is not possible for revisions that have no branch certs and  
(2) update doesn't always produce correct execute permissions.


Felipe discovered what I believe to be the cause of this a few months  
ago [1]. As I understand the issue, there is no `mtn update` hook for  
unsetting execute bits, so unsetting that attribute doesn't have any  
effect. However, when doing an update that would involve moving very  
far through history (say, from the revision Felipe mentions in that  
email to h:im.pidgin.pidgin), I believe Monotone optimizes that  
operation to 'check out the new manifest [and apply working changes]',  
and as the mtn:exec property isn't set on the files in the target  
revision, the file's exec bit is unset.


I may have some of the details of how Monotone handles these cases  
wrong, but hopefully my description is clear enough to be sensible.  
And of course, credit for discovering and figuring out why it  
sometimes does work go to Felipe and some people in #pidgin (sorry, I  
don't remember who, specifically).


~Paul

[1] http://lists.nongnu.org/archive/html/monotone-devel/2008-12/msg00082.html


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-26 Thread Derek Scherger
On Wed, Feb 25, 2009 at 12:37 PM, Felipe Contreras 
felipe.contre...@gmail.com wrote:

 I think it should be an option. Otherwise the people that want a
 single message would have trouble running a git filter-branch command
 to strip the message out. It would be much easier to do that in the
 mtn export.


Looking through the pidgin repo that I have here, there are several commits
with multiple changelog's some of which consist of a single 'a' character.
Selecting one of these arbitrarily is going to select the 'a' changelogs
sometimes which I suspect is also not what you want, unless you're thinking
of a different option that I am. As I recall what you wanted was an option
to just grab one changelog and use that right? Maybe the longest changelog
would be the best one to use? I see several other revisions with multiple
distinct changelogs that seem like they would be good to preserve as well.

I'm somewhat reluctant to add an option that does this because it does not
seem like a general thing that anyone else will want and it seems like it
will just move your problem around a bit. Instead of getting changelogs you
don't want you'll be missing changelogs you do want.

The other options here are to (1) filter the exported data and remove the
messages you don't want or (2) delete the unwanted changelog certs from a
copy of your monotone database and export from that. Both of these should be
scriptable without too much trouble although Identifying the specific
changelogs to drop will probably be rather tedious.

I don't know the exact commit id in the Pidgin repo, but I can assure
 you, it's there.


Oh I beleive you, but it still might be useful to see the actual real data
and do something based on that. So, if you do come across the revision id,
I'd still like to see it.


 no author cert: 'unknown'
 user_id not mapped: 'user_id'
 user_id mapped: obvious


The current code works mostly like this. In the unmapped case it only adds
'' and '' when neither is present. There are monotone users who have their
keys named like User Name u...@foobar.com and adding another set of ''
and '' around these wouldn't make much sense.


 Anyway, I've been able to reach a little further and now I've finally
 found a difference in the trees between your and my method. In
 Pidigin's repo there's a commit
 '3f1b3854a77850131531d1d6f19c44a0b9174107' that in my method some
 files have exec flag off, and with your method it has the exec flag
 on. Can you take a look?


Good catch. The monotone checkout of this revision has execute bits on some
files that the git checkout does not. I'll have to do some digging to see
what's going wrong here.

Now I'm using a bit different method so I'll be able to test faster.


The latest monotone git_export code runs quite a bit faster as well. I can
export the pidgin repo I have here in a little over an hour instead of the 5
hours it was taking previously.

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-25 Thread Felipe Contreras
On Tue, Feb 10, 2009 at 8:27 AM, Derek Scherger de...@echologic.com wrote:

 On Mon, Feb 9, 2009 at 11:14 PM, Felipe Contreras
 felipe.contre...@gmail.com wrote:

 It's just that I don't like that behavior. It doesn't matter how smart
 is the algorithm, it will always look like two messages instead of
 one, which might be ok for some people, but not for other.

 What I was trying to do was to only use *one* of the two messages if they
 were identical so it shouldn't look like two messages at all.

But they are not identical.

 In any case, the first error I got was with a revision that had a
 change like this [merge 0123...] and another one like [merge
 '0123...].

 In the case of merges it should be catching the fact that they have the same
 message and only including one of them. I guess if one of them has a quote
 and one doesn't it will fail though. It seems odd that you would be getting
 this and makes me wonder whether different monotone versions have had
 different automated messages in these cases. Can you post the *exact*
 contents of the message you're getting please?

 If it is the case that you have two slightly different automatically
 generated merge messages then this isn't going to handle that and I think
 the best thing to do in that case is keep all of the messages in the
 exported data, rather than losing information. If you don't like specific
 messages there's always the option of removing some things from the exported
 data before importing it which seems like it would generally be easier than
 adding things to the exported data that it doesn't contain.

I think it should be an option. Otherwise the people that want a
single message would have trouble running a git filter-branch command
to strip the message out. It would be much easier to do that in the
mtn export.

I don't know the exact commit id in the Pidgin repo, but I can assure
you, it's there.

  Yes. Git doesn't like authors without a email address wrapped in  and 
  so
  you need to put these in the --authors-file mappings.

 Why not? I thought 'unknown' was ok.

 'unknown' is only used when there are no author certs, not when some
 author cert is not found in the author map. If there is an author cert with
 no email then git won't like it. Another option would be to require these
 values to exist in the author map or replace them with unknown as you seem
 to be suggesting.

I'm sorry, I can't recall what specifically we where discussing but I
think it should work this way:

no author cert: 'unknown'
user_id not mapped: 'user_id'
user_id mapped: obvious

Anyway, I've been able to reach a little further and now I've finally
found a difference in the trees between your and my method. In
Pidigin's repo there's a commit
'3f1b3854a77850131531d1d6f19c44a0b9174107' that in my method some
files have exec flag off, and with your method it has the exec flag
on. Can you take a look?

Now I'm using a bit different method so I'll be able to test faster.

Cheers.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-09 Thread Felipe Contreras
On Tue, Feb 10, 2009 at 1:40 AM, Felipe Contreras
felipe.contre...@gmail.com wrote:
 On Sun, Jan 25, 2009 at 6:40 AM, Derek Scherger de...@echologic.com wrote:
 On Thu, Jan 22, 2009 at 1:33 PM, Derek Scherger de...@echologic.com wrote:

 I think unknown is a better start point.

 Both committer and author are defaulted to unknown in rev
 2c207d528e37e59d8d8d14e24edd14fb34a10a21.

 I'm using the value from the author cert as the git author and the key from
 the same cert as the git committer. That should generally be the same key
 that would be on a changelog cert but it's not guaranteed to be.

 I've also removed the code that was trying to clean up various things, like
 adding  and  around email addresses. These should all be fixed by mappings
 using the --authors-file now.

 The list of committers that might need mapping comes from:

 $ mtn db execute 'select distinct keypair from revision_certs where name =
 author'

 The list of authors that might need mapping comes from:

 $ mtn db execute 'select distinct value from revision_certs where name =
 author'

 Great!

 I'm getting a bit further now, but I'm still having some
 incompatibilities. I've tried to emulate most of them, but it takes
 time between each try.

 The first issue I stumbled upon is that you do the following:

  db.get_revision_certs(*r, author_cert_name, authors);
  db.get_revision_certs(*r, branch_cert_name, branches);
  db.get_revision_certs(*r, changelog_cert_name, changelogs);
  db.get_revision_certs(*r, comment_cert_name, comments);
  db.get_revision_certs(*r, date_cert_name, dates);

 While I get all the certs and handle them in one single go. I think
 it's more efficient to do a single sql query. This generates a
 different order when there's multiple committers. For now I'm doing it
 separately as your code does.

 The next one is regarding the changelogs. I want to have only one
 changelog, not concatenate them. So I patched your code. I'm attaching
 the patch.

 Then I tried to update to your latest revision and I'm getting a
 failure from fast import. I'm attaching the report.

Ah, I read your comments in the code, so it fails when the author is
not defined... I think that should be optional.

How about this patch?

-- 
Felipe Contreras


mtn-fast-export-authors.diff
Description: Binary data
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-09 Thread Derek Scherger
On Mon, Feb 9, 2009 at 4:40 PM, Felipe Contreras felipe.contre...@gmail.com
 wrote:


 I'm getting a bit further now, but I'm still having some
 incompatibilities. I've tried to emulate most of them, but it takes
 time between each try.


I've checked in a change that improves performance quite a lot, which I'll
describe in another email.



 The first issue I stumbled upon is that you do the following:

  db.get_revision_certs(*r, author_cert_name, authors);
  db.get_revision_certs(*r, branch_cert_name, branches);
  db.get_revision_certs(*r, changelog_cert_name, changelogs);
  db.get_revision_certs(*r, comment_cert_name, comments);
  db.get_revision_certs(*r, date_cert_name, dates);

 While I get all the certs and handle them in one single go. I think
 it's more efficient to do a single sql query. This generates a


sqlite handles rapid-fire queries *much* faster than your average oracle or
postgresql database does. I would be very surprised if issuing these 5
queries per rev made any measurable difference to performance, but I have
been wrong before. ;)

different order when there's multiple committers. For now I'm doing it
 separately as your code does.

 The next one is regarding the changelogs. I want to have only one
 changelog, not concatenate them. So I patched your code. I'm attaching
 the patch.


I'm curious as to what the exact problem you're having here is. Can you give
an example of the messages you're getting and what you would like to have?

The export code should not be repeating changelogs that are due to multiple
people arriving at the same merge or propagate. The intent of the code
that's there is that if multiple people did have unique things to say it
will preserve all of their messages which seems better than randomly
throwing away someone's comments. It should be retaining only one
automatically generated merge or propagate message though.

Then I tried to update to your latest revision and I'm getting a
 failure from fast import. I'm attaching the report.


Yes. Git doesn't like authors without a email address wrapped in  and  so
you need to put these in the --authors-file mappings.

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-09 Thread Derek Scherger
On Mon, Feb 9, 2009 at 4:53 PM, Felipe Contreras felipe.contre...@gmail.com
 wrote:


  Then I tried to update to your latest revision and I'm getting a
  failure from fast import. I'm attaching the report.

 Ah, I read your comments in the code, so it fails when the author is
 not defined... I think that should be optional.

 How about this patch?


I think it will fail for any author not listed in the --authors-file that
*does* have an email address with  and  around it because it will get
wrapped in an extra set of  and .

I know that creating an author map to handle all of the various authors is a
pretty major PITA. I'm not sure what the best thing to do here is though. I
didn't think the previous code that was trying to detect various types of
author names and fix them was particularly robust. I'm open to suggestions
but at the moment I think getting a list of authors and commiters from these
queries and scripting up a proper author map is probably the best approach
to the problem.

  // 'select distinct keypair from revision_certs'
  // 'select distinct value from revision_certs where name = author'

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-09 Thread Felipe Contreras
On Tue, Feb 10, 2009 at 6:58 AM, Derek Scherger de...@echologic.com wrote:

 On Mon, Feb 9, 2009 at 4:40 PM, Felipe Contreras
 felipe.contre...@gmail.com wrote:

 I'm getting a bit further now, but I'm still having some
 incompatibilities. I've tried to emulate most of them, but it takes
 time between each try.

 I've checked in a change that improves performance quite a lot, which I'll
 describe in another email.


 The first issue I stumbled upon is that you do the following:

  db.get_revision_certs(*r, author_cert_name, authors);
  db.get_revision_certs(*r, branch_cert_name, branches);
  db.get_revision_certs(*r, changelog_cert_name, changelogs);
  db.get_revision_certs(*r, comment_cert_name, comments);
  db.get_revision_certs(*r, date_cert_name, dates);

 While I get all the certs and handle them in one single go. I think
 it's more efficient to do a single sql query. This generates a

 sqlite handles rapid-fire queries *much* faster than your average oracle or
 postgresql database does. I would be very surprised if issuing these 5
 queries per rev made any measurable difference to performance, but I have
 been wrong before. ;)

 different order when there's multiple committers. For now I'm doing it
 separately as your code does.

 The next one is regarding the changelogs. I want to have only one
 changelog, not concatenate them. So I patched your code. I'm attaching
 the patch.

 I'm curious as to what the exact problem you're having here is. Can you give
 an example of the messages you're getting and what you would like to have?

 The export code should not be repeating changelogs that are due to multiple
 people arriving at the same merge or propagate. The intent of the code
 that's there is that if multiple people did have unique things to say it
 will preserve all of their messages which seems better than randomly
 throwing away someone's comments. It should be retaining only one
 automatically generated merge or propagate message though.

It's just that I don't like that behavior. It doesn't matter how smart
is the algorithm, it will always look like two messages instead of
one, which might be ok for some people, but not for other.

In any case, the first error I got was with a revision that had a
change like this [merge 0123...] and another one like [merge
'0123...].

 Then I tried to update to your latest revision and I'm getting a
 failure from fast import. I'm attaching the report.

 Yes. Git doesn't like authors without a email address wrapped in  and  so
 you need to put these in the --authors-file mappings.

Why not? I thought 'unknown' was ok.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-09 Thread Derek Scherger
On Mon, Feb 9, 2009 at 10:07 PM, Derek Scherger de...@echologic.com wrote:


 I know that creating an author map to handle all of the various authors is
 a pretty major PITA. I'm not sure what the best thing to do here is though.
 I didn't think the previous code that was trying to detect various types of
 author names and fix them was particularly robust. I'm open to suggestions
 but at the moment I think getting a list of authors and commiters from these
 queries and scripting up a proper author map is probably the best approach
 to the problem.


What about this:

--- cmd_othervcs.cc894030d1e0d2749eb7dbf31f2b64761e6a85308a
+++ cmd_othervcs.ccb14baf18550496ddb8ca705a283186f568343cce
@@ -510,9 +510,19 @@ CMD(git_export, git_export, , CMD_RE

   if (author_map.find(author_key) != author_map.end())
 author_key = author_map[author_key];
+  else if (author_key.find('') == string::npos 
+   author_key.find('') == string::npos 
+   author_key.find(' ') == string::npos 
+   author_key.find('@') != string::npos)
+author_key =  + author_key + ;

   if (author_map.find(author_name) != author_map.end())
 author_name = author_map[author_name];
+  else if (author_name.find('') == string::npos 
+   author_name.find('') == string::npos 
+   author_name.find(' ') == string::npos 
+   author_name.find('@') != string::npos)
+author_name =  + author_name + ;

   cert_iterator date = dates.begin();

Having to create a proper author map makes testing the darn thing hard too!

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-02-09 Thread Derek Scherger
On Mon, Feb 9, 2009 at 11:14 PM, Felipe Contreras 
felipe.contre...@gmail.com wrote:


 It's just that I don't like that behavior. It doesn't matter how smart
 is the algorithm, it will always look like two messages instead of
 one, which might be ok for some people, but not for other.


What I was trying to do was to only use *one* of the two messages if they
were identical so it shouldn't look like two messages at all.

In any case, the first error I got was with a revision that had a
 change like this [merge 0123...] and another one like [merge
 '0123...].


In the case of merges it should be catching the fact that they have the same
message and only including one of them. I guess if one of them has a quote
and one doesn't it will fail though. It seems odd that you would be getting
this and makes me wonder whether different monotone versions have had
different automated messages in these cases. Can you post the *exact*
contents of the message you're getting please?

If it is the case that you have two slightly different automatically
generated merge messages then this isn't going to handle that and I think
the best thing to do in that case is keep all of the messages in the
exported data, rather than losing information. If you don't like specific
messages there's always the option of removing some things from the exported
data before importing it which seems like it would generally be easier than
adding things to the exported data that it doesn't contain.

  Yes. Git doesn't like authors without a email address wrapped in  and 
 so
  you need to put these in the --authors-file mappings.

 Why not? I thought 'unknown' was ok.


'unknown' is only used when there are no author certs, not when some
author cert is not found in the author map. If there is an author cert with
no email then git won't like it. Another option would be to require these
values to exist in the author map or replace them with unknown as you seem
to be suggesting.

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-22 Thread hendrik
On Thu, Jan 22, 2009 at 09:37:22AM +0200, Felipe Contreras wrote:
 On Thu, Jan 22, 2009 at 4:17 AM,  hend...@topoi.pooq.com wrote:
  On Thu, Jan 22, 2009 at 12:17:14AM +0200, Felipe Contreras wrote:
 
  However, I found some issues:
 
  1) no author
 
  Where is no author it appears as unknownunknown; it's missing a
  space and I think first-letter capitalization looks better for names
  (Unknown).
 
  The point of lower-case here is presumably that unknown isn't
  somebody's name, thereby obviating confusion with Mr. Simon J. Unknown.

In the Japanese translation of gnucash, this distinction wouldn't work.

 
 But it's part of the name field, so perhaps unknown would be better?

And  looks like the letter く.

:-)

-- hendrik


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-22 Thread Zack Weinberg
On Thu, Jan 22, 2009 at 9:24 AM, Felipe Contreras
felipe.contre...@gmail.com wrote:
 On Thu, Jan 22, 2009 at 6:05 PM, Derek Scherger de...@echologic.com wrote:
 On Wed, Jan 21, 2009 at 3:17 PM, Felipe Contreras 
 felipe.contre...@gmail.com wrote:
 I really don't like 1), there is *always* a committer in mtn. I

 There ought to be but there's no real requirement by the data model and if a
 pull operation was interrupted at the wrong moment it is likely possible to
 miss some certs. Also, when pulling you always get revs, but you might not
 get certs (at least branch certs) if they don't match the pattern you're
 pulling with. I can't recall if no certs are pulled for revs that don't
 match the branch pattern.

 There must be a least one cert per revision. Doesn't it? Date? Changelog?

No, there is no requirement to have any certs on a revision.  A
revision without a branch cert is inconvenient to work with -- you
have to check it out by ID, I think -- but that is the *only* cert
that the system actually pays attention to, unless you've done
something clever in the hooks.  Author, date, changelog are all for
the user's information only.

You have to go to some trouble to get a revision with no certs, but it
is allowed.

zw


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-22 Thread Derek Scherger
On Thu, Jan 22, 2009 at 10:24 AM, Felipe Contreras 
felipe.contre...@gmail.com wrote:


 Or I can modify your code.



Sure, if you like, it's not particularly complicated. I am hoping to arrive
at something where this isn't generally required though. ;)

I think unknown is a better start point.


Agreed, now that I look at it.


  Why not? It's a better approximation than author which is already
 available in the git commits. What would be the advantage of having
 both git author and git committer set to mtn author?


I'll try this and see what it looks like.

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-21 Thread Felipe Contreras
On Sun, Jan 18, 2009 at 9:40 PM, Derek Scherger de...@echologic.com wrote:

 On Sun, Jan 18, 2009 at 12:26 AM, Felipe Contreras
 felipe.contre...@gmail.com wrote:

 True. Then the issue is with my script (ruby date parsing).

 It looks like I would have to re-generate my repo clone (yay for one
 whole day of conversion) for the comparison.

 Before I do that, can we agree on a format for unknown committers?

 Is there anything wrong with what I have now, which I think should be
 unknown unknown? If you want something else use the --authors-file and
 set 'unknown = Unknown foobar'

Since I'll be re-generating the repo with my script once more, I'll
follow you convention.

However, I found some issues:

1) no author

Where is no author it appears as unknownunknown; it's missing a
space and I think first-letter capitalization looks better for names
(Unknown).

2) no name

As discussed before I prefer Unknown f...@bar.com but your approach
(f...@bar.com) is not bad.

3) no email

When there's no email I get Nameunknown; it's missing a space.

I really don't like 1), there is *always* a committer in mtn. I
propose to use the first committer of the changelog cert as the git
committer, and then, if there's no mtn author, use the same committer
as author.

Anyway, I'll try to simulate that behaviour so I can make an exact
comparison of the repos.

Cheers.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-18 Thread Derek Scherger
On Sat, Jan 17, 2009 at 4:18 PM, Paul Aurich p...@aurich.com wrote:



 No, his script is handling it properly (based solely on this example).

 Monotone uses UTC internally and git does some crazy wacky things with
 timezones (in short, the timezone shown is /just/ for prettifying the date;
 the timestamp by itself *must* be the time of the commit in UTC). See
 https://kerneltrap.org/mailarchive/git/2007/2/6/237902

  $ mtn automate certs d137c7046bae7e4a0144fee82bfce8061f61e3b3 | grep date
 -A 1
   name date
  value 2000-03-23T03:09:51

  $ date -d 2000-03-23 03:09:51 -u
  Thu Mar 23 03:09:51 UTC 2000
  $ date -d 2000-03-23 03:09:51 -u  +%s
  953780991


Thanks for checking this.

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-17 Thread Felipe Contreras
On Sat, Jan 17, 2009 at 9:04 AM, Derek Scherger de...@echologic.com wrote:
 On Wed, Jan 14, 2009 at 12:50 AM, Felipe Contreras
 felipe.contre...@gmail.com wrote:

  1) Your tool adds a bunch of Monotone- fields, can those be disabled?
 
  There's no option at the moment but it would be easy to add.

 It would be really useful.

 I've added --log-revids and --log-certs to enable including revision ids and
 cert values in the commit logs. These are off by default .

Great!

  2) There's no author mapping, can this option be added?
 
  I'm not exactly sure what you mean by author mapping but I assume
  translating between things like f...@bedrock.com and Fred Flintstone
  f...@bedrock.com? Is there a generally accepted format that other
  tools
  use for this?

 Yes, that's what I meant.

 The only format I know is the one from git-svn:
 felipec = Felipe Contreras felipe.contre...@gmail.com

 I've added --authors-file and --branches-file options that work like this
 for mapping author names and branch names respectively. Names not found in
 these maps are used as-is. I've also changed the default branch to unknown
 from master but this can be changed with the branches-file mapping to
 whatever you want with a line like unknown = whatever-you-want.

Very nice. Now the only difference is that for unknown users my script
maps them to Unknown user_id.

  3) I add the mtn sha1 in refs/mtn/id
 
  This is easy to add too. I have added refs/mtn/roots/id and
  refs/mtn/leaves/id and was wondering about all of the monotone
  revision
  ids. I assume the leaf refs would prevent git from wanting to garbage
  collect otherwise unreferenced revs if there were any?

 I've added --refs=roots, --refs=leaves and --refs=revs to include
 refs/mtn/roots, refs/mtn/leaves and refs/mtn/revs respectively.

Great :)

 If there's a ref pointing to it, then it's not pruned.

 Good. Including --refs=leaves should make sure that nothing is subject to
 garbage collection then.

 Branches and tags can be manually fixed a posteriori, no big issue.

 The important things are the commits themselves.

 Not always. Monotone allows things in branch names that git does not. If
 these aren't changed git will fail to import them.
 Use --branches-file to map offending names to something git can handle.

True.

 It probably depends on the intent of the clone:
 a) migrate the repo forever
 b) mirror a mtn repo

 Right now I'm interested in b), so I find the ref/mtn approach very
 useful since I can quickly look for the mtn or git sha1.

 The --refs=revs option does clutter up the gitk display somewhat but
 otherwise seems fine.

I tested your changes and converted the pidgin repo, on the second try
I was finally able to convert it, took me 8 hours (1 revs/s).

I ran my comparison script, but unfortunately the first revision has a
missmatch:

yours:
author Tailor Script tai...@homing.pidgin.im 953780991 +

mine:
author Tailor Script tai...@homing.pidgin.im 953773791 +0200

Which suggest that your script is not handling the timezone correctly
(not sure about that).

Would it be possible to pause after a certain amount of commits, or at
least issue a checkpoint? (maybe git fast-import has this option)

Very good job! I'll try to look further into this later.

Cheers.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-17 Thread Felipe Contreras
On Sun, Jan 18, 2009 at 1:18 AM, Paul Aurich p...@aurich.com wrote:
 And Felipe Contreras spake on 01/17/2009 02:52 PM, saying:
 I ran my comparison script, but unfortunately the first revision has a
 missmatch:

 yours:
 author Tailor Script tai...@homing.pidgin.im 953780991 +

 mine:
 author Tailor Script tai...@homing.pidgin.im 953773791 +0200

 Which suggest that your script is not handling the timezone correctly
 (not sure about that).

 No, his script is handling it properly (based solely on this example).

 Monotone uses UTC internally and git does some crazy wacky things with
 timezones (in short, the timezone shown is /just/ for prettifying the date;
 the timestamp by itself *must* be the time of the commit in UTC). See
 https://kerneltrap.org/mailarchive/git/2007/2/6/237902

  $ mtn automate certs d137c7046bae7e4a0144fee82bfce8061f61e3b3 | grep date
 -A 1
   name date
  value 2000-03-23T03:09:51

  $ date -d 2000-03-23 03:09:51 -u
  Thu Mar 23 03:09:51 UTC 2000
  $ date -d 2000-03-23 03:09:51 -u  +%s
  953780991

True. Then the issue is with my script (ruby date parsing).

It looks like I would have to re-generate my repo clone (yay for one
whole day of conversion) for the comparison.

Before I do that, can we agree on a format for unknown committers?

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-16 Thread Derek Scherger
On Wed, Jan 14, 2009 at 12:50 AM, Felipe Contreras 
felipe.contre...@gmail.com wrote:

  1) Your tool adds a bunch of Monotone- fields, can those be disabled?
 
  There's no option at the moment but it would be easy to add.

 It would be really useful.


I've added --log-revids and --log-certs to enable including revision ids and
cert values in the commit logs. These are off by default .


  2) There's no author mapping, can this option be added?
 
  I'm not exactly sure what you mean by author mapping but I assume
  translating between things like f...@bedrock.com and Fred Flintstone
  f...@bedrock.com? Is there a generally accepted format that other
 tools
  use for this?

 Yes, that's what I meant.

 The only format I know is the one from git-svn:
 felipec = Felipe Contreras felipe.contre...@gmail.com


I've added --authors-file and --branches-file options that work like this
for mapping author names and branch names respectively. Names not found in
these maps are used as-is. I've also changed the default branch to unknown
from master but this can be changed with the branches-file mapping to
whatever you want with a line like unknown = whatever-you-want.


 3) I add the mtn sha1 in refs/mtn/id
 
  This is easy to add too. I have added refs/mtn/roots/id and
  refs/mtn/leaves/id and was wondering about all of the monotone revision
  ids. I assume the leaf refs would prevent git from wanting to garbage
  collect otherwise unreferenced revs if there were any?


I've added --refs=roots, --refs=leaves and --refs=revs to include
refs/mtn/roots, refs/mtn/leaves and refs/mtn/revs respectively.


 If there's a ref pointing to it, then it's not pruned.


Good. Including --refs=leaves should make sure that nothing is subject to
garbage collection then.

Branches and tags can be manually fixed a posteriori, no big issue.

 The important things are the commits themselves.


Not always. Monotone allows things in branch names that git does not. If
these aren't changed git will fail to import them.
Use --branches-file to map offending names to something git can handle.


 It probably depends on the intent of the clone:
 a) migrate the repo forever
 b) mirror a mtn repo

 Right now I'm interested in b), so I find the ref/mtn approach very
 useful since I can quickly look for the mtn or git sha1.


The --refs=revs option does clutter up the gitk display somewhat but
otherwise seems fine.

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-13 Thread Felipe Contreras
On Wed, Jan 7, 2009 at 8:50 AM, Derek Scherger de...@echologic.com wrote:
 On Mon, Jan 5, 2009 at 7:27 PM, Felipe Contreras
 felipe.contre...@gmail.com wrote:

 On Tue, Jan 6, 2009 at 1:42 AM, Derek Scherger de...@echologic.com
 wrote:
  On Mon, Jan 5, 2009 at 2:22 PM, Felipe Contreras
  felipe.contre...@gmail.com wrote:
 
  Why an extra master branch? There's no need for that branch.
 
  It's used for revs that have no other branch certs to use.

 Would it make sense to use another name? nil, unknown or something?

 Possibly. It's probably not a big deal to rename it after importing either.

  taking about 6 seconds per commit, that's too slow.
  Working with the roster is extremely slow. Right now your tool is

 progress revision a19da46cd3d31611d768b67a772c2861aded46c5 (27501/27501)

 real277m23.646s
 user251m33.919s
 sys 24m42.357s

 For the record I can apparently convert the pidgin database in about 4.5
 hours on a 2.4GHz core 2 which works out to about 1.65 revs per second (0.61
 seconds per rev) on average. The export file is 3.4GB in size.

It seems to speed up at some points, I have tried again two times but
I had issues, I still don't have the numbers but it's probably faster
than what I thought.

 I have not verified this imported repo in any way yet though so who knows
 whether its accurate or not.

I can verify comparing to the output of my tool, but there are some differences:

1) Your tool adds a bunch of Monotone- fields, can those be disabled?
2) There's no author mapping, can this option be added?
3) I add the mtn sha1 in refs/mtn/id

Only 1) would be required to do a comparison, 2) would be great to
avoid converting the huge repo again without author mappings.

In order to do future updates I think 3) would be really great, that
way it's possible to know if a revision has been imported or not, and
makes possible to do quick lookups like: git show mtn/sha1.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-13 Thread Derek Scherger
On Tue, Jan 13, 2009 at 8:30 PM, Felipe Contreras 
felipe.contre...@gmail.com wrote:


 It seems to speed up at some points, I have tried again two times but
 I had issues, I still don't have the numbers but it's probably faster
 than what I thought.


Yeah, it depends a lot on the length of the delta chains required to
reconstruct rosters. Newer rosters reconstruct faster.


  I have not verified this imported repo in any way yet though so who knows
  whether its accurate or not.

 I can verify comparing to the output of my tool, but there are some
 differences:


Good and bad I guess. ;)

1) Your tool adds a bunch of Monotone- fields, can those be disabled?


There's no option at the moment but it would be easy to add.

Note that monotone revisions can have multiple authors, dates, changelogs,
etc. if several people merge two revisions to the same result. The git
fast-import format doesn't seem to allow more than committer and author and
the monotone side doesn't indicate which would be which. So, at the moment I
just grab one author, date and branch, cert and use that. I do concatenate
all the changelogs and comment certs together for the git commit message and
add the Monotone- values on to the end of that in case they are
interesting.


 2) There's no author mapping, can this option be added?


I'm not exactly sure what you mean by author mapping but I assume
translating between things like f...@bedrock.com and Fred Flintstone 
f...@bedrock.com? Is there a generally accepted format that other tools
use for this?

This would be easy enough to add but with the caveat above about picking one
author from several. We will very likely not agree on author or date on some
revisions where multiple certs exist.


 3) I add the mtn sha1 in refs/mtn/id


This is easy to add too. I have added refs/mtn/roots/id and
refs/mtn/leaves/id and was wondering about all of the monotone revision
ids. I assume the leaf refs would prevent git from wanting to garbage
collect otherwise unreferenced revs if there were any?


 Only 1) would be required to do a comparison, 2) would be great to
 avoid converting the huge repo again without author mappings.


Another option here is to process the exported output through
sed/awk/perl/python during the fast-import phase. I suspect this may be
needed in some cases anyway to fix branch names and things allthough I guess
a branch mapping file would also be a possibility.

In order to do future updates I think 3) would be really great, that
 way it's possible to know if a revision has been imported or not, and
 makes possible to do quick lookups like: git show mtn/sha1.


Yeah, this is probably worth having, at least for checking things over after
an import. I'm not sure if you would want to keep these refs around long
term or not. I was wondering about exporting the marks file as well, but
this would probably be better.

All of these things had crossed my mind previously and I'll probably get to
them at some point.

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-13 Thread Felipe Contreras
On Wed, Jan 14, 2009 at 6:41 AM, Derek Scherger de...@echologic.com wrote:
 On Tue, Jan 13, 2009 at 8:30 PM, Felipe Contreras
 felipe.contre...@gmail.com wrote:

 It seems to speed up at some points, I have tried again two times but
 I had issues, I still don't have the numbers but it's probably faster
 than what I thought.

 Yeah, it depends a lot on the length of the delta chains required to
 reconstruct rosters. Newer rosters reconstruct faster.

  I have not verified this imported repo in any way yet though so who
  knows
  whether its accurate or not.

 I can verify comparing to the output of my tool, but there are some
 differences:

 Good and bad I guess. ;)

 1) Your tool adds a bunch of Monotone- fields, can those be disabled?

 There's no option at the moment but it would be easy to add.

It would be really useful.

 Note that monotone revisions can have multiple authors, dates, changelogs,
 etc. if several people merge two revisions to the same result. The git
 fast-import format doesn't seem to allow more than committer and author and
 the monotone side doesn't indicate which would be which. So, at the moment I
 just grab one author, date and branch, cert and use that. I do concatenate
 all the changelogs and comment certs together for the git commit message and
 add the Monotone- values on to the end of that in case they are
 interesting.

I ignore the comments and use only the first value of changelog, date
and author. I set the committer to the first person that added a
changelog.

The git format only allows one author and one committer, but the
convention is to add multiple 'signed-off-by' lines for the people
that reviewed and accepted the patch. I have an option to add the
s-o-b lines, but I have it off.

I'm interested in a git repo that looks natural, not to have every
single bit of information from the mtn repo.

 2) There's no author mapping, can this option be added?

 I'm not exactly sure what you mean by author mapping but I assume
 translating between things like f...@bedrock.com and Fred Flintstone
 f...@bedrock.com? Is there a generally accepted format that other tools
 use for this?

Yes, that's what I meant.

The only format I know is the one from git-svn:
felipec = Felipe Contreras felipe.contre...@gmail.com

 This would be easy enough to add but with the caveat above about picking one
 author from several. We will very likely not agree on author or date on some
 revisions where multiple certs exist.

Right, but I don't think there's any point in trying to mirror exactly
the original repo; it's not possible. Lets just settle for a
reasonably good approximation.

 3) I add the mtn sha1 in refs/mtn/id

 This is easy to add too. I have added refs/mtn/roots/id and
 refs/mtn/leaves/id and was wondering about all of the monotone revision
 ids. I assume the leaf refs would prevent git from wanting to garbage
 collect otherwise unreferenced revs if there were any?

If there's a ref pointing to it, then it's not pruned.

 Only 1) would be required to do a comparison, 2) would be great to
 avoid converting the huge repo again without author mappings.

 Another option here is to process the exported output through
 sed/awk/perl/python during the fast-import phase. I suspect this may be
 needed in some cases anyway to fix branch names and things allthough I guess
 a branch mapping file would also be a possibility.

I don't like the sed approach.

Branches and tags can be manually fixed a posteriori, no big issue.
The important things are the commits themselves.

 In order to do future updates I think 3) would be really great, that
 way it's possible to know if a revision has been imported or not, and
 makes possible to do quick lookups like: git show mtn/sha1.

 Yeah, this is probably worth having, at least for checking things over after
 an import. I'm not sure if you would want to keep these refs around long
 term or not. I was wondering about exporting the marks file as well, but
 this would probably be better.

It probably depends on the intent of the clone:
a) migrate the repo forever
b) mirror a mtn repo

Right now I'm interested in b), so I find the ref/mtn approach very
useful since I can quickly look for the mtn or git sha1.

 All of these things had crossed my mind previously and I'll probably get to
 them at some point.

Cool, then we are mostly in sync ;)

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-12 Thread Markus Wanner
Hi,

Derek Scherger wrote:
 On Mon, Jan 5, 2009 at 6:52 AM, Markus Wanner mar...@bluegap.ch wrote:
 
 (I'd also love to see the underscore in cvs_import and git_export
 vanish. But that's another issue..)
 
 Just following up on this. What would you prefer instead (replace the
 underscore with dashes) or subcommands (like ls foo) of cvs and git (mtn cvs
 import, mtn git export, etc.) ?

Please note that this is not exclusive to your patch, but involves more
general redesign and cleanup of the mtn commands.

In this case, yeah, IMO it would be nicer to have a sub-group of mtn
cvs and mtn git commands.

 mtn git export
 mtn cvs import
 mtn hg pull (one fine day)
 mtn git pull (another fine day)

Or, we decide that single commands are better and turn everything into
an underscore writing. I personally don't think these look very compelling:

 mtn ls_branches
 mtn db_init

The point here is UI consistency. But that certainly needs more thought
and discussion. I don't really want to get into it now. Maybe on the
Mini-Summit?

Regards

Markus Wanner


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-06 Thread Markus Wanner
Hi,

Derek Scherger wrote:
 It's used for revs that have no other branch certs to use.

This is a bit off-topic here, but: is there a use case for revisions
without *any* branch cert?

It's somewhat similar to the invalid date certs: it's certainly easier
to just store the data and not care.

However, most other VCSen I've seen so far do not allow such a thing.

Regards

Markus Wanner



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-06 Thread Patrick Georgi
Am Dienstag, den 06.01.2009, 09:47 +0100 schrieb Markus Wanner:
 This is a bit off-topic here, but: is there a use case for revisions
 without *any* branch cert?
I think you get these if you pull a branch, which descends into other
branches that you don't pull


Regards,
Patrick Georgi




___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-06 Thread Thomas Moschny

Patrick Georgi wrote:

Am Dienstag, den 06.01.2009, 09:47 +0100 schrieb Markus Wanner:

This is a bit off-topic here, but: is there a use case for revisions
without *any* branch cert?

I think you get these if you pull a branch, which descends into other
branches that you don't pull


Yes, or later, when you don't trust a branch cert for some reasons. This 
is a general topic that has to be discussed in the scope of the so 
called policy branches: is it really meaningful to use a revision's 
data (in the sense that it is an ancestor of a revision I actively use 
or even commit myself), but not trust its metadata?


But back to the topic: I wonder how missing or untrusted branch certs 
affect the git-export, unless these revisions are leaves?


- Thomas


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-06 Thread Derek Scherger
On Mon, Jan 5, 2009 at 7:27 PM, Felipe Contreras felipe.contre...@gmail.com
 wrote:

 On Tue, Jan 6, 2009 at 1:42 AM, Derek Scherger de...@echologic.com
 wrote:
  On Mon, Jan 5, 2009 at 2:22 PM, Felipe Contreras
  felipe.contre...@gmail.com wrote:
 
  Why an extra master branch? There's no need for that branch.
 
  It's used for revs that have no other branch certs to use.

 Would it make sense to use another name? nil, unknown or something?


Possibly. It's probably not a big deal to rename it after importing either.

 taking about 6 seconds per commit, that's too slow.
  Working with the roster is extremely slow. Right now your tool is


progress revision a19da46cd3d31611d768b67a772c2861aded46c5 (27501/27501)

real277m23.646s
user251m33.919s
sys 24m42.357s

For the record I can apparently convert the pidgin database in about 4.5
hours on a 2.4GHz core 2 which works out to about 1.65 revs per second (0.61
seconds per rev) on average. The export file is 3.4GB in size.

On the import side...

progress revision a19da46cd3d31611d768b67a772c2861aded46c5 (27501/27501)

real5m10.373s
user4m34.469s
sys 0m6.984s

git imports this at a rate of 89 revs per second (0.011 seconds per rev)...
pretty impressive. The inital git repo is 762MB vs the 252MB monotone repo.
Repacking with git repack -adf...

real3m43.852s
user3m41.978s
sys 0m1.272s

after which the git repo is 99MB. Again pretty impressive.

I have not verified this imported repo in any way yet though so who knows
whether its accurate or not.

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Derek Scherger
On Mon, Jan 5, 2009 at 7:24 AM, Markus Wanner mar...@bluegap.ch wrote:


 Sorry if that didn't get clear, but I'd vote for the smallest change for
 now. (IMO, that means leaving cvs_import and git_export, but put them
 into a single command group).


+1

I'll have a look at doing this and dealing with rename ordering over the
next few days unless someone else gets around to it.

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Markus Wanner
Hi,

Derek Scherger wrote:
 I've spent a bit of holiday hacking time working on a git_export command for
 monotone, more as an experiment than anything else. I've committed the
 result to net.venge.monotone.fast-export for people to have a look at.

Awesome!

 Three exported branch names net.prjek:tester,
 net.prjet:tester/drop-for-propagate and prjek.net:tester where changed
 (with sed) during the import process because git does not allow colon's (and
 various other characters) in branch/ref names. I simply changed : and /
 in these names to . although the / should have worked it did cause an
 error of some sort.

Maybe respect those limitations within mtn git_export?

 The conversion was verified by checking out each of the 276 branches and 126
 tags from both git and mtn and comparing the resulting workspaces. The
 script I used to do this verification was a bit dumb and failed to checkout
 a few revisions so these weren't compared. Using only the branch name failed
 in some cases because there were multiple heads and using only a tag name
 failed in some cases because the tagged revisions had no branch certs. All
 of the branches and tags that did checkout were identical according to diff
 -qr so I'm reasonably confident that the new exporter basically works.

In my experience with cvs_import, it's rather often the history, than
the end result, which looks strange. However, that's certainly harder to
check and git seems a tiny bit closer to the spirit of mtn, than CVS.

 I suspect that the various other git fast-import conversion scripts that
 exist for monotone are probably slower and less robust than this
 implementation (unless they work similarly from rosters) which uses the
 monotone internals to do the work. I spent a bit of time initially trying to
 export revisions using the revision data structures but this didn't work
 very well. Git only deals with files and trying to order a mix of renames of
 directories and files from monotone correctly from revisions was difficult.
 Ultimately I didn't use the revision data structures at all but built up a
 similar files-only based revision representation by comparing rosters. Much
 like what is done for make_cset, but ignoring directories and producing only
 file deletions, renames and additions. This works much better, correctly
 handles pivot_root and a few other odd things that working with revisions
 proved difficult.

Sounds good.

 This exporter does not (yet) handle all rename ordering issues that are
 possible. For example rename a b followed by rename b c will probably
 fail on import unless it is executed as rename b c followed by rename a
 b. Similarly rename a b followed by rename b a which is indeed
 possible, will probably fail on import and requires the introduction of a
 third temporary file. These problems can be fixed in the exporter and can
 also be fixed in the exported data by re-ordering renames as required.

Hm.. maybe I just don't understand git fast-import format, but isn't the
ordering derived from the revision graph?

 This feels a bit like throwing in the proverbial towel and I hope this
 doesn't elicit any ill-will from the current monotone crowd. I'm not really
 planning on converting my personal stuff from monotone any time soon but
 knowing it can be done without losing information is nice.

You are definitely underestimating the mtn world domination plan (tm).

 I'm still happy
 to contribute to monotone but with 2 small kids my free/hacking time is
 pretty limited.

Very well understood.


Some questions aside: is there a git fast-export? Can you comment on
writing mtn git_import? Problems we are facing there?

Regards

Markus Wanner



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


monotone world domination. Was: [Monotone-devel] git fast-export

2009-01-05 Thread hendrik
On Mon, Jan 05, 2009 at 09:22:03AM -0700, Derek Scherger wrote:
 
 You are definitely underestimating the mtn world domination plan (tm).

Part of it is, presumably, convincing people that monotone is not a 
trap, that they feel safe.  This monotone - git transformer does this.

We also need to ensure popular systems aren't a trap either, so those 
stuck with them can escape fo monotone if they want.  We'll need a 
git-monotone, preferably compatible with the monotone-git for 
round-trip revisions.  And an anything-popular-monotone conversion.

Ideal would be for monotone to have a git front end, so that is can 
transparently serve as a git server if necessary.

This is probably a lot of work.

-- hendrik



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Derek Scherger
On Mon, Jan 5, 2009 at 6:43 AM, Markus Wanner mar...@bluegap.ch wrote:

  Three exported branch names net.prjek:tester,
  net.prjet:tester/drop-for-propagate and prjek.net:tester where
 changed
  (with sed) during the import process because git does not allow colon's
 (and
  various other characters) in branch/ref names. I simply changed : and
 /
  in these names to . although the / should have worked it did cause an
  error of some sort.

 Maybe respect those limitations within mtn git_export?


Maybe... I'm not sure what we're replace all the bad characters with though.
Apparently multiple consecutive dots (i.e. ..) are also disallowed so
leaving this up to an intermediate script doesn't seem such a bad idea
either.


 In my experience with cvs_import, it's rather often the history, than
 the end result, which looks strange. However, that's certainly harder to
 check and git seems a tiny bit closer to the spirit of mtn, than CVS.


Agreed.


 Hm.. maybe I just don't understand git fast-import format, but isn't the
 ordering derived from the revision graph?


The ordering of the revisions is, but the ordering of the commands (add,
delete, rename) within a single revision is not and monotone allows chains
of renames that, for git, would have to happen in lifo order.

i.e. if a rev has the following: rename a b, rename b c, rename c d
it would have to be fed to git as: rename c d, rename b c, rename a b

... I think. I haven't actually tested this yet though so I could be wrong.

You are definitely underestimating the mtn world domination plan (tm).


Heh... I'm all for it, but we the bar has been raised considerably since
monotone started and we'll have a lot of work to do!

Some questions aside: is there a git fast-export? Can you comment on
 writing mtn git_import? Problems we are facing there?


There is a git fast-export. Writing a mtn git_import should be reasonably
straight forward I think. I'm sure there will be some quirks though.

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Markus Wanner
Hi,

Thomas Keller wrote:
 Of course this is bikeshed discussion and still doesn't solve the issue
 where to put these - having two groups already with only one command
 (database, automation) is already stupid enough - and a command group
 cannot be specified in the commandline either, so an import and
 export group is equally stupid.

Sorry if that didn't get clear, but I'd vote for the smallest change for
now. (IMO, that means leaving cvs_import and git_export, but put them
into a single command group).

Regards

Markus Wanner



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Derek Scherger
On Mon, Jan 5, 2009 at 9:32 AM, Thomas Moschny thomas.mosc...@gmx.dewrote:

 Derek Scherger:

 I've spent a bit of holiday hacking time working on a git_export command
 for monotone, more as an experiment than anything else.


 Nice!

  This successfully (I think) converts the entire monotone database with 276
 branches (more or less what you get when you pull '*' from monotone.ca 
 http://monotone.ca) to a git repository.Here's some details on the
 conversion:


 This doesn't honor suspend certs, does it?


At the moment no it doesn't and I'm not sure you would want to. Currently it
will create branch refs (refs/heads/branchname) for all branches. Not doing
so would probably leave suspended branches in an unreachable state and
thus prone to garbage collection, as I understand it.

And one more interesting question, what do you do with branches that have
 multiple heads?


At the moment nothing in particular. I suspect this will leave all but one
of the heads in an unreachable state again open for garbage collection. I'm
not sure how we would want to handle this in general. Possibly just by
saying make sure all of your branches are merged before exporting.

I considered whether we should also write refs/mtn/revid references so that
it is possible to refer to revisions by their original monotone sha1 to
avoid leaving any unreachable revs. You probably wouldn't want to keep this
list around indefinitely but it's easy to remove these once they are no
longer useful, assuming they don't cause any problems I guess.

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Thomas Keller
Markus Wanner schrieb:
 Thomas Keller wrote:
 I scanned over your patch and noticed only a little quirk: you've
 introduced yet another command group (git); wouldn't it be better to
 collect all the different import/export functions under one common
 group, like the existing rcs (revision control systems) or, to avoid
 the RCS analogy, vcs?
 
 Maybe ovcs, for Other, Overaged, Obscure or Outrageous VCS?
 
 Other than that, I agree with Thomas that a single command group for all
 import/export/push/pull/sync mechanisms to/from other VCSen is plenty.
 It's not like we have such an overwhelming amount of conversion commands.
 
 (I'd also love to see the underscore in cvs_import and git_export
 vanish. But that's another issue..)

What about two commands with subcommands, export and import, where
we could have

mtn export git
mtn import cvs

for now?

Of course this is bikeshed discussion and still doesn't solve the issue
where to put these - having two groups already with only one command
(database, automation) is already stupid enough - and a command group
cannot be specified in the commandline either, so an import and
export group is equally stupid.

Thomas.

-- 
GPG-Key 0x160D1092 | tommyd3...@jabber.ccc.de | http://thomaskeller.biz
Please note that according to the EU law on data retention, information
on every electronic information exchange might be retained for a period
of six months or longer: http://www.vorratsdatenspeicherung.de/?lang=en




signature.asc
Description: OpenPGP digital signature
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Markus Wanner
Hi,

Thomas Keller wrote:
 I scanned over your patch and noticed only a little quirk: you've
 introduced yet another command group (git); wouldn't it be better to
 collect all the different import/export functions under one common
 group, like the existing rcs (revision control systems) or, to avoid
 the RCS analogy, vcs?

Maybe ovcs, for Other, Overaged, Obscure or Outrageous VCS?

Other than that, I agree with Thomas that a single command group for all
import/export/push/pull/sync mechanisms to/from other VCSen is plenty.
It's not like we have such an overwhelming amount of conversion commands.

(I'd also love to see the underscore in cvs_import and git_export
vanish. But that's another issue..)

 Other than that I think you're right that people should have the choice
 what to use, and at least some need for a mtn - git conversion tool
 popped up a few times on the list. Since git seems to become the new
 big thing its formats will probably be more wildly used than ours, so
 even people who stick to monotone will have something from this. (F.e.
 code_swarm [0] has a git-mode to render the pretty videos, though the
 XML format [1] it needs as input is easy enough to create even with a
 custom lua command...)

I absolutely agree.

Regards

Markus Wanner



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Thomas Moschny

Derek Scherger:
I've spent a bit of holiday hacking time working on a git_export command 
for monotone, more as an experiment than anything else.


Nice!

This successfully (I think) converts the entire monotone database with 
276 branches (more or less what you get when you pull '*' from 
monotone.ca http://monotone.ca) to a git repository.Here's some 
details on the conversion:


This doesn't honor suspend certs, does it?

And one more interesting question, what do you do with branches that 
have multiple heads?


- Thomas



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Thomas Keller
Derek Scherger schrieb:
 I've spent a bit of holiday hacking time working on a git_export command
 for monotone, more as an experiment than anything else. I've committed
 the result to net.venge.monotone.fast-export for people to have a look
 at. [...]

Hi Derek!

I scanned over your patch and noticed only a little quirk: you've
introduced yet another command group (git); wouldn't it be better to
collect all the different import/export functions under one common
group, like the existing rcs (revision control systems) or, to avoid
the RCS analogy, vcs?

Other than that I think you're right that people should have the choice
what to use, and at least some need for a mtn - git conversion tool
popped up a few times on the list. Since git seems to become the new
big thing its formats will probably be more wildly used than ours, so
even people who stick to monotone will have something from this. (F.e.
code_swarm [0] has a git-mode to render the pretty videos, though the
XML format [1] it needs as input is easy enough to create even with a
custom lua command...)

Thomas.

[0] http://vis.cs.ucdavis.edu/~ogawa/codeswarm/
[1]
http://code.google.com/p/codeswarm/source/browse/trunk/data/sample-repevents.xml

-- 
GPG-Key 0x160D1092 | tommyd3...@jabber.ccc.de | http://thomaskeller.biz
Please note that according to the EU law on data retention, information
on every electronic information exchange might be retained for a period
of six months or longer: http://www.vorratsdatenspeicherung.de/?lang=en




signature.asc
Description: OpenPGP digital signature
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Felipe Contreras
On Mon, Jan 5, 2009 at 8:09 AM, Derek Scherger de...@echologic.com wrote:
 I've spent a bit of holiday hacking time working on a git_export command for
 monotone, more as an experiment than anything else. I've committed the
 result to net.venge.monotone.fast-export for people to have a look at.
 There's probably not much preventing this from landing on mainline, other
 than some documentation and possibly tests. Although I'm not really sure how
 we would want to go about testing it beyond what I've already done. The fun
 part about a command like this is that I expect most users of it would have
 some expectation of being their own testers in terms of verifying their
 conversions and such.

Great! I'm already trying it with Pidgin.

 This successfully (I think) converts the entire monotone database with 276
 branches (more or less what you get when you pull '*' from monotone.ca) to a
 git repository.Here's some details on the conversion:

 exported monotone database
 - 174MB in size
 - 276 branches
 - 127 tags (with one duplicate name monotone-viz-1.0.1-1
 - export time 83m42.134s (on a 2.0GHz pentium-m laptop)
 - export file size 2.9GB
 - 15245 revisions exported

 imported git repository
 - 719MB in size (before being repacked)
 - import time 23m15.463s
 - repack -adf time 3m14.385s
 - packed repository size 60MB
 - 277 branches (the extra one is master)

Why an extra master branch? There's no need for that branch.

 - 126 tags (missing the duplicate above)

 Three exported branch names net.prjek:tester,
 net.prjet:tester/drop-for-propagate and prjek.net:tester where changed
 (with sed) during the import process because git does not allow colon's (and
 various other characters) in branch/ref names. I simply changed : and /
 in these names to . although the / should have worked it did cause an
 error of some sort.

 The conversion was verified by checking out each of the 276 branches and 126
 tags from both git and mtn and comparing the resulting workspaces. The
 script I used to do this verification was a bit dumb and failed to checkout
 a few revisions so these weren't compared. Using only the branch name failed
 in some cases because there were multiple heads and using only a tag name
 failed in some cases because the tagged revisions had no branch certs. All
 of the branches and tags that did checkout were identical according to diff
 -qr so I'm reasonably confident that the new exporter basically works.

I have a ruby script (mtn2git) that I'm pretty confident generates an
exact clone, the problem is that it's *very* slow.

I could probably compare the output of mtn2git with your tool but it
would probably take more than one entire day to generate the repo.

 I suspect that the various other git fast-import conversion scripts that
 exist for monotone are probably slower and less robust than this
 implementation (unless they work similarly from rosters) which uses the
 monotone internals to do the work. I spent a bit of time initially trying to
 export revisions using the revision data structures but this didn't work
 very well. Git only deals with files and trying to order a mix of renames of
 directories and files from monotone correctly from revisions was difficult.
 Ultimately I didn't use the revision data structures at all but built up a
 similar files-only based revision representation by comparing rosters. Much
 like what is done for make_cset, but ignoring directories and producing only
 file deletions, renames and additions. This works much better, correctly
 handles pivot_root and a few other odd things that working with revisions
 proved difficult.

Working with the roster is extremely slow. Right now your tool is
taking about 6 seconds per commit, that's too slow.

I agree that working with revisions it very error prone, but it's the
only decent approach if you want something fast.

I think the best way to do this would be with revisions, and careful
comparisons with other more robust approaches, until all the issues
are tracked down.

Cheers.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Derek Scherger
On Mon, Jan 5, 2009 at 2:22 PM, Felipe Contreras felipe.contre...@gmail.com
 wrote:


 Why an extra master branch? There's no need for that branch.


It's used for revs that have no other branch certs to use.

Working with the roster is extremely slow. Right now your tool is
 taking about 6 seconds per commit, that's too slow.


How many files are in a pidgin checkout? I was getting about 3 revs per
second on the monotone database on my laptop.
How fast is your machine, what does hdparm report, how much ram do you have,
etc.?

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] git fast-export

2009-01-05 Thread Felipe Contreras
On Tue, Jan 6, 2009 at 1:42 AM, Derek Scherger de...@echologic.com wrote:
 On Mon, Jan 5, 2009 at 2:22 PM, Felipe Contreras
 felipe.contre...@gmail.com wrote:

 Why an extra master branch? There's no need for that branch.

 It's used for revs that have no other branch certs to use.

Would it make sense to use another name? nil, unknown or something?

 Working with the roster is extremely slow. Right now your tool is
 taking about 6 seconds per commit, that's too slow.

 How many files are in a pidgin checkout? I was getting about 3 revs per
 second on the monotone database on my laptop.
 How fast is your machine, what does hdparm report, how much ram do you have,
 etc.?

It depends at which point in the history, but that point I guess about
1700 files.

I have a centrino duo 1.83ghz, 2 GB of ram.

hdparm -t shows:

 Timing buffered disk reads:  116 MB in  3.08 seconds =  37.63 MB/sec

I don't know if that's what you are looking for.

Using revisions I've been able to convert the 27000 commits to git in
about 2 hours, but it doesn't work properly, I need to rewrite my
script.

-- 
Felipe Contreras


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] git fast-export

2009-01-04 Thread Derek Scherger
I've spent a bit of holiday hacking time working on a git_export command for
monotone, more as an experiment than anything else. I've committed the
result to net.venge.monotone.fast-export for people to have a look at.
There's probably not much preventing this from landing on mainline, other
than some documentation and possibly tests. Although I'm not really sure how
we would want to go about testing it beyond what I've already done. The fun
part about a command like this is that I expect most users of it would have
some expectation of being their own testers in terms of verifying their
conversions and such.

This successfully (I think) converts the entire monotone database with 276
branches (more or less what you get when you pull '*' from monotone.ca) to a
git repository.Here's some details on the conversion:

exported monotone database
- 174MB in size
- 276 branches
- 127 tags (with one duplicate name monotone-viz-1.0.1-1
- export time 83m42.134s (on a 2.0GHz pentium-m laptop)
- export file size 2.9GB
- 15245 revisions exported

imported git repository
- 719MB in size (before being repacked)
- import time 23m15.463s
- repack -adf time 3m14.385s
- packed repository size 60MB
- 277 branches (the extra one is master)
- 126 tags (missing the duplicate above)

Three exported branch names net.prjek:tester,
net.prjet:tester/drop-for-propagate and prjek.net:tester where changed
(with sed) during the import process because git does not allow colon's (and
various other characters) in branch/ref names. I simply changed : and /
in these names to . although the / should have worked it did cause an
error of some sort.

The conversion was verified by checking out each of the 276 branches and 126
tags from both git and mtn and comparing the resulting workspaces. The
script I used to do this verification was a bit dumb and failed to checkout
a few revisions so these weren't compared. Using only the branch name failed
in some cases because there were multiple heads and using only a tag name
failed in some cases because the tagged revisions had no branch certs. All
of the branches and tags that did checkout were identical according to diff
-qr so I'm reasonably confident that the new exporter basically works.

I suspect that the various other git fast-import conversion scripts that
exist for monotone are probably slower and less robust than this
implementation (unless they work similarly from rosters) which uses the
monotone internals to do the work. I spent a bit of time initially trying to
export revisions using the revision data structures but this didn't work
very well. Git only deals with files and trying to order a mix of renames of
directories and files from monotone correctly from revisions was difficult.
Ultimately I didn't use the revision data structures at all but built up a
similar files-only based revision representation by comparing rosters. Much
like what is done for make_cset, but ignoring directories and producing only
file deletions, renames and additions. This works much better, correctly
handles pivot_root and a few other odd things that working with revisions
proved difficult.

This exporter does not (yet) handle all rename ordering issues that are
possible. For example rename a b followed by rename b c will probably
fail on import unless it is executed as rename b c followed by rename a
b. Similarly rename a b followed by rename b a which is indeed
possible, will probably fail on import and requires the introduction of a
third temporary file. These problems can be fixed in the exporter and can
also be fixed in the exported data by re-ordering renames as required.

WARNING: Please don't bet your life on this implementation! If you do use it
to convert a repository you must do careful verification of the converted
results. WORKSFORME is the only assurance I can make.

This feels a bit like throwing in the proverbial towel and I hope this
doesn't elicit any ill-will from the current monotone crowd. I'm not really
planning on converting my personal stuff from monotone any time soon but
knowing it can be done without losing information is nice. I'm still happy
to contribute to monotone but with 2 small kids my free/hacking time is
pretty limited.

Cheers,
Derek
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel