On Thu, Aug 24, 2006 at 03:24:18PM +0200, Markus Schiltknecht wrote: > Nathaniel Smith wrote: > >My memory of the discussion before is not that it was rejected for not > >being like cvs2svn. Just, if you're making up your own algorithm, > >we'd like to see a description and justification of it so we have a > >chance to apply some of the collective brain power here to making sure > >it makes sense. Because, well, I doubt _anyone_ is smart enough to > >invent a complete and correct CVS reconstruction algorithm without > >some help noticing where they forgot nasty edge cases :-). (Certainly > >I'm not.) > > Maybe. However, I don't feel like making up my own algorithm for that. I > just thought maybe this change I did could already be sufficient. But I > know not it's not. So I will try to do something closer to what cvs2svn > does.
Okay. > >And, to make the process a little easier, cvs2svn is a very good place > >to look, because they've done a lot of that work to find all the > >approaches that _don't_ work already, so hopefully we could piggyback > >on that. > > ..yeah, I have already included their design-notes.txt into the > repository (uh... is that license compatible at all?) and added my own > comments about how I did it for mtn cvs_import. Hmm... it looks like SVN has an advertising-like clause in its license (WTF?), plus a "you may not use the word 'tigris' in your project name" requirement that goes way beyond trademark law, so no, it's probably not GPL-compatible. This isn't a huge problem, because it's not like we're going to compile design-notes.txt and link it with the rest of our code anyway :-). (Though don't copy/paste from it into source code comments.) It would probably still be better not to redistribute it in the monotone distribution -- perhaps by the time this lands on mainline,we could just provide a link, along with our own description of what we do? (We probably want one of those anyway, that isn't spending all its time talking about what formats intermediate data files are written out in, and how the 'sort' program is called...) Of course, if it's useful to mark up a copy on the branch, feel free -- there's definitely no need to write up a whole document on how things work, before we have even figured out how things work :-). > >I am a bit curious about this sql table for tracking them, though; > >it doesn't make a whole lot of sense to me at first glance. There's > >some question about storing it on disk in the first place -- > > We need to store some information on disk to help speed up later > resyncs. Ah, I see, this is for incremental re-imports -- I missed that part. You may want to decide to either get incremental imports working first, or get branch reconstruction working first, and start out concentrating on just one of them. cvs2svn doesn't write things to disk so it can support incremental re-imports. (IIUC, it doesn't support them at all.) It writes them to disk so, if an import is interrupted (like by a power failure or something), you can restart it. That's what I assumed you were trying to achieve by writing this thing to disk. (And this is the goal that doesn't seem very important to me, at least at this stage.) > I'm not sure if it's this RCS version <-> file_id mapping which > helps most. Of course as it's a separate table (as is) a resync could > only happen on the database which also did the very first import. I'm not sure either. Again, Christof is probably the one to talk to; my understanding is that he has a scheme for storing this information in monotone certs, and though this is really ugly, no-one's managed to come up with a better scheme after many months of trying. > >everything else cvs_import does is in-memory, which might not be > >ideal, but it hasn't seemed to cause any problems yet, and fixing it > >will take more than moving one single data structure onto disk. > > Why not? What more does it take? Do you want to have such information > netsynced to other repositories? If the goal is to be able to incrementally re-import, netsyncing with other repositories would definitely be handy :-). This is one of the reasons that Christof's cvssync works the way it does. However, I just meant that _if_ you wanted to be able to resume a failed cvs_import (and apparently you don't), you would have to move the existing data structures we use to disk, not just this one new one. > >More > >than that, though, it seems unlikely that a file_id<->rcs number > >mapping is what you're actually looking for? > > Like I said, I don't know. > > >Recall that a file_id simply identifies a bitstring -- it does not > >correspond uniquely to any particular "file" in any particular > >revision. In fact, a given revision may contain many files, that all > >have the same file_id (because they happen to have the same content). > > Aha. And from a file_id you cannot get the filename, then? So this > should better be called 'stream_id'? Yeah. The name is definitely a bit confusing. (It's _slightly_ more meaningful than something like 'stream_id', because it does specifically state that the bitstring in question is being used as file data, rather than, say, manifest data -- maybe the really correct name would be id_for_a_stream_that_is_used_as_a_file, or something like that.) > >Similarly, a rcs number is not useful on its own; every rcs file has > >some revision numbered 1.1, for instance... unless we're somehow > >mashing the rcs filename and the rcs version number together into a > >single string in this table, I don't see how it can be useful? > > Yeah, we probably need the filename, too. > > Like I said, it's just my 'scratch pad' thing. And it 'works' - at least > so far as it does write out the (to some extent useless) RCS -> file_id > mapping. Sure. I don't mean to stop your playing around -- I generally go through all sorts of horrible, stupid designs in order to discover one that's actually usable :-). I just realized that there are some traps here (like what a 'file_id' actually means), so I figured I might be able to save you a bit of time by pointing them out now, instead of letting you discover them on your own :-). -- Nathaniel -- In mathematics, it's not enough to read the words you have to hear the music _______________________________________________ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel