Hi, First things first, happy new year to all !
Having recently felt the pain to use subversion merge, I was wondering about people's feeling on moving away from subversion and using a better system, ala mercurial or bzr (I will talk about bzr because that's the one I know the most, but this discussion is really about using something better than subversion, not that much about bzr). I think this could be an important step forward, and is somewhat related to the discusions on scikits and co. As some of you are certainly aware, there has been a recent trend towards so called Distributed Version Control Systems (DVCS). I won't go into the details, because it varies from system to system, and I am in no position to explain technical details. But for people who are wondering, here is a small description of DVCS, and why I think this can be a significant step forward for numpy/scipy. You can skip it if you know about them What is a DVCS ============== DVCS, contrary to centralized systems like CVS or SVN, has no technical concept of a central repository from which everybodoy pulls and push changes. Instead, the DVCS are centered around the branch concept, which contains a local copy of the history. As a consequence: 1 you can do most traditionnal svn/cvs operations locally, and disconnected from the network (getting the log, getting annotations, commiting, branching, merging from other branches). 2 because the branch is local, no rights is needed: anybody can jump-in, commit to a local branch. Of course, integration in an official numpy branch would need some special approval. Also, this has the following consequence: since branching/merging is such a key point of DVCS, merging actually works with DVCS. In perticular, merging several times the same changes works, and you certainly do not have to do the whole svn madness of tracking versions. For more informations, here are some links which go much deeper: - some discussion from K. Richard, the maintained of X.org: http://keithp.com/blogs/Tyrannical_SCM_selection/, http://keithp.com/blogs/Repository_Formats_Matter/ - Linus Torvald on the advantages of git, the DVCS he wrote for linux developement, versus svn for kde (long but it really makes all the points really clearly): http://lists.kde.org/?l=kde-core-devel&m=118764715705846&w=2 Why using a DVCS ? ================== Some people argue that DVCS are intrinsically more complicated, which is something I really don't understand. I've been programming 'seriously' for only about 2-3 years, and I find bzr much easier to use and setting up than subversion; the key point I think is that I started using DVCS before centralized ones. Some things which are utterly complicated with subversion and are trivial with bzr: merging, going back into the history (that is at rev 150, you realize that everything from rev 140 is rubbish, and you want to go back: this is extremely tedious to do with subversion). Basically, most of the things which are the reasons why we use VCS in the first place are easier with DVCS than VCS (at least as far as svn is concerned). Also: - For a casual user who wants to use the last development instead of a release, getting it from a bzr repository, a git repository, a mercurial repository or a svn repository is extremely similar. It is one step in all cases. - For casual developers: being able to use branches means that they can implements their new features in a change-set oriented way instead of one big patch. Also, bzr enables things like uncommit if you made a mistake and wants to go back. More generally, going back in history is much easier. - For core developers: I personally find the ability to use branches for each new feature to be extremely useful. It makes me feel much safer when I do something. I am not afraid of doing something totally stupid which may end up screwing other people. And finally, I find the ability to do things locally to be really pleasant and it enables workflows not really possible with systems such as SVN. In particular, I work at three distant places every week, and the ability to work in the transportation, and the trivial synchronization between computers is definitely helpful. Instantaneous log and annotations is also really useful IMHO. Which DVCS ? ============ The 3 ones which keeps coming up are: - git (the one used for linux kernel development). That's the one I know the least (only from a user point of view, never used it for developement). It is supposed to be more powerful, more complicated than the others. It is also known to be really fast (the kernel is not a small codebase for sure). - mercurial: started at the same time than git. Is written in python except for a few things written in C. It is reasonably fast, and has been recently selected for some bigs projects, in perticular by Sun (openJDK, openSolaris, open Netbeans). - bzr: also written in python. Sponsored by Cannonical, the company between Ubuntu. It has just reached the 1.0 version. The focus is on the UI; handles renaming really well. It has a vibrant community, with dedicated developers working on it; it has the reputation of being slow, which was somewhat true previously, but in my experience, it is on par with mercurial, at least for local operations. Anyway, it is not a problem for numpy or scipy, which are small codebases (a few thousand of files, a few thousand revisions). Problems: ========= Assuming people think it worths being tried out, I mainly see two problems: - importing the current history - integration with trac For bzr, I can say that the bzr-svn plugin works really well; in perticular, it can import numpy and scipy repositories with the whole history, I am using it regurlarly as a proxy between local bzr and the scipy and scikits trunk. Incidentally, this makes it possible for me to give numbers if numbers are needed wrt bzr's speed, repository size, etc... For mercurial, I tried one method once which did not go really far, but I did not try really hard; anyway, I think people at enthought use mercurial a lot, so they would know better. Integration with trac is the real problem, I think. According to one bzr developer, trac model (0.10, the last released one) is really based around subversion notion of repository, which does not fit well with mercurial and bzr. I don't know if this is true for the not yet released 0.11. If bzr is considered a possible candidate, I can get more informations from bzr developers. What is the experience wrt trac from enthought developers ? This email is already getting pretty long, so to conclude, I think DVCS would be helpful for future development of numpy/scipy. I believe it would both enable easier participation from different people, enabling safer developement schedules, etc... What do other people think ? Would it be worthwhile to discuss further around the issues and how to resolve things ? cheers, David P.S: I would be willing to take care about the bzr side of things: trying conversion, setting up experimental repositories for trial, and asking advices to the bzr community. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion