Re: [HACKERS] PostgreSQL Developer meeting minutes up
2009/6/7 Tom Lane t...@sss.pgh.pa.us: So there are a lot of good reasons to work backwards in patching. I don't believe that these would be outweighed by some advantage in the mechanics of applying an unchanging patch to multiple branches (especially since AFAICT the mechanical advantage would be pretty darn minimal anyhow). As another data point, the stable branches of the linux kernel are actually maintained this way. There is a policy that any patch for the stable branches must have already be included (in some form) in HEAD. There is no merging going on. They aren't even using git cherry-pick, but that's because all backpatching goes into a review list rather than happening immediately. The multiple branches and merging that is going on in the linux kernel is all about development of new features, not fixing of bugs. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Mark Mielke m...@mark.mielke.cc: I am a theory person - I run things in my head. To me, the concept of having more context to make the right decision, and an algorithm that takes advantage of this context to make the right decision, is simple and compelling on its own. Knowing the algorithms that are in use, including how it selects the most recent common ancestor gives me confidence. Than makes me wondering why you are speaking against merges, where there are common ancestors. I'd argue that in theory (and generally) a merge yields better results than cherry-picking (where there is no common ancestor, thus less information). Especially for back-branches, where there obviously is a common ancestor. No amount of discussions where others say it works great and you say I don't believe you until you provide me with output is going to get anywhere. Well, I guess it can be frustrating for both sides. However, I think these discussions are worthwhile (and necessary) none the less. As not even those who highly appreciate merge algorithms (you and me, for example) are in agreement on how to use them (cherry-picking vs. merging) it doesn't surprise me that others are generally skeptic. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Robert Haas wrote: On Fri, Jun 5, 2009 at 12:15 PM, Tom Lanet...@sss.pgh.pa.us wrote: ... but I'm not at all excited about cluttering the long-term project history with a zillion micro-commits. One of the things I find most annoying about reviewing the current commit history is that Bruce has taken a micro-commit approach to managing the TODO list --- I was seldom so happy as the day that disappeared from CVS, because of the ensuing reduction in noise level. For better or worse, git also includes a command git-rebase that can collapse such micro-commits into a larger one. Quoting the git-rebase man page: A range of commits could also be removed with rebase. If we have the following situation: E---F---G---H---I---J topicA then the command git-rebase --onto topicA~5 topicA~3 topicA would result in the removal of commits F and G: E---H´---I´---J´ topicA While I wouldn't recommend using this for historical revisionism, I imagine it could be useful during code-review time when the micro-commits (from both the patch submitter and patch reviewer) are interesting. After the review, the commits could be collapsed into meaningful-sized-chunks just before they're merged into the official branches. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Nicolas Barbier nicolas.barb...@gmail.com: If I understand correctly, nearby variable renaming refers to changes to the few lines surrounding the changes-to-be-merged. Hm.. I took that to mean changes on the same line. I now realize this interpretation has been an overly strict interpretation. There is certainly supposed to be an advantage relative to diff/patch here: as all changes leading to both versions are known (up to some common ancestor), git doesn't need context lines to recognize the position in the file that is supposed to receive the updates. Yes, that's how I understand it as well. Your example seems fine (except that it does not make much sense to merge with an ancestor). I'm not sure if git also works line by line (as does monotone). However, IIRC kdiff3 uses some finer grained comparison, so it can even merge unrelated change on the same line, i.e.: ancestor: aaa bbb left: axa bbb (modified a - x) right:aaa byb (modified b - y) merge:axa byb (contains both modifications) Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Markus Wanner wrote: Quoting Mark Mielke m...@mark.mielke.cc: I am a theory person - I run things in my head. To me, the concept of having more context to make the right decision, and an algorithm that takes advantage of this context to make the right decision, is simple and compelling on its own. Knowing the algorithms that are in use, including how it selects the most recent common ancestor gives me confidence. Than makes me wondering why you are speaking against merges, where there are common ancestors. I'd argue that in theory (and generally) a merge yields better results than cherry-picking (where there is no common ancestor, thus less information). Especially for back-branches, where there obviously is a common ancestor. Nope - definitely not speaking against merges. Automatic merges = best. Automatic cherry picking = second best if the work flow doesn't allow for merges. Doing things by hand = bad but sometimes necessary. Automatic merges or automatic cherry picking with some manual tweaking (hopefully possible from kdiff3) = necessary at times but still better than doing things by hand completely. I think you and I are in agreement. (Even Tom and I are in agreement on many things - I just didn't respond to his well thought out great posts, like the one that describes why back patching is often better than forward patching when having multiple parallel releases open at the same time) No amount of discussions where others say it works great and you say I don't believe you until you provide me with output is going to get anywhere. Well, I guess it can be frustrating for both sides. However, I think these discussions are worthwhile (and necessary) none the less. As not even those who highly appreciate merge algorithms (you and me, for example) are in agreement on how to use them (cherry-picking vs. merging) it doesn't surprise me that others are generally skeptic. We're in agreement on the merge algorithms I think. :-) That said, it is a large domain, and there is room for disagreement even between those with experience, and you are right that it shouldn't be surprising that others are generally sceptic. Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Nicolas Barbier nicolas.barb...@gmail.com: ISTM that back-patching I take this to mean back-patching by cherry picking. a change to a file that wasn't modified on the back-branch leads exactly to merging a change to a (file-wise) ancestor? Regarding the file's contents - and therefore the immediately visible result - that's correct. However, for a merge, the two ancestor revisions are stored, where as with cherry-pinging this information is lost (at least for git). So, trying to merge on top of a cherry-pick, git must merge these changes again (which might or might not work). Merging on top of merging works just fine. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
2009/6/7 Markus Wanner mar...@bluegap.ch: However, there's no special whitespace treatment. Nor anything remotely as clever as nearby variable renaming. There's no such magic, the developer still needs to tell the tool what he wants. If I understand correctly, nearby variable renaming refers to changes to the few lines surrounding the changes-to-be-merged. There is certainly supposed to be an advantage relative to diff/patch here: as all changes leading to both versions are known (up to some common ancestor), git doesn't need context lines to recognize the position in the file that is supposed to receive the updates. Example: Original file: a b c Random other changes later (a and c are updated to incorporate nearby variable renaming or somesuch): extra line a' b c' (Note that the extra line is important, because if the line numbers stay the same and the lines-to-update are exactly the same, patch could just ignore the context lines.) An update to line b yields: extra line a' b' c' This change would not be diff/patch-mergeable to the original file, because the context lines a' and c' wouldn't be found. Git is smarter than this and doesn't need the context lines; rather it uses the full history to determine that the change to line 3 becomes a change to line 2 in the original file. It therefore merges this change to yield: a b' c Disclaimer: I don't use git, but I assume that this is how all systems that are smarter than diff/patch work. Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Tom Lane wrote: I think it's already been made crystal clear that the people who actually do this work don't do it that way, and are uninterested in allowing their tools to force them to do it that way. That's well understood. Patching from HEAD back works better for us for a number of reasons, the main one being that HEAD is the version of the code that's most swapped into our awareness. Committing on the oldest back-branch first doesn't necessarily mean having to develop the patch there. However, so long as we can have a separate working copy per branch, I see no problem with preparing all the versions of a patch and then committing them back-to-front. That's what I think as well. However, I bet git could help a lot with creating all the versions of a patch in the first place. You don't *need* to use that feature, but preserving the option could help. What I'm not clear about is the mechanics for doing that. If you create each of the patches individually, there's not much magic required from git. It should be trivial to commit those as merges. Would someone explain exactly what the steps should be to produce the nicest-looking git history? I fear the cherry-picking approach creates the nicest-looking history (especially to the CVS trained eye). Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Andrew Dunstan wrote: Yeah, a requirement to work from the back branch forward is quite unacceptable IMNSHO. It's also quite unreasonable. The monotone page about daggy fixes does quite a good job in explaining why it is helpful. I think it's how to make best use of these tools. And it's obviously not the same as what worked well in practice with CVS. Out of interest, and not necessarily related to Postgres: why do you think it's unreasonable? Fixing the problem where it was introduced sounds like the most reasonable place to fix it, IMO. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Markus Wanner mar...@bluegap.ch writes: Out of interest, and not necessarily related to Postgres: why do you think it's unreasonable? Fixing the problem where it was introduced sounds like the most reasonable place to fix it, IMO. There are a number of possible reasons, but here are a few that hold for me: * I always prefer to isolate a bug in HEAD if possible. It's the version of the code that's most familiar at the moment, and there are often new features available that make it easier to test a problem. So that generally leads to formulating the fix in terms of the HEAD code first. After that you start to think about whether (some form of) the bug exists in back branches and how to fix those branches. * Experience has shown that later branches tend to have more places affected by an issue than older ones; eg you might need to touch four places to fix a bug now, but only three of those places exist in the older branches. ISTM you'd be far more likely to miss fixing the fourth place if you do your initial investigation and fixing/testing in the oldest affected branch. * We want HEAD to have the cleanest, most maintainable version of the fix. It's not infrequently the case that the most natural way of fixing a problem varies across branches --- for instance, there might be a helpful subroutine available in later branches. If you design the fix in terms of what works in the oldest branch that has the problem, you're more likely to come up with something that's suboptimal for later branches. For instance in the helpful-subroutine case, I'd be more likely to decide to back-port the subroutine along with the fix if I work from HEAD back than if I try to work the other way. * We are often willing to adopt a fairly invasive fix for HEAD, if that's what's needed to have a clean maintainable solution, and then look for a less invasive but klugy solution for the back branches. Approaching it the other way around would strongly encourage use of the kluge solution as a permanent fix. So there are a lot of good reasons to work backwards in patching. I don't believe that these would be outweighed by some advantage in the mechanics of applying an unchanging patch to multiple branches (especially since AFAICT the mechanical advantage would be pretty darn minimal anyhow). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Markus Wanner wrote: Hi, Andrew Dunstan wrote: Yeah, a requirement to work from the back branch forward is quite unacceptable IMNSHO. It's also quite unreasonable. The monotone page about daggy fixes does quite a good job in explaining why it is helpful. I think it's how to make best use of these tools. And it's obviously not the same as what worked well in practice with CVS. Out of interest, and not necessarily related to Postgres: why do you think it's unreasonable? Fixing the problem where it was introduced sounds like the most reasonable place to fix it, IMO. Half the trouble with this discussion is that it has not been related enough to how the Postgres project actually works IMNSHO. One fact to keep in mind is that, unlike most other FOSS projects, we keep quite a large number of branches live. If we don't remove one (and so far there is no great reason to that I know of) that number will be seven when we release 8.4. There is a huge benefit from this to the user community. It means that they can deploy Postgres with confidence that they will not have to upgrade for quite a few years. In the corporate world, especially, that is a major issue. I occasionally have clients running 7.4 or even older versions. Anyway, the large number of branches alone means that our patterns are unlikely to match those of other projects. The question we often face in backpatching is not where did it first occur? but how far back should we patch it?. Problems are almost always discovered near the top of the version list, overwhelmingly on the HEAD or most recent stable branches. So the way we work is not to try to develop a fix where the problem first occurred (which might not even be on a supported branch at all) but as high up the list as the problem goes (usually HEAD) and then work out how far down the list to apply the fix. And the notion that a fix of any complexity at all is going to be simply applicable across six or seven branches simply defies our experience. It almost never does. Frequently it won't apply cleanly from *any* one branch to another. Even fairly trivial patches can suffer from this: the pretty small plperl fixes I applied yesterday and the day before, required adjustment going from one branch to the previous one in about three out of five back branch cases. Sometimes these adjustments are small, sometimes they are quite large. So the idea that we can just create a fix on say, the 7.4 branch, and then just merge it forward nicely, is just fanciful in most cases, as well as being contrary to our methods of work. Most of this stuff is almost invisible to most of the community. But people like Tom work with it every day. And we want to keep Tom productive, right? ;-) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Andrew Dunstan and...@dunslane.net writes: [ most of a good summary omitted ] ... Even fairly trivial patches can suffer from this: the pretty small plperl fixes I applied yesterday and the day before, required adjustment going from one branch to the previous one in about three out of five back branch cases. Sometimes these adjustments are small, sometimes they are quite large. So the idea that we can just create a fix on say, the 7.4 branch, and then just merge it forward nicely, is just fanciful in most cases, as well as being contrary to our methods of work. I have heard it claimed that git is more intelligent than plain diff/patch and could successfully merge patches in cases that currently require manual adjustment of the sort Andrew describes. If that's really true to any significant extent, then it could represent a benefit large enough to persuade us to alter work flows (at least for simple patches that don't require significant rethinking across branches). However, I have yet to see any actual *evidence* in support of this claim. How robust is git about dealing with whitespace changes, nearby variable renamings, and such? Andrew's plperl patches would be an excellent small test case. Anybody want to try them against the experimental git repository and see if git does any better than plain patch? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Tom Lane wrote: I have heard it claimed that git is more intelligent than plain diff/patch and could successfully merge patches in cases that currently require manual adjustment of the sort Andrew describes. If that's really true to any significant extent, then it could represent a benefit large enough to persuade us to alter work flows (at least for simple patches that don't require significant rethinking across branches). However, I have yet to see any actual *evidence* in support of this claim. How robust is git about dealing with whitespace changes, nearby variable renamings, and such? Andrew's plperl patches would be an excellent small test case. Anybody want to try them against the experimental git repository and see if git does any better than plain patch Any revision control system should be able to do better than diff/patch as these systems have more information available to them. Normal GIT uses the relatively common 3-way merge based upon the most recent common ancestor algorithm. Assuming there is a most recent common ancestor that isn't file creation, it will have a better chance of doing the right thing. Systems such as ClearCase have had these capabilities for a long time. The difference with distributed version control systems is that they absolutely must work well, as every user has their own repository, and every repository represents a branch, therefore each user of the system is working on a different branch. The need for reliable merges goes up under a distributed version control system. Not to say GIT is truly best-in-class here, but it definitely has motivation to be and benefit of being better than diff/patch. These sorts of tools usually work with another tool such as kdiff3 to allow for only the conflicts the be resolved. If you set it up properly, you can have the automatic merges completely successful, and kdiff3 or similar can present you a graphical interface that allow you to identify and resolve the conflicts that require help. I've used these sorts of tools long enough to completely take them for granted now, and it feels painful to go back to anything more primitive. Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Mark Mielke m...@mark.mielke.cc writes: Tom Lane wrote: I have heard it claimed that git is more intelligent than plain diff/patch and could successfully merge patches in cases that currently require manual adjustment of the sort Andrew describes. ... However, I have yet to see any actual *evidence* in support of this claim. Any revision control system should be able to do better than diff/patch as these systems have more information available to them. Normal GIT uses the relatively common 3-way merge based upon the most recent common ancestor algorithm. Assuming there is a most recent common ancestor that isn't file creation, it will have a better chance of doing the right thing. And I still haven't seen any actual evidence. Could we have fewer undocumented assertions and more experimental evidence? Take Andrew's plperl patches and see if git does any better with them than plain patch does. (If it's not successful with that patch, it's pointless to try it on any bigger cases, I fear.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On Fri, Jun 5, 2009 at 4:37 PM, Tom Lanet...@sss.pgh.pa.us wrote: However, given that we don't do any real development on the back branches, it might be that trying to be smart about this is a waste of time anyway. Surely only the HEAD version of the patch is going to be something that other developers care about merging with. For what it's worth that's certainly not true. Any user maintaining a patched version of the source tree for production use will want to merge in any patches for older releases. For example anyone using the CONNECT BY patch with 8.3 will surely want to take any 8.3 patch releases. Of course EDB in particular has to maintain sources based on old patch releases as well as the current branch. That said, I don't see that this really affects the decision here. These devleopers will just merge in the patch as it was applied to the back branch anyways. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Tom Lane wrote: Any revision control system should be able to do better than diff/patch as these systems have more information available to them. Normal GIT uses the relatively common 3-way merge based upon the most recent common ancestor algorithm. Assuming there is a most recent common ancestor that isn't file creation, it will have a better chance of doing the right thing. And I still haven't seen any actual evidence. Could we have fewer undocumented assertions and more experimental evidence? Take Andrew's plperl patches and see if git does any better with them than plain patch does. (If it's not successful with that patch, it's pointless to try it on any bigger cases, I fear.) The plperl stuff is actually a tough case. In 7.4 we didn't have provision for two interpreters, so PERL_SYS_INIT3 is called unconditionally, and we didn't have a Windows port either, so the comment is also different. I guess that in itself illustrates the problems. I also entirely agree with your point about us being more kludgey and less invasive on back branches. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Tom Lane wrote: There are a number of possible reasons, but here are a few that hold for me: Thank you for this very good collection. I'm still wondering about what's the best way to represent this in git (or others). Cherry-picking is arguably the simplest variant. Maybe that can be combined with merging to preserve merge capability. I'll try that... So there are a lot of good reasons to work backwards in patching. Agreed and understood. However, there are good reasons for keeping merge capability between branches intact as well. I still hope we can get both somehow, if not, I'm certainly accepting that backward patching is more important. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Andrew Dunstan wrote: One fact to keep in mind is that, unlike most other FOSS projects, we keep quite a large number of branches live. So far I thought exactly that would be a good reason for migrating to something like git. Those claim to ease working on multiple branches in parallel, and in my experience that works pretty well. I'd like to find a good way to allow the Postgres project to make use of these features to ease development. It means that they can deploy Postgres with confidence that they will not have to upgrade for quite a few years. In the corporate world, especially, that is a major issue. I occasionally have clients running 7.4 or even older versions. I agree and appreciate that very much as well. The question we often face in backpatching is not where did it first occur? but how far back should we patch it?. Uh.. the difference here mostly being *when* the question comes up, right? Because the possible answers in 8.1 or back to 8.1 are pretty close. From what I understand now, you are saying here that you work on the patch and only after that question how far back to apply it. Note that working on the patch doesn't necessarily mean having to commit it on HEAD first. I seem to recall a script which has so far been used for CVS to do the multi-branch commits pretty much at the same time. Is that correct? the pretty small plperl fixes I applied yesterday and the day before, required adjustment going from one branch to the previous one in about three out of five back branch cases. I'll give these a try with one of the touted merge algorithms. I'm curious myself. Sometimes these adjustments are small, sometimes they are quite large. So the idea that we can just create a fix on say, the 7.4 branch, and then just merge it forward nicely, is just fanciful in most cases, as well as being contrary to our methods of work. Well, my experience with the Postgres-R patch has been different. However, that patch is probably not overly invasive. Most of this stuff is almost invisible to most of the community. The daily work maybe, yes. But not the end result, which is known as rock-solid. I certainly don't want to change that. ;-) Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Tom Lane wrote: How robust is git about dealing with whitespace changes, nearby variable renamings, and such? Monotone tracks changes line by line. I'm not sure about git. Kdiff3, which is used to do the manual merge, if necessary, uses some finer grained method, AFAIK. However, there's no special whitespace treatment. Nor anything remotely as clever as nearby variable renaming. There's no such magic, the developer still needs to tell the tool what he wants. However, I'd argue that monotone (as well as git) do an incredible job at remembering these decisions and merges, so you never need to do a manual merge twice. (Which I remember doing a lot with diff/patch, quilt or subversion). Andrew's plperl patches would be an excellent small test case. Anybody want to try them against the experimental git repository and see if git does any better than plain patch? I've given that patch a try under monotone (just because I happen to know that a lot better). The results should be the same as with git. I've started with the patch against 7.4 (which I know doesn't resemble the current workflow, but is sufficient for testing merging capabilities). Merging that to 8.0 worked without any conflicts. Although the result then differed from Andrew's work in that the variable dummy_perl_env is declared after the #ifdef WIN32 block as opposed to before in 7.4. The addition in the comment (notably on Windows) of course also didn't appear automatically. It merged from 8.0 to 8.1 without any conflicts, results were equal. Merging from 8.1 to 8.2 resulted in one merge conflict, because of the additional condition ('if (interp_state == INTERP_NONE)') that got added between 8.1 and 8.2. Merging from 8.2 to 8.3 and then to HEAD as well was conflict free again. The results differ in whitespace changes exclusively. So, three out of the five merges would have been equally perfect with automatic merging, while requiring only one single command, which could even be scripted, because it remains the same over time, i.e. for monotone it was something similar to: mtn propagate REL8_0_STABLE REL8_1_STABLE Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Tom Lane wrote: Any revision control system should be able to do better than diff/patch as these systems have more information available to them. Normal GIT uses the relatively common 3-way merge based upon the most recent common ancestor algorithm. Assuming there is a most recent common ancestor that isn't file creation, it will have a better chance of doing the right thing. And I still haven't seen any actual evidence. Could we have fewer undocumented assertions and more experimental evidence? Take Andrew's plperl patches and see if git does any better with them than plain patch does. (If it's not successful with that patch, it's pointless to try it on any bigger cases, I fear.) This comes to the theory vs profiling I suppose. I am a theory person - I run things in my head. To me, the concept of having more context to make the right decision, and an algorithm that takes advantage of this context to make the right decision, is simple and compelling on its own. Knowing the algorithms that are in use, including how it selects the most recent common ancestor gives me confidence. You have the capabilities to test things for yourself. If you have any questions, try it out. No amount of discussions where others say it works great and you say I don't believe you until you provide me with output is going to get anywhere. I could set up a few scenarios or grab actual patches and show you particular success cases and particular failure cases, but will you really believe it? Because you shouldn't. For all you know, I picked the cases I knew would work and put them up against the cases I knew would fail. I've used ClearCase for around 10 years now, and with the exception of cherry picking, it has very strong and mature merge support. We rely on merges being safe while managing many projects much larger than PostgreSQL. Many of the projects have hundreds of users working on them at the same time. CVS is *unusable* in these environments. Recently, however, in spite of investments into ClearCase, we are looking at GIT as providing *stronger* merge capabilities than ClearCase, specifically with regard to propagating changes from one release to another. I'm not going to pull up the last ten years of history and make it available to you. Nothing is going to prove this to you other than trying it out for yourself. People need to be burned by unreliable merge algorithms before they respect the value of a reliable merge algorithm. People need to experience reliable merging before they buy the product. If the theory doesn't work for you, you really are going to have to try it out for yourself. Or not. It doesn't matter to me. :-) In any case - you raised the question - I explained how it works - and you shot me done without any evidence of your own. I explained how it works. It's up to you to try it out for yourself and decide if you are a believer. Cheers, mark P.S. I'm only a bit insulted by these threads. There are a lot of sceptical people in the crowd who until now have raised questions which only make it clear that these people have not ever worked with a capable SCM system on a major project before. I really shouldn't hold this against you, which is why I continue to try and provide the theory and background, so that when you do give it a chance, it will all start to make sense. You'll try it out - find it works great - and wonder how does it do that? Then, hopefully you can go back to my post (or the many others who have tried to help out) and read how it works and say ah hah! excellent! -- Mark Mielke m...@mielke.cc
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Ron Mayer rm...@cheapcomplexdevices.com: Seems you'd want to do is create a new branch as close to the point where the bug was introduced - and then merge that forward into each of the branches. Thank you for pointing this out. As a fan of monotone I certainly know and like that way. However, for people who are used to CVS, lots of branching and merging quickly sound dangerous and messy. So I'd like to keep things as simple as possible while still keeping possibilities open for the future. Note that a requirement for daggy fixes is that the bug is fixed close to the point where it was introduced. So fixing it on the oldest stable branch that introduced a bug instead of fixing it on HEAD and then back-porting would certainly be a step into the right direction. And I think it would be sufficient in most cases. If not, we can still enhance that and used daggy fixes later on (as long as we have a conversion that allows merging, that is). Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Markus Wanner mar...@bluegap.ch writes: Note that a requirement for daggy fixes is that the bug is fixed close to the point where it was introduced. So fixing it on the oldest stable branch that introduced a bug instead of fixing it on HEAD and then back-porting would certainly be a step into the right direction. I think it's already been made crystal clear that the people who actually do this work don't do it that way, and are uninterested in allowing their tools to force them to do it that way. Patching from HEAD back works better for us for a number of reasons, the main one being that HEAD is the version of the code that's most swapped into our awareness. However, so long as we can have a separate working copy per branch, I see no problem with preparing all the versions of a patch and then committing them back-to-front. What I'm not clear about is the mechanics for doing that. Would someone explain exactly what the steps should be to produce the nicest-looking git history? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Tom Lane wrote: Markus Wanner mar...@bluegap.ch writes: Note that a requirement for daggy fixes is that the bug is fixed close to the point where it was introduced. So fixing it on the oldest stable branch that introduced a bug instead of fixing it on HEAD and then back-porting would certainly be a step into the right direction. I think it's already been made crystal clear that the people who actually do this work don't do it that way, and are uninterested in allowing their tools to force them to do it that way. Patching from HEAD back works better for us for a number of reasons, the main one being that HEAD is the version of the code that's most swapped into our awareness. Yeah, a requirement to work from the back branch forward is quite unacceptable IMNSHO. It's also quite unreasonable. The tool is there to help, not to force an unnatural work pattern on us. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On Fri, Jun 5, 2009 at 9:38 AM, Tom Lanet...@sss.pgh.pa.us wrote: Markus Wanner mar...@bluegap.ch writes: Note that a requirement for daggy fixes is that the bug is fixed close to the point where it was introduced. So fixing it on the oldest stable branch that introduced a bug instead of fixing it on HEAD and then back-porting would certainly be a step into the right direction. I think it's already been made crystal clear that the people who actually do this work don't do it that way, and are uninterested in allowing their tools to force them to do it that way. Patching from HEAD back works better for us for a number of reasons, the main one being that HEAD is the version of the code that's most swapped into our awareness. However, so long as we can have a separate working copy per branch, I see no problem with preparing all the versions of a patch and then committing them back-to-front. What I'm not clear about is the mechanics for doing that. Would someone explain exactly what the steps should be to produce the nicest-looking git history? I'm sure someone is going to come in here and again recommend merging, but I'm going to again recommend not merging. Cherry-picking is the way to go here. Or just commit to each branch completely separately with the same commit message; cherry-pick at least IMO is just a convenience to help you attempt to apply the patch to a different branch. The way you're using commit messages to construct the release notes really puts a limits on what the history has to look like. I think it would be good to find a better way to generate release notes that isn't quite so dependent on having a very tight history, but even if we do that I think in this particular situation cherry-picking is going to be less work for the committers than any of the other options that have been proposed. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Robert Haas robertmh...@gmail.com writes: I'm sure someone is going to come in here and again recommend merging, but I'm going to again recommend not merging. Cherry-picking is the way to go here. Or just commit to each branch completely separately with the same commit message; cherry-pick at least IMO is just a convenience to help you attempt to apply the patch to a different branch. Commit to each branch separately is surely the closest analog to what we have done historically. What I'm trying to understand is whether there's an easy variant on that that'd expose the related-ness of the patch versions in a way git understands, hopefully giving us more ability to leverage git's capabilities in future. However, given that we don't do any real development on the back branches, it might be that trying to be smart about this is a waste of time anyway. Surely only the HEAD version of the patch is going to be something that other developers care about merging with. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On Fri, Jun 5, 2009 at 11:37 AM, Tom Lanet...@sss.pgh.pa.us wrote: However, given that we don't do any real development on the back branches, it might be that trying to be smart about this is a waste of time anyway. Surely only the HEAD version of the patch is going to be something that other developers care about merging with. I think that's about right. I think there would be some benefit in developning better tools - release notes seem to be the main issue - so that, for example, if I develop a complex feature and you think my code is great (ok, now I'm dreaming), you could actually merge my commits rather than flattening them. The EXPLAIN stuff I'm working on right now is a good example where it's a lot easier to review the changes piece by piece rather than as a big unit, but I know you won't want to commit it that way because (1) with CVS, it would be a lot more work to do that, and (2) it would suck a lot of extra commits into the data you use to generate release notes, thereby making that process more complex. I'm actually going to the trouble of trying to make sure that each of my commits does one and only one thing that can be separately checked, tested, and either accepted (hopefully) or rejected (hopefully not). Hopefully, that will still help with reviewing, but then if you commit it, it'll probably go in as one stomping commit that changes the world, or at most as two or three commits that are all still pretty big. There are certainly cases where big stomping commits are good (I have them in my own projects, too, and branches with long histories of little dumb commits regularly get squashed and rebased before merging) but I think it would be nice to have other options. (As a side benefit, if one of my little micro-commits turns out to have a bug, you can easily revert *just that commit*, without having to manually sort out exactly which pieces related to that change.) ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Robert Haas robertmh...@gmail.com writes: [ about micro commits ] (As a side benefit, if one of my little micro-commits turns out to have a bug, you can easily revert *just that commit*, without having to manually sort out exactly which pieces related to that change.) I don't actually have a lot of faith in such an approach. My experience is that bugs arise from unforeseen interactions of changes, and that backing out just one isn't a useful thing to do, even if none of the later parts of the patch directly depend on it. So, yeah, presenting a patch as a series of edits can be useful for review purposes, but I'm not at all excited about cluttering the long-term project history with a zillion micro-commits. One of the things I find most annoying about reviewing the current commit history is that Bruce has taken a micro-commit approach to managing the TODO list --- I was seldom so happy as the day that disappeared from CVS, because of the ensuing reduction in noise level. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On Fri, Jun 5, 2009 at 12:15 PM, Tom Lanet...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: [ about micro commits ] (As a side benefit, if one of my little micro-commits turns out to have a bug, you can easily revert *just that commit*, without having to manually sort out exactly which pieces related to that change.) I don't actually have a lot of faith in such an approach. My experience is that bugs arise from unforeseen interactions of changes, and that backing out just one isn't a useful thing to do, even if none of the later parts of the patch directly depend on it. So, yeah, presenting a patch as a series of edits can be useful for review purposes, but I'm not at all excited about cluttering the long-term project history with a zillion micro-commits. One of the things I find most annoying about reviewing the current commit history is that Bruce has taken a micro-commit approach to managing the TODO list --- I was seldom so happy as the day that disappeared from CVS, because of the ensuing reduction in noise level. I've never even noticed that noise, even when reviewing older history. The power of git log to get you exactly the commits you care about is not to be underestimated. With regard to micro-commits, I don't have hugely strong feelings on the issue. I like them in certain situations, and I think that git makes it feasible to use them that way if you want to; but if you don't want to, I don't think that's a disaster either. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: [ about micro commits ] (As a side benefit, if one of my little micro-commits turns out to have a bug, you can easily revert *just that commit*, without having to manually sort out exactly which pieces related to that change.) I don't actually have a lot of faith in such an approach. My experience is that bugs arise from unforeseen interactions of changes, and that backing out just one isn't a useful thing to do, even if none of the later parts of the patch directly depend on it. So, yeah, presenting a patch as a series of edits can be useful for review purposes, but I'm not at all excited about cluttering the long-term project history with a zillion micro-commits. One of the things I find most annoying about reviewing the current commit history is that Bruce has taken a micro-commit approach to managing the TODO list --- I was seldom so happy as the day that disappeared from CVS, because of the ensuing reduction in noise level. Yea, that was a problem that is now fixed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Andrew Dunstan and...@dunslane.net [090605 13:55]: Yeah, a requirement to work from the back branch forward is quite unacceptable IMNSHO. It's also quite unreasonable. The tool is there to help, not to force an unnatural work pattern on us. Again, just to make it clear, git isn't going to *force* anyone to drastically change their workflow. For people who want to keep a separate working directory per branch, and just work on them as independently as they do with CVS, *nothing* is going to have to change, except the possible git push step required to actually publish your committed changes... But, if you want, you could just also have a post-commit hook that will do that push for you too, and you just don't commit until you're sure (a-la-cvs-style): cvs update === git stash save git pull git stash apply cvs commit === git commit -a git push The git stash is because git won't pull/merge remote work into a dirty workdir... This is the classic conflict CVS mess that git avoids, and then allows you to use all it's powerful merge machinery to merge any of your stashed local changes back into what you've just pulled. But I have a feeling that as people (specifically the comitters) get slowly introduced and exposed to some of the more advanced things git lets you do, and as you get comfortable with using it, people will *want* to start altering how they do thing, simply because they start to find out that git really allows them to do what they really want, rather than what they have thought they want because they've been so brainwashed by CVS... ;-) -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Aidan Van Dyk wrote: * Andrew Dunstan and...@dunslane.net [090605 13:55]: Yeah, a requirement to work from the back branch forward is quite unacceptable IMNSHO. It's also quite unreasonable. The tool is there to help, not to force an unnatural work pattern on us. Again, just to make it clear, git isn't going to *force* anyone to drastically change their workflow. My reaction was against someone saying in effect don't work that way, work this way. So make your argument to that person ;-) [...] I have a feeling that as people (specifically the comitters) get slowly introduced and exposed to some of the more advanced things git lets you do, and as you get comfortable with using it, people will *want* to start altering how they do thing, simply because they start to find out that git really allows them to do what they really want, rather than what they have thought they want because they've been so brainwashed by CVS... The whole point is that we want something better *that suits our work patterns*. Almost all the backpatching that gets done is by the committers. So we have a bunch of concerns that are not relevant to that vast majority of developers. In particular, it would be nice to be able to make a bunch of changes on different branches and then commit it all in one hit. If that's possible, then well and good. If it's not, that's a pity. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Andrew Dunstan and...@dunslane.net [090605 14:41]: The whole point is that we want something better *that suits our work patterns*. Almost all the backpatching that gets done is by the committers. So we have a bunch of concerns that are not relevant to that vast majority of developers. In particular, it would be nice to be able to make a bunch of changes on different branches and then commit it all in one hit. If that's possible, then well and good. If it's not, that's a pity. My only concern is that I am seeing 2 requirements emerge: 1) Everything has to work as it currently does with CVS 2) We want better information about how patches relate for possible future stuff Unfortunately, those 2 requirements are conflicting... If you (not anyone personally, but the more general PostgreSQL committer) want the repository to properly track the fixes and show their relationship, and extra through all the branches than you really do want the branch-to-fix and merge the fix forward into all your STABLE/master branches, like the daggy type thing mentioned elsewhere... But notice, that is *very* different from the current work patterns based on the CVS model where everything is completely independent (save the commit message), and it's a huge change to the way developers work. If you want to stay with the current CVS style, then you aren't going to get any closer than commit messages matching (or possibly a reference to another commit as an extra line) that we currently have with CVS. My suggestion is to keep it simple. Just work independently, like you currently do. You don't want every committer to have to completely learn the advanced features of a new tool just to use it... You can use it as you use the less feature-full tool as you learn all the features... But as people start to use the new tool, and start to use it's more advanced features, then it's natural that their results will start to be reflected the main repository. But insisting that people currently comfortable and proficient in the current work patterns *have* to learn completely new ones for a flag-day type switch and start using them immediately is going to: * Piss them off * Create great ill-will against the tool And neither of those will be the fault of the tool itself, but of the way a new process was forced in conjunction with a new tool... I don't want to see the PG project trying to *force* a radical change in the way the development/branches currently work at the same time as a switch to git. Replace the tool, and allow the current processes and work-flows to gradually improve. The process and work-flow improvements will be an iterative and collaborative process, just like the actual code improvements, where huge radical patches are generally frowned upon. I've used git for a long time, on many different projects. I do know how radically it *can* change the process, and how much more efficient and natural the improved processes can be. But the change is not an overnight change. And it's not going to happen unless the people needing to change *see* it's benefits. And that's going to take time and experience with the new tool... Anyways, I said previously that I was over with this thread, but now I mean it ;-) If someone want specific git information or help, I'm available. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Greg Stark st...@enterprisedb.com: This is all completely irrelevant to the CVS import. To the CVS import it is, yes. After all, CVS has no notion of renaming files. But my example is about renaming with git *after* the conversion. Git *does* support renaming (to some extent). However, it fails as explained if you feed it with corrupt data (the corruption being the missing link between the two added files - after a rename, git simply has no chance of knowing it should be the same file). I don't think we've ever renamed files because CVS can't handle it cleanly. Yes, that applies to the past. But I think we *are* going to rename files *after* the switch, because git *can* handle it cleanly - given a correct import. If that defect would only affect historic information, I'd not be half as pestering as I am. But it's such delayed effects which might surprise you years after the cause, which make me nervous. It does sound to me like we really ought to have merge commits marking the bug fixes in old releases as merged in the equivalent commits to later branches based on Tom's commit messages. Now, I don't know how you got to that conclusion, but I absolutely agree ;-) Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On 4 Jun 2009, at 09:11, Markus Wanner mar...@bluegap.ch wrote: Hi, Quoting Greg Stark st...@enterprisedb.com: This is all completely irrelevant to the CVS import. To the CVS import it is, yes. After all, CVS has no notion of renaming files. But my example is about renaming with git *after* the conversion. Git *does* support renaming (to some extent). However, it fails as explained if you feed it with corrupt data (the corruption being the missing link between the two added files - after a rename, git simply has no chance of knowing it should be the same file). Hmm. I see. I'm not sure we've ever added files to back branches either. I'm less sure of that though. I don't think we've ever renamed files because CVS can't handle it cleanly. Yes, that applies to the past. But I think we *are* going to rename files *after* the switch, because git *can* handle it cleanly - given a correct import. If that defect would only affect historic information, I'd not be half as pestering as I am. But it's such delayed effects which might surprise you years after the cause, which make me nervous. It does sound to me like we really ought to have merge commits marking the bug fixes in old releases as merged in the equivalent commits to later branches based on Tom's commit messages. Now, I don't know how you got to that conclusion, but I absolutely agree ;-) Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Greg Stark greg.st...@enterprisedb.com: Hmm. I see. I'm not sure we've ever added files to back branches either. I'm less sure of that though. We did from time to time. Every merge commit in my current conversion contains at least one such file that got added as part of a back patch. The perl file mentioned in the example upstream is one of them. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Tom Lane t...@sss.pgh.pa.us: BTW, Markus: you do realize thomas is not me but Tom Lockhart? Uh.. thanks, that name has fallen through the cracks, before. I've added it now, it will be included in the next sample conversion. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Marko Kreen mark...@gmail.com: I'm not sure whether we should mark the old branches getting merges down or the new branches getting merged up. I suspect I'm missing something but I don't see any reason one is better than the other. As pointed out by others, it doesn't make sense to merge (all commits since the last merge) from HEAD to the back branches. You'd have to cherry-pick only the commits which actually have to get back patched. The new branches getting merged up could work. That is, applying the fix to the oldest back-branch which requires the fix first and then merge it to all newer ones, including HEAD. However, that would require some rethinking: instead of creating bugfix-patches for HEAD, then manually adjust patches for back-branches and then group committing, you'd have to create a bugfix-patch for the oldest branch first, commit that and then merge that to the newer branches. I consider merging a cleaner and simpler operation than cherry-picking, because merging allows the VCS to keep track of what needs to be propagated, while with cherry-picking, you'd have to keep track of that manually (or with the help of other tools). An example for that is the very same unability to properly track renames when cherry-picking, just like what I explained for the CVS conversion. It seems to require noticeable development effort to get a importer to a level it can do it. Will this be a requirement for import? Or just a good thing to have? Also how to check if all such merges are sensible? If that's how you'd like to have the CVS repository represented in git (which I'd support as well), I'd give it a try. With all of the work I've done for mtn cvs_import I certainly have the necessary experience in CVS conversion and with the cvs2svn algorithm itself. And note that such effort will affect only old imported history, it will not make easier to handle back-branch fixes in the future... Hm.. depends, if you want to merge from older branches to newer ones, instead of cherry-picking, it would certainly help to get the history clean. Various scenarios with git cherry-pick and similar tools would still result in duplicate commits, so we would need a git log post-processor anyway if we want to somehow group them together for eg. weekly commit summary. And such post-processor would work on old history too. I think we should decide on either using merges or using duplicate commits we try to link somehow. But then, we should IMO use that scheme for the conversion as well as later on, so as not to get a messy history, as you put it. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Markus Wanner wrote: The new branches getting merged up could work. That is, applying the fix to the oldest back-branch which requires the fix first and then merge it to all newer ones, including HEAD. However, that would require some rethinking: instead of creating bugfix-patches for HEAD, then manually adjust patches for back-branches and then group committing, you'd have to create a bugfix-patch for the oldest branch first, commit that and then merge that to the newer branches. That sounds a bit dangerous too, since I imagine there are some changes in the old release branches you wouldn't want merged into the newest releases (say, code affecting sections that got redesigned). Seems you'd want to do is create a new branch as close to the point where the bug was introduced - and then merge that forward into each of the branches. This concept was mentioned in a page linked earlier in the thread[1] and seems like the way monotone recommends people use their system[2]. See that page for more reasons why they think it's good. [1]http://archives.postgresql.org/pgsql-hackers/2009-06/msg00153.php [2]http://www.monotone.ca/wiki/DaggyFixes/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Marko Kreen mark...@gmail.com: The example was not actual case from Postgres CVS history, but hypotetical situation without checking if it already works with GIT. Of course it is a simplified example, but it resembles what could happen i.e. to the file doc/src/sgml/generate_history.pl, which got added from a backported patch after forking off REL8_3_STABLE. If you create separate commits during the conversion, rename that file on the master branch and then - for whatever reason - try to merge the two branches, you will end up having that file twice. That's what I'm warning about. Changes on either or both sides of the merge make the situation worse. Merging between branches with GIT is fine workflow in the future. Do you consider the above scenario a fine merge? My point is that we should avoid fake merges, to avoid obfuscating history. Understood. It looks like I'm pretty much the only one who cares more about merge capability than nice looking history :-( Attached is my current options file for cvs2git, it includes requested changes by Alvaro and additional names and emails as given by Tom (thanks again). A current conversion with cvs2git (and with the merges) results in a repository with exactly 0 differences against any branch or tag symbol compared to cvs checkout -kk. Regards Markus Wanner # (Be in -*- mode: python; coding: utf-8 -*- mode.) import re from cvs2svn_lib import config from cvs2svn_lib import changeset_database from cvs2svn_lib.common import CVSTextDecoder from cvs2svn_lib.log import Log from cvs2svn_lib.project import Project from cvs2svn_lib.git_revision_recorder import GitRevisionRecorder from cvs2svn_lib.git_output_option import GitRevisionMarkWriter from cvs2svn_lib.git_output_option import GitOutputOption from cvs2svn_lib.revision_manager import NullRevisionRecorder from cvs2svn_lib.revision_manager import NullRevisionExcluder from cvs2svn_lib.fulltext_revision_recorder \ import SimpleFulltextRevisionRecorderAdapter from cvs2svn_lib.rcs_revision_manager import RCSRevisionReader from cvs2svn_lib.cvs_revision_manager import CVSRevisionReader from cvs2svn_lib.checkout_internal import InternalRevisionRecorder from cvs2svn_lib.checkout_internal import InternalRevisionExcluder from cvs2svn_lib.checkout_internal import InternalRevisionReader from cvs2svn_lib.symbol_strategy import AllBranchRule from cvs2svn_lib.symbol_strategy import AllTagRule from cvs2svn_lib.symbol_strategy import BranchIfCommitsRule from cvs2svn_lib.symbol_strategy import ExcludeRegexpStrategyRule from cvs2svn_lib.symbol_strategy import ForceBranchRegexpStrategyRule from cvs2svn_lib.symbol_strategy import ForceTagRegexpStrategyRule from cvs2svn_lib.symbol_strategy import ExcludeTrivialImportBranchRule from cvs2svn_lib.symbol_strategy import ExcludeVendorBranchRule from cvs2svn_lib.symbol_strategy import HeuristicStrategyRule from cvs2svn_lib.symbol_strategy import UnambiguousUsageRule from cvs2svn_lib.symbol_strategy import HeuristicPreferredParentRule from cvs2svn_lib.symbol_strategy import SymbolHintsFileRule from cvs2svn_lib.symbol_transform import ReplaceSubstringsSymbolTransform from cvs2svn_lib.symbol_transform import RegexpSymbolTransform from cvs2svn_lib.symbol_transform import IgnoreSymbolTransform from cvs2svn_lib.symbol_transform import NormalizePathsSymbolTransform from cvs2svn_lib.property_setters import AutoPropsPropertySetter from cvs2svn_lib.property_setters import CVSBinaryFileDefaultMimeTypeSetter from cvs2svn_lib.property_setters import CVSBinaryFileEOLStyleSetter from cvs2svn_lib.property_setters import CVSRevisionNumberSetter from cvs2svn_lib.property_setters import DefaultEOLStyleSetter from cvs2svn_lib.property_setters import EOLStyleFromMimeTypeSetter from cvs2svn_lib.property_setters import ExecutablePropertySetter from cvs2svn_lib.property_setters import KeywordsPropertySetter from cvs2svn_lib.property_setters import MimeMapper from cvs2svn_lib.property_setters import SVNBinaryFileKeywordsPropertySetter Log().log_level = Log.NORMAL ctx.revision_recorder = SimpleFulltextRevisionRecorderAdapter( CVSRevisionReader(cvs_executable=r'cvs'), GitRevisionRecorder('cvs2git-tmp/git-blob.dat'), ) ctx.revision_excluder = NullRevisionExcluder() ctx.revision_reader = None ctx.sort_executable = r'sort' ctx.trunk_only = False ctx.cvs_author_decoder = CVSTextDecoder( ['ascii', 'latin1'], ) ctx.cvs_log_decoder = CVSTextDecoder( ['ascii', 'latin1'], ) ctx.cvs_filename_decoder = CVSTextDecoder( ['ascii', 'latin1'], ) ctx.initial_project_commit_message = ( 'Standard project directories initialized by cvs2git.' ) ctx.post_commit_message = ( 'This commit was generated by cvs2git to track changes on a CVS ' 'vendor branch.' ) ctx.symbol_commit_message = ( This commit was manufactured by cvs2git to create %(symbol_type)s '%(symbol_name)s'. ) ctx.decode_apple_single = False
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On Wed, Jun 3, 2009 at 12:10 PM, Markus Wanner mar...@bluegap.ch wrote: If you create separate commits during the conversion, rename that file on the master branch This is all completely irrelevant to the CVS import. I don't think we've ever renamed files because CVS can't handle it cleanly. It does sound to me like we really ought to have merge commits marking the bug fixes in old releases as merged in the equivalent commits to later branches based on Tom's commit messages. That would make the git history match Tom's same commit message implicit CVS history that cvs2pcl was giving him. I find git-log's output including merge commits kind of strange and annoying myself but having them at least gives us a chance to have a tool that understands them output something like cvs2pcl. Throwing away that information because we don't like the clutter in the tool output seems like a short-sighted plan. That said, the commit log message isn't being lost. We could always import the history linearly and add the merge commits later if we decide having them would help some tool implement cvs2pcl summaries. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, On 06/03/2009 02:08 PM, Greg Stark wrote: On Wed, Jun 3, 2009 at 12:10 PM, Markus Wannermar...@bluegap.ch wrote: That would make the git history match Tom's same commit message implicit CVS history that cvs2pcl was giving him. I find git-log's output including merge commits kind of strange and annoying myself but having them at least gives us a chance to have a tool that understands them output something like cvs2pcl. git log --no-merges hides the actual merge commits if that is what you want. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On Wed, Jun 3, 2009 at 1:19 PM, Andres Freund and...@anarazel.de wrote: git log --no-merges hides the actual merge commits if that is what you want. Ooh! Life seems so much sweeter now! Given that we don't have to see them then I'm all for marking bug fix patches which were applied to multiple branches as merges. That seems like it would make it easier for tools like gitk or to show useful information analogous to the cvs2pcl info. Given that Tom's been intentionally marking the commits with identical commit messages we ought to be able to find *all* of them and mark them properly. That would be way better than only finding patches that are absolutely identical. I'm not sure whether we should mark the old branches getting merges down or the new branches getting merged up. I suspect I'm missing something but I don't see any reason one is better than the other. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Greg Stark wrote: On Wed, Jun 3, 2009 at 1:19 PM, Andres Freund and...@anarazel.de wrote: git log --no-merges hides the actual merge commits if that is what you want. Ooh! Life seems so much sweeter now! Given that we don't have to see them then I'm all for marking bug fix patches which were applied to multiple branches as merges. That seems like it would make it easier for tools like gitk or to show useful information analogous to the cvs2pcl info. Right, if it adds additional metadata that lets the tools do their magic better, and it's still easy to filter out, I don't see a downside. Given that Tom's been intentionally marking the commits with identical commit messages we ought to be able to find *all* of them and mark them properly. That would be way better than only finding patches that are absolutely identical. Just to be clear, not just Tom. All committers. I was told to do that right after my first backpatch which *didn't* do it :-) So it's an established project practice. That has other advantages as well, of course.. I'm not sure whether we should mark the old branches getting merges down or the new branches getting merged up. I suspect I'm missing something but I don't see any reason one is better than the other. If you go from older to newer, the automatic merge algorithms have a better chance of doing something smart since they can track previous changes. At least I think that's how it works. But I think for most of the changes it wouldn't make a huge difference, though - manual merging would be needed anyway. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On 6/3/09, Greg Stark st...@enterprisedb.com wrote: On Wed, Jun 3, 2009 at 1:19 PM, Andres Freund and...@anarazel.de wrote: git log --no-merges hides the actual merge commits if that is what you want. Ooh! Life seems so much sweeter now! Given that we don't have to see them then I'm all for marking bug fix patches which were applied to multiple branches as merges. That seems like it would make it easier for tools like gitk or to show useful information analogous to the cvs2pcl info. Given that Tom's been intentionally marking the commits with identical commit messages we ought to be able to find *all* of them and mark them properly. That would be way better than only finding patches that are absolutely identical. I'm not sure whether we should mark the old branches getting merges down or the new branches getting merged up. I suspect I'm missing something but I don't see any reason one is better than the other. Although mark Tom's back-branch fixes as merges makes much more sense than mark new files as merges, it is quite a step up from do tags match official releases. It seems to require noticeable development effort to get a importer to a level it can do it. Will this be a requirement for import? Or just a good thing to have? Also how to check if all such merges are sensible? And note that such effort will affect only old imported history, it will not make easier to handle back-branch fixes in the future... Various scenarios with git cherry-pick and similar tools would still result in duplicate commits, so we would need a git log post-processor anyway if we want to somehow group them together for eg. weekly commit summary. And such post-processor would work on old history too. Maybe that's better direction to work on, than to potentially risk in messy history in GIT? -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On Wed, Jun 3, 2009 at 10:13 AM, Magnus Hagander mag...@hagander.net wrote: Greg Stark wrote: On Wed, Jun 3, 2009 at 1:19 PM, Andres Freund and...@anarazel.de wrote: git log --no-merges hides the actual merge commits if that is what you want. Ooh! Life seems so much sweeter now! Given that we don't have to see them then I'm all for marking bug fix patches which were applied to multiple branches as merges. That seems like it would make it easier for tools like gitk or to show useful information analogous to the cvs2pcl info. Right, if it adds additional metadata that lets the tools do their magic better, and it's still easy to filter out, I don't see a downside. I'm not sure whether we should mark the old branches getting merges down or the new branches getting merged up. I suspect I'm missing something but I don't see any reason one is better than the other. If you go from older to newer, the automatic merge algorithms have a better chance of doing something smart since they can track previous changes. At least I think that's how it works. But I think for most of the changes it wouldn't make a huge difference, though - manual merging would be needed anyway. In practice, isn't it more likely that you would develop the change on the newest branch and then try to back-port it? However you do the import, you're going to want to do subsequent things the same way. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On 6/3/09, Magnus Hagander mag...@hagander.net wrote: Robert Haas wrote: On Wed, Jun 3, 2009 at 10:13 AM, Magnus Hagander mag...@hagander.net wrote: I'm not sure whether we should mark the old branches getting merges down or the new branches getting merged up. I suspect I'm missing something but I don't see any reason one is better than the other. If you go from older to newer, the automatic merge algorithms have a better chance of doing something smart since they can track previous changes. At least I think that's how it works. But I think for most of the changes it wouldn't make a huge difference, though - manual merging would be needed anyway. In practice, isn't it more likely that you would develop the change on the newest branch and then try to back-port it? However you do the import, you're going to want to do subsequent things the same way. That's definitely the order in which *I* work, and I think that's how most others do it as well. Thats true, but it's not representable in VCS, unless you use cherry-pick, which is just UI around patch transport. But considering separate local trees (with can optionally contain local per-fix branches), it is possible to separate the fix-developement from final representation. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Magnus Hagander mag...@hagander.net [090603 10:13]: Right, if it adds additional metadata that lets the tools do their magic better, and it's still easy to filter out, I don't see a downside. Note, that it could (and likely will) have a downside when you get to doing real merge-based development... A merge means that *all* changes in *both* parents have been combined in *this* commit. And all merge tools depend on this. That's the directed part of the DAG in git. So if you want to be working in a way that the merge tools work, you *don't* have master/HEAD merged into REL8_2_STABLE. You can have REL8_2_STABLE merged into master/head. I'll concede that in GIT, it's flexible (some say arbitrary) enough that you can *construct* the DAG otherwise, but then you've done something in such a fashion that the DAG has no bearing on real merging, and thus you loose all the power of DAGs merge tracking when working on new real merging a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Marko Kreen mark...@gmail.com [090603 10:26]: Thats true, but it's not representable in VCS, unless you use cherry-pick, which is just UI around patch transport. But considering separate local trees (with can optionally contain local per-fix branches), it is possible to separate the fix-developement from final representation. I'll note that in git, cherry-pick is *more* than just patch transport. I would more call it patch commute. It does actually look at the history between the picked patch, and the current tree, any merge/fork points, and the differences on each path that lead to the changes in the current tree and the picked patch. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Robert Haas wrote: On Wed, Jun 3, 2009 at 10:13 AM, Magnus Hagander mag...@hagander.net wrote: I'm not sure whether we should mark the old branches getting merges down or the new branches getting merged up. I suspect I'm missing something but I don't see any reason one is better than the other. If you go from older to newer, the automatic merge algorithms have a better chance of doing something smart since they can track previous changes. At least I think that's how it works. But I think for most of the changes it wouldn't make a huge difference, though - manual merging would be needed anyway. In practice, isn't it more likely that you would develop the change on the newest branch and then try to back-port it? However you do the import, you're going to want to do subsequent things the same way. That's definitely the order in which *I* work, and I think that's how most others do it as well. -- Magnus Hagander Self: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On Wed, Jun 3, 2009 at 10:20 AM, Marko Kreen mark...@gmail.com wrote: Various scenarios with git cherry-pick and similar tools would still result in duplicate commits, so we would need a git log post-processor anyway if we want to somehow group them together for eg. weekly commit summary. And such post-processor would work on old history too. Maybe that's better direction to work on, than to potentially risk in messy history in GIT? I think it is. cherry-picking seems like a much better way of back-patching than merging, so putting a lot of effort into making merges work doesn't seem like a good expenditure of effort. It seems pretty clear that searching through the histories of each branch for duplicate commit messages and producing a unified report is pretty straightforward if we assume that the commit messages are byte-for-byte identical (or even modulo whitespace changes). But I wonder if it would make more sense to include some kind of metadata in the commit message (or some other property of the commit? does git support that?) to make it not depend on that. I suppose Tom et. al. like the way they do it now, so maybe we should just stick with text comparison, but it seems a bit awkward to me. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On 6/3/09, Aidan Van Dyk ai...@highrise.ca wrote: * Marko Kreen mark...@gmail.com [090603 10:26]: Thats true, but it's not representable in VCS, unless you use cherry-pick, which is just UI around patch transport. But considering separate local trees (with can optionally contain local per-fix branches), it is possible to separate the fix-developement from final representation. I'll note that in git, cherry-pick is *more* than just patch transport. I would more call it patch commute. It does actually look at the history between the picked patch, and the current tree, any merge/fork points, and the differences on each path that lead to the changes in the current tree and the picked patch. Well, thats good to know, but this also seems to mean it's rather bad tool for back-patching, as you risk including random unwanted commits too that happened in the HEAD meantime. But also, it's very good tool for forward-patching. But my point was not about that - rather I was pointing out that this patch-commute will result in duplicate commits, that have no ties in DAG. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Marko Kreen mark...@gmail.com [090603 11:12]: Well, thats good to know, but this also seems to mean it's rather bad tool for back-patching, as you risk including random unwanted commits too that happened in the HEAD meantime. But also, it's very good tool for forward-patching. It doesn't pull in commits in the sense that darcs does... But rather, its more like the patch changes $XXX in $file, but that $file was really $old_file at the common point between the 2 commits, and $old_file is still $old file in the commit I'm trying to apply the patch to. It looks at the history of the changes to figure out why (or why not) they apply, and see if they should still be applied to the same file, or another file (in case of a rename/moved file in 1 branch), or if the changed area has been moved drastically in the file in one branch, and the change should be applied there instead. But my point was not about that - rather I was pointing out that this patch-commute will result in duplicate commits, that have no ties in DAG. Yes. That's a cherry-pick, if you want a merge, you merge ;-) But merge carries the baggage of expectation that *all* changes in both parents have been combined. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On 6/3/09, Aidan Van Dyk ai...@highrise.ca wrote: * Marko Kreen mark...@gmail.com [090603 11:12]: Well, thats good to know, but this also seems to mean it's rather bad tool for back-patching, as you risk including random unwanted commits too that happened in the HEAD meantime. But also, it's very good tool for forward-patching. It doesn't pull in commits in the sense that darcs does... But rather, its more like the patch changes $XXX in $file, but that $file was really $old_file at the common point between the 2 commits, and $old_file is still $old file in the commit I'm trying to apply the patch to. It looks at the history of the changes to figure out why (or why not) they apply, and see if they should still be applied to the same file, or another file (in case of a rename/moved file in 1 branch), or if the changed area has been moved drastically in the file in one branch, and the change should be applied there instead. I'm not certain, but I remember using cherry pick and seeing several commits in result. This seems to be a point that needs to be checked. But my point was not about that - rather I was pointing out that this patch-commute will result in duplicate commits, that have no ties in DAG. Yes. That's a cherry-pick, if you want a merge, you merge ;-) But merge carries the baggage of expectation that *all* changes in both parents have been combined. But in forward-merge case it's true. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Marko Kreen mark...@gmail.com [090603 11:28]: I'm not certain, but I remember using cherry pick and seeing several commits in result. This seems to be a point that needs to be checked. I'm not sure what you're recalling, but git cherry-pick takes a single commit, and applies it as a single commit (or, with -n, doesn't actually commit it). That's what it does... There are various *other* tools (like rebase, am, cherry, etc) which operate on sets of commits. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Robert Haas wrote: But I wonder if it would make more sense to include some kind of metadata in the commit message (or some other property of the commit? does git support that?) to make it not depend on that. From elsewhere in this thread[1], 'The git cherry-pick ... -x flag adds a note to the commit comment describing the relationship between the commits.' If the commit on the main branch had this message = added a line on the main branch = The commit on the cherry picked branch will have this comment = added a line on the main branch (cherry picked from commit 189ef03b4f4ed5078328f7965c7bfecce318490d) = where the big hex string identifies the comment on the other branch. [1] http://archives.postgresql.org/pgsql-hackers/2009-06/msg00191.php -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On 6/1/09, Markus Wanner mar...@bluegap.ch wrote: a newish conversion with cvs2git is available to check here: git://www.bluegap.ch/ (it's not incremental and will only stay for a few days) +1 for the idea of replacing CVS usernames with full names. The knowledge about CVS usernames will be increasingly obscure. Also worth mentioning is that there is no need to assign absolutely up-to-date email addresses, it's enough if they uniquely identify person. Aidan Van Dyk wrote: Yes, but the point is you want an exact replica of CVS right? You're git repo should have $PostgreSQL$ and the cvs export/checkout (you do use -kk right) should also have $PostgreSQL$. No, I'm testing against cvs checkout, as that's what everybody is used to. But it's important, because on *some* files you *do* want expanded keywords (like the $OpenBSD ... Exp $. One of the reasons pg CVS went to the $PostgreSQL$ keyword (I'm guessing) was so they could explictly de-couple them from other keywords that they didn't want munging on. I don't care half as much about the keyword expansion stuff - that's doomed to disappear anyway. But this is one aspect we need to get right for the conversion. So preferably we test it sooner not later. I think Aidan got it right - expand $PostgreSQL$ and others that are actually expanded on current repo, but not $OpenBSD$ and others coming from external sources. What I'm much more interested in is correctness WRT historic contents, i.e. that git log, git blame, etc.. deliver correct results. That's certainly harder to check. In my experience, cvs2svn (or cvs2git) does a pretty decent job at that, even in case of some corruptions. Plus it offers lots of options to fine tune the conversion, see the attached configuration I've used. So, I wouldn't consider any conversion good unless it had all these: As well as stuff like: parsecvs-master:src/backend/access/index/genam.c: * $PostgreSQL$ I disagree here and find it more convenient for the git repository to keep the old RCS versions - as in the source tarballs that got (and still get) shipped. Just before switching over to git one can (and should, IMO) remove these tags to avoid confusion. I'd prefer we immediately test full conversion and not leave some steps to last moment. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On 6/2/09, Marko Kreen mark...@gmail.com wrote: On 6/1/09, Markus Wanner mar...@bluegap.ch wrote: a newish conversion with cvs2git is available to check here: git://www.bluegap.ch/ (it's not incremental and will only stay for a few days) Btw this conversion seems broken as it contains random merge commits. parsecvs managed to do it without them. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Marko Kreen mark...@gmail.com: I don't care half as much about the keyword expansion stuff - that's doomed to disappear anyway. But this is one aspect we need to get right for the conversion. What's your definition of right? I personally prefer the keyword expansion to match a cvs checkout as closely as possible. So preferably we test it sooner not later. I actually *am* testing against that. As mentioned, the only differences are insignificant, IMO. For example having 1.1.1.1 instead of 1.1 (or vice versa, I don't remember). I think Aidan got it right - expand $PostgreSQL$ and others that are actually expanded on current repo, but not $OpenBSD$ and others coming from external sources. AFAIU Aidan proposed the exact opposite. I'm proposing to leave both expanded, as in a CVS checkout and as shipped in the source release tarballs. I'd prefer we immediately test full conversion and not leave some steps to last moment. IMO that would equal to changing history, so that a checkout from git doesn't match a released tarball as good as possible. What you call leave(ing) some steps to last moment is IMO not part of the conversion. It's rather a conscious decision to drop these keywords as soon as we switch to git. This step should be represented in history as a separate commit, IMO. What do others think? Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On 6/2/09, Markus Wanner mar...@bluegap.ch wrote: Quoting Marko Kreen mark...@gmail.com: I don't care half as much about the keyword expansion stuff - that's doomed to disappear anyway. But this is one aspect we need to get right for the conversion. What's your definition of right? I personally prefer the keyword expansion to match a cvs checkout as closely as possible. This is Definitely Wrong (tm). You seem to be thinking that comparing GIT checkout to random parallel CVS checkout (eg. from .tgz.) is the main use-case. It is not. Browsing history and looking and diffs between versions is. And expanded CVS keywords would be total PITA for that. So preferably we test it sooner not later. I actually *am* testing against that. As mentioned, the only differences are insignificant, IMO. For example having 1.1.1.1 instead of 1.1 (or vice versa, I don't remember). Why have those at all... I think Aidan got it right - expand $PostgreSQL$ and others that are actually expanded on current repo, but not $OpenBSD$ and others coming from external sources. AFAIU Aidan proposed the exact opposite. Ah, sorry, my thinko. s/expanded/stripped/. Take Aidan's description as authoritative.. :) I'm proposing to leave both expanded, as in a CVS checkout and as shipped in the source release tarballs. No, the noise they add to history would seriously hurt usability. I'd prefer we immediately test full conversion and not leave some steps to last moment. IMO that would equal to changing history, so that a checkout from git doesn't match a released tarball as good as possible. We need to compare against tarballs only when checking the conversion. And only then. Writing few scripts for that should not be a problem. What you call leave(ing) some steps to last moment is IMO not part of the conversion. It's rather a conscious decision to drop these keywords as soon as we switch to git. This step should be represented in history as a separate commit, IMO. The question is how they should appear in historical commits. I have no strong opinion whether to edit them out or not in the future. Doing it during the periodic reindent would be good moment tho'. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Marko Kreen mark...@gmail.com: Btw this conversion seems broken as it contains random merge commits. Well, that's a feature, not a bug ;-) When a commit adds a file to the master *and* then to the branch as well, cvs2git prefers to represent this as a merge from the master branch, instead of adding the file twice, once on the master and once on the branch. This way the target VCS knows it's the *same* file, originating from one single commit. This may be important for later merges - otherwise you may suddenly end up with duplicated files after a merge, because the VCS doesn't know they are in fact the same. (Okay, git assumes two files to have the same origin/history as long as they have the same filename. But just rename one of the two, and you are have the same troubles, again). Also note that these situations occur rather frequently in the Postgres CVS repository. Every back-patch which adds files ends up as a merge. (One could even argue that in the perfect conversion *all* back-patches should be represented as merges, rather than as separate commits). parsecvs managed to do it without them. Now, I'm not calling it broken, but cvs2git's output is arguably better in that regard. As you certainly see by now, conversion from CVS is neither simple nor unambiguous. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Marko Kreen mark...@gmail.com: This is Definitely Wrong (tm). You seem to be thinking that comparing GIT checkout to random parallel CVS checkout (eg. from .tgz.) is the main use-case. It is not. Browsing history and looking and diffs between versions is. And expanded CVS keywords would be total PITA for that. That's an agrument. Point taken. I'll check if cvs2git supports that as well. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On 6/2/09, Markus Wanner mar...@bluegap.ch wrote: Quoting Marko Kreen mark...@gmail.com: Btw this conversion seems broken as it contains random merge commits. Well, that's a feature, not a bug ;-) When a commit adds a file to the master *and* then to the branch as well, cvs2git prefers to represent this as a merge from the master branch, instead of adding the file twice, once on the master and once on the branch. This way the target VCS knows it's the *same* file, originating from one single commit. This may be important for later merges - otherwise you may suddenly end up with duplicated files after a merge, because the VCS doesn't know they are in fact the same. (Okay, git assumes two files to have the same origin/history as long as they have the same filename. But just rename one of the two, and you are have the same troubles, again). Not a problem for git I think - it assumes they are same if they have same contents... Also note that these situations occur rather frequently in the Postgres CVS repository. Every back-patch which adds files ends up as a merge. (One could even argue that in the perfect conversion *all* back-patches should be represented as merges, rather than as separate commits). Well, such behaviour may be a feature for some repo with complex CVS usage, but currently we should aim for simple and clear conversion. The question is - do such merges make any sense to human looking at history - and the answer is no, as no VCS level merge was happening, just some copying around (if your description is correct). And we don't need to add noise for the benefit of GIT as it works fine without any fake merges. Our target should be each branch having simple linear history, without any fake merges. This will result in minimal confusion to both humans looking history and also GIT itself. So please turn the merge logic off. If this cannot be turned off, cvs2git is not usable for conversion. parsecvs managed to do it without them. Now, I'm not calling it broken, but cvs2git's output is arguably better in that regard. Seems it contains more complex logic to handle more complex CVS usage cases, but seems like overkill for us if it creates a mess of history. As you certainly see by now, conversion from CVS is neither simple nor unambiguous. I know, thats why I'm discussing the tradeoffs. Simple+clear vs. complex+messy. :) -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Markus Wanner mar...@bluegap.ch [090602 07:08]: Hi, Quoting Marko Kreen mark...@gmail.com: I don't care half as much about the keyword expansion stuff - that's doomed to disappear anyway. But this is one aspect we need to get right for the conversion. What's your definition of right? I personally prefer the keyword expansion to match a cvs checkout as closely as possible. AFAIU Aidan proposed the exact opposite. I'm proposing to leave both expanded, as in a CVS checkout and as shipped in the source release tarballs. Well, since I have -kk set in my .cvsrc, mine matches exactly the CVS checkout l-) Basically, I want the git to be identical to the cvs checkout. If you use -kk, that means the PostgreSQL CVS repository keywords *aren't* expanded. If you like -kv, that means they are. Pick your poison (after all, it's CVS), either way, I think the 2 of *us* are going to disagree which is best here ;-) But, which ever way (exact to -kk or exact to -kv), the conversion should be exact, and there should be no reason to filter out keyword-like stuff in the diffs. What you call leave(ing) some steps to last moment is IMO not part of the conversion. It's rather a conscious decision to drop these keywords as soon as we switch to git. This step should be represented in history as a separate commit, IMO. What do others think? I'm assuming they will get removed from the source eventually too - but that step is *outside* the conversion. Somebody could do it now in CVS before the conversion, or afterwards, but it's still outside the conversion. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Aidan Van Dyk ai...@highrise.ca: Pick your poison (after all, it's CVS), either way, I think the 2 of *us* are going to disagree which is best here ;-) Marko already convinced me of -kk, I'm trying that with cvs2git. But, which ever way (exact to -kk or exact to -kv), the conversion should be exact, and there should be no reason to filter out keyword-like stuff in the diffs. I just really didn't want to care about keyword expansion. Besides lacking consistency, it's one of the worst misfeatures of CVS, IMNSHO. ;-) I'll let you know how cvs2git behaves WRT -kk. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Markus Wanner mar...@bluegap.ch [090602 09:37]: Marko already convinced me of -kk, I'm trying that with cvs2git. Good ;-) I just really didn't want to care about keyword expansion. Besides lacking consistency, it's one of the worst misfeatures of CVS, IMNSHO. ;-) Absolutely... And one of the reasons I've had -kk in my .cvsrc for years, even before I started with git. I'll let you know how cvs2git behaves WRT -kk. Cool.. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Marko Kreen mark...@gmail.com: Not a problem for git I think Knowing that git doesn't track files as hard as monotone, I certainly doubt that. - it assumes they are same if they have same contents... Why do you assume they have the same contents? Obviously these are different branches, where files can (and will!) have different contents. Well, such behaviour may be a feature for some repo with complex CVS usage, but currently we should aim for simple and clear conversion. First of all, we should aim for a correct one. The question is - do such merges make any sense to human looking at history - and the answer is no, as no VCS level merge was happening, just some copying around (if your description is correct). And we don't need to add noise for the benefit of GIT as it works fine without any fake merges. For low expectations of it works, maybe yes. However if you don't tell git, it has no chance of knowing that two (different) files should actually be the same. Try the following: git init echo base basefile git add basefile git commit -m base commit git checkout -b branch echo hello, world testfile git add testfile git commit testfile -m addition on branch git checkout master echo hello world testfile git add testfile git commit testfile -m addition on master # here we are a similar point like after a lacking conversion, having two # distinct, i.e. historically independent files called testfile git mv testfile movedfile git commit -m file moved git checkout branch git merge master ls # Bang, you suddenly have 'testfile' and 'movedfile', go figure! I leave it as an exercise for the reader to try the same with a single historic origin of the file, as cvs2git does the conversion. Our target should be each branch having simple linear history, without any fake merges. This will result in minimal confusion to both humans looking history and also GIT itself. I don't consider the above a minimal confusion. And concerning humans... you get used to merge commits pretty quickly. I for one am more confused by a linear history which in fact is not. As mentioned before, I'd personally favor *all* of the back-ports to actually be merges of some sort, because that's what they effectively are. However, that also bring up the question of how we are going to do back-patches in the future with git. So please turn the merge logic off. If this cannot be turned off, cvs2git is not usable for conversion. As far as I know, it cannot be turned off. Use parsecvs if you want to get silly side effects later on in history. ;-) Seems it contains more complex logic to handle more complex CVS usage cases, but seems like overkill for us if it creates a mess of history. You consider it a mess, I consider it a better and more valid representation of the mess that CVS is. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Markus Wanner mar...@bluegap.ch [090602 10:23]: # Bang, you suddenly have 'testfile' and 'movedfile', go figure! I leave it as an exercise for the reader to try the same with a single historic origin of the file, as cvs2git does the conversion. Sure, and we can all construct example where that move is both right and wrong... But the point is that in PostgreSQL, (and that may be mainly because we're using CVS), merges *aren't* something that happens. Patches are written against HEAD (master) and then back-patched... If you want to turn PostgreSQL devellopment on it's head, then we can switch this around, so that patches are always done on the oldest branch, and fixes always merged forward... I'm not going to be the one that pushes that though ;-) I don't consider the above a minimal confusion. And concerning humans... you get used to merge commits pretty quickly. I for one am more confused by a linear history which in fact is not. But the fact is, everyone using CVS wants a linear history. All they care about is cvs update...wait...cvs update ... time ... cvs update .. Everything *was* linear to them. Any merge type things certaily wasn't intentional in CVS... As mentioned before, I'd personally favor *all* of the back-ports to actually be merges of some sort, because that's what they effectively are. However, that also bring up the question of how we are going to do back-patches in the future with git. Well, if people get comfortable with it, I expect that backports don't happenen.. Bugs are fixed where they happen, and merged forward into all affected later development based on the bugged area. As far as I know, it cannot be turned off. Use parsecvs if you want to get silly side effects later on in history. ;-) Ya, that's one of the reasons I considered parsecvs the leading candidate... And why I went thouth, and showed that with the exception of the one REL_8_0_0 tip, it *was* and exact copy of the current CVS repository (minus the 1 messed up tag in the repository). You consider it a mess, I consider it a better and more valid representation of the mess that CVS is. So much better that it makes the history as useless as CVS... I think one of the reasons people are wanting tomove from CVS to git is that it makes things *better*... The exact history will *always* be available, right in CVS if people need it. I thin the goal is to make the git history as close to CVS as possible, such that it's useful. I mean, if we want it to be a more valid representation, then really, we should be doing every file change in a single commit, and merging that file commit into the branch *every* *single* *time*... I don't think anybody wants our conversion to be that much better and move valid representation of the mess that CVS is... It's a balance... We're moving because we want *better* tools and access, not the same mess that CVS is. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Aidan Van Dyk escribió: * Markus Wanner mar...@bluegap.ch [090602 10:23]: # Bang, you suddenly have 'testfile' and 'movedfile', go figure! I leave it as an exercise for the reader to try the same with a single historic origin of the file, as cvs2git does the conversion. Sure, and we can all construct example where that move is both right and wrong... But the point is that in PostgreSQL, (and that may be mainly because we're using CVS), merges *aren't* something that happens. Patches are written against HEAD (master) and then back-patched... If you want to turn PostgreSQL devellopment on it's head, then we can switch this around, so that patches are always done on the oldest branch, and fixes always merged forward... The Monotone folk call this daggy fixes and it seems a clean way to handle things. http://www.monotone.ca/wiki/DaggyFixes/ However, I'm not going to be the one that pushes that though ;-) I'm not either. Maybe someday we'll be familiar enough with the tools to make things this way, but I think just after the migration we'll mainly want to be able to press on with development and not waste too much time learning the new toys. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Aidan Van Dyk ai...@highrise.ca writes: * Markus Wanner mar...@bluegap.ch [090602 10:23]: You consider it a mess, I consider it a better and more valid representation of the mess that CVS is. So much better that it makes the history as useless as CVS... I think one of the reasons people are wanting tomove from CVS to git is that it makes things *better*... FWIW, the tool that I customarily use (cvs2cl) considers commits on different branches to be the same if they have the same commit message and occur sufficiently close together (within a few minutes). My committing habits have been designed around that behavior for years, and I believe other PG committers have been doing likewise. I would consider a git conversion to be less useful to me, not more, if it insists on showing me such cases as separate commits --- and if it then adds useless merge messages on top of that, I'd start to get seriously annoyed. What we want here is a readable equivalent of the CVS history, not necessarily something that is theoretically an exact equivalent. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Aidan Van Dyk ai...@highrise.ca: Sure, and we can all construct example where that move is both right and wrong... Huh? The problem is the file duplication. The move is an action of a committer - it's neither right nor wrong in this example. I cannot see any use case for seemingly random files poping up out of nowhere, just because git doesn't know how to merge two files after a mv and a merge. But the point is that in PostgreSQL, (and that may be mainly because we're using CVS), merges *aren't* something that happens. Patches are written against HEAD (master) and then back-patched... ..which can (and better is) represented as a merge in git (for the sake of comfortable automated merging). If you want to turn PostgreSQL devellopment on it's head, then we can switch this around, so that patches are always done on the oldest branch, and fixes always merged forward... I'd consider that good use of tools, yes. However, I realize that this probably is pipe-dreaming... But the fact is, everyone using CVS wants a linear history. All they care about is cvs update...wait...cvs update ... time ... cvs update .. Everything *was* linear to them. Any merge type things certaily wasn't intentional in CVS... ..no, it just wasn't possible in CVS. Switching to git, people soon want merge type things. Heck, it's probably *the* reason for switching to git. So much better that it makes the history as useless as CVS... I think one of the reasons people are wanting tomove from CVS to git is that it makes things *better*... Yes, especially merging. Please don't cripple that ability just because CVS once upon a time enforced a linear history. The exact history will *always* be available, right in CVS if people need it. Agreed. Please note that I mostly talk about a more correct representation *of history*, as it happened. This has nothing to do with single commits per file. It's a balance... We're moving because we want *better* tools and access, not the same mess that CVS is. Agreed. And please cut as many of its burdens of the past, like linearity. History is not linear and has never been. But I'm stopping now before getting overly philosophic... Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On Tue, Jun 2, 2009 at 4:02 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: The Monotone folk call this daggy fixes and it seems a clean way to handle things. http://www.monotone.ca/wiki/DaggyFixes/ Is this like what git calls an octopus? I've been wondering what the point of such things were. Or maybe not. I thought an octopus was two patches with the same parent -- ie, two patches that could independently be applied in any order. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On 6/2/09, Tom Lane t...@sss.pgh.pa.us wrote: Aidan Van Dyk ai...@highrise.ca writes: * Markus Wanner mar...@bluegap.ch [090602 10:23]: You consider it a mess, I consider it a better and more valid representation of the mess that CVS is. So much better that it makes the history as useless as CVS... I think one of the reasons people are wanting tomove from CVS to git is that it makes things *better*... FWIW, the tool that I customarily use (cvs2cl) considers commits on different branches to be the same if they have the same commit message and occur sufficiently close together (within a few minutes). My committing habits have been designed around that behavior for years, and I believe other PG committers have been doing likewise. I would consider a git conversion to be less useful to me, not more, if it insists on showing me such cases as separate commits --- and if it then adds useless merge messages on top of that, I'd start to get seriously annoyed. They cannot be same commits in GIT as the resulting tree is different. You could tie them with some sort of merge commits, but doubt the result would be worth the noise. Also I doubt there is tool grokking such commits anyway, the merge discussion above was for full files with exact contents appearing in several branches. What we want here is a readable equivalent of the CVS history, not necessarily something that is theoretically an exact equivalent. I suggest setting the goal to be simple and clear representation of CVS history that we can make sense later, instead of revising CVS history to look like we used some better VCS system... -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On 6/2/09, Markus Wanner mar...@bluegap.ch wrote: [academic nitpicking] Sorry, not going there. Just look at the state of VCS systems that have prioritized academic issues insead of practicality... (arch/darcs/monotone/etc..) So please turn the merge logic off. If this cannot be turned off, cvs2git is not usable for conversion. As far as I know, it cannot be turned off. Use parsecvs if you want to get silly side effects later on in history. ;-) --no-cross-branch-commits seems sort of that direction? And what silly side effects are you talking about? I see only cvs2git doing silly things... (I'm talking about only in context of Postgres CVS repo, not in general.) Seems it contains more complex logic to handle more complex CVS usage cases, but seems like overkill for us if it creates a mess of history. You consider it a mess, I consider it a better and more valid representation of the mess that CVS is. Note that merge is no file-level but tree level. Also note we don't use branches for feature developement but for major version maintenance. So how can single file appearing in 2 branches means merge of 2 trees? How can that be valid? -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On Tue, Jun 2, 2009 at 11:08 AM, Tom Lane t...@sss.pgh.pa.us wrote: Aidan Van Dyk ai...@highrise.ca writes: * Markus Wanner mar...@bluegap.ch [090602 10:23]: You consider it a mess, I consider it a better and more valid representation of the mess that CVS is. So much better that it makes the history as useless as CVS... I think one of the reasons people are wanting tomove from CVS to git is that it makes things *better*... FWIW, the tool that I customarily use (cvs2cl) considers commits on different branches to be the same if they have the same commit message and occur sufficiently close together (within a few minutes). My committing habits have been designed around that behavior for years, and I believe other PG committers have been doing likewise. Interesting. I was wondering why all your commit messages always show up simultaneously for all the back branches. I would consider a git conversion to be less useful to me, not more, if it insists on showing me such cases as separate commits --- and if it then adds useless merge messages on top of that, I'd start to get seriously annoyed. There's no help for them being separate commits, but I agree that useless merge commits are a bad thing. There are plenty of ways to avoid that, though; I've been using git cherry-pick a lot recently, and I think git rebase --onto also has some potential. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Tom Lane t...@sss.pgh.pa.us: FWIW, the tool that I customarily use (cvs2cl) considers commits on different branches to be the same if they have the same commit message and occur sufficiently close together (within a few minutes). My committing habits have been designed around that behavior for years, and I believe other PG committers have been doing likewise. Yeah, that's how I see things as well. I would consider a git conversion to be less useful to me, not more, if it insists on showing me such cases as separate commits --- and if it then adds useless merge messages on top of that, I'd start to get seriously annoyed. Hm.. well, in git, there's no such thing as a commit that spans multiple branches. So it's impossible to fulfill both of your wishes here. parsecvs creates multiple independent commits in such a case. cvs2git creates a single commit and propagates this to the back branches with merge commits (however, only if new files are added, otherwise it does the same as parsecvs). What we want here is a readable equivalent of the CVS history, not necessarily something that is theoretically an exact equivalent. Understood. However, readability depends on the user's habits. But failing to merge due to a lacking conversion potentially hurts everybody who wants to merge. Having used merging (in combination with renaming) often enough, I'd certainly be pretty annoyed if merges suddenly begin to bring up spurious file duplicates. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Marko Kreen mark...@gmail.com: Sorry, not going there. Just look at the state of VCS systems that have prioritized academic issues insead of practicality... (arch/darcs/monotone/etc..) I already am there. And I don't want to go back, thanks. But my bias for monotone certainly shines through, yes ;-) --no-cross-branch-commits seems sort of that direction? Yes, that could lead to the same defect. Uhm.. thank you for pointing that out, I'm not gonna try it, sorry. And what silly side effects are you talking about? I'm talking about spurious file duplicates popping up after a rename and a merge, see my example in this thread. You consider it a mess, I consider it a better and more valid representation of the mess that CVS is. Note that merge is no file-level but tree level. Depends on your point of view. Each file gets merged pretty indivitually, but the result ends up in a single commit, yes. Also note we don't use branches for feature developement but for major version maintenance. So? You think you are never going to merge? So how can single file appearing in 2 branches means merge of 2 trees? How can that be valid? I'm not sure what you are questioning here. I find it perfectly reasonable to build something on top of REL8_3_STABLE and later on wanting to merge to REL8_4_STABLE. And I don't want to manually merge my changes, just because of a rename in 8.4 and a bad decision during the migration to git. (And no, I don't think any of the other git tools will help with this, due to the academic-nitpick-reasons above). Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On 6/2/09, Markus Wanner mar...@bluegap.ch wrote: Quoting Marko Kreen mark...@gmail.com: And what silly side effects are you talking about? I'm talking about spurious file duplicates popping up after a rename and a merge, see my example in this thread. The example was not actual case from Postgres CVS history, but hypotetical situation without checking if it already works with GIT. Also note we don't use branches for feature developement but for major version maintenance. So? You think you are never going to merge? So how can single file appearing in 2 branches means merge of 2 trees? How can that be valid? I'm not sure what you are questioning here. I find it perfectly reasonable to build something on top of REL8_3_STABLE and later on wanting to merge to REL8_4_STABLE. And I don't want to manually merge my changes, just because of a rename in 8.4 and a bad decision during the migration to git. (And no, I don't think any of the other git tools will help with this, due to the academic-nitpick-reasons above). Merging between branches with GIT is fine workflow in the future. But we are currently discussing how to convert CVS history to GIT. My point is that we should avoid fake merges, to avoid obfuscating history. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Aidan Van Dyk wrote: * Markus Wanner mar...@bluegap.ch [090602 10:23]: As mentioned before, I'd personally favor *all* of the back-ports to actually be merges of some sort, because that's what they effectively are. However, that also bring up the question of how we are going to do back-patches in the future with git. Well, if people get comfortable with it, I expect that backports don't happenen.. Bugs are fixed where they happen, and merged forward into all affected later development based on the bugged area. I imagine the closest thing to existing practices would be that people would to use git-cherry-pick -x -n to backport only the commits they wanted from the current branch into the back branches. AFAICT, this doesn't record a merge in the GIT history, but looks a lot like the linear history from CVS - with the exception that the comment added by -x explicitly refers to the exact commit from the main branch. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, a newish conversion with cvs2git is available to check here: git://www.bluegap.ch/ (it's not incremental and will only stay for a few days) For everybody interested, please check the committer names and emails. I'm missing the names and email addresses for these committers: 'barry' : ('barry??', ''), 'dennis' : ('Dennis??', ''), 'inoue' : ('inoue??', ''), 'jurka' : ('jurka??', ''), 'pjw' : ('pjw??', ''), And I'm guessing that 'peter' is the same as 'petere': 'peter' : ('Peter Eisentraut (?)', 'pete...@gmx.net'), I've compared all branch heads and all tags with a cvs checkout. The only differences are keyword expansion errors. Most commonly the RCS version 1.1 is used in the resulting git repository, instead of version 1.1.1.1. This also leads to getting dates wrong ($Date keyword). I'm unsure on how to test Tom's requirement that every commit and its log message is included in the resulting git repository. Feel free to clone and inspect the mentioned git repository and propose improvements on the cvs2git options used. Aidan Van Dyk wrote: Yes, but the point is you want an exact replica of CVS right? You're git repo should have $PostgreSQL$ and the cvs export/checkout (you do use -kk right) should also have $PostgreSQL$. No, I'm testing against cvs checkout, as that's what everybody is used to. But it's important, because on *some* files you *do* want expanded keywords (like the $OpenBSD ... Exp $. One of the reasons pg CVS went to the $PostgreSQL$ keyword (I'm guessing) was so they could explictly de-couple them from other keywords that they didn't want munging on. I don't care half as much about the keyword expansion stuff - that's doomed to disappear anyway. What I'm much more interested in is correctness WRT historic contents, i.e. that git log, git blame, etc.. deliver correct results. That's certainly harder to check. In my experience, cvs2svn (or cvs2git) does a pretty decent job at that, even in case of some corruptions. Plus it offers lots of options to fine tune the conversion, see the attached configuration I've used. So, I wouldn't consider any conversion good unless it had all these: As well as stuff like: parsecvs-master:src/backend/access/index/genam.c: * $PostgreSQL$ I disagree here and find it more convenient for the git repository to keep the old RCS versions - as in the source tarballs that got (and still get) shipped. Just before switching over to git one can (and should, IMO) remove these tags to avoid confusion. Regards Markus Wanner # (Be in -*- mode: python; coding: utf-8 -*- mode.) import re from cvs2svn_lib import config from cvs2svn_lib import changeset_database from cvs2svn_lib.common import CVSTextDecoder from cvs2svn_lib.log import Log from cvs2svn_lib.project import Project from cvs2svn_lib.git_revision_recorder import GitRevisionRecorder from cvs2svn_lib.git_output_option import GitRevisionMarkWriter from cvs2svn_lib.git_output_option import GitOutputOption from cvs2svn_lib.revision_manager import NullRevisionRecorder from cvs2svn_lib.revision_manager import NullRevisionExcluder from cvs2svn_lib.fulltext_revision_recorder \ import SimpleFulltextRevisionRecorderAdapter from cvs2svn_lib.rcs_revision_manager import RCSRevisionReader from cvs2svn_lib.cvs_revision_manager import CVSRevisionReader from cvs2svn_lib.checkout_internal import InternalRevisionRecorder from cvs2svn_lib.checkout_internal import InternalRevisionExcluder from cvs2svn_lib.checkout_internal import InternalRevisionReader from cvs2svn_lib.symbol_strategy import AllBranchRule from cvs2svn_lib.symbol_strategy import AllTagRule from cvs2svn_lib.symbol_strategy import BranchIfCommitsRule from cvs2svn_lib.symbol_strategy import ExcludeRegexpStrategyRule from cvs2svn_lib.symbol_strategy import ForceBranchRegexpStrategyRule from cvs2svn_lib.symbol_strategy import ForceTagRegexpStrategyRule from cvs2svn_lib.symbol_strategy import ExcludeTrivialImportBranchRule from cvs2svn_lib.symbol_strategy import ExcludeVendorBranchRule from cvs2svn_lib.symbol_strategy import HeuristicStrategyRule from cvs2svn_lib.symbol_strategy import UnambiguousUsageRule from cvs2svn_lib.symbol_strategy import HeuristicPreferredParentRule from cvs2svn_lib.symbol_strategy import SymbolHintsFileRule from cvs2svn_lib.symbol_transform import ReplaceSubstringsSymbolTransform from cvs2svn_lib.symbol_transform import RegexpSymbolTransform from cvs2svn_lib.symbol_transform import IgnoreSymbolTransform from cvs2svn_lib.symbol_transform import NormalizePathsSymbolTransform from cvs2svn_lib.property_setters import AutoPropsPropertySetter from cvs2svn_lib.property_setters import CVSBinaryFileDefaultMimeTypeSetter from cvs2svn_lib.property_setters import CVSBinaryFileEOLStyleSetter from cvs2svn_lib.property_setters import CVSRevisionNumberSetter from cvs2svn_lib.property_setters import DefaultEOLStyleSetter from cvs2svn_lib.property_setters import EOLStyleFromMimeTypeSetter
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Markus Wanner mar...@bluegap.ch writes: I'm missing the names and email addresses for these committers: 'barry' : ('barry??', ''), Barry Lind, formerly one of the JDBC bunch, been inactive for awhile 'dennis' : ('Dennis??', ''), I suppose this must be Dennis Björklund, but I didn't realize he used to be a committer. 'inoue' : ('inoue??', ''), Hiroshi Inoue, still active, but ODBC is not part of core anymore 'jurka' : ('jurka??', ''), Kris Jurka, still active, but JDBC is not part of core anymore 'pjw' : ('pjw??', ''), Philip Warner, inactive (still reads the lists though) And I'm guessing that 'peter' is the same as 'petere': 'peter' : ('Peter Eisentraut (?)', 'pete...@gmx.net'), No, that would be Peter Mount, also a retired JDBC hacker. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Tom Lane wrote: Markus Wanner mar...@bluegap.ch writes: 'dennis' : ('Dennis??', ''), I suppose this must be Dennis Bj�rklund, but I didn't realize he used to be a committer. IIRC he was given commit privs for translation files. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Alvaro Herrera alvhe...@commandprompt.com writes: Tom Lane wrote: I suppose this must be Dennis Björklund, but I didn't realize he used to be a committer. IIRC he was given commit privs for translation files. Ah, right, that does ring a bell now. BTW, Markus: you do realize thomas is not me but Tom Lockhart? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Robert Haas robertmh...@gmail.com: That's not the best news I've had today... Sorry :-( To me they sound complex and inconvenient. I guess I'm kind of mystified by why we can't make this work reliably. Other than the broken tags issue we've discussed, it seems like the only real issue should be how to group changes to different files into a single commit. Once you do that, you should be able to construct a well-defined, total function f : cvs-file, cvs-revision - git commit which is surjective on the space of git commits. In fact it might be a good idea to explicitly construct this mapping and drop it into a database table somewhere so that people can sanity check it as much as they wish. Why is this harder than I think it is? Well, as CVS doesn't guarantee any consistency between files, you end up with silly situations more often than you think. One of the simplest possible example is something like: commit 1: fileA @ 1.1, fileB @ 1.2 commit 2: fileA @ 1.2, fileB @ 1.1 Seen from fileA, it's obvious that commit 1 (@1.1) comes before commit 2 (@1.2), but seen from fileB it's the exact opposite. The most promising approach to solve these problems seems to be based on Graph Theory, where you work with a graph of dependencies from fileA @ 1.1 to fileA @ 1.2. To resolve the above situation, you'd have split a blob of single-file commits into two end-result commits (for monotone / git). In the above example, you'd have two options to resolve the conflict: commit 1a: fileA @ 1.1 commit 2: fileA @ 1.2, fileB @ 1.1 commit 1b: fileA @ 1.2 Or: commit 2a: fileB @ 1.1 commit 1: fileA @ 1.1, fileB @ 1.2 commit 2b: fileB @ 1.2 (Note that often enough, these have actually been separate commits in CVS as well, there's just no way to represent that. And no, timestamps are simply not reliable enough). Now add tags, branches and cyclic dependencies involving many files and many 100 commits to the example above and you start to get an idea of the complexity of the problem in general. See my description and diagrams of the steps used for cvs_import in monotone at [1] or follow descriptions of how cvs2svn works internally. A few numbers about a conversion I'm trying for testing my algorithm and heuristics. It's converting a pretty recent snapshot of the Postgres repository: * running at 100% CPU time since: April, 17 * Total number of files involved: 6'847 * total number of blobs (before splitting): 28'010 * blobs split due to cyclic dependencies: 12'801 Admittedly, my algorithm isn't optimized at all. However, I'm focusing on good results rather than speed of conversion. Also note, that monotone uses SQLite, so it actually stores the results of this conversion in an SQL database, as you proposed. Recently, a git_export command has been added, so that's definitely worth a try for converting CVS to git. However, I fear cvs2git is more mature. Regards Markus Wanner [1]: a description of the various steps in conversion from CVS to monotone: http://www.monotone.ca/wiki/CvsImport/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On Thursday 28 May 2009 20:03:38 Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: Right. Shall we try to spec out exactly what our conversion requirements are? Here's a shot: [...] Comments? Other considerations? Certainly sounds reasonable to me. I'd be really suprised if that's really all that hard to accomplish. I'd be happy to help with some testing too if we feel that the current git repo is in reasonable shape to do that testing against (or someone has another). Sounds like writing a comprehensive test suite against Tom's spec would be the first step. And then this test suite can be run against various conversion tools and configurations thereof. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On Fri, May 29, 2009 at 2:41 AM, Markus Wanner mar...@bluegap.ch wrot Hi, Quoting Robert Haas robertmh...@gmail.com: Why is this harder than I think it is? One of the simplest possible example is something like: Thanks for the explanation, I understand it better now. I'm still dismayed, but at least I know why I'm dismayed. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Aidan Van Dyk ai...@highrise.ca: Ok, so seeing the interest in having a good conversion, I took a stab at parsecvs this afternoon, probably what I consider the leading static conversion tool. Here are some results from a conversion with cvs2git. It takes about 10 minutes to run my old xeon. The conversion with cvs2git certainly took a bit longer, however, I don't think that matters at all. Everything below a day or two is good enough, IMO. What counts is the result. The first step is running cvs2git itself: cvs2svn Statistics: -- Total CVS Files: 6873 Total CVS Revisions:140191 Total CVS Branches: 36057 Total CVS Tags: 457515 Total Unique Tags: 171 Total Unique Branches: 21 CVS Repos Size in KB: 377337 Total SVN Commits: 32889 First Revision Date:Tue Jul 9 08:21:07 1996 Last Revision Date: Thu May 28 22:02:10 2009 (number of files matches pretty well with my own algorithm, however, total svn commits is a bit lower, compared to the ~ 40'000 blobs I got). The output of cvs2git can then be imported with git fast-import: git-fast-import statistics: - Alloc'd objects: 35 Total objects: 349405 ( 19563 duplicates ) blobs : 132672 ( 3255 duplicates 119032 deltas) trees : 183967 ( 16308 duplicates 165582 deltas) commits:32766 ( 0 duplicates 0 deltas) tags :0 ( 0 duplicates 0 deltas) Total branches: 194 ( 664 loads ) marks: 1073741824 (168693 unique) atoms: 5280 Memory total: 16532 KiB pools: 2860 KiB objects: 13671 KiB - pack_report: getpagesize()= 4096 pack_report: core.packedGitWindowSize = 1073741824 pack_report: core.packedGitLimit = 8589934592 pack_report: pack_used_ctr= 124414 pack_report: pack_mmap_calls = 3674 pack_report: pack_open_windows= 1 / 1 pack_report: pack_mapped = 199500913 / 199500913 - The resulting repository contains the following branches. The unlabeled ones contain only 1-2 files and seem rather irrelevant. In a next try, I'd disable their creation completely, just wanted to check. REL2_0B REL6_4 REL6_5_PATCHES REL7_0_PATCHES REL7_1_STABLE REL7_2_STABLE REL7_3_STABLE REL7_4_STABLE REL8_0_0 REL8_0_STABLE REL8_1_STABLE REL8_2_STABLE REL8_3_STABLE Release_1_0_3 WIN32_DEV ecpg_big_bison * master unlabeled-1.44.2 - from src/backend/commands/tablecmds.c unlabeled-1.51.2 - from src/test/regress/expected/alter_table.out unlabeled-1.59.2 - from src/backend/executor/execTuples.c unlabeled-1.87.2 - from src/backend/executor/nodeAgg.c unlabeled-1.90.2 - from src/backend/parser/parse_target.c and src/backend/access/common/tupdesc.c Comparison of the head of each branch between git and CVS (modulo CVS keyword expansion, which I've filtered out): ecpg_big_bison.diff: 0 files changed master.diff: 0 files changed REL2_0B.diff: 0 files changed REL6_4.diff: 0 files changed REL6_5_PATCHES.diff: 0 files changed REL7_0_PATCHES.diff: 0 files changed REL7_1_STABLE.diff: 0 files changed REL7_2_STABLE.diff: 0 files changed REL7_3_STABLE.diff: 0 files changed REL7_4_STABLE.diff: 0 files changed REL8_0_0.diff:0 files changed REL8_0_STABLE.diff: 0 files changed REL8_1_STABLE.diff: 0 files changed REL8_2_STABLE.diff: 0 files changed REL8_3_STABLE.diff: 0 files changed Release_1_0_3.diff: 0 files changed WIN32_DEV.diff: 0 files changed I plan to compare the tags as well and test what branch they are in, but so far cvs2git seems to hold its promises. I'll report back again within the next few days. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Markus Wanner mar...@bluegap.ch [090529 11:06]: Hi, Comparison of the head of each branch between git and CVS (modulo CVS keyword expansion, which I've filtered out): How did you filter it out, and without the filtering out, how does it do? I plan to compare the tags as well and test what branch they are in, but so far cvs2git seems to hold its promises. I'll report back again within the next few days. It definitely seems to have figured out the REL8_0_0 confusing that tripped up parsecvs. If I'm stuck on another windows project some time in the near future, I'll try and look into why parsecvs trips up on those 3 files from REL8_0_0 branch ;-) a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Aidan Van Dyk ai...@highrise.ca: * Markus Wanner mar...@bluegap.ch [090529 11:06]: Comparison of the head of each branch between git and CVS (modulo CVS keyword expansion, which I've filtered out): How did you filter it out With perl some regexes. and without the filtering out, how does it do? Uh.. why is that of interest? With content hashing, these keywords do more harm than good. I'd have to check again, but there certainly are differences here and there. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Markus Wanner mar...@bluegap.ch [090529 11:18]: Hi, Quoting Aidan Van Dyk ai...@highrise.ca: * Markus Wanner mar...@bluegap.ch [090529 11:06]: Comparison of the head of each branch between git and CVS (modulo CVS keyword expansion, which I've filtered out): How did you filter it out With perl some regexes. and without the filtering out, how does it do? Uh.. why is that of interest? With content hashing, these keywords do more harm than good. Yes, but the point is you want an exact replica of CVS right? You're git repo should have $PostgreSQL$ and the cvs export/checkout (you do use -kk right) should also have $PostgreSQL$. The 3 parsecvs errors were that it *didn't* recognoze the strange $PostgreSQL ... Exp $ expansion that cvs did. But it's important, because on *some* files you *do* want expanded keywords (like the $OpenBSD ... Exp $. One of the reasons pg CVS went to the $PostgreSQL$ keyword (I'm guessing) was so they could explictly de-couple them from other keywords that they didn't want munging on. So, I wouldn't consider any conversion good unless it had all these: parsecvs-master:contrib/pgcrypto/crypt-des.c: * $FreeBSD: src/secure/lib/libcrypt/crypt-des.c,v 1.12 1999/09/20 12:39:20 markm Exp $ parsecvs-master:contrib/pgcrypto/crypt-md5.c: * $FreeBSD: src/lib/libcrypt/crypt-md5.c,v 1.5 1999/12/17 20:21:45 peter Exp $ parsecvs-master:contrib/pgcrypto/md5.c:/* $KAME: md5.c,v 1.3 2000/02/22 14:01:17 itojun Exp $ */ parsecvs-master:contrib/pgcrypto/md5.h:/* $KAME: md5.h,v 1.3 2000/02/22 14:01:18 itojun Exp $ */ parsecvs-master:contrib/pgcrypto/rijndael.c:/* $OpenBSD: rijndael.c,v 1.6 2000/12/09 18:51:34 markus Exp $ */ parsecvs-master:contrib/pgcrypto/rijndael.h: * $OpenBSD: rijndael.h,v 1.3 2001/05/09 23:01:32 markus Exp $ */ parsecvs-master:contrib/pgcrypto/sha1.c:/* $KAME: sha1.c,v 1.3 2000/02/22 14:01:18 itojun Exp $*/ parsecvs-master:contrib/pgcrypto/sha1.h:/* $KAME: sha1.h,v 1.4 2000/02/22 14:01:18 itojun Exp $*/ parsecvs-master:contrib/pgcrypto/sha2.c:/* $OpenBSD: sha2.c,v 1.6 2004/05/03 02:57:36 millert Exp $*/ parsecvs-master:contrib/pgcrypto/sha2.h:/* $OpenBSD: sha2.h,v 1.2 2004/04/28 23:11:57 millert Exp $*/ parsecvs-master:src/backend/port/darwin/system.c: * $FreeBSD: src/lib/libc/stdlib/system.c,v 1.6 2000/03/16 02:14:41 jasone Exp $ parsecvs-master:src/port/crypt.c:/* $NetBSD: crypt.c,v 1.18 2001/03/01 14:37:35 wiz Exp $ */ parsecvs-master:src/port/crypt.c:__RCSID($NetBSD: crypt.c,v 1.18 2001/03/01 14:37:35 wiz Exp $); parsecvs-master:src/port/qsort.c:/* $NetBSD: qsort.c,v 1.13 2003/08/07 16:43:42 agc Exp $ */ parsecvs-master:src/port/qsort_arg.c:/* $NetBSD: qsort.c,v 1.13 2003/08/07 16:43:42 agc Exp $ */ parsecvs-master:src/port/strlcat.c: * $OpenBSD: strlcat.c,v 1.13 2005/08/08 08:05:37 espie Exp $ */ parsecvs-master:src/port/strlcpy.c:/* $OpenBSD: strlcpy.c,v 1.11 2006/05/05 15:27:38 millert Exp $*/ As well as stuff like: parsecvs-master:src/backend/access/index/genam.c: * $PostgreSQL$ parsecvs-master:src/backend/access/index/indexam.c: * $PostgreSQL$ parsecvs-master:src/backend/access/nbtree/Makefile:#$PostgreSQL$ parsecvs-master:src/backend/access/nbtree/README:$PostgreSQL$ parsecvs-master:src/backend/access/nbtree/nbtcompare.c: * $PostgreSQL$ parsecvs-master:src/backend/access/nbtree/nbtinsert.c: * $PostgreSQL$ parsecvs-master:src/backend/access/nbtree/nbtpage.c: *$PostgreSQL$ parsecvs-master:src/backend/access/nbtree/nbtree.c: * $PostgreSQL$ parsecvs-master:src/backend/access/nbtree/nbtsearch.c: * $PostgreSQL$ Basically, identical what to a cvs export/checkout/update gives you with a -kk. But I'm picky ;-) a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Aidan Van Dyk wrote: Yes, but the point is you want an exact replica of CVS right? You're git repo should have $PostgreSQL$ and the cvs export/checkout (you do use -kk right) should also have $PostgreSQL$. The 3 parsecvs errors were that it *didn't* recognoze the strange $PostgreSQL ... Exp $ expansion that cvs did. Huh, no -- I agree that $OpenBSD$ etc should remain (we don't munge them anyway), but $PostgreSQL$, $Id$, $Revision$ etc tags are best gone because, as Markus says, their expansion interferes with content hashing. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Tom Lane escribió: Alvaro Herrera alvhe...@commandprompt.com writes: Tom Lane escribi�: What was in the back of my mind was that we'd go around and mass-remove $PostgreSQL$ (and any other lurking tags), but only from HEAD and only after the repo conversion. Although just before it would be okay too. You mean we would remove them from CVS? I don't think that's necessarily a good idea; it'd be massive changes for no good reason. Uh, how is it different from any other mass edit, such as our annual copyright-year updates, or pgindent runs? Well, the other mass edits have a purpose. This one would be only to help the migration. My idea was to remove them from the repository that would be used for the conversion (I think that means editing the ,v files), Ick ... I'm willing to tolerate a few small manual ,v edits if we have to do it to make tags consistent or something like that. I don't think we should be doing massive edits of that kind. Yeah, that idea wasn't all that great after all. But anyway, that's not the interesting point. The interesting point is what about the historical aspect of it, not whether we want to dispense with the tags going forward. Should our repo conversion try to represent the historical states of the files including the tag strings? Since we're going to lose them functionally after the conversion, it doesn't seem that they serve any purpose. After all, they will not represent anything on the new repository. The problem is that they are a problem for the conversion. Are they expanded before or after the commit? Because the very expansion causes the file to change identity, files being identified by the SHA1 sum of their contents. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Alvaro Herrera alvhe...@commandprompt.com [090529 11:45]: Aidan Van Dyk wrote: Yes, but the point is you want an exact replica of CVS right? You're git repo should have $PostgreSQL$ and the cvs export/checkout (you do use -kk right) should also have $PostgreSQL$. The 3 parsecvs errors were that it *didn't* recognoze the strange $PostgreSQL ... Exp $ expansion that cvs did. Huh, no -- I agree that $OpenBSD$ etc should remain (we don't munge them anyway), but $PostgreSQL$, $Id$, $Revision$ etc tags are best gone because, as Markus says, their expansion interferes with content hashing. I *think* you're actually agreeing with me. *Hiding* the diffs that include munching of keywords is not what we want. We want the conversion to *not* munge keyword-like things (No, $OpenBSD$ is *not* a keyword in the PostgreSQL CVS repository. But $PostgreSQL$ *is*. So we want the conversion to be identical to: cvs export -kk -r $tag That will have *keywords* be unexpanded; namely these specific ones: Author Date Header Id Locker Log Name RCSfile Revision Source State PostgreSQL but *not* keyword-like entries, like: $ NetBSD ... Exp $ $ FreeBSD ... Exp $ $ OpenBSD ... Exp $ $ KAME ... Exp $ which are *not* CVS keywords in the PostgreSQL repository. i.e. Just like I said, identical to cvs checkout/export -kk. Now, and intersting question, do you want the perfect conversion to contain *other* keyword un-expansion possiblities that would have happened on any commits on Nov 29/30 2003 when CVSROOT/options contained: +tagexpand=iPostgreSQL If you had checked out something on that day, even with a -kk, $Log$ would have been expanded, because for that day, $Log$ was *not* an eligable keyword on the PostgreSQL CVS repository. Whooee... Fun with CVS history a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Marc G. Fournier scra...@hub.org: Please repost ... Peter referred to this message here: http://archives.postgresql.org/pgsql-hackers/2008-12/msg01879.php However, please be cautious before applying such a patch. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Marc G. Fournier scra...@hub.org: Actually, I have done that on at least one of the 8.x tags too, so if that is it, more then those two tags should be causing issues ... Not *every* such issue causes problems. An example that's perfectly fine: cvs commit -m first commit fileA cvs tag TEST filA cvs commit -m second commit fileB cvs tag TEST fileB In such a situation, a converter can easily push-down the tag TEST to the second commit, because fileA is the same (in that revision) as after the first commit. After all, the results in the RCS files are exactly the same as if you did the following: cvs commit -m first commit fileA cvs commit -m second commit fileB cvs tag TEST fileA fileB A converter can't possibly distinguish these two. However, if both files get committed the second time, but only one gets tagged, it gets problematic (always assuming the commit actually changes the file): cvs commit -m first commit fileA cvs tag TEST filA cvs commit -m second commit fileA fileB cvs tag TEST fileB That's perfectly valid from CVS's point of view, unwanted for the Postgres repository and hard to handle for a converter to git (or mercurial, monotone, etc..), because the tag TEST is on the first commit for fileA but on the second for fileB, while both of fileA and fileB differ between the commits. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Hi, Quoting Robert Haas robertmh...@gmail.com: I think this is a semantic argument. The problem isn't that we don't understand how CVS behaves; it's that we find that behavior undesirable I fully agree to that and find it undesirable as well. aka broken. Well, for some it's a feature, for others a bug ;-) My point was that other converters have better support for such (undesirable, but still existent) tags that span multiple commits. If that's unwanted anyway, it seems cleaner to fix the CVS repository, yes. Has that been done now? Or is somebody going to do it? (See Peter's patch he just linked again upthread). If we really care about having a tag that contains the exact files that are tagged in CVS, we can create a branch from one of the commits involved, and then apply a commit to that branch that places it in the state that matches the contents of the CVS tag. Exactly (with the difference that with the branch you preserve the history of changes, while the variant with the tag does not). AIUI, this is not very different from what you'd have to do in Subversion, where a tag is a branch is a copy. I think so, too. I'd even state that subversion doesn't really support tagging, instead it simulates tags with branches. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Robert Haas robertmh...@gmail.com [090527 22:43]: On Wed, May 27, 2009 at 10:09 PM, Aidan Van Dyk ai...@highrise.ca wrote: * Robert Haas robertmh...@gmail.com [090527 21:30]: And actually looking at the history of the gpo repo, the branches are all messed up with merges and stuff that I'm not sure where they are coming from... 8.2, 8.3, and master(HEAD) are all the same as my gpo repo, but the back branchs are very bad... This is really quite horrible. What is the best way forward here? That depends entirely on what the project wants. I can't speak for anyone else, but what I want is for the git tree on git.postgresql.org to match CVS. Well, sure, but I think the way forward part implied recognition that the current tree at git.postgresql.org *doesn't* match CVS very closely (for back branches), and that people currently rely on it and use it. So, again, the answer to the question really does depend on what the canonical VCS of the project is. As of now, it's *still* CVS, and those using either git repo can still develop and submit patches to CVS easily. When the project switches, there will probably need to be a more canonical conversion, with one of the tools that doesn't support incremental imports, and then people will have to adjust their current repo with any of rebase/graft/filter-branch to adjust their work history onto the official tree... All that based on the assumption that when the project switches to git, they actually want all the CVS history in their official tree. Its certainly not necessary, and possibly not even desirable... PostgreSQL could just as easily to a linus style switch when they switch to git, and just import the latest release in each branch as the starting point for each branch. The git repository will have no history, and people can choose which history they want to graft in... CVSROOT can be made available as a historical download. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
On Thu, May 28, 2009 at 8:59 AM, Aidan Van Dyk ai...@highrise.ca wrote: All that based on the assumption that when the project switches to git, they actually want all the CVS history in their official tree. Its certainly not necessary, and possibly not even desirable... PostgreSQL could just as easily to a linus style switch when they switch to git, and just import the latest release in each branch as the starting point for each branch. The git repository will have no history, and people can choose which history they want to graft in... CVSROOT can be made available as a historical download. That would suck for me. I use git log a lot to see how things have changed over time. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
* Robert Haas robertmh...@gmail.com [090528 09:49]: On Thu, May 28, 2009 at 8:59 AM, Aidan Van Dyk ai...@highrise.ca wrote: All that based on the assumption that when the project switches to git, they actually want all the CVS history in their official tree. Its certainly not necessary, and possibly not even desirable... PostgreSQL could just as easily to a linus style switch when they switch to git, and just import the latest release in each branch as the starting point for each branch. The git repository will have no history, and people can choose which history they want to graft in... CVSROOT can be made available as a historical download. That would suck for me. I use git log a lot to see how things have changed over time. No, the whole point is that you graft whatever history *you* want in... So if PostgreSQL offical git only starts when the offical VCS was in git, you graft on gpo, or git, or some personal one-time cvs2git or parsecvs history you want in... It would be the projects way of saying basically None of the current cvs imports are perfect and we recognize that. So we're starting fresh, use whatever historical cvs import *you* find best for your history and graft it in. Just the linux kernel has a few historical repos available for people to graft into linus's tree which only started in 2.6.12. If you have work that requires the history of the current gpo repo, you keep using it. If you have work requring the current git repo, you keep using it. If you have no work, but you're a stickler for perfect imports, you start working on parsecvs and cvs2git, and make a new history every time you find another quirk... a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Robert Haas wrote: On Thu, May 28, 2009 at 8:59 AM, Aidan Van Dyk ai...@highrise.ca wrote: All that based on the assumption that when the project switches to git, they actually want all the CVS history in their official tree. Its certainly not necessary, and possibly not even desirable... PostgreSQL could just as easily to a linus style switch when they switch to git, and just import the latest release in each branch as the starting point for each branch. The git repository will have no history, and people can choose which history they want to graft in... CVSROOT can be made available as a historical download. That would suck for me. I use git log a lot to see how things have changed over time. Indeed. Losing the history is not an acceptable option. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Andrew Dunstan and...@dunslane.net writes: Robert Haas wrote: That would suck for me. I use git log a lot to see how things have changed over time. Indeed. Losing the history is not an acceptable option. I think the same. If git is not able to maintain our project history then it is not mature enough to be considered as our official VCS. This is not a negotiable requirement. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers