Re: Libraries in the repo
On Fri, Aug 28, 2009 at 10:06:36AM +0100, Simon Marlow wrote: On 27/08/2009 11:25, José Pedro Magalhães wrote: Hello, On Wed, Aug 26, 2009 at 18:15, Simon Marlow marlo...@gmail.com mailto:marlo...@gmail.com wrote: * Boot libraries are of several kinds: - INDEPENDENT: Independently maintained (e.g. time, haskeline) - COUPLED: Tightly coupled to GHC, but used by others (base) - SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH) Does syb fall under INDEPENDENT or COUPLED? In any case, as the syb maintainer, I'd favor (1) too. I'd say at this stage it's INDEPENDENT. I think that once we move rebase3-compat (in the next few days, in the HEAD), the only thing that needs syb is extcore. Is that sufficient that it is worth keeping it as a core lib? Thanks Ian ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
I don't think the current situation has worked well, due to people forgetting to push/send to the upstream repos, but if we use a prehook script to stop people accidentally breaking the rules then (1) is probably the best solution for the HEAD. For stable branches, in order to avoid releasing with random darcs versions, I think that it would be best to use released tarballs (2). In order to migrate from GHC's bytestring fork, back to a repo compatible with the upstream repo, we can't just switch the repos, as you can't pull the new repo into an old checkout. There are some additional complications, such as people who pull from local repos rather than the darcs.haskell.org repos, which make this a little fiddly. I think we should: * switch HEAD to use a released tarball of bytestring * make darcs-all complain if libraries/bytestring is a checked out copy of the old repo (i.e. check to see if a particular patch ID is in it) * wait a couple of months * switch to using a darcs repo containing a subset of the upstream repo again Hopefully during the couple of months everyone will update and use all the repos they have lying around at least once, and thus will remove the old checkout. Thanks Ian ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
Duncan Coutts wrote: On Fri, 2009-08-28 at 11:42 +0100, Simon Marlow wrote: Can anyone think of a good reason not to upgrade darcs to 2.3.0 on darcs.haskell.org? I can think of 3 reasons to do so: - this script, for preventing accidental divergence from upstream - faster pushes, due to transfer-mode - hide those annoying Ignore-this: x messages By the way, people who regularly work with the ghc repos (at least on Linux) and who are thinking of upgrading to darcs-2.3.0 should heed this advice: Use darcs get to get your repos again. Not remotely, just locally. This switches them from darcs1 traditional format to darcs1 hashed format. If you do this, then darcs whatsnew gets ~4 times quicker. If you do not do this, then darcs whatsnew gets ~100 times slower. All times measured on Linux, local ext3 filesystem, ghc testsuite repo. All times are the second of two runs to allow for OS caching. The results may well be quite different on a different file systems, like Windows NTFS. yes - on Windows things got slower with 2.3.0, even with hashed repositories: http://bugs.darcs.net/issue1585 Another thing to watch out for is that hashed repositories will automatically cache patches and pristine files in ~/.darcs/cache by default. If you home directory is NFS-mounted, this can be a bad idea. Even if you're not using NFS, the fact that pristine files are shared between all repositories means that darcs sometimes is a lot slower than it needs to be, because the timestamps on the pristine files are out of sync with the local repository (you'll see long Reading pristine... messages from darcs). I raised this on the darcs-users list before the 2.3.0 release, but as far as I know it isn't planned to be fixed until 2.4. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
On 27/08/2009 11:37, Sittampalam, Ganesh wrote: Simon Marlow wrote: Simon Marlow wrote: I suggest if we stick with the independent repo approach that we have some automation to check that changes are indeed getting pushed upstream. [snip unhelpful suggestion from me] Yes, it tells you that you've screwed up, rather than telling you that you're about to screw up, which would be much more convenient. After you've screwed up it might be too late to fix it, due to conflicts with upstream. Can you arrange that the only way that patches can get into the branch is via darcs pull --intersectionupstream repo ? That's an interesting idea, I'd forgotten about --intersection. Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
On 27/08/2009 11:25, José Pedro Magalhães wrote: Hello, On Wed, Aug 26, 2009 at 18:15, Simon Marlow marlo...@gmail.com mailto:marlo...@gmail.com wrote: * Boot libraries are of several kinds: - INDEPENDENT: Independently maintained (e.g. time, haskeline) - COUPLED: Tightly coupled to GHC, but used by others (base) - SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH) Does syb fall under INDEPENDENT or COUPLED? In any case, as the syb maintainer, I'd favor (1) too. I'd say at this stage it's INDEPENDENT. Thanks for the feedback! Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
On 28/08/2009 10:05, Simon Marlow wrote: On 27/08/2009 11:37, Sittampalam, Ganesh wrote: Simon Marlow wrote: Simon Marlow wrote: I suggest if we stick with the independent repo approach that we have some automation to check that changes are indeed getting pushed upstream. [snip unhelpful suggestion from me] Yes, it tells you that you've screwed up, rather than telling you that you're about to screw up, which would be much more convenient. After you've screwed up it might be too late to fix it, due to conflicts with upstream. Can you arrange that the only way that patches can get into the branch is via darcs pull --intersectionupstream repo ? That's an interesting idea, I'd forgotten about --intersection. I have a script that works as a prehook (below). Unfortunately it doesn't work on darcs.haskell.org, I think because we only have darcs 1.0.9 there, and it is ignoring my prehook. Can anyone think of a good reason not to upgrade darcs to 2.3.0 on darcs.haskell.org? I can think of 3 reasons to do so: - this script, for preventing accidental divergence from upstream - faster pushes, due to transfer-mode - hide those annoying Ignore-this: x messages Cheers, Simon #!/bin/sh -e # checkupstream.sh # Only allow applying of patches that are also in this upstream repository: UPSTREAM=$1 # echo DARCS_PATCHES_XML = $DARCS_PATCHES_XML # Take $DARCS_PATCHES_XML and turn it into a list of patch hashes # suitable for looping over. hashes=`echo $DARCS_PATCHES_XML | sed 's|/patch|/patch\n|g' | sed -n '/hash/p' | sed s|^.*hash='\([^']*\)'.*$|\1|` # echo hashes: $hashes # For each patch, try pulling the patch from the upstream repo. If # the patch is not upstream, then fail. for p in $hashes; do if darcs pull --match=hash $p $UPSTREAM --xml --dry-run | grep $p /dev/null; then echo Patch $p is upstream; ok else echo Patch $p is not upstream! exit 1 fi done exit 0 ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
On Fri, 2009-08-28 at 11:42 +0100, Simon Marlow wrote: Can anyone think of a good reason not to upgrade darcs to 2.3.0 on darcs.haskell.org? I can think of 3 reasons to do so: - this script, for preventing accidental divergence from upstream - faster pushes, due to transfer-mode - hide those annoying Ignore-this: x messages By the way, people who regularly work with the ghc repos (at least on Linux) and who are thinking of upgrading to darcs-2.3.0 should heed this advice: Use darcs get to get your repos again. Not remotely, just locally. This switches them from darcs1 traditional format to darcs1 hashed format. If you do this, then darcs whatsnew gets ~4 times quicker. If you do not do this, then darcs whatsnew gets ~100 times slower. All times measured on Linux, local ext3 filesystem, ghc testsuite repo. All times are the second of two runs to allow for OS caching. The results may well be quite different on a different file systems, like Windows NTFS. Perhaps someone can suggest a way of doing this using the ./darcs-all script, that would not mess up what the default push/pull address is. Of course doing a get means the copy doesn't have the changes from the working directory. As far as I know darcs currently does not provide a way to do an inplace upgrade to the faster format. I've emailed the darcs list to raise this issue, that: 1. we get no warning or advice from darcs that we should switch format 2. that there is not a really convenient way of doing the switch Duncan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
On 26/08/2009 22:32, Duncan Coutts wrote: On Wed, 2009-08-26 at 17:15 +0100, Simon Marlow wrote: * Sometimes we want to make local modifications to INDEPENDENT libraries: - when GHC adds a new warning, we need to fix instances of the warning in the library to keep the GHC build warning-free. I have to say I think this one is rather dubious. What is wrong with just allowing warnings in these independent libs until they get fixed upstream? I know ghc's build system sets -Werror on them, but I don't see that as essential, especially for new warnings added in ghc head. True, we don't have to do that. Experience with Cabal and bytestring has shown that (1) can work for INDPENDENT libraries, but only if we're careful not to get too out-of-sync (as we did with bytestring). In the case of Cabal, we never have local changes in our branch that aren't in Cabal HEAD, and that works well. It requires an attentive maintainer to notice when people forget to push upstream (as they inevitably do on occasion). If it goes unnoticed for too long then ghc ends up with a forked repo that cannot sanely be synced from the upstream repo (like bytestring). I suggest if we stick with the independent repo approach that we have some automation to check that changes are indeed getting pushed upstream. Agreed. Can you think of an easy way to automate it? Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: Libraries in the repo
Simon Marlow wrote: I suggest if we stick with the independent repo approach that we have some automation to check that changes are indeed getting pushed upstream. Agreed. Can you think of an easy way to automate it? How about a cronjob that runs darcs send upstream-repo --to=some-list ? Ganesh === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html === ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
On 27/08/2009 11:18, Sittampalam, Ganesh wrote: Simon Marlow wrote: I suggest if we stick with the independent repo approach that we have some automation to check that changes are indeed getting pushed upstream. Agreed. Can you think of an easy way to automate it? How about a cronjob that runs darcs sendupstream-repo --to=some-list ? But the requirement we want is that patches are only pushed upstream, and never pushed to the branch first. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: Libraries in the repo
Simon Marlow wrote: On 27/08/2009 11:18, Sittampalam, Ganesh wrote: Simon Marlow wrote: I suggest if we stick with the independent repo approach that we have some automation to check that changes are indeed getting pushed upstream. Agreed. Can you think of an easy way to automate it? How about a cronjob that runs darcs sendupstream-repo --to=some-list ? But the requirement we want is that patches are only pushed upstream, and never pushed to the branch first. I might be getting confused about something, but I'd expect this command to send an email with any changes in the branch repo that aren't in upstream. In other words if you have some patches in the branch that aren't upstream, you'll find out and can remedy the situation. Ganesh === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html === ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
Hello, On Wed, Aug 26, 2009 at 18:15, Simon Marlow marlo...@gmail.com wrote: * Boot libraries are of several kinds: - INDEPENDENT: Independently maintained (e.g. time, haskeline) - COUPLED: Tightly coupled to GHC, but used by others (base) - SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH) Does syb fall under INDEPENDENT or COUPLED? In any case, as the syb maintainer, I'd favor (1) too. Cheers, Pedro ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
On 27/08/2009 00:55, Don Stewart wrote: marlowsd: Simon and I have been chatting about how we accommodate libraries in the GHC repository. After previous discussion on this list, GHC has been gradually migrating towards having snapshots of libraries kept as tarballs in the repo (currently only time falls into this category), but I don't think we really evaluated the alternatives properly. Here's an attempt to do that, and to my mind the outcome is different: we really want to stick to having all libraries as separate repositories. Background: * Scope: libraries that are needed to build GHC itself (aka boot libraries) * Boot libraries are of several kinds: - INDEPENDENT: Independently maintained (e.g. time, haskeline) - COUPLED: Tightly coupled to GHC, but used by others (base) - SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH) * Most boot libraries are INDEPENDENT. INDEPENDENT libraries have a master repository somewhere separate from the GHC repositories. * We need a branch of INDEPENDENT libraries, so that GHC builds don't break when the upstream package is modified. * Sometimes we want to make local modifications to INDEPENDENT libraries: - when GHC adds a new warning, we need to fix instances of the warning in the library to keep the GHC build warning-free. - to check that the changes work, before pushing upstream Choices for how we deal with libraries in the GHC repository: (+) is a pro, (-) is a con. (1) Check out the library from a separate repo, using the darcs-all script. The repo may either be a GHC-specific branch [INDEPENDENT], or the master copy of the package [SPECIFIC/COUPLED]. (+) we can treat every library this way, which gives a consistent story. Consistency is good for developers. (+) [INDEPENDENT] makes it easy to push changes upstream and sync with the upstream repo (unless upstream is using a different VCS). (-) [INDEPENDENT] we have to be careful not to let our branches get too far out of sync with upstream, and we must sync before releasing GHC. (2) Put a snapshot tarball of the library in libraries/tarballs, but allow you to checkout the darcs repo instead. (-) [SPECIFIC/COUPLED] this approach doesn't really make sense, because we expect to be modifying the library often. (-) updating the snapshot is awkward (-) workflow for making a change to the library is awkward: - checkout the darcs repo - make the change, validate it - push the change upstream (bump version?) - make a new snapshot tarball - commit the new snapshot to the GHC repo. (-) having tarballs in the repository is ugly (-) we have no revision history of the library (3) The GHC repo *itself* contains every library unpacked in the tree. You are allowed to check out the darcs repo instead. (+) atomic commits to both the library and GHC. (+) doing this consistently would allow us to remove darcs-all, giving a nice easy development workflow (-) [INDEPENDENT/COUPLED] still need a separate darcs repo. (-) [INDEPENDENT/COUPLED] pushing changes upstream is hard (-) [INDEPENDENT/COUPLED] manual syncing with upstream (-) [COUPLED] (particularly base) syncing with upstream would be painful. (3) works best for SPECIFIC libraries, whereas (1) works best for INDEPENDENT/COUPLED libraries. If we want to treat all libraries the same, then the only real option is (1). Experience with Cabal and bytestring has shown that (1) can work for INDPENDENT libraries, but only if we're careful not to get too out-of-sync (as we did with bytestring). In the case of Cabal, we never have local changes in our branch that aren't in Cabal HEAD, and that works well. Comments/thoughts? As author of bytestring, I'd prefer it if GHC used a released version direct from Hackage. I.e. GHC could snapshot a Hackage release, and get out of the business of cloning repos. Same for other INDPENDENTs. Are you saying you don't want us to have a GHC branch? Even if the branch just pulls from upstream and never has local changes? We can still use released versions only, the main point about having separate repos is that we have a consistent picture of libraries from GHC's side. For bytestring I imagine we can get away without making changes between releases, or at least ensuring our changes are sent upstream and we wait for a release before pulling. For other libraries, such as Cabal, this would be too onerous I think (Cabal is really COUPLED at the moment, much as we'd like it to be INDEPENDENT). Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org
Re: Libraries in the repo
On 27/08/2009 11:24, Sittampalam, Ganesh wrote: Simon Marlow wrote: On 27/08/2009 11:18, Sittampalam, Ganesh wrote: Simon Marlow wrote: I suggest if we stick with the independent repo approach that we have some automation to check that changes are indeed getting pushed upstream. Agreed. Can you think of an easy way to automate it? How about a cronjob that runs darcs sendupstream-repo --to=some-list ? But the requirement we want is that patches are only pushed upstream, and never pushed to the branch first. I might be getting confused about something, but I'd expect this command to send an email with any changes in the branch repo that aren't in upstream. In other words if you have some patches in the branch that aren't upstream, you'll find out and can remedy the situation. Yes, it tells you that you've screwed up, rather than telling you that you're about to screw up, which would be much more convenient. After you've screwed up it might be too late to fix it, due to conflicts with upstream. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: Libraries in the repo
Simon Marlow wrote: Simon Marlow wrote: I suggest if we stick with the independent repo approach that we have some automation to check that changes are indeed getting pushed upstream. [snip unhelpful suggestion from me] Yes, it tells you that you've screwed up, rather than telling you that you're about to screw up, which would be much more convenient. After you've screwed up it might be too late to fix it, due to conflicts with upstream. Can you arrange that the only way that patches can get into the branch is via darcs pull --intersection upstream repo ? Ganesh === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html === ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
Incedentally, the reason I'd like us to make a decision on this now is because I'm about to add two new boot libraries: - binary, to support a binary cache of GHC's package database (INDEPENDENT) - bin-package-db, the code to read and write the binary package database (SPECIFIC, shared by ghc and ghc-pkg). I don't much like bin-package-db being a separate package, given that it's only 100 lines or so in one module, but I don't see a good alternative. Cheers, Simon On 26/08/2009 17:15, Simon Marlow wrote: Simon and I have been chatting about how we accommodate libraries in the GHC repository. After previous discussion on this list, GHC has been gradually migrating towards having snapshots of libraries kept as tarballs in the repo (currently only time falls into this category), but I don't think we really evaluated the alternatives properly. Here's an attempt to do that, and to my mind the outcome is different: we really want to stick to having all libraries as separate repositories. Background: * Scope: libraries that are needed to build GHC itself (aka boot libraries) * Boot libraries are of several kinds: - INDEPENDENT: Independently maintained (e.g. time, haskeline) - COUPLED: Tightly coupled to GHC, but used by others (base) - SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH) * Most boot libraries are INDEPENDENT. INDEPENDENT libraries have a master repository somewhere separate from the GHC repositories. * We need a branch of INDEPENDENT libraries, so that GHC builds don't break when the upstream package is modified. * Sometimes we want to make local modifications to INDEPENDENT libraries: - when GHC adds a new warning, we need to fix instances of the warning in the library to keep the GHC build warning-free. - to check that the changes work, before pushing upstream Choices for how we deal with libraries in the GHC repository: (+) is a pro, (-) is a con. (1) Check out the library from a separate repo, using the darcs-all script. The repo may either be a GHC-specific branch [INDEPENDENT], or the master copy of the package [SPECIFIC/COUPLED]. (+) we can treat every library this way, which gives a consistent story. Consistency is good for developers. (+) [INDEPENDENT] makes it easy to push changes upstream and sync with the upstream repo (unless upstream is using a different VCS). (-) [INDEPENDENT] we have to be careful not to let our branches get too far out of sync with upstream, and we must sync before releasing GHC. (2) Put a snapshot tarball of the library in libraries/tarballs, but allow you to checkout the darcs repo instead. (-) [SPECIFIC/COUPLED] this approach doesn't really make sense, because we expect to be modifying the library often. (-) updating the snapshot is awkward (-) workflow for making a change to the library is awkward: - checkout the darcs repo - make the change, validate it - push the change upstream (bump version?) - make a new snapshot tarball - commit the new snapshot to the GHC repo. (-) having tarballs in the repository is ugly (-) we have no revision history of the library (3) The GHC repo *itself* contains every library unpacked in the tree. You are allowed to check out the darcs repo instead. (+) atomic commits to both the library and GHC. (+) doing this consistently would allow us to remove darcs-all, giving a nice easy development workflow (-) [INDEPENDENT/COUPLED] still need a separate darcs repo. (-) [INDEPENDENT/COUPLED] pushing changes upstream is hard (-) [INDEPENDENT/COUPLED] manual syncing with upstream (-) [COUPLED] (particularly base) syncing with upstream would be painful. (3) works best for SPECIFIC libraries, whereas (1) works best for INDEPENDENT/COUPLED libraries. If we want to treat all libraries the same, then the only real option is (1). Experience with Cabal and bytestring has shown that (1) can work for INDPENDENT libraries, but only if we're careful not to get too out-of-sync (as we did with bytestring). In the case of Cabal, we never have local changes in our branch that aren't in Cabal HEAD, and that works well. Comments/thoughts? Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
On Wed, Aug 26, 2009 at 9:15 AM, Simon Marlowmarlo...@gmail.com wrote: Simon and I have been chatting about how we accommodate libraries in the GHC repository. After previous discussion on this list, GHC has been gradually migrating towards having snapshots of libraries kept as tarballs in the repo (currently only time falls into this category), but I don't think we really evaluated the alternatives properly. Here's an attempt to do that, and to my mind the outcome is different: we really want to stick to having all libraries as separate repositories. Background: * Scope: libraries that are needed to build GHC itself (aka boot libraries) * Boot libraries are of several kinds: - INDEPENDENT: Independently maintained (e.g. time, haskeline) - COUPLED: Tightly coupled to GHC, but used by others (base) - SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH) Choices for how we deal with libraries in the GHC repository: (+) is a pro, (-) is a con. (1) Check out the library from a separate repo, using the darcs-all script. The repo may either be a GHC-specific branch [INDEPENDENT], or the master copy of the package [SPECIFIC/COUPLED]. (2) Put a snapshot tarball of the library in libraries/tarballs, but allow you to checkout the darcs repo instead. (3) The GHC repo *itself* contains every library unpacked in the tree. You are allowed to check out the darcs repo instead. (3) works best for SPECIFIC libraries, whereas (1) works best for INDEPENDENT/COUPLED libraries. If we want to treat all libraries the same, then the only real option is (1). Agreed. Also, it seems odd to have template-haskell be built-in yet something so fundamental as base be a separate repo. Experience with Cabal and bytestring has shown that (1) can work for INDPENDENT libraries, but only if we're careful not to get too out-of-sync (as we did with bytestring). In the case of Cabal, we never have local changes in our branch that aren't in Cabal HEAD, and that works well. Comments/thoughts? I also would rather stay with (1). Although using a DVCS allows greater freedom for developers, it also creates the need for more explicit rules of process. So I propose codifying on the wiki that for certain libraries, the local ghc repo - Never has patches which are not in the library's HEAD - Pulls patches sparingly, and usually only after a tagged release of the library. (the darcs-all script could help double-check that the former is being obeyed.) We package admins would need to agree to be responsive to patch submissions from GHC devels (or grant push access to them). Thanks for your very helpful analysis, -Judah ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
On Wed, 2009-08-26 at 17:15 +0100, Simon Marlow wrote: * Sometimes we want to make local modifications to INDEPENDENT libraries: - when GHC adds a new warning, we need to fix instances of the warning in the library to keep the GHC build warning-free. I have to say I think this one is rather dubious. What is wrong with just allowing warnings in these independent libs until they get fixed upstream? I know ghc's build system sets -Werror on them, but I don't see that as essential, especially for new warnings added in ghc head. Experience with Cabal and bytestring has shown that (1) can work for INDPENDENT libraries, but only if we're careful not to get too out-of-sync (as we did with bytestring). In the case of Cabal, we never have local changes in our branch that aren't in Cabal HEAD, and that works well. It requires an attentive maintainer to notice when people forget to push upstream (as they inevitably do on occasion). If it goes unnoticed for too long then ghc ends up with a forked repo that cannot sanely be synced from the upstream repo (like bytestring). I suggest if we stick with the independent repo approach that we have some automation to check that changes are indeed getting pushed upstream. Duncan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Libraries in the repo
marlowsd: Simon and I have been chatting about how we accommodate libraries in the GHC repository. After previous discussion on this list, GHC has been gradually migrating towards having snapshots of libraries kept as tarballs in the repo (currently only time falls into this category), but I don't think we really evaluated the alternatives properly. Here's an attempt to do that, and to my mind the outcome is different: we really want to stick to having all libraries as separate repositories. Background: * Scope: libraries that are needed to build GHC itself (aka boot libraries) * Boot libraries are of several kinds: - INDEPENDENT: Independently maintained (e.g. time, haskeline) - COUPLED: Tightly coupled to GHC, but used by others (base) - SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH) * Most boot libraries are INDEPENDENT. INDEPENDENT libraries have a master repository somewhere separate from the GHC repositories. * We need a branch of INDEPENDENT libraries, so that GHC builds don't break when the upstream package is modified. * Sometimes we want to make local modifications to INDEPENDENT libraries: - when GHC adds a new warning, we need to fix instances of the warning in the library to keep the GHC build warning-free. - to check that the changes work, before pushing upstream Choices for how we deal with libraries in the GHC repository: (+) is a pro, (-) is a con. (1) Check out the library from a separate repo, using the darcs-all script. The repo may either be a GHC-specific branch [INDEPENDENT], or the master copy of the package [SPECIFIC/COUPLED]. (+) we can treat every library this way, which gives a consistent story. Consistency is good for developers. (+) [INDEPENDENT] makes it easy to push changes upstream and sync with the upstream repo (unless upstream is using a different VCS). (-) [INDEPENDENT] we have to be careful not to let our branches get too far out of sync with upstream, and we must sync before releasing GHC. (2) Put a snapshot tarball of the library in libraries/tarballs, but allow you to checkout the darcs repo instead. (-) [SPECIFIC/COUPLED] this approach doesn't really make sense, because we expect to be modifying the library often. (-) updating the snapshot is awkward (-) workflow for making a change to the library is awkward: - checkout the darcs repo - make the change, validate it - push the change upstream (bump version?) - make a new snapshot tarball - commit the new snapshot to the GHC repo. (-) having tarballs in the repository is ugly (-) we have no revision history of the library (3) The GHC repo *itself* contains every library unpacked in the tree. You are allowed to check out the darcs repo instead. (+) atomic commits to both the library and GHC. (+) doing this consistently would allow us to remove darcs-all, giving a nice easy development workflow (-) [INDEPENDENT/COUPLED] still need a separate darcs repo. (-) [INDEPENDENT/COUPLED] pushing changes upstream is hard (-) [INDEPENDENT/COUPLED] manual syncing with upstream (-) [COUPLED] (particularly base) syncing with upstream would be painful. (3) works best for SPECIFIC libraries, whereas (1) works best for INDEPENDENT/COUPLED libraries. If we want to treat all libraries the same, then the only real option is (1). Experience with Cabal and bytestring has shown that (1) can work for INDPENDENT libraries, but only if we're careful not to get too out-of-sync (as we did with bytestring). In the case of Cabal, we never have local changes in our branch that aren't in Cabal HEAD, and that works well. Comments/thoughts? As author of bytestring, I'd prefer it if GHC used a released version direct from Hackage. I.e. GHC could snapshot a Hackage release, and get out of the business of cloning repos. Same for other INDPENDENTs. -- Don ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users