Re: Libraries in the repo

2009-08-27 Thread Simon Marlow

On 26/08/2009 22:32, Duncan Coutts wrote:

On Wed, 2009-08-26 at 17:15 +0100, Simon Marlow wrote:


   * Sometimes we want to make local modifications to INDEPENDENT
 libraries:
   - when GHC adds a new warning, we need to fix instances of the
 warning in the library to keep the GHC build warning-free.


I have to say I think this one is rather dubious. What is wrong with
just allowing warnings in these independent libs until they get fixed
upstream? I know ghc's build system sets -Werror on them, but I don't
see that as essential, especially for new warnings added in ghc head.


True, we don't have to do that.


Experience with Cabal and bytestring has shown that (1) can work for
INDPENDENT libraries, but only if we're careful not to get too
out-of-sync (as we did with bytestring).  In the case of Cabal, we never
have local changes in our branch that aren't in Cabal HEAD, and that
works well.


It requires an attentive maintainer to notice when people forget to push
upstream (as they inevitably do on occasion). If it goes unnoticed for
too long then ghc ends up with a forked repo that cannot sanely be
synced from the upstream repo (like bytestring).

I suggest if we stick with the independent repo approach that we have
some automation to check that changes are indeed getting pushed
upstream.


Agreed.  Can you think of an easy way to automate it?

Cheers,
Simon



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: Libraries in the repo

2009-08-27 Thread Sittampalam, Ganesh
Simon Marlow wrote:
 
 I suggest if we stick with the independent repo approach that we have
 some automation to check that changes are indeed getting pushed
 upstream.
 
 Agreed.  Can you think of an easy way to automate it?

How about a cronjob that runs

darcs send upstream-repo --to=some-list

?

Ganesh

=== 
 Please access the attached hyperlink for an important electronic 
communications disclaimer: 
 http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
 
=== 
 
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Libraries in the repo

2009-08-27 Thread Simon Marlow

On 27/08/2009 11:18, Sittampalam, Ganesh wrote:

Simon Marlow wrote:


I suggest if we stick with the independent repo approach that we have
some automation to check that changes are indeed getting pushed
upstream.


Agreed.  Can you think of an easy way to automate it?


How about a cronjob that runs

darcs sendupstream-repo  --to=some-list

?


But the requirement we want is that patches are only pushed upstream, 
and never pushed to the branch first.


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: Libraries in the repo

2009-08-27 Thread Sittampalam, Ganesh
Simon Marlow wrote:
 On 27/08/2009 11:18, Sittampalam, Ganesh wrote:
 Simon Marlow wrote:
 
 I suggest if we stick with the independent repo approach that we
 have some automation to check that changes are indeed getting
 pushed upstream.
 
 Agreed.  Can you think of an easy way to automate it?
 
 How about a cronjob that runs
 
 darcs sendupstream-repo  --to=some-list
 
 ?
 
 But the requirement we want is that patches are only pushed upstream,
 and never pushed to the branch first. 

I might be getting confused about something, but I'd expect this command
to send an email with any changes in the branch repo that aren't in
upstream. In other words if you have some patches in the branch that
aren't upstream, you'll find out and can remedy the situation.

Ganesh

=== 
 Please access the attached hyperlink for an important electronic 
communications disclaimer: 
 http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
 
=== 
 
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Libraries in the repo

2009-08-27 Thread José Pedro Magalhães
Hello,

On Wed, Aug 26, 2009 at 18:15, Simon Marlow marlo...@gmail.com wrote:


  * Boot libraries are of several kinds:
   - INDEPENDENT: Independently maintained (e.g. time, haskeline)
   - COUPLED: Tightly coupled to GHC, but used by others (base)
   - SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH)


Does syb fall under INDEPENDENT or COUPLED?

In any case, as the syb maintainer, I'd favor (1) too.


Cheers,
Pedro
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Libraries in the repo

2009-08-27 Thread Simon Marlow

On 27/08/2009 00:55, Don Stewart wrote:

marlowsd:

Simon and I have been chatting about how we accommodate libraries in the
GHC repository.  After previous discussion on this list, GHC has been
gradually migrating towards having snapshots of libraries kept as
tarballs in the repo (currently only time falls into this category),
but I don't think we really evaluated the alternatives properly.  Here's
an attempt to do that, and to my mind the outcome is different: we
really want to stick to having all libraries as separate repositories.

Background:
  * Scope: libraries that are needed to build GHC itself (aka boot
libraries)

  * Boot libraries are of several kinds:
- INDEPENDENT: Independently maintained (e.g. time, haskeline)
- COUPLED: Tightly coupled to GHC, but used by others (base)
- SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH)

  * Most boot libraries are INDEPENDENT.  INDEPENDENT libraries have a
master repository somewhere separate from the GHC repositories.

  * We need a branch of INDEPENDENT libraries, so that GHC builds don't
break when the upstream package is modified.

  * Sometimes we want to make local modifications to INDEPENDENT
libraries:
  - when GHC adds a new warning, we need to fix instances of the
warning in the library to keep the GHC build warning-free.
  - to check that the changes work, before pushing upstream


Choices for how we deal with libraries in the GHC repository: (+) is a
pro, (-) is a con.

   (1) Check out the library from a separate repo, using the darcs-all
   script.  The repo may either be a GHC-specific branch
   [INDEPENDENT], or the master copy of the package
   [SPECIFIC/COUPLED].

   (+) we can treat every library this way, which gives a
   consistent story.  Consistency is good for developers.
   (+) [INDEPENDENT] makes it easy to push changes upstream and sync
   with the upstream repo (unless upstream is using a different
   VCS).

   (-) [INDEPENDENT] we have to be careful not to let our branches
   get too far out of sync with upstream, and we must
   sync before releasing GHC.

   (2) Put a snapshot tarball of the library in libraries/tarballs,
   but allow you to checkout the darcs repo instead.

   (-) [SPECIFIC/COUPLED] this approach doesn't really make sense,
   because we expect to be modifying the library often.
   (-) updating the snapshot is awkward
   (-) workflow for making a change to the library is awkward:
   - checkout the darcs repo
   - make the change, validate it
   - push the change upstream (bump version?)
   - make a new snapshot tarball
   - commit the new snapshot to the GHC repo.
   (-) having tarballs in the repository is ugly
   (-) we have no revision history of the library

   (3) The GHC repo *itself* contains every library unpacked in the
   tree.  You are allowed to check out the darcs repo instead.

   (+) atomic commits to both the library and GHC.
   (+) doing this consistently would allow us to remove darcs-all,
   giving a nice easy development workflow

   (-) [INDEPENDENT/COUPLED] still need a separate darcs repo.
   (-) [INDEPENDENT/COUPLED] pushing changes upstream is hard
   (-) [INDEPENDENT/COUPLED] manual syncing with upstream
   (-) [COUPLED] (particularly base) syncing with
   upstream would be painful.


(3) works best for SPECIFIC libraries, whereas (1) works best for
INDEPENDENT/COUPLED libraries.  If we want to treat all libraries the
same, then the only real option is (1).

Experience with Cabal and bytestring has shown that (1) can work for
INDPENDENT libraries, but only if we're careful not to get too
out-of-sync (as we did with bytestring).  In the case of Cabal, we never
have local changes in our branch that aren't in Cabal HEAD, and that
works well.

Comments/thoughts?



As author of bytestring, I'd prefer it if GHC used a released version
direct from Hackage. I.e. GHC could snapshot a Hackage release, and get
out of the business of cloning repos. Same for other INDPENDENTs.


Are you saying you don't want us to have a GHC branch?  Even if the 
branch just pulls from upstream and never has local changes?  We can 
still use released versions only, the main point about having separate 
repos is that we have a consistent picture of libraries from GHC's side.


For bytestring I imagine we can get away without making changes between 
releases, or at least ensuring our changes are sent upstream and we wait 
for a release before pulling.  For other libraries, such as Cabal, this 
would be too onerous I think (Cabal is really COUPLED at the moment, 
much as we'd like it to be INDEPENDENT).


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org

Re: Libraries in the repo

2009-08-27 Thread Simon Marlow

On 27/08/2009 11:24, Sittampalam, Ganesh wrote:

Simon Marlow wrote:

On 27/08/2009 11:18, Sittampalam, Ganesh wrote:

Simon Marlow wrote:


I suggest if we stick with the independent repo approach that we
have some automation to check that changes are indeed getting
pushed upstream.


Agreed.  Can you think of an easy way to automate it?


How about a cronjob that runs

darcs sendupstream-repo   --to=some-list

?


But the requirement we want is that patches are only pushed upstream,
and never pushed to the branch first.


I might be getting confused about something, but I'd expect this command
to send an email with any changes in the branch repo that aren't in
upstream. In other words if you have some patches in the branch that
aren't upstream, you'll find out and can remedy the situation.


Yes, it tells you that you've screwed up, rather than telling you that 
you're about to screw up, which would be much more convenient.  After 
you've screwed up it might be too late to fix it, due to conflicts with 
upstream.


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: Libraries in the repo

2009-08-27 Thread Sittampalam, Ganesh
Simon Marlow wrote:
 Simon Marlow wrote:
 
 I suggest if we stick with the independent repo approach that we
 have some automation to check that changes are indeed getting
 pushed upstream.
[snip unhelpful suggestion from me]
 
 Yes, it tells you that you've screwed up, rather than telling you
 that you're about to screw up, which would be much more convenient. 
 After you've screwed up it might be too late to fix it, due to
 conflicts with upstream.   

Can you arrange that the only way that patches can get into the branch
is via darcs pull --intersection upstream repo ?

Ganesh

=== 
 Please access the attached hyperlink for an important electronic 
communications disclaimer: 
 http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
 
=== 
 
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Libraries in the repo

2009-08-27 Thread Simon Marlow
Incedentally, the reason I'd like us to make a decision on this now is 
because I'm about to add two new boot libraries:


  - binary, to support a binary cache of GHC's package database
(INDEPENDENT)

  - bin-package-db, the code to read and write the binary package
database (SPECIFIC, shared by ghc and ghc-pkg).

I don't much like bin-package-db being a separate package, given that 
it's only 100 lines or so in one module, but I don't see a good alternative.


Cheers,
Simon

On 26/08/2009 17:15, Simon Marlow wrote:

Simon and I have been chatting about how we accommodate libraries in the
GHC repository. After previous discussion on this list, GHC has been
gradually migrating towards having snapshots of libraries kept as
tarballs in the repo (currently only time falls into this category),
but I don't think we really evaluated the alternatives properly. Here's
an attempt to do that, and to my mind the outcome is different: we
really want to stick to having all libraries as separate repositories.

Background:
* Scope: libraries that are needed to build GHC itself (aka boot
libraries)

* Boot libraries are of several kinds:
- INDEPENDENT: Independently maintained (e.g. time, haskeline)
- COUPLED: Tightly coupled to GHC, but used by others (base)
- SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH)

* Most boot libraries are INDEPENDENT. INDEPENDENT libraries have a
master repository somewhere separate from the GHC repositories.

* We need a branch of INDEPENDENT libraries, so that GHC builds don't
break when the upstream package is modified.

* Sometimes we want to make local modifications to INDEPENDENT
libraries:
- when GHC adds a new warning, we need to fix instances of the
warning in the library to keep the GHC build warning-free.
- to check that the changes work, before pushing upstream


Choices for how we deal with libraries in the GHC repository: (+) is a
pro, (-) is a con.

(1) Check out the library from a separate repo, using the darcs-all
script. The repo may either be a GHC-specific branch
[INDEPENDENT], or the master copy of the package
[SPECIFIC/COUPLED].

(+) we can treat every library this way, which gives a
consistent story. Consistency is good for developers.
(+) [INDEPENDENT] makes it easy to push changes upstream and sync
with the upstream repo (unless upstream is using a different
VCS).

(-) [INDEPENDENT] we have to be careful not to let our branches
get too far out of sync with upstream, and we must
sync before releasing GHC.

(2) Put a snapshot tarball of the library in libraries/tarballs,
but allow you to checkout the darcs repo instead.

(-) [SPECIFIC/COUPLED] this approach doesn't really make sense,
because we expect to be modifying the library often.
(-) updating the snapshot is awkward
(-) workflow for making a change to the library is awkward:
- checkout the darcs repo
- make the change, validate it
- push the change upstream (bump version?)
- make a new snapshot tarball
- commit the new snapshot to the GHC repo.
(-) having tarballs in the repository is ugly
(-) we have no revision history of the library

(3) The GHC repo *itself* contains every library unpacked in the
tree. You are allowed to check out the darcs repo instead.

(+) atomic commits to both the library and GHC.
(+) doing this consistently would allow us to remove darcs-all,
giving a nice easy development workflow

(-) [INDEPENDENT/COUPLED] still need a separate darcs repo.
(-) [INDEPENDENT/COUPLED] pushing changes upstream is hard
(-) [INDEPENDENT/COUPLED] manual syncing with upstream
(-) [COUPLED] (particularly base) syncing with
upstream would be painful.


(3) works best for SPECIFIC libraries, whereas (1) works best for
INDEPENDENT/COUPLED libraries. If we want to treat all libraries the
same, then the only real option is (1).

Experience with Cabal and bytestring has shown that (1) can work for
INDPENDENT libraries, but only if we're careful not to get too
out-of-sync (as we did with bytestring). In the case of Cabal, we never
have local changes in our branch that aren't in Cabal HEAD, and that
works well.

Comments/thoughts?

Cheers,
Simon


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users