On Aug 26, 2013, at 11:59 AM, James Taylor wrote:

> On Mon, Aug 26, 2013 at 11:48 AM, John Chilton <chil...@msi.umn.edu> wrote:
> 
>> I think it is interesting that there was push back on providing
>> infrastructure (tool actions) for obtaining CBL from github and
>> performing installs based on it because it was not in the tool shed
>> and therefore less reproducible, but the team believes infrastructure
>> should be put in place to support pypi.
> 
> Well, first, I'm not sure what "the team" believes, I'm stating what I
> believe and engaging in a discussion with "the community". At some
> point this should evolve into what we are actually going to do and be
> codified in a spec as a Trello card, which is even then not set in
> stone.
> 
> Second, I'm not suggesting we depend on PyPI. The nice thing about the
> second format I proposed on galaxy-dev is that we can easily parse out
> the URL and archive that file. Then someday we could provide a
> fallback repository where if the PyPI URL no longer works we still
> have it stored.

I concur here, the experience and lessons learned by long-established package 
and dependency managers can provide some useful guidance for us going forward.  
APT has long relied on a model of archiving upstream source (as well as 
distro-generated binary (dpkg) packages), cataloging changes as a set of 
patches, and maintaining an understanding of installed files, even those meant 
to be user-edited.  I think there is a strong advantage for us doing this as 
well.

> 
>> I think we all value reproduciblity here, but we make different
>> calculations on what is reproducible. I think in terms of implementing
>> the ideas James has laid out or similar things I have proposed, it
>> might be beneficial to have some final answers on what external
>> resources are allowed - both for obtaining a Galaxy IUC gold star and
>> for the tool shed providing infrastructure to support their usage.
> 
> My focus is ensuring that we can archive things that pass through the
> toolshed. Tarballs from *anywhere* are easy enough to deal with.
> External version control repositories are a bit more challenging,
> especially when you are pulling just a particular file out, so that's
> where things got a little hinky for me.
> 
> Since we don't have the archival mechanism in place yet anyway, this
> is more a philosophical discussion and setting the right precedent.
> 
> And yes, keeping an archive of all the software in the world is a
> scary prospect, though compared to the amount of data we currently
> keep for people it is a blip. And I'm not sure how else we can really
> achieve the level of reproducibility we desire.

One additional step that will assist with long-term archival is generating 
static metadata and allowing the packaging and dependency systems to work 
outside of the Galaxy and Tool Shed applications.  A package metadata catalog 
and package format that provided descriptions of packages on a generic 
webserver and installable without a running Galaxy instance are components that 
I believe are fairly important.

As for user-edited files, the env.sh files, which are generated at install-time 
and then essentially untracked afterward scare me a bit.  I think it'd be 
useful for the packaging system have a tighter concept of environment 
management.

These are just my opinions, of course, and are going to be very APT/dpkg-biased 
simply due to my experience with and favor for Debian-based distros and 
dependency/package management, but I think there are useful concepts in this 
(and other systems) that we can draw from.

Along those lines, one more idea I had thrown out a while ago was coming up 
with a way to incorporate (or at least automatically process so that we can 
convert to our format) the build definitions for other systems like MacPorts, 
BSD ports/pkgsrc, dpkg, rpm, etc. so that we can leverage the existing rules 
for building across our target platforms that have already been worked out by 
other package maintainers with more time.  I think this aligns pretty well with 
Brad's thinking with CloudBioLinux, the difference in implementation being that 
we require multiple installable versions and platform independence.

I am a bit worried that as we go down the "repackage (almost) all dependencies" 
path (which I do think is the right path), we also run the risk of most of our 
packages being out of date.  That's almost a guaranteed outcome when even the 
huge packaging projects (Debian, Ubuntu, etc.) are rife with out-of-date 
packages.  So being able to incorporate upstream build definitions may help us 
package dependencies quickly.

--nate

> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  http://lists.bx.psu.edu/
> 
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to