On May 14, 2013, at 10:58 AM, John Chilton wrote:

> Hey Nate,
> 
> On Tue, May 14, 2013 at 8:40 AM, Nate Coraor <n...@bx.psu.edu> wrote:
>> Hi John,
>> 
>> A few of us in the lab here at Penn State actually discussed automatic 
>> creation of virtualenvs for dependency installations a couple weeks ago.  
>> This was in the context of Bjoern's request for supporting compile-time 
>> dependencies.  I think it's a great idea, but there's a limitation that we'd 
>> need to account for.
>> 
>> If you're going to have frequently used and expensive to build libraries 
>> (e.g. numpy, R + rpy) in dependency-only repositories and then have your 
>> tool(s) depend on those repositories, the activate method won't work.  
>> virtualenvs cannot depend on other virtualenvs or be active at the same time 
>> as other virtualenvs.  We could work around it by setting PYTHONPATH in the 
>> dependencies' env.sh like we do now.  But then, other than making 
>> installation a bit easier (e.g. by allowing the use of pip), we have not 
>> gained much.
> 
> I don't know what to make of your response. It seems like a no, but
> the word no doesn't appear anywhere.

Sorry about being wishy-washy.  Unless anyone has any objections or can foresee 
other problems, I would say yes to this.  But I believe it should not break the 
concept of common-dependency-only repositories.

I'm pretty sure that as long as the process of creating a venv also adds the 
venv's site-packages to PYTHONPATH in that dependency's env.sh, the problem 
should be automatically dealt with.

> I don't know the particulars of rpy, but numpy installs fine via this
> method and I see no problem with each application having its own copy
> of numpy. I think relying on OS managed python packages for instance
> is something of a bad practice, when developing and distributing
> software I use virtualenvs for everything. I think that stand-alone
> python defined packages in the tool shed are directly analogous to OS
> managed packages.

Completely agree that we want to avoid OS-managed python packages.  I had, in 
the past, considered that for something like numpy, we ought to make it easy 
for an administrator to allow their own version of numpy to be used, since 
numpy can be linked against a number of optimized libraries for significant 
performance gains, and this generally won't happen for versions installed from 
the toolshed unless the system already has stuff like atlas-dev installed.  But 
I think we still allow admins that possibility with reasonable ease since 
dependency management in Galaxy is not a requirement.

What we do want to avoid is the situation where someone clones a new copy of 
Galaxy, wants to install 10 different tools that all depend on numpy, and has 
to wait an hour while 10 versions of numpy compile.  Add that in with other 
tools that will have a similar process (installing R + packages + rpy) plus the 
hope that down the line you'll be able to automatically maintain separate 
builds for remote resources that are not the same (i.e. multiple clusters with 
differing operating systems) and this hopefully highlights why I think reducing 
duplication where possible will be important.

> I also disagree we have not gained much. Setting up these repositories
> is a onerous, brittle process. This patch provides some high-level
> functionality for creating virtualenv's which negates the need for
> creating separate repositories per package.

This is a good point.  I probably also sold short the benefit of being able to 
install with pip, since this does indeed remove a similarly brittle and tedious 
step of downloading and installing modules.

--nate

> 
> -John
> 
>> 
>> --nate
>> 
>> On May 13, 2013, at 6:49 PM, John Chilton wrote:
>> 
>>> The proliferation of individual python package install definitions has
>>> continued and it has spread to some MSI managed tools. I worry about
>>> the tedium I will have to endure in the future if that becomes an
>>> established best practice :) so I have implemented the python version
>>> of what I had described in this thread:
>>> 
>>> As patch:
>>> https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b.patch
>>> Pretty version:
>>> https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b
>>> 
>>> I understand that there are going to be differing opinions as to
>>> whether this is the best way forward but I thought I would give my
>>> position a better chance of succeeding by providing an implementation.
>>> 
>>> Thanks for your consideration,
>>> -John
>>> 
>>> 
>>> On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock <p.j.a.c...@googlemail.com> 
>>> wrote:
>>>> On Tue, Apr 16, 2013 at 2:46 PM, John Chilton <chil...@msi.umn.edu> wrote:
>>>>> Stepping back a little, is the right way to address Python
>>>>> dependencies?
>>>> 
>>>> Looks like I missed this thread, hence:
>>>> http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html
>>>> 
>>>>> I was a big advocate for inter-repository dependencies,
>>>>> but I think taking it to the level of individual python packages might
>>>>> be going too far - my thought was they were needed for big 100Mb
>>>>> programs and stuff like that.
>>>> 
>>>> It should work but it is a lot of boilerplate for something which
>>>> should be more automated.
>>>> 
>>>>> At the Java jar/Python library/Ruby gem
>>>>> level I think using some of the platform specific packaging stuff to
>>>>> creating isolated environments for each program might be a better way
>>>>> to go.
>>>> 
>>>> I agree, the best way forward isn't obvious here, and it may make
>>>> sense to have tailored solutions for Python, Perl, Java, R, Ruby,
>>>> etc packages rather than the current Tool Shed package solution.
>>>> 
>>>> I've like to be able to just continue to write this kind of thing in my
>>>> tool XML files and have it actually taken care of (rather than ignored):
>>>> 
>>>> <requirements>
>>>>    <requirement type="python-module">numpy</requirement>
>>>>    <requirement type="python-module">Bio</requirement>
>>>> </requirements>
>>>> 
>>>> Adding a version key would be sensible, handling min/max etc
>>>> as per Python packaging norms.
>>>> 
>>>> Peter
>>> ___________________________________________________________
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>> http://lists.bx.psu.edu/
>>> 
>>> To search Galaxy mailing lists use the unified search at:
>>> http://galaxyproject.org/search/mailinglists/
>> 
> 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to