Re: [galaxy-dev] set_environment_for_install problem, seeking for ideas

2013-11-06 Thread Björn Grüning
Hi John,

 Perl complicates things, TPP complicates things greatly.

So true, so true ...

 Bjoern, can I ask you if this hypothetical exhibits the same problem
 and can be used to reason about these things more easily and drive a
 test implementation.

Yes to both questions :)

 So right now, Galaxy has setup_virtualenv which will build and install
 Python packages in a virtual environment. However, some Python
 packages have C library dependencies that could prevent them from
 being installed.
 
 As a specific example - take PyTables (install via pip install
 tables) - which is a package for managing hierarchical datasets. If
 you try to install this with pip the way Galaxy will - it will fail if
 you do not have libhdf5 installed. So at a high-level, it would be
 nice if the tool shed had a libhdf5 definition and the dependencies
 file had some mechanism for declaring libhdf5 should be installed
 before a setup_virtualenv containing tables and its environment
 configured properly so the pip install succeeds (maybe just
 LD_LIBRARY_PATH needs to be set).

Indeed, same problem. I think we have this problem in every high-level
install methodm because set_environment_for_install is not allowed as
first action tag.

Can you think of any case where ENV vars can conflict with each other,
besides set_to, and assuming that we source every env.sh file by default
for every specified package.

Cheers,
Bjoern  

 -John
 
 
 On Tue, Nov 5, 2013 at 3:35 PM, Björn Grüning
 bjoern.gruen...@pharmazie.uni-freiburg.de wrote:
  Hi Greg,
 
 
  Hello Bjoern,
 
 
  On Nov 5, 2013, at 12:13 PM, Bjoern Gruening bjoern.gruen...@gmail.com
  wrote:
 
  Hi Greg,
 
  I'm right now in implementing a setup_perl_environment and stumbled about
  a tricky problem (that is not only related to perl but also for ruby, 
  python
  and R).
 
  The Problem:
  Lets assume a perl package (A) requires a xml parser written in C/C++ (Z).
  (Z) is a dependency that I can import but as far as I can see there is no
  way to call set_environment_for_install before setup_perl_environment,
  because setup_perl_environment defines an installation type.
 
 
  The above is fairly difficult to understand - can you provide an actual xml
  recipe that provides some context?
 
  Attached, please see a detailed explanation at the bottom.
 
 
 
 
  I would like to discuss that issue to get a few ideas. I can think about
  these solutions:
 
  - hackish solution:
  I can call install_environment.add_env_shell_file_paths( action_dict[
  'env_shell_file_paths' ] ) inside of the setup_*_environment path and 
  remove
  it from action type afterwards
 
  Again, it's difficult to provide good feedback on the above approach without
  an example recipe.  However, your hackish solution term probably means it
  is not ideal.  ;)
 
  :)
 
 
  - import all env.sh variables from every (package) definition. Regardless
  if set_environment_for_install is set or not.
 
 
  I don't think the above approach would be ideal.  It seems that it could
  fairly easily create conflicting environment variables within a single
  installation,
  so the latest value for an environment variable may not be what is expected.
 
  What means conflicting ENV vars, I only can imagine multiple set_to that
  overwrite each other. append_to and prepend_to should be save or?
 
 
 
  I must admit, I do not understand why set_environment_for_install is
  actually needed. I think we can assume that if I specify a
 
  package name=R_3_0_1 version=3.0.1
  repository name=package_r_3_0_1 owner=iuc
  prior_installation_required=True /
  /package
 
  I want the ENV vars sourced.
 
  Hmmm…so you are saying that you want the be able to define the above
  package tag set inside of an actions tag set and have everything work?
 
 
  Oh no, I mean just have it as package like it is but source the env.sh file
  for every other package set automatically. So you do not need
  set_environment_for_install.
 
  In the attached example:
  package name=expat version=2.1.0
  repository changeset_revision=8fc96166cddd
  name=package_expat_2_1 owner=iuc prior_installation_required=True
  toolshed=http://testtoolshed.g2.bx.psu.edu; /
  /package
 
  Is not imported with set_environment_for_install so its actually useless
  (now). But the env.sh needs to be sourced during the
  setup_perl_environment part.
 
  I think this may cause problems because  I believe the
  set_environment_for_install tag set restricts activity to only the time
  when a dependent
  repository will be using the defined environment from the required
  repository in order to compile one or more of it's dependencies.
  Eliminating this restriction may cause problems after compilation.  ALthough
  I cannot state this as a definite fact.
 
  Furthermore, that can solve an other issue: Namely, the need of ENV vars
  from a package definition in the same file. Lets imagine package P has
  dependency D and you want to download 

Re: [galaxy-dev] Supporting file sets for running a tool with multiple input files

2013-11-06 Thread Lukasse, Pieter
Hi Dannon,

Thanks for your reply. I've found a workaround by using the method 
Binary.register_sniffable_binary_format() . I discovered this workaround in a 
previous thread by John Chilton.

Attached the complete solution, just for the record.

Regards,

Pieter.

From: Dannon Baker [mailto:dannon.ba...@gmail.com]
Sent: maandag 4 november 2013 13:42
To: Lukasse, Pieter
Cc: galaxy-...@bx.psu.edu
Subject: Re: [galaxy-dev] Supporting file sets for running a tool with multiple 
input files

Hi Pieter,

We've worked out what we think is the right way to solve this for Galaxy and 
expect work to start soon.  See the trello card 
(https://trello.com/c/325AXIEr/613-tools-dataset-collections) for more details.

For your particular tool, the first workaround that comes to mind would be 
adding a new datatype, say, ZippedInputFiles in your toolshed repository that 
gets included and used by users, though I haven't actually tried that.  That 
said, I'd probably wait, this feature is high on our list of things to do next.

-Dannon

On Mon, Nov 4, 2013 at 5:44 AM, Lukasse, Pieter 
pieter.luka...@wur.nlmailto:pieter.luka...@wur.nl wrote:
Hi,

Is there any news regarding support for the following scenario in Galaxy:

-  User has N files which he would like to process with a Galaxy tool using 
the same parameters

-  User uploads a (.tar or .zip ?) file to Galaxy and selects this as the 
input file for the tool

-  Tool produces an output .zip file with the N result files

I know Galaxy-P had a workaround for this some time ago. But has this been 
solved in the main Galaxy code base?
Or are there any feasible workarounds that I can add to my Toolshed package to 
ensure my .zip file does not get unzipped at upload (default Galaxy behaviour)?

Thanks and regards,

Pieter Lukasse
Wageningen UR, Plant Research International
Departments of Bioscience and Bioinformatics
Wageningen Campus, Building 107, Droevendaalsesteeg 1, 6708 PB,
Wageningen, the Netherlands
+31-317480891tel:%2B31-317480891; skype: pieter.lukasse.wur
http://www.pri.wur.nlhttp://www.pri.wur.nl/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/



prims_masscomb_datatypes.py
Description: prims_masscomb_datatypes.py
?xml version=1.0?
datatypes
  datatype_files
	datatype_file name=prims_masscomb_datatypes.py/
  /datatype_files
  registration display_path=display_applications
datatype extension=prims.fileset.zip type=galaxy.datatypes.prims_masscomb_datatypes:FileSet display_in_upload=true/
  /registration
/datatypes___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Galaxy dropping jobs?

2013-11-06 Thread Nikolai Vazov

Hi again,

The loop (as explained below), did the job :)

Nikolay




Thank you very much, Nate,

1.
I have put a fix : a loop running the JobStatus check 5 times every 60 
secs and only then throwing an exception as the one below.
It happened that all connect failures happen at the same time - at 
slurm log rotation time at 3 am. Hopefully it helps :)


2.
Our slurm conf keeps the info about each job for 5 min. But looking at 
the code, it seems that in in the case you describe below, there will be 
an Invalid job exception leading to Job finished state. Am I wrong?


Anyway, I'll let you know if the loop does the job.

Thanks again,

Nikolay


On 2013-11-04 15:57, Nate Coraor wrote:

Hi Nikolay,
With slurm, the following change that I backed out should fix the 
problem:


https://bitbucket.org/galaxy/galaxy-central/diff/lib/galaxy/jobs/runners/drmaa.py?diff2=d46b64f12c52at=default
Although I do believe that if Galaxy doesn't read the completion
state before slurm forgets about the job (MinJobAge in slurm.conf),
this change could result in the job becoming permanently stuck in the
running state.
I should have some enhancements to the DRMAA runner for slurm coming
soon that would prevent this.
--nate
On Oct 31, 2013, at 5:27 AM, Nikolai Vazov wrote:


Hi,

I discovered a weird issue in the job behaviour : Galaxy is running a 
long job on a cluster (more than 24h), about 15 hours later it misses 
the connection with SLURM on the cluster and throws the following 
message :

[root@galaxy-prod01 galaxy-dist]# grep 3715200 paster.log
galaxy.jobs.runners.drmaa INFO 2013-10-30 10:51:54,149 (555) queued 
as 3715200
galaxy.jobs.runners.drmaa DEBUG 2013-10-30 10:51:55,149 (555/3715200) 
state change: job is queued and active
galaxy.jobs.runners.drmaa DEBUG 2013-10-30 10:52:13,516 (555/3715200) 
state change: job is running
galaxy.jobs.runners.drmaa INFO 2013-10-31 03:29:33,090 (555/3715200) 
job left DRM queue with following message: code 1: slurm_load_jobs 
error: Unable to contact slurm controller (connect failure),job_id: 
3715200
Is there a timeout in Galaxy for contacting slurm? Yet, the job is 
still running properly on the cluster ...

Thanks for help, it's really urgent :)
Nikolay

--
Nikolay Vazov, PhD
Research Computing Centre - http://hpc.uio.no
USIT, University of Oslo
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/


--
Nikolay Vazov, PhD
Research Computing Centre - http://hpc.uio.no
USIT, University of Oslo
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


[galaxy-dev] Dynamic tool configuration

2013-11-06 Thread Biobix Galaxy
Hi all,

We are working on a galaxy tool suite for data analysis.
We use a sqlite db to keep result data centralised between the different tools. 

At one point the tool configuration options of a tool should be dependent on 
the rows within a table of the sqlite db that is the output of the previous 
step. In other words, we would like to be able to set selectable parameters 
based on an underlying sql statement. If sql is not possible, an alternative 
would be to output the table content into a txt file and subsequently parse the 
txt file instead of the sqlite_db within the xml configuration file. 

When looking through the galaxy wiki and mailing lists I came across the code 
tag which would be ideal, we could run a python script in the background to 
fetch date from the sqlite table, however that function is deprecated. 

Does anybody know of other ways to achieve this?  

Thanks!

Jeroen

Ir. Jeroen Crappé, PhD Student
Lab of Bioinformatics and Computational Genomics (Biobix)
FBW - Ghent University


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] latest galaxy-central version

2013-11-06 Thread Robert Baertsch
Carl,
Thanks for building a great tool. 

Can you fix the library performance problem?

Steps to reproduce:
1. http://su2c-dev.ucsc.edu/library
2. open library TCGA Pancan. 

It takes more than 20 seconds!
-Robert

On Nov 5, 2013, at 1:29 PM, Carl Eberhard carlfeberh...@gmail.com wrote:

 Thanks for the thorough report, Robert.
 
 I've added some sanity checking to the panel in default:11240:c271cdb443c6
 
 My only guess as to why the panel javascript is trying to load at all in the 
 library section is a link to the controller/datasets methods (unhide, delete, 
 etc.) that still assume histories are in their own frames. Still 
 investigating.
 
 
 On Tue, Nov 5, 2013 at 12:06 PM, Robert Baertsch baert...@soe.ucsc.edu 
 wrote:
 Carl,
 Just to make sure I just deleted my tree and did a fresh checkout and tried a 
 third time. It happened all three times. I'm using a postgres database and 
 this time I started with a fresh config files and just changed the setting 
 below.
 
 I didn't see the javascript error in Chrome, perhaps it is some strangeness 
 in Firebug. I've seen strange errors before in Firebug.
 
 Attached are two screen grab of the lastest try before and after browser 
 refresh.
 
 changeset:   11227:151b7d3b2f1b
 branch:  stable
 tag: tip
 parent:  11219:5c789ab4144a
 user:Carl Eberhard carlfeberh...@gmail.com
 date:Tue Nov 05 11:33:06 2013 -0500
 summary: History panel: fix naming bug in purge async error handling
 
 -Robert
 
 29c29
  port = 8585
 ---
  #port = 8080
 34c34
  host = 0.0.0.0
 ---
  #host = 127.0.0.1
 96d95
  database_connection = postgresql://localhost/medbookgalaxycentral3
 542c541
  allow_library_path_paste = True
 ---
  #allow_library_path_paste = False
 599c598
  admin_users = baert...@soe.ucsc.edu
 ---
  #admin_users = None
 
 
 Screen Shot 2013-11-05 at 8.35.09 AM.png
 Screen Shot 2013-11-05 at 8.37.04 AM.png
 
 On Nov 5, 2013, at 8:21 AM, Carl Eberhard carlfeberh...@gmail.com wrote:
 
 Hello, Robert
 
 I'm having difficulty reproducing this with a fresh install of 
 galaxy_central:default:11226:c67b9518c1e0.
 
 Is this an intermittent error or does it happen reliably each time with the 
 steps above? Is it still the same javascript error you mentioned above?
 
 I'll investigate further.
 
 
 On Mon, Nov 4, 2013 at 7:29 PM, Robert Baertsch baert...@soe.ucsc.edu 
 wrote:
 I updated to the stable release and reproduced the issue.
 
 Step to reproduce
 1. go to admin
 2. Manage data libraries
 3. add dataset
 4. select Upload files from filesystem paths
 5. paste full path to any bam file.
 6. leave defaults: auto-detect and copy files into galaxy
 7. select role to restrict access
 8. click upload to start
 
 Screen shows strange formatting and Job is running for Bam file.
 
 python /data/galaxy-central/tools/data_source/upload.py /data/galaxy-central 
 /data/galaxy-central/database/tmp/tmpJoasJl 
 /data/galaxy-central/database/tmp/tmpzZO8t1 
 8877:/data/galaxy-central/database/job_working_directory/004/4548/dataset_8877_files:/data/galaxy-central/database/files/008/dataset_8877.dat
 
 If I do a firefox refresh and go back to the library, the formatting is 
 normal.
 
 I'm assuming the fix is to just render the page before waiting for the job 
 to complete.
 -Robert
 
 On Nov 4, 2013, at 12:45 PM, Martin Čech mar...@bx.psu.edu wrote:
 
 Hello,
 
 I have also seen some of these errors while developing libraries. The 
 library code is not in central however it might be related to recent 
 changes to the history panel. Carl Eberhard might now more, adding him to 
 the conversation.
 
 --Marten
 
 
 On Mon, Nov 4, 2013 at 2:45 PM, Robert Baertsch baert...@soe.ucsc.edu 
 wrote:
 It keeps doing posts, and I'm not seeing any new errors. 
 
 POST http://su2c-dev.ucsc.edu:8383/library_common/library_item_updates
 200 OK   121ms
 
 When I did a browser refresh, I got the following javascript error: (I 
 am logged in)
 
 Galaxy.currUser is undefined on Line 631 in history-panel.js
 
 
 When I opened the data library where the bam file was copying, everything 
 is rendered ok.  It seems the browser refresh fixed things.
 
 -Robert
 
 
 On Nov 4, 2013, at 11:14 AM, James Taylor ja...@jamestaylor.org wrote:
 
 Robert, I'm not sure what is going on here, other than that the javascript 
 that converts buttons into dropdown menus has not fired. 
 
 Are there any javascript errors? 
 
 Marten is working on rewriting libraries, and we will be eliminating the 
 progressive loading popupmenus for something much more efficient, but this 
 also might indicate a bug so let us know if there is anything odd in the 
 console. 
 
 
 --
 James Taylor, Associate Professor, Biology/CS, Emory University
 
 
 On Mon, Nov 4, 2013 at 1:58 PM, Robert Baertsch baert...@soe.ucsc.edu 
 wrote:
 HI James,
 I just pulled in the latest code to see how you changed from iframe to 
 divs. Very exciting update.
 
 
 I tried importing a bam file into the library using 

[galaxy-dev] Dynamic tool configuration

2013-11-06 Thread Biobix Galaxy
Hi all,

We are working on a galaxy tool suite for data analysis.
We use a sqlite db to keep result data centralised between the different tools. 

At one point the tool configuration options of a tool should be dependent on 
the rows within a table of the sqlite db that is the output of the previous 
step. In other words, we would like to be able to set selectable parameters 
based on an underlying sql statement. If sql is not possible, an alternative 
would be to output the table content into a txt file and subsequently parse the 
txt file instead of the sqlite_db within the xml configuration file. 

When looking through the galaxy wiki and mailing lists I came across the code 
tag which would be ideal, we could run a python script in the background to 
fetch date from the sqlite table, however that function is deprecated. 

Does anybody know of other ways to achieve this?  

Thanks!

Jeroen

Ir. Jeroen Crappé, PhD Student
Lab of Bioinformatics and Computational Genomics (Biobix)
FBW - Ghent University


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] set_environment_for_install problem, seeking for ideas

2013-11-06 Thread Dave Bouvier

Björn,

We're thinking that the following approach makes the most sense:

action type=setup_perl_environment OR action 
type=setup_r_environment OR action type=setup_ruby_environment OR 
action type=setup_virtualenv
repository changeset_revision=978287122b91 
name=package_perl_5_18 owner=iuc 
toolshed=http://testtoolshed.g2.bx.psu.edu;

package name=perl version=5.18.1 /
/repository
repository changeset_revision=8fc96166cddd 
name=package_expat_2_1 owner=iuc 
toolshed=http://testtoolshed.g2.bx.psu.edu;

package name=expat version=2.1 /
/repository
/action

For all repository tag sets contained within these setup_* tags, the 
repository's env.sh would be pulled in for the setup of the specified 
environment without requiring a set_environment_for_install action type.


Would this work for your use cases?

If so, can you confirm that this should be done for all four currently 
supported setup_* action types?


Based on your response, Greg or I will implement this as soon as possible.

   --Dave B.

On 11/06/2013 03:05 AM, Björn Grüning wrote:

Hi John,


Perl complicates things, TPP complicates things greatly.


So true, so true ...


Bjoern, can I ask you if this hypothetical exhibits the same problem
and can be used to reason about these things more easily and drive a
test implementation.


Yes to both questions :)


So right now, Galaxy has setup_virtualenv which will build and install
Python packages in a virtual environment. However, some Python
packages have C library dependencies that could prevent them from
being installed.

As a specific example - take PyTables (install via pip install
tables) - which is a package for managing hierarchical datasets. If
you try to install this with pip the way Galaxy will - it will fail if
you do not have libhdf5 installed. So at a high-level, it would be
nice if the tool shed had a libhdf5 definition and the dependencies
file had some mechanism for declaring libhdf5 should be installed
before a setup_virtualenv containing tables and its environment
configured properly so the pip install succeeds (maybe just
LD_LIBRARY_PATH needs to be set).


Indeed, same problem. I think we have this problem in every high-level
install methodm because set_environment_for_install is not allowed as
first action tag.

Can you think of any case where ENV vars can conflict with each other,
besides set_to, and assuming that we source every env.sh file by default
for every specified package.

Cheers,
Bjoern


-John


On Tue, Nov 5, 2013 at 3:35 PM, Björn Grüning
bjoern.gruen...@pharmazie.uni-freiburg.de wrote:

Hi Greg,


Hello Bjoern,


On Nov 5, 2013, at 12:13 PM, Bjoern Gruening bjoern.gruen...@gmail.com
wrote:


Hi Greg,

I'm right now in implementing a setup_perl_environment and stumbled about
a tricky problem (that is not only related to perl but also for ruby, python
and R).

The Problem:
Lets assume a perl package (A) requires a xml parser written in C/C++ (Z).
(Z) is a dependency that I can import but as far as I can see there is no
way to call set_environment_for_install before setup_perl_environment,
because setup_perl_environment defines an installation type.



The above is fairly difficult to understand - can you provide an actual xml
recipe that provides some context?

Attached, please see a detailed explanation at the bottom.





I would like to discuss that issue to get a few ideas. I can think about
these solutions:

- hackish solution:
I can call install_environment.add_env_shell_file_paths( action_dict[
'env_shell_file_paths' ] ) inside of the setup_*_environment path and remove
it from action type afterwards


Again, it's difficult to provide good feedback on the above approach without
an example recipe.  However, your hackish solution term probably means it
is not ideal.  ;)

:)



- import all env.sh variables from every (package) definition. Regardless
if set_environment_for_install is set or not.



I don't think the above approach would be ideal.  It seems that it could
fairly easily create conflicting environment variables within a single
installation,
so the latest value for an environment variable may not be what is expected.

What means conflicting ENV vars, I only can imagine multiple set_to that
overwrite each other. append_to and prepend_to should be save or?




I must admit, I do not understand why set_environment_for_install is
actually needed. I think we can assume that if I specify a

 package name=R_3_0_1 version=3.0.1
 repository name=package_r_3_0_1 owner=iuc
prior_installation_required=True /
 /package

I want the ENV vars sourced.


Hmmm…so you are saying that you want the be able to define the above
package tag set inside of an actions tag set and have everything work?


Oh no, I mean just have it as package like it is but source the env.sh file
for every other package set automatically. So you do not need
set_environment_for_install.

In the attached example:
 package name=expat 

[galaxy-dev] Tool Access Control

2013-11-06 Thread Eric Rasche
Howdy devs,

I've implemented some rather basic tool access control and am looking
for feedback on my implementation.

# Why

Our organisation wanted the ability to restrict tools to different
users/roles. As such I've implemented as an execute tag which can be
applied to either section or tools in the tool configuration file.

# Example galaxy-admin changes

For example:

  section execute=a...@b.co,b...@b.co id=EncodeTools name=ENCODE Tools
tool file=encode/gencode_partition.xml /
tool execute=b...@b.co file=encode/random_intervals.xml /
  /section

which would allow A and B to access gencode_parition, but only B would
be able to access random_intervals. To put it explicity

- by default, everyone can access all tools
- if section level permissions are set, then those are set as defaults
for all tools in that section
- if tool permissions are set, they will override the defaults.

# Pros and Cons

There are some good features

- non-accessible tools won't show up in the left hand panel, based on user
- non-accessible tools cannot be run or accessed.

There are some caveats however.

- existence of tools is not completely hidden.
- Labels are not hidden at all.
- workflows break completely if a tool is unavailable to a shared user
and the user copies+edits. They can be copied, and viewed (says tool not
found), but cannot be edited.

Tool names/id/version info can be found in the javascript object due to
the call to app.toolbox.tool_panel.items() in
templates/webapps/galaxy/workflow/editor.mako, as that returns the raw
tool list, rather than one that's filtered on whether or not the user
has access. I'm yet to figure out a clean fix for this. Additionally,
empty sections are still shown even if there aren't tools listed in them.

For a brief overview of my changes, please see the attached diff. (It's
missing one change because I wasn't being careful and started work on
multiple different features)

# Changeset overview

In brief, most of the changes consist of
- new method in model.User to check if an array of roles overlaps at all
with a user's roles
- modifications to appropriate files for reading in the new
tool_config.xml's options
- modification to get_tool to pass user information, as whether or not a
tool exists is now dependent on who is asking.

Please let me know if you have input on this before I create a pull
request on this feature.

# Fixes

I believe this will fix a number of previously brought up issues (at
least to my understanding of the issues listed)

+ https://trello.com/c/Zo7FAXlM/286-24-add-ability-to-password-secure-tools
+ (I saw some solution where they were adding _beta to tool names
which gave permissions to developers somewhere, but cannot find that now)



Cheers,
Eric Rasche

-- 
Eric Rasche
Programmer II
Center for Phage Technology
Texas AM University
College Station, TX 77843
404-692-2048
e...@tamu.edu
rasche.e...@yandex.ru
diff -r c458a0fe1ba8 lib/galaxy/model/__init__.py
--- a/lib/galaxy/model/__init__.py	Mon Nov 04 14:56:57 2013 -0500
+++ b/lib/galaxy/model/__init__.py	Wed Nov 06 11:18:10 2013 -0600
@@ -114,6 +114,19 @@
 roles.append( role )
 return roles
 
+def can_execute( self, permissions=None ):
+
+Check if any of a user's roles overlap with set permissions
+
+# If permissions variable is NOT set, then allow access (be friendly mode)
+if permissions is None:
+return True
+# Otherwise, we want to check and deny if they're not in the set
+for role in self.all_roles():
+if role.name in permissions:
+return True
+return False
+
 def get_disk_usage( self, nice_size=False ):
 
 Return byte count of disk space used by user or a human-readable
diff -r c458a0fe1ba8 lib/galaxy/tools/__init__.py
--- a/lib/galaxy/tools/__init__.py	Mon Nov 04 14:56:57 2013 -0500
+++ b/lib/galaxy/tools/__init__.py	Wed Nov 06 11:18:10 2013 -0600
@@ -195,12 +195,19 @@
 self.index += 1
 if parsing_shed_tool_conf:
 config_elems.append( elem )
+permissions = None
+try:
+permissions = elem.get('execute').split(',')
+log.debug('Execute section found: %s' % (':'.join(permissions)))
+except:
+log.debug(No execute section found)
+pass
 if elem.tag == 'tool':
-self.load_tool_tag_set( elem, self.tool_panel, self.integrated_tool_panel, tool_path, load_panel_dict, guid=elem.get( 'guid' ), index=index )
+self.load_tool_tag_set( elem, self.tool_panel, self.integrated_tool_panel, tool_path, load_panel_dict, guid=elem.get( 'guid' ), index=index, permissions=permissions )
 elif elem.tag == 'workflow':
 self.load_workflow_tag_set( elem, self.tool_panel, self.integrated_tool_panel, load_panel_dict, index=index )
 elif elem.tag == 

Re: [galaxy-dev] Tool Access Control

2013-11-06 Thread Nicola Soranzo
Hi Eric,
please also take a look at this mailing list thread:

http://dev.list.galaxyproject.org/pass-user-groups-to-dynamic-job-runner-td4661753.html

If you are interested in the is_user_in_group solution, I have a
slightly updated version which also uses roles instead of groups.

Nicola

Il giorno mer, 06/11/2013 alle 11.38 -0600, Eric Rasche ha scritto: 
 Howdy devs,
 
 I've implemented some rather basic tool access control and am looking
 for feedback on my implementation.
 
 # Why
 
 Our organisation wanted the ability to restrict tools to different
 users/roles. As such I've implemented as an execute tag which can be
 applied to either section or tools in the tool configuration file.
 
 # Example galaxy-admin changes
 
 For example:
 
   section execute=a...@b.co,b...@b.co id=EncodeTools name=ENCODE Tools
 tool file=encode/gencode_partition.xml /
 tool execute=b...@b.co file=encode/random_intervals.xml /
   /section
 
 which would allow A and B to access gencode_parition, but only B would
 be able to access random_intervals. To put it explicity
 
 - by default, everyone can access all tools
 - if section level permissions are set, then those are set as defaults
 for all tools in that section
 - if tool permissions are set, they will override the defaults.
 
 # Pros and Cons
 
 There are some good features
 
 - non-accessible tools won't show up in the left hand panel, based on user
 - non-accessible tools cannot be run or accessed.
 
 There are some caveats however.
 
 - existence of tools is not completely hidden.
 - Labels are not hidden at all.
 - workflows break completely if a tool is unavailable to a shared user
 and the user copies+edits. They can be copied, and viewed (says tool not
 found), but cannot be edited.
 
 Tool names/id/version info can be found in the javascript object due to
 the call to app.toolbox.tool_panel.items() in
 templates/webapps/galaxy/workflow/editor.mako, as that returns the raw
 tool list, rather than one that's filtered on whether or not the user
 has access. I'm yet to figure out a clean fix for this. Additionally,
 empty sections are still shown even if there aren't tools listed in them.
 
 For a brief overview of my changes, please see the attached diff. (It's
 missing one change because I wasn't being careful and started work on
 multiple different features)
 
 # Changeset overview
 
 In brief, most of the changes consist of
 - new method in model.User to check if an array of roles overlaps at all
 with a user's roles
 - modifications to appropriate files for reading in the new
 tool_config.xml's options
 - modification to get_tool to pass user information, as whether or not a
 tool exists is now dependent on who is asking.
 
 Please let me know if you have input on this before I create a pull
 request on this feature.
 
 # Fixes
 
 I believe this will fix a number of previously brought up issues (at
 least to my understanding of the issues listed)
 
 + https://trello.com/c/Zo7FAXlM/286-24-add-ability-to-password-secure-tools
 + (I saw some solution where they were adding _beta to tool names
 which gave permissions to developers somewhere, but cannot find that now)
 
 
 
 Cheers,
 Eric Rasche
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Tool Access Control

2013-11-06 Thread Eric Rasche
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Nicola,

Oh, excellent. I must've skipped over that, given the strange title of
the thread.

Your solution at the end of that thread is very promising, and certainly
handles failure MUCH better than mine does (i.e. raising exceptions and
not breaking a workflow if the user isn't permitted access.)

(Did you put it on the galaxy wiki anywhere? If it weren't for you
linking that, I never would've known about it and that's very useful info!)

In my organisation's case; if a user isn't allowed access to a given
tool, we believe that

- - galaxy has no reason to admit it exists
- - galaxy should not share default information about a tool

Which is a bit different from the case of having a license to use a
tool. For licensing issues, naturally it would be fine to say yes this
exists and if you can't run it, obtain a license.

For my org's case, we might want to store administrative tools (for
other services) in galaxy. It's a very convenient platform for more than
just bioinformatics and we have some non-technical people on staff who
occasionally need to pull various data sets from various services/make
database backups/etc. Students and clients who use our galaxy instance
don't need to know that these tools are available.

Thoughts?




On 11/06/2013 12:12 PM, Nicola Soranzo wrote:
 Hi Eric,
 please also take a look at this mailing list thread:
 
 http://dev.list.galaxyproject.org/pass-user-groups-to-dynamic-job-runner-td4661753.html
 
 If you are interested in the is_user_in_group solution, I have a
 slightly updated version which also uses roles instead of groups.
 
 Nicola
 
 Il giorno mer, 06/11/2013 alle 11.38 -0600, Eric Rasche ha scritto: 
 Howdy devs,

 I've implemented some rather basic tool access control and am looking
 for feedback on my implementation.

 # Why

 Our organisation wanted the ability to restrict tools to different
 users/roles. As such I've implemented as an execute tag which can be
 applied to either section or tools in the tool configuration file.

 # Example galaxy-admin changes

 For example:

   section execute=a...@b.co,b...@b.co id=EncodeTools name=ENCODE 
 Tools
 tool file=encode/gencode_partition.xml /
 tool execute=b...@b.co file=encode/random_intervals.xml /
   /section

 which would allow A and B to access gencode_parition, but only B would
 be able to access random_intervals. To put it explicity

 - by default, everyone can access all tools
 - if section level permissions are set, then those are set as defaults
 for all tools in that section
 - if tool permissions are set, they will override the defaults.

 # Pros and Cons

 There are some good features

 - non-accessible tools won't show up in the left hand panel, based on user
 - non-accessible tools cannot be run or accessed.

 There are some caveats however.

 - existence of tools is not completely hidden.
 - Labels are not hidden at all.
 - workflows break completely if a tool is unavailable to a shared user
 and the user copies+edits. They can be copied, and viewed (says tool not
 found), but cannot be edited.

 Tool names/id/version info can be found in the javascript object due to
 the call to app.toolbox.tool_panel.items() in
 templates/webapps/galaxy/workflow/editor.mako, as that returns the raw
 tool list, rather than one that's filtered on whether or not the user
 has access. I'm yet to figure out a clean fix for this. Additionally,
 empty sections are still shown even if there aren't tools listed in them.

 For a brief overview of my changes, please see the attached diff. (It's
 missing one change because I wasn't being careful and started work on
 multiple different features)

 # Changeset overview

 In brief, most of the changes consist of
 - new method in model.User to check if an array of roles overlaps at all
 with a user's roles
 - modifications to appropriate files for reading in the new
 tool_config.xml's options
 - modification to get_tool to pass user information, as whether or not a
 tool exists is now dependent on who is asking.

 Please let me know if you have input on this before I create a pull
 request on this feature.

 # Fixes

 I believe this will fix a number of previously brought up issues (at
 least to my understanding of the issues listed)

 + https://trello.com/c/Zo7FAXlM/286-24-add-ability-to-password-secure-tools
 + (I saw some solution where they were adding _beta to tool names
 which gave permissions to developers somewhere, but cannot find that now)



 Cheers,
 Eric Rasche

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
 
 

- -- 
Eric Rasche
Programmer II
Center for Phage Technology
Texas AM University

Re: [galaxy-dev] Question regarding walltime exceeded not being correctly reported via the WebUI

2013-11-06 Thread Daniel Patrick Sullivan
Hi, John,

Based on my initial testing the application of your patch 2 successfully
conveys the job walltime exceeded error to the web UI.  As far as I am
concerned you resolved this issue perfectly.  Thank you so much for your
help with this.  I will let you know if I experience any additional issues.

Respectfully yours,

Dan Sullivan


On Tue, Nov 5, 2013 at 8:52 PM, John Chilton chil...@msi.umn.edu wrote:

 On Tue, Nov 5, 2013 at 11:53 AM, Daniel Patrick Sullivan
 dansulli...@gmail.com wrote:
  Hi, John,
 
  Thank you for taking the time to help me look into this issue.  I have
  applied the patch you provided and confirmed that it appears to help
  remediate the problem (when a walltime is exceeded feedback is in fact
  provided via the Galaxy web UI; it no longer appears that jobs are
 running
  indefinitely).One thing I would like to note is that the error that
 is
  provided to the user is generic, i.e. the web UI reports An error
 occurred
  with this dataset: Job cannot be completed due to a cluster error, please
  retry it later.  So, the fact that a Walltime exceeded error actually
  occurred is not presented to the user (I am not sure if this is
 intentional
  or not).  Again, I appreciate you taking the time to verify and patch
 this
  issue.  I have attached a screenshot of the output for your review.

 Glad we are making progress - I have committed that previous patch to
 galaxy-central. Lets see if we cannot improve the user feedback so
 they know they hit the maximum walltime. Can you try this new patch?
 The message about the timeout was being built but it was not being
 logged not set as the error message on the dataset - this should
 resolve that.

 
  I am probably going to be testing Galaxy with Torque 4.2.5 in the coming
  weeks, I will let you know if I identify any additional problems.  Thank
 you
  so much have a wonderful day.

 You too, thanks for working with me on fixing this!

 -John

 
  Dan Sullivan
 
 
  On Tue, Nov 5, 2013 at 8:48 AM, John Chilton chil...@msi.umn.edu
 wrote:
 
  Hey Daniel,
 
  Thanks so much for the details problem report, it was very helpful.
  Reviewing the code there appears to be a bug in the PBS job runner -
  in some cases pbs_job_state.stop_job is never set but is attempted to
  be read. I don't have torque so I don't have a great test setup for
  this problem, any chance you can make the following changes for me and
  let me know if they work?
 
  Between the following two lines:
 
  log.error( '(%s/%s) PBS job failed: %s' % (
  galaxy_job_id, job_id, JOB_EXIT_STATUS.get( int( status.exit_status ),
  'Unknown error: %s' % status.exit_status ) ) )
  self.work_queue.put( ( self.fail_job, pbs_job_state
 )
  )
 
  log.error( '(%s/%s) PBS job failed: %s' % (
  galaxy_job_id, job_id, JOB_EXIT_STATUS.get( int( status.exit_status ),
  'Unknown error: %s' % status.exit_status ) ) )
  pbs_job_state.stop_job = False
  self.work_queue.put( ( self.fail_job, pbs_job_state
 )
  )
 
  And at the top of the file can you add a -11 option to the
  JOB_EXIT_STATUS to indicate a job timeout.
 
  I have attached a patch that would apply against the latest stable -
  it will probably will work against your branch as well.
 
  If you would rather not act as my QC layer, I can try to come up with
  a way to do some testing on my end :).
 
  Thanks again,
  -John
 
 
  On Mon, Nov 4, 2013 at 10:10 AM, Daniel Patrick Sullivan
  dansulli...@gmail.com wrote:
   Hi, Galaxy Developers,
  
   I have what I hops is somewhat of a basic question regarding Galaxy's
   interaction with a pbs job cluster and information reported via the
   webUI.
   Basically, in certain situations, the walltime of a specific job is
   exceeded.  This is of course to be expected and all fine and
   understandeable.
  
   My problem is that the information is not being relayed back to the
 end
   user
   via the Galaxy web UI, which causes confusion in our Galaxy user
   community.
   Basically the Torque scheduler generates the following message when a
   walltime is exceeded:
  
   11/04/2013
   08:39:45;000d;PBS_Server.30621;Job;163.sctest.cri.uchicago.edu
 ;preparing
   to
   send 'a' mail for job 163.sctest.cri.uchicago.edu to
   s.cri.gal...@crigalaxy-test.uchicago.edu (Job exceeded its walltime
   limit.
   Job was aborted
   11/04/2013
   08:39:45;0009;PBS_Server.30621;Job;163.sctest.cri.uchicago.edu;job
 exit
   status -11 handled
  
   Now, my problem is that this status -11 return code is not being
   correctly
   handled by Galaxy.  What happens is that Galaxy throws an exception,
   specificially:
  
   10.135.217.178 - - [04/Nov/2013:08:39:42 -0500] GET
   /api/histories/90240358ebde1489 HTTP/1.1 200 -
   https://crigalaxy-test.uchicago.edu/history; Mozilla/5.0 (X11;
 Linux
   x86_64; rv:23.0) Gecko/20100101 Firefox/23.0
   galaxy.jobs.runners.pbs DEBUG 2013-11-04 08:39:46,137
   

Re: [galaxy-dev] set_environment_for_install problem, seeking for ideas

2013-11-06 Thread Björn Grüning
Hi Dave,

 We're thinking that the following approach makes the most sense:
 
 action type=setup_perl_environment OR action 
 type=setup_r_environment OR action type=setup_ruby_environment OR 
 action type=setup_virtualenv
  repository changeset_revision=978287122b91 
 name=package_perl_5_18 owner=iuc 
 toolshed=http://testtoolshed.g2.bx.psu.edu;
  package name=perl version=5.18.1 /
  /repository
  repository changeset_revision=8fc96166cddd 
 name=package_expat_2_1 owner=iuc 
 toolshed=http://testtoolshed.g2.bx.psu.edu;
  package name=expat version=2.1 /
  /repository
 /action
 
 For all repository tag sets contained within these setup_* tags, the 
 repository's env.sh would be pulled in for the setup of the specified 
 environment without requiring a set_environment_for_install action type.
 
 Would this work for your use cases?

Yes, the first one. But its a little bit to verbose or? Include the perl
repository in a setup_perl environment should be implicit or? We can
assume that this need to be present.
Do you have example why sourcing every repository by default can be
harmful? It would make such an installation so much easier and less
complex.

Also that did not solve the second use case. If have two packages one
that is installing perl libraries and the second a binary that is
checking or that needs these perl libs.

 If so, can you confirm that this should be done for all four currently 
 supported setup_* action types?

I think it will solve my current issues.

 Based on your response, Greg or I will implement this as soon as possible.

Thanks!
Bjoern

 --Dave B.
 
 On 11/06/2013 03:05 AM, Björn Grüning wrote:
  Hi John,
 
  Perl complicates things, TPP complicates things greatly.
 
  So true, so true ...
 
  Bjoern, can I ask you if this hypothetical exhibits the same problem
  and can be used to reason about these things more easily and drive a
  test implementation.
 
  Yes to both questions :)
 
  So right now, Galaxy has setup_virtualenv which will build and install
  Python packages in a virtual environment. However, some Python
  packages have C library dependencies that could prevent them from
  being installed.
 
  As a specific example - take PyTables (install via pip install
  tables) - which is a package for managing hierarchical datasets. If
  you try to install this with pip the way Galaxy will - it will fail if
  you do not have libhdf5 installed. So at a high-level, it would be
  nice if the tool shed had a libhdf5 definition and the dependencies
  file had some mechanism for declaring libhdf5 should be installed
  before a setup_virtualenv containing tables and its environment
  configured properly so the pip install succeeds (maybe just
  LD_LIBRARY_PATH needs to be set).
 
  Indeed, same problem. I think we have this problem in every high-level
  install methodm because set_environment_for_install is not allowed as
  first action tag.
 
  Can you think of any case where ENV vars can conflict with each other,
  besides set_to, and assuming that we source every env.sh file by default
  for every specified package.
 
  Cheers,
  Bjoern
 
  -John
 
 
  On Tue, Nov 5, 2013 at 3:35 PM, Björn Grüning
  bjoern.gruen...@pharmazie.uni-freiburg.de wrote:
  Hi Greg,
 
 
  Hello Bjoern,
 
 
  On Nov 5, 2013, at 12:13 PM, Bjoern Gruening bjoern.gruen...@gmail.com
  wrote:
 
  Hi Greg,
 
  I'm right now in implementing a setup_perl_environment and stumbled about
  a tricky problem (that is not only related to perl but also for ruby, 
  python
  and R).
 
  The Problem:
  Lets assume a perl package (A) requires a xml parser written in C/C++ 
  (Z).
  (Z) is a dependency that I can import but as far as I can see there is no
  way to call set_environment_for_install before setup_perl_environment,
  because setup_perl_environment defines an installation type.
 
 
  The above is fairly difficult to understand - can you provide an actual 
  xml
  recipe that provides some context?
 
  Attached, please see a detailed explanation at the bottom.
 
 
 
 
  I would like to discuss that issue to get a few ideas. I can think about
  these solutions:
 
  - hackish solution:
  I can call install_environment.add_env_shell_file_paths( action_dict[
  'env_shell_file_paths' ] ) inside of the setup_*_environment path and 
  remove
  it from action type afterwards
 
  Again, it's difficult to provide good feedback on the above approach 
  without
  an example recipe.  However, your hackish solution term probably means 
  it
  is not ideal.  ;)
 
  :)
 
 
  - import all env.sh variables from every (package) definition. Regardless
  if set_environment_for_install is set or not.
 
 
  I don't think the above approach would be ideal.  It seems that it could
  fairly easily create conflicting environment variables within a single
  installation,
  so the latest value for an environment variable may not be what is 
  expected.
 
  What means conflicting ENV vars, I only can imagine 

[galaxy-dev] Trouble setting up a local instance of Galaxy

2013-11-06 Thread Hazard, E. Starr
Hello,
I am a new user of Galaxy.
I have a Galaxy instance running (sort of)  on a local research cluster. I 
issued the command hg update stable” today and it retrieved no files. SO 
presume I am up-to-date on the stable release. I start the instance as a user 
named “galaxy”.
Right now I am still running in “local” mode. I hope to migrate to DRMAA LSF 
eventually.
I have tried to set up ProFTP to upload files but have not succeeded so I use 
Galaxy Web-upload.
The upload was working nicely and I had  added  a couple of  new tools and they 
were working with the uploaded files.
Getting LSF/DRMAA to work was giving me fits and ultimately I deleted all my 
history files in an effort to start over.
Presently, files being uploaded appear in history as say job 1 ( in a new 
history) The job status in the history panel of the web GUI
changes from purple to yellow then then to red indicating some sort of error. 
There is no viewable error text captured, but I can click on the “eye” icon and 
see the
first megabyte of the data (for tiny files I can see the entire content and 
it’s intact). In the Galaxy file system, however, these files appear but have a 
different number , say, dataset_399.dat

On my system the uploaded files appear in /PATH/galaxy-dist/database/files/000

My first question is why is the data going into the “000” subdirectory and not 
one “owned’ by the user who is uploading?

My second question is why is the dataset being labeled as dataset_399.dat and 
not dataset_001.dat?

My third question is why do the uploaded files not appear as selectable options 
( say I have paired-end fastq files and tool wants to have choices about 
filenames)? This problem is present for programs that seek one input file as 
well.

 I presume that Galaxy is confused because the numbering in history is not the 
same as the numbering in the file upload archive (e.g. 
/PATH/galaxy-dist/database/files/000 in my case) so my last question is how do 
I “reset” my system to get the dataset and history numbers to be the same?

Here’s how I launch the Galaxy instance

 sh /shared/app/Galaxy/galaxy-dist/run.sh -v --daemon 
--pid-file=Nov6Localdaemon.pid.txt  --log-file=Nov6Local1639daemon.log.txt

Entering daemon mode

Here are the last lines of the log


Starting server in PID 26236.

serving on 0.0.0.0:8089 view at http://127.0.0.1:8089


galaxy.tools.actions.upload_common DEBUG 2013-11-06 16:48:49,624 Changing 
ownership of 
/shared/app/Galaxy/galaxy-dist/database/tmp/upload_file_data_QZGHm4 with: 
/usr/bin/sudo -E scripts/external_chown_script.py 
/shared/app/Galaxy/galaxy-dist/database/tmp/upload_file_data_QZGHm4 hazards 502

galaxy.tools.actions.upload_common WARNING 2013-11-06 16:48:49,750 Changing 
ownership of uploaded file 
/shared/app/Galaxy/galaxy-dist/database/tmp/upload_file_data_QZGHm4 failed: 
sudo: no tty present and no askpass program specified


galaxy.tools.actions.upload_common DEBUG 2013-11-06 16:48:49,751 Changing 
ownership of /shared/app/Galaxy/galaxy-dist/database/tmp/tmpEsyGfO with: 
/usr/bin/sudo -E scripts/external_chown_script.py 
/shared/app/Galaxy/galaxy-dist/database/tmp/tmpEsyGfO hazards 502

galaxy.tools.actions.upload_common WARNING 2013-11-06 16:48:49,775 Changing 
ownership of uploaded file 
/shared/app/Galaxy/galaxy-dist/database/tmp/tmpEsyGfO failed: sudo: no tty 
present and no askpass program specified


galaxy.tools.actions.upload_common INFO 2013-11-06 16:48:49,805 tool upload1 
created job id 170


galaxy.jobs DEBUG 2013-11-06 16:48:50,678 (170) Persisting job destination 
(destination id: local)

galaxy.jobs.handler INFO 2013-11-06 16:48:50,698 (170) Job dispatched

galaxy.jobs.runners.local DEBUG 2013-11-06 16:48:50,994 (170) executing: python 
/shared/app/Galaxy/galaxy-dist/tools/data_source/upload.py 
/depot/shared/app/Galaxy/galaxy-dist 
/shared/app/Galaxy/galaxy-dist/database/tmp/tmpTq22ot 
/shared/app/Galaxy/galaxy-dist/database/tmp/tmpEsyGfO 
406:/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/170/dataset_406_files:/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/170/galaxy_dataset_406.dat

galaxy.jobs DEBUG 2013-11-06 16:48:51,030 (170) Persisting job destination 
(destination id: local)

galaxy.jobs.runners.local DEBUG 2013-11-06 16:48:53,335 execution finished: 
python /shared/app/Galaxy/galaxy-dist/tools/data_source/upload.py 
/depot/shared/app/Galaxy/galaxy-dist 
/shared/app/Galaxy/galaxy-dist/database/tmp/tmpTq22ot 
/shared/app/Galaxy/galaxy-dist/database/tmp/tmpEsyGfO 
406:/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/170/dataset_406_files:/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/170/galaxy_dataset_406.dat

galaxy.jobs.runners.local DEBUG 2013-11-06 16:48:53,463 executing external 
set_meta script for job 170: 
/depot/shared/app/Galaxy/galaxy-dist/set_metadata.sh 
/shared/app/Galaxy/galaxy-dist/database/files 

Re: [galaxy-dev] set_environment_for_install problem, seeking for ideas

2013-11-06 Thread John Chilton
My two cents below.

On Wed, Nov 6, 2013 at 4:20 PM, Björn Grüning
bjoern.gruen...@pharmazie.uni-freiburg.de wrote:
 Hi Dave,

 We're thinking that the following approach makes the most sense:

 action type=setup_perl_environment OR action
 type=setup_r_environment OR action type=setup_ruby_environment OR
 action type=setup_virtualenv
  repository changeset_revision=978287122b91
 name=package_perl_5_18 owner=iuc
 toolshed=http://testtoolshed.g2.bx.psu.edu;
  package name=perl version=5.18.1 /
  /repository
  repository changeset_revision=8fc96166cddd
 name=package_expat_2_1 owner=iuc
 toolshed=http://testtoolshed.g2.bx.psu.edu;
  package name=expat version=2.1 /
  /repository
 /action

 For all repository tag sets contained within these setup_* tags, the
 repository's env.sh would be pulled in for the setup of the specified
 environment without requiring a set_environment_for_install action type.

 Would this work for your use cases?

 Yes, the first one. But its a little bit to verbose or? Include the perl
 repository in a setup_perl environment should be implicit or? We can
 assume that this need to be present.
 Do you have example why sourcing every repository by default can be
 harmful? It would make such an installation so much easier and less
 complex.

I am not sure I understand this paragraph - I have a vague sense I
agree but is there any chance you could rephrase this or elaborate?


 Also that did not solve the second use case. If have two packages one
 that is installing perl libraries and the second a binary that is
 checking or that needs these perl libs.

We have discussed off list in another thread. Just to summarize my
thoughts there - I think we should delay this or not make it a
priority if there are marginally acceptable workarounds that can be
found for the time being. Getting these four actions to work well as
sort of terminal endpoints and allow specification as tersely as
possible should be the primary goal for the time being. You will see
Perl or Python packages depend on C libraries 10 times more frequently
than you will find makefiles and C programs depend on complex perl or
python environments (correct me if I am wrong). Given that there is
already years worth of tool shed development outlined in existing
Trello cards - this is just how I would prioritize things (happy to be
overruled).


 If so, can you confirm that this should be done for all four currently
 supported setup_* action types?

I think it would be best to tackle setup_r_environment and
setup_ruby_environment first. setup_virtualenv cannot have nested
elements at this time - it is just assumed to be a bunch of text
(either a file containing the dependencies or a list of the
dependencies).

So setup_r_environment and setup_ruby_environment have the same structure:

setup_ruby_environment
  repository .. /
  package .. /
  package .. /
/setup_ruby_environment

... but setup_virtualenv is just

setup_virtualenvrequests=1.20
pycurl==1.3/setup_virtualenv

I have created a Trello card for this: https://trello.com/c/NsLJv9la
(and some other related stuff).

Once that is tackled though, it will make sense to allow
setup_virtualenv to utilize the same functionality.

Thanks all,
-John


 I think it will solve my current issues.

 Based on your response, Greg or I will implement this as soon as possible.

 Thanks!
 Bjoern

 --Dave B.

 On 11/06/2013 03:05 AM, Björn Grüning wrote:
  Hi John,
 
  Perl complicates things, TPP complicates things greatly.
 
  So true, so true ...
 
  Bjoern, can I ask you if this hypothetical exhibits the same problem
  and can be used to reason about these things more easily and drive a
  test implementation.
 
  Yes to both questions :)
 
  So right now, Galaxy has setup_virtualenv which will build and install
  Python packages in a virtual environment. However, some Python
  packages have C library dependencies that could prevent them from
  being installed.
 
  As a specific example - take PyTables (install via pip install
  tables) - which is a package for managing hierarchical datasets. If
  you try to install this with pip the way Galaxy will - it will fail if
  you do not have libhdf5 installed. So at a high-level, it would be
  nice if the tool shed had a libhdf5 definition and the dependencies
  file had some mechanism for declaring libhdf5 should be installed
  before a setup_virtualenv containing tables and its environment
  configured properly so the pip install succeeds (maybe just
  LD_LIBRARY_PATH needs to be set).
 
  Indeed, same problem. I think we have this problem in every high-level
  install methodm because set_environment_for_install is not allowed as
  first action tag.
 
  Can you think of any case where ENV vars can conflict with each other,
  besides set_to, and assuming that we source every env.sh file by default
  for every specified package.
 
  Cheers,
  Bjoern
 
  -John
 
 
  On Tue, Nov 5, 2013 at 3:35 PM, Björn Grüning
  

Re: [galaxy-dev] Question regarding walltime exceeded not being correctly reported via the WebUI

2013-11-06 Thread John Chilton
Thanks for the feedback, I have incorporated that previous patch into
galaxy-central.

As for the new warning message - I think this is fine. It is not
surprising the process doesn't have an exit code - the job script
itself never got to the point where that would have been written. If
there are no observable problems, I wouldn't worry.

http://stackoverflow.com/questions/234075/what-is-your-best-programmer-joke/235307#235307

Thanks again,
-John

On Wed, Nov 6, 2013 at 4:32 PM, Daniel Patrick Sullivan
dansulli...@gmail.com wrote:
 Hi, John,

 Actually, now that I am taking a look at this, I wanted to report something.
 I am actually not sure if this is a problem or not (based on what I can tell
 this is not causing any negative impact).  The Galaxy log data is actually
 reporting that the cleanup failed (for my testing I am using the upload1
 tool).

 galaxy.jobs.runners.pbs DEBUG 2013-11-06 16:04:05,150
 (2156/169.sctest.cri.uchicago.edu) PBS job state changed from R to C
 galaxy.jobs.runners.pbs ERROR 2013-11-06 16:04:05,152
 (2156/169.sctest.cri.uchicago.edu) PBS job failed: job maximum walltime
 exceeded
 galaxy.datatypes.metadata DEBUG 2013-11-06 16:04:05,389 Cleaning up external
 metadata files
 galaxy.datatypes.metadata DEBUG 2013-11-06 16:04:05,421 Failed to cleanup
 MetadataTempFile temp files from
 /group/galaxy_test/galaxy-dist/database/job_working_directory/002/2156/metadata_out_HistoryDatasetAssociation_381_8fH0ZU:
 No JSON object could be decoded: line 1 column 0 (char 0)
 galaxy.jobs.runners.pbs WARNING 2013-11-06 16:04:05,498 Unable to cleanup:
 [Errno 2] No such file or directory:
 '/group/galaxy_test/galaxy-dist/database/pbs/2156.ec'

 Like I said, as far as I can tell this isn't causing an problem (everything
 is being reported correctly via the web UI; this was my original problem and
 you definitely solved it).  I figured it wouldn't hurt to report the message
 above regardless.  Thank you again for all of your help.

 Respectfully yours,

 Dan Sullivan


 On Wed, Nov 6, 2013 at 4:10 PM, Daniel Patrick Sullivan
 dansulli...@gmail.com wrote:

 Hi, John,

 Based on my initial testing the application of your patch 2 successfully
 conveys the job walltime exceeded error to the web UI.  As far as I am
 concerned you resolved this issue perfectly.  Thank you so much for your
 help with this.  I will let you know if I experience any additional issues.

 Respectfully yours,

 Dan Sullivan


 On Tue, Nov 5, 2013 at 8:52 PM, John Chilton chil...@msi.umn.edu wrote:

 On Tue, Nov 5, 2013 at 11:53 AM, Daniel Patrick Sullivan
 dansulli...@gmail.com wrote:
  Hi, John,
 
  Thank you for taking the time to help me look into this issue.  I have
  applied the patch you provided and confirmed that it appears to help
  remediate the problem (when a walltime is exceeded feedback is in fact
  provided via the Galaxy web UI; it no longer appears that jobs are
  running
  indefinitely).One thing I would like to note is that the error that
  is
  provided to the user is generic, i.e. the web UI reports An error
  occurred
  with this dataset: Job cannot be completed due to a cluster error,
  please
  retry it later.  So, the fact that a Walltime exceeded error actually
  occurred is not presented to the user (I am not sure if this is
  intentional
  or not).  Again, I appreciate you taking the time to verify and patch
  this
  issue.  I have attached a screenshot of the output for your review.

 Glad we are making progress - I have committed that previous patch to
 galaxy-central. Lets see if we cannot improve the user feedback so
 they know they hit the maximum walltime. Can you try this new patch?
 The message about the timeout was being built but it was not being
 logged not set as the error message on the dataset - this should
 resolve that.

 
  I am probably going to be testing Galaxy with Torque 4.2.5 in the
  coming
  weeks, I will let you know if I identify any additional problems.
  Thank you
  so much have a wonderful day.

 You too, thanks for working with me on fixing this!

 -John

 
  Dan Sullivan
 
 
  On Tue, Nov 5, 2013 at 8:48 AM, John Chilton chil...@msi.umn.edu
  wrote:
 
  Hey Daniel,
 
  Thanks so much for the details problem report, it was very helpful.
  Reviewing the code there appears to be a bug in the PBS job runner -
  in some cases pbs_job_state.stop_job is never set but is attempted to
  be read. I don't have torque so I don't have a great test setup for
  this problem, any chance you can make the following changes for me and
  let me know if they work?
 
  Between the following two lines:
 
  log.error( '(%s/%s) PBS job failed: %s' % (
  galaxy_job_id, job_id, JOB_EXIT_STATUS.get( int( status.exit_status ),
  'Unknown error: %s' % status.exit_status ) ) )
  self.work_queue.put( ( self.fail_job,
  pbs_job_state )
  )
 
  log.error( '(%s/%s) PBS job failed: %s' % (
  galaxy_job_id, job_id, JOB_EXIT_STATUS.get( int(