Re: [galaxy-dev] Galaxy for Natural Language Processing

2015-04-23 Thread Keith Suderman
Hi Björn,

On Apr 22, 2015, at 8:00 AM, Björn Grüning  wrote:

>> Do you have a beer preference?  
> 
> Outing: I'm one of the rare Germans that do not drink alcohol ;)

That must be awkward ;)


> This can be done via the ToolShed. I assume your custom command
> interpreter is not different than python or perl as interpreter? 

One difference is that my interpreter is a Java program. I likely should have 
mentioned that little detail... anyone wanting to install our tools would need 
my interpreter AND Java 1.7+ on their server.  Hopefully that is not an 
insurmountable problem. 

However, does the bioinformatics community really want a bunch of NLP tools in 
their tool shed?


>> The editor also allows me to select output formats that have no converters 
>> defined, 
>> so either I am still missing something or the workflow editor does not do 
>> what I want.  I can convert formats through the "Edit attributes" menu, 
>> so Galaxy knows about my converters and how to invoke them, just not in the 
>> workflow editor.
> 
> Ok, I think I understood. Not sure if this is the best way but put your
> converters into the toolbox.

By the "toolbox" do you mean adding my converters to the tool_conf.xml file so 
they are available on the Tools menu?  I have done that and I can add the 
converters to a workflow manually. I was just hoping the workflow editor could 
detect when it could perform the conversion and insert the converters as 
needed; it seems this is not possible.


>> Do you have more pointers to tools that use the attached metadata?  In 
>> particular tools that set metadata that is consumed by subsequent tools.
> 
> The sqlite datatype should be a good example. Keep in mind, we can not
> set metadata from inside a tool.
> Imho this is not possible, yet, but a
> common requested feature. But you can "calculate" such metadata inside
> your datatype definition and set it implicitly after your tool is finished.

Setting the metadata in the tool wrapper is fine, and after grepping through 
some of the other wrappers I think I need something like:

  
  


True


  

  
  

metadata.tokens is not None

  

That is, the input validator simply checks if some value has been set in the 
metadata, and the output sets a value in the metadata.  The above does not 
work, but at least Galaxy stopped complaining about the tool XML with this.  
However, the documentation for   and  does not match 
up with what existing wrappers (in the dev branch) are doing so I am having 
problems with the exact syntax.


>> Do you have pointers to any documentation on data collections?  My searches 
>> haven't turned up much but tantalizing references [1], 
>> and my experiments trying to return a data collection from a tool have been 
>> unsuccessful.
> 
> https://wiki.galaxyproject.org/Histories?highlight=%28collection%29#Dataset_Collections
> 
> https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax ->
> data_collection
> 
> And have a look at:
> https://github.com/galaxyproject/galaxy/tree/dev/test/functional/tools

Success!  I was running the code from master, so I suspect that was part of my 
problem.  

However, my browser is still complaining about long running scripts.

> A script on this page may be busy, or it may have stopped responding. You can 
> stop the script now, open the script in the debugger, or let the script 
> continue.
> 
> Script: http://localhost:8000/static/s…/jquery/jquery.js?v=1429811186:2


I accidentally left visible="true" when creating the dataset collection and 
ended up with +1500 items in my history; the above message kept popping up 
while the workflow was running (at least until I selected "Don't show this 
again").  Deleting +1500 datasets from the history is also very slow, but that 
is a different issue. On the bright side, at least I had +1500 items in the 
history to delete.


>> I have also been trying John Chiltons blend4j and managed to populate a data 
>> library, and this is almost what I want, 
>> but I would like a tool that can be included in a workflow as the data from 
>> the library may not necessarily be the first step.   
>> I have no problem calling the Galaxy API from my tools, except that between 
>> the bioinformatics lingo and Python (I'm a Java programmer) it's slow going.
> 
> If possible at all you should avoid this, but as last resort probably an
> option.

Out of curiosity, what exactly should I avoid; making calls to the Galaxy 
REST/API from inside a tool, using blend4j, or populating a data library from 
inside a tool?  I can see myself doing all three in the near future.

Cheers,
Keith

> 
> Ciao,
> Bjoern
> 
>> Cheers,
>> Keith
>> 
>> REFERENCES
>> 
>> 1. https://wiki.galaxyproject.org/Learn/API#Collections
>> 2. 
>> https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_of_Output_datasets_cannot_be_determined_until_tool_run
>> 
>>> 
> Oh yes this is supported out of the box!
>

Re: [galaxy-dev] Shared Data Library issues with permissions

2015-04-23 Thread Martin Čech
Hi Carrie,

in what step did you actually add the datasets to the folder? Also how did
you add it?

Thanks

Martin

On Thu, Apr 23, 2015 at 6:27 PM Ganote, Carrie L  wrote:

>  Hi Devs,
>
> Maybe I'm missing something, but I'm having trouble getting permissions
> right on the Data Libraries.
> I can reproduce this:
> As an admin, I go to Manage data libraries.
> On top right, I click Create new data library. Let's call it test.
> I Add a folder in test, called "more testing".
> I check the permissions of the library in Library Actions->Edit
> permissions and make sure that all roles associated are blank.
> I check permissions in more testing to make sure they are all blank.
> I click on Shared Data -> Data Libraries to see if I can view the files
> with the user perspective.
> I click on the Test library I just created:
> Data Library “Test” The data library 'Test' does not contain any datasets
> that you can access.
>
> I have a fairly recent (April 2015) update to Galaxy.
>
> Any advice? I first started having trouble getting access for a user to
> their own library; when I associated their role with their own folder, they
> still couldn't see anything they could access. I should at least get an
> empty folder.
>
> Sincerely,
>
> Carrie Ganote
>  ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Serving a galaxy instance over Apache

2015-04-23 Thread Asma Riyaz
Hi,

I am setting up apache over a remote server on which a virtual apache was
previously installed under /seq/../iwww. Set up of this directory looks
something like this:

/seq/../iwww
-->apache/
-->sites-available/
 -->defaults
-->sites-enabled/
-->myPublic.conf (this is equivalent to httpd.conf)

As John suggested to start with an easier apache config:

I have done the following:-
1) In defaults I have this:



.

.

RewriteEngine on

RewriteRule ^(.*) [http://[IP address]:8080/]$1 [P]
2) In myPublic.conf I have these two directories:



Options +ExecCGI

AllowOverride All

DirectoryIndex index.html index.cgi

AddHandler cgi-script .cgi

Order Allow,Deny

Allow from All



Deny from all







AllowOverride All

DirectoryIndex static/welcome.html

Order Allow,Deny

Allow from All



  Deny from all


At this point I am not sure in which direction should I go to get this
running off of apache.

Kindly note: I am able to start an apache instance by executing sh run.sh and
the galaxy instance is up and running through the IP address.

Please advice.

Thank you,
Asma






On Tue, Apr 21, 2015 at 9:57 AM, John Chilton  wrote:

> This configuration looks correct - are you sure the correct properties
> are set in config/galaxy.ini:
>
> You need this section:
>
> [filter:proxy-prefix]
> use = egg:PasteDeploy#prefix
> prefix = /galaxy
>
> And [app:main] needs following properties:
>
> filter-with = proxy-prefix
> cookie_path = /galaxy
>
> I would also make sure your Galaxy is up and running on port 8080 -
> maybe by running a command such as
>
> wget http://127.0.0.1:8080
>
> from the server to make sure you get a response from Galaxy.
>
> If none of that leads to clues - it might be worth starting with the
> easier apache proxy configuration (serving on / instead of /galaxy)
> and getting that working first to rule out Galaxy and very basic
> Apache configuration problems.
>
> -John
>
>
>
>
>
>
>
> On Fri, Apr 17, 2015 at 11:50 AM, Asma Riyaz 
> wrote:
> > Hi,
> >
> > I have been through previous issues regarding this issue. So far, I have
> > included the rewrite Engine logic written in
> > /etc/apache2/sites-available/defaults - looks like so:
> >
> > 
> >
> > .
> >
> > .
> >
> > RewriteEngine on
> >
> > RewriteRule ^/galaxy$ /galaxy/ [R]
> >
> > RewriteRule ^/galaxy/static/style/(.*)
> > /seq/SOFTWARE/galaxy/static/june_2007_style/blue/$1 [L]
> >
> > RewriteRule ^/galaxy/static/scripts/(.*)
> > /seq/SOFTWARE/galaxy/static/scripts/packed/$1 [L]
> >
> > RewriteRule ^/galaxy/static/(.*) /seq/SOFTWARE/galaxy/static/$1 [L]
> >
> > RewriteRule ^/galaxy/favicon.ico /seq/SOFTWARE/galaxy/static/favicon.ico
> [L]
> >
> > RewriteRule ^/galaxy/robots.txt /seq/SOFTWARE/galaxy/static/robots.txt
> [L]
> >
> > RewriteRule ^/galaxy(.*) http://[my IP address]/$1 [P]
> >
> > I restarted apache after this and accessed : http://[my IP
> address]/galaxy
> > but no webpage is found. Any advice on how I can proceed to configure the
> > apache correctly?
> >
> > Thank you,
> >
> > Asma
> >
> >
> >
> >
> >
> > ___
> > Please keep all replies on the list by using "reply all"
> > in your mail client.  To manage your subscriptions to this
> > and other Galaxy lists, please use the interface at:
> >   https://lists.galaxyproject.org/
> >
> > To search Galaxy mailing lists use the unified search at:
> >   http://galaxyproject.org/search/mailinglists/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Shared Data Library issues with permissions

2015-04-23 Thread Ganote, Carrie L
Hi Devs,

Maybe I'm missing something, but I'm having trouble getting permissions right 
on the Data Libraries.
I can reproduce this:
As an admin, I go to Manage data libraries.
On top right, I click Create new data library. Let's call it test.
I Add a folder in test, called "more testing".
I check the permissions of the library in Library Actions->Edit permissions and 
make sure that all roles associated are blank.
I check permissions in more testing to make sure they are all blank.
I click on Shared Data -> Data Libraries to see if I can view the files with 
the user perspective.
I click on the Test library I just created:
Data Library “Test”
The data library 'Test' does not contain any datasets that you can access.

I have a fairly recent (April 2015) update to Galaxy.

Any advice? I first started having trouble getting access for a user to their 
own library; when I associated their role with their own folder, they still 
couldn't see anything they could access. I should at least get an empty folder.

Sincerely,

Carrie Ganote
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Data Libraries

2015-04-23 Thread Martin Čech
Hello Ryan,

it is currently not possible to give users rights to create data libraries.
However you can create the libraries for them and give them rights to
create and manage subfolders (doing the same things but one level below).
Would that address your goal?

Martin

On Thu, Apr 23, 2015 at 4:33 PM Ryan G  wrote:

> Hi all - We are trying to use Galaxy as a mechanism for our sequencing lab
> to create data libraries for data they generate.  I noticed in the docs,
> only Admins are able to create data libraries.  Is there a way to change
> this?  I'd like to give specific users in our group this ability without
> giving them admin rights.
>
> Ryan
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Data Libraries

2015-04-23 Thread Ryan G
Hi all - We are trying to use Galaxy as a mechanism for our sequencing lab
to create data libraries for data they generate.  I noticed in the docs,
only Admins are able to create data libraries.  Is there a way to change
this?  I'd like to give specific users in our group this ability without
giving them admin rights.

Ryan
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] InvalidRequestError: kombu.transport.sqlalchemy.Queue

2015-04-23 Thread Asma Riyaz
Hi,

I am receiving this traceback when I do restart Apache and not sure what to
make of it:

Exception in thread WorkflowRequestMonitor.monitor_thread:

Traceback (most recent call last):

  File
"/broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/lib/python2.7/threading.py",
line 530, in __bootstrap_inner

self.run()

  File
"/broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/lib/python2.7/threading.py",
line 483, in run

self.__target(*self.__args, **self.__kwargs)

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/lib/galaxy/workflow/scheduling_manager.py",
line 158, in __monitor

self.__schedule( workflow_scheduler_id, workflow_scheduler )

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/lib/galaxy/workflow/scheduling_manager.py",
line 162, in __schedule

invocation_ids = self.__active_invocation_ids( workflow_scheduler_id )

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/lib/galaxy/workflow/scheduling_manager.py",
line 191, in __active_invocation_ids

handler=handler,

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/lib/galaxy/model/__init__.py",
line 3254, in poll_active_workflow_ids

WorkflowInvocation

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/scoping.py",
line 150, in do

return getattr(self.registry(), name)(*args, **kwargs)

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/session.py",
line 1165, in query

return self._query_cls(entities, self, **kwargs)

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/query.py",
line 108, in __init__

self._set_entities(entities)

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/query.py",
line 118, in _set_entities

self._set_entity_selectables(self._entities)

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/query.py",
line 151, in _set_entity_selectables

ent.setup_entity(*d[entity])

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/query.py",
line 3036, in setup_entity

self._with_polymorphic = ext_info.with_polymorphic_mappers

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/util/langhelpers.py",
line 725, in __get__

obj.__dict__[self.__name__] = result = self.fget(obj)

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/mapper.py",
line 1877, in _with_polymorphic_mappers

configure_mappers()

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/mapper.py",
line 2589, in configure_mappers

mapper._post_configure_properties()

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/mapper.py",
line 1694, in _post_configure_properties

prop.init()

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/interfaces.py",
line 144, in init

self.do_init()

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/relationships.py",
line 1549, in do_init

self._process_dependent_arguments()

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/relationships.py",
line 1605, in _process_dependent_arguments

self.target = self.mapper.mapped_table

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/util/langhelpers.py",
line 725, in __get__

obj.__dict__[self.__name__] = result = self.fget(obj)

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/relationships.py",
line 1522, in mapper

argument = self.argument()

  File
"/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/ext/declarative/clsregistry.py",
line 283, in __call__

(self.prop.parent, self.arg, n.args[0], self.cls)

InvalidRequestError: When initializing mapper Mapper|Queue|kombu_queue,
expression 'Message' failed to locate a name ("name 'Message' is not
defined"). If this is a class name, consider adding this relationship() to
the  class after both dependent
classes have been defined.

This happens sometimes but when I shut down galaxy and re-start I don't get
it anymore.

Can someone guide me to a more permanent solution to this?

Asma
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscr

Re: [galaxy-dev] question about splitting bams

2015-04-23 Thread John Chilton
I am a pragmatist - I have no problem just splitting the inputs and
skipping the metadata files. I would just convert the error into an
log.info() and warn that the tool cannot use metadata files. If the
underlying tool needs an index it can recreate it instead I think. One
can imagine a more intricate solution that would recreate metadata
files as needed - but that would be a lot of work I think.

Does that make sense?

About BB PR 175 there were some recent discussions about that approach
- I would check out
http://dev.list.galaxyproject.org/Parallelism-using-metadata-td4666763.html.

-John

On Thu, Apr 23, 2015 at 11:55 AM, Roberto Alonso CIPF  wrote:
> Hello,
> I ma trying ti write some code in order to give the possibility of
> parallelize some tasks. Now, I was with the problem of splitting a bam in
> some parts, for this I create this simple tool
>
>  merge_outputs="output" split_inputs="input" >
>
>   
> java -jar
> /home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T
> UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I
> $input -o $output 2> /dev/null;
>
>   
>   
> 
>   
>   
>   
>   
>
> But I have one problem, when I execute the tool it goes through this part of
> code (I am working in dev branch):
>
> $galaxy/lib/galaxy/jobs/splitters/multi.py, line 75:
>
> for input in parent_job.input_datasets:
> if input.name in split_inputs:
> this_input_files =
> job_wrapper.get_input_dataset_fnames(input.dataset)
> if len(this_input_files) > 1:
> log_error = "The input '%s' is composed of multiple files -
> splitting is not allowed" % str(input.name)
> log.error(log_error)
> raise Exception(log_error)
> input_datasets.append(input.dataset)
>
> So, it is raising the exception because this_input_files=2, concretely:
> ['/home/ralonso/galaxy/database/files/000/dataset_171.dat',
> '/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'],
> I guess that:
> dataset_171.dat: It is the bam file.
> metadata_13.dat: It is the bai file.
>
> So, Galaxy can't move on and I don't know which would be the best solution.
> Maybe change the if to check only non-metadata files? I think I should use
> both files in order to create the bam sub-files, but this would be inside
> the Bam class, under binary.py file.
> Could you please guide me before I mess things up?
>
> Thanks so much
> --
> Roberto Alonso
> Functional Genomics Unit
> Bioinformatics and Genomics Department
> Prince Felipe Research Center (CIPF)
> C./Eduardo Primo Yúfera (Científic), nº 3
> (junto Oceanografico)
> 46012 Valencia, Spain
> Tel: +34 963289680 Ext. 1021
> Fax: +34 963289574
> E-Mail: ralo...@cipf.es
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Galaxy on HPC and Bright Cluster Manager?

2015-04-23 Thread David Trudgian
Hi Carlos, sorry for the slow reply. 

With a small 10 user setup (I assume just one group) your installation is 
probably going to be a lot less complex than ours. We are running services for 
users from multiple labs and departments, so a chief concern was making sure 
private datasets always stay private - which is relatively involved when using 
Galaxy to run jobs on a general-purpose shared cluster.

I think if I had to recommend picking a job scheduler I would suggest SLURM 
since it's probably the most 'fashionable' choice at present. You also have the 
advantage that SLURM is used with Galaxy in the Galaxy docker image etc. which 
ensures people notice if the Galaxy->DRMAA->SLURM setup isn't working. I've 
also used Galaxy with GridEngine in the past, and that was fine - but is 
becoming a less common choice as a scheduler.

Having said that, I don't think that the job scheduler needs to be your biggest 
concern. I would focus most on the file system and user account setup you are 
going to need.

* How are you going to migrate from standalone Galaxy to a situation where your 
new cluster can see the Galaxy data files, tools etc. Are you purchasing 
storage with the cluster? If so do you move Galaxy onto that storage, or can 
you mount existing Galaxy data onto the cluster nodes? If you can do that, is 
your networking such that performance is sufficient  for the type of analysis 
you are going to run?

* Do you need to, or will you need to, keep track of per-user usage of the 
cluster for things that Galaxy will be running? If not then you can just have a 
galaxy user on your cluster and things are pretty easy for file permissions 
etc. If you need to track jobs per-user then it becomes more complex, and the 
solution depends on how much privacy you need for datasets, how your cluster 
will authenticate users etc.

The filesystem and user accounts issues are, in my mind, the ones to focus on. 
You can always modify Galaxy's config to switch to a different job scheduler 
fairly easily. You cannot as easily move around large amounts of data, and 
reconcile local vs cluster user accounts, should that be necessary.

Cheers,

Dave Trudgian

-Original Message-
From: Carlos Lijeron [mailto:clije...@hunter.cuny.edu] 
Sent: Wednesday, April 22, 2015 10:54 AM
To: David Trudgian; John Chilton
Cc: RODRIGO GONZALEZ SERRANO
Subject: Re: [galaxy-dev] Galaxy on HPC and Bright Cluster Manager?

Hello David,

Thank you for the great feedback.  We are at Hunter College in NYC, part of the 
City University of New York.  We recently ordered the cluster which comes with 
Bright Cluster Management, and our PI wants to implement Galaxy for all the 
users (about 10) on the cluster and manage all job submissions through a job 
scheduler.

So, to answer  your question, we are not really using any scheduler at this 
point, but only a stand alone server with a local installation of Galaxy.  Our 
Cluster should be assembled and installed by the end of May, so I¹m trying to 
gather as much information as possible in preparation for the deployment.

Based on your experience, what do you think I should focus on to ensure we 
maximize outcome and reduce the possibility of mistakes?  In other words, any 
lessons learned that you would like to share will be greatly appreciated.



Thanks again,


Carlos Lijeron.


On 4/22/15, 10:34 AM, "David Trudgian" 
wrote:

>Carlo,
>
>We have Bright Cluster Manager in use on our cluster for node 
>provisioning etc. but the actual job scheduler in use in our case is 
>SLURM, which we use directly.
>
>Are you using one of the integrated workload managers such as SLURM / 
>SGE / TORQUE directly, or indirectly via cmsub?
>
>I guess the easiest way to come up with some kind of advice is if you 
>can provide an example of generic job script you  are using on your system.
>If you're using cmsub is it specifying a --wlmanager etc.
>
>DT
>
>-Original Message-
>From: galaxy-dev [mailto:galaxy-dev-boun...@lists.galaxyproject.org] On 
>Behalf Of John Chilton
>Sent: Wednesday, April 22, 2015 8:26 AM
>To: Carlos Lijeron
>Cc: galaxy-dev@lists.galaxyproject.org
>Subject: Re: [galaxy-dev] Galaxy on HPC and Bright Cluster Manager?
>
>Hello Carlos,
>
>  I have never heard of anyone running Galaxy with Bright Cluster 
>Manager (though hopefully someone will chime in if they have). If you 
>are interested in adding support it should be possible. One 
>complication is that Bright Cluster Manager doesn't appear to have a 
>DRMAA interface
>(http://www.drmaa.org/) which is the most direct way to utilize new DRMs.
>Without that my approach would be to build a new CLI runner:
>
>There are a few examples here that one can use as template:
>
>https://github.com/galaxyproject/galaxy/tree/dev/lib/galaxy/jobs/runner
>s/u
>til/cli/job
>
>I guess you would have to write a new one targeting cmsub I guess - you 
>also need to be able to parse a job status somehow - I haven't figured 
>out how to do that from the d

[galaxy-dev] Writing Auth Layer as WSGI Middleware

2015-04-23 Thread Stephen Rosen
Hi All,

I have a need for authentication as a layer in front of Galaxy which is
more specialized than the available options -- specifically 3-legged OAuth
against site of my choice.

After looking into writing this in PHP and having the webserver (nginx) set
remote_user, I decided to nix that approach for a couple of reasons -- one
of which is that I don't have PHP experience.

After a few discussions with other devs, I've decided that there are two
easy options available to me:
- Write a WSGI app which does authentication, and *proxies* authenticated
requests to Galaxy with remote_user set. Since that's a WSGI app doing
proxying, obvious code smell there
- Write a WSGI middleware that wraps the existing Galaxy WSGI app, and
passes authenticated requests directly to the Galaxy app

That second solution seems much better, but I'm now faced with the question
of "How do I do it?"

Looking over the sample config, I see these lines:

# The factory for the WSGI application.  This should not be changed.
paste.app_factory = galaxy.web.buildapp:app_factory

I'm thinking that I could change that to my middleware, which will turn to
`galaxy.web.buildapp` when the time comes.

One problem I'm seeing is that my middleware and galaxy both have to run in
the same virtualenv, so there's potential for dependency conflicts.
The lib I want to use for this does rely on PyYAML and a few other things
which Galaxy also needs, so that possibility is very real.

Other than that hurdle, are there any gotchas I should be aware of with
this approach?
Are there similarly simple alternatives to this which I am not seeing?

Ultimately, if I have to write an app that does proxying, I'd prefer that
to the wide variety of highly effortful solutions I have envisioned.
Those include, but are not limited to, a PAM which does the OAuth and doing
Basic Authentication against that, just to give a flavor.

Thanks very much for your help,
-Stephen
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] question about splitting bams

2015-04-23 Thread Roberto Alonso CIPF
Regarding my previous mail I found this thread
http://www.bytebucket.org/galaxy/galaxy-central/pull-request/175/parameter-based-bam-file-parallelization/diff

is it still alive? is it maybe the best choice to do the bam
parallelization?

Thanks!
Best regards

On 23 April 2015 at 17:55, Roberto Alonso CIPF  wrote:

> Hello,
> I ma trying ti write some code in order to give the possibility of
> parallelize some tasks. Now, I was with the problem of splitting a bam in
> some parts, for this I create this simple tool
>
>  merge_outputs="output" split_inputs="input" >
>
>   
> java -jar
> /home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T
> UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I
> $input -o $output 2> /dev/null;
>
>   
>   
> 
>   
>   
>   
>   
>
> But I have one problem, when I execute the tool it goes through this part
> of code (I am working in dev branch):
>
> *$galaxy/lib/galaxy/jobs/splitters/multi.py, line 75:*
>
> for input in parent_job.input_datasets:
> if input.name in split_inputs:
> this_input_files =
> job_wrapper.get_input_dataset_fnames(input.dataset)
> if len(this_input_files) > 1:
> log_error = "The input '%s' is composed of multiple files
> - splitting is not allowed" % str(input.name)
> log.error(log_error)
> raise Exception(log_error)
> input_datasets.append(input.dataset)
>
> So, it is raising the exception because this_input_files=2, concretely:
> ['/home/ralonso/galaxy/database/files/000/dataset_171.dat',
> '/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'],
> I guess that:
> *dataset_171.dat*: It is the bam file.
> *metadata_13.dat*: It is the bai file.
>
> So, Galaxy can't move on and I don't know which would be the best
> solution. Maybe change the *if* to check only non-metadata files? I think
> I should use both files in order to create the bam sub-files, but this
> would be inside the Bam class, under *binary.py* file.
> Could you please guide me before I mess things up?
>
> Thanks so much
> --
> Roberto Alonso
> Functional Genomics Unit
> Bioinformatics and Genomics Department
> Prince Felipe Research Center (CIPF)
> C./Eduardo Primo Yúfera (Científic), nº 3
> (junto Oceanografico)
> 46012 Valencia, Spain
> Tel: +34 963289680 Ext. 1021
> Fax: +34 963289574
> E-Mail: ralo...@cipf.es
>



-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] use of copied history while original user account deleled (and purged)

2015-04-23 Thread Olivia Doppelt-Azeroual


Dear Developers,

We manage a Galaxy instance at Institut Pasteur where there are more than a 
hundred users. Among them there are postdocs that are meant to
leave the institute.

Our Galaxy instance is configured with LDAP authentication. The LDAP entry is 
suppressed shortly after the end of contracts.

We are facing the following problem:

A user left a few month ago,
1/ before leaving, she shared interesting histories with another colleague 
of the lab.

2/ before leaving, the colleague created a copy of every shared history in 
her own Galaxy account.

3/ The colleague made a few tests to check that the data were really 
transfered, by displaying them or downloading them and everything was ok.

4/ The time passed and the ldap user account of the user who left was 
suppressed. We deleted then purged her Galaxy account.

5/ The colleague has tried to relaunch some analysis using the copied 
histories. The data are there as her colleague is able to download the files or 
display them, but the jobs haven't been launched. They have remained grey in 
the copied history.

The problem is that there are no logs at all on the galaxy side. No command 
line is generated either.

On the reporting side of Galaxy, we see that the jobs have been created but their 
status remains "NEW'

We know that this will be a recurrent problem if we can't resolve it.

Have someone already complained about something like that?

Best regards,

--
Olivia Doppelt-Azeroual, PhD
Fabien Mareuil, PhD

Bioinformatics Engineer
Galaxy Team - CIB/C3BI
Institut Pasteur, Paris




___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

[galaxy-dev] question about splitting bams

2015-04-23 Thread Roberto Alonso CIPF
Hello,
I ma trying ti write some code in order to give the possibility of
parallelize some tasks. Now, I was with the problem of splitting a bam in
some parts, for this I create this simple tool



  
java -jar
/home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T
UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I
$input -o $output 2> /dev/null;

  
  

  
  
  
  

But I have one problem, when I execute the tool it goes through this part
of code (I am working in dev branch):

*$galaxy/lib/galaxy/jobs/splitters/multi.py, line 75:*

for input in parent_job.input_datasets:
if input.name in split_inputs:
this_input_files =
job_wrapper.get_input_dataset_fnames(input.dataset)
if len(this_input_files) > 1:
log_error = "The input '%s' is composed of multiple files -
splitting is not allowed" % str(input.name)
log.error(log_error)
raise Exception(log_error)
input_datasets.append(input.dataset)

So, it is raising the exception because this_input_files=2, concretely:
['/home/ralonso/galaxy/database/files/000/dataset_171.dat',
'/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'],
I guess that:
*dataset_171.dat*: It is the bam file.
*metadata_13.dat*: It is the bai file.

So, Galaxy can't move on and I don't know which would be the best solution.
Maybe change the *if* to check only non-metadata files? I think I should
use both files in order to create the bam sub-files, but this would be
inside the Bam class, under *binary.py* file.
Could you please guide me before I mess things up?

Thanks so much
-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Galaxy Tool Shed security vulnerability

2015-04-23 Thread Nate Coraor
*Please note: This notice affects Galaxy Tool Shed servers only. Galaxy
servers are unaffected.*

A security vulnerability was recently discovered by Daniel Blankenberg of
the Galaxy Team that would allow a malicious person to execute arbitrary
code on a Galaxy Tool Shed server. The vulnerability is due to reuse of
tool loading code from Galaxy, which executes "code files" defined by
Galaxy tool config files. Because the Tool Shed allows any user to create
and "load" tools, any user could cause arbitrary code to be executed by the
Tool Shed server. In Galaxy, administrators control which tools are loaded,
which is why this vulnerability does not affect Galaxy itself.

Although we recommend upgrading to the latest stable version (15.03.2), a
fix for this issue has been committed to Galaxy versions from 14.08 and
newer. If you are using Mercurial, you can update with (where YY.MM corresponds
to the Galaxy release you are currently running):

  % hg pull
  % hg update release_YY.MM

If you are using git, you can update with (assuming your remote upstream is
set to https://github.com/galaxyproject/galaxy/):

If you have not yet set up a remote tracking branch for the release you are
using:

  % git fetch upstream
  % git checkout -b release_YY.MM upstream/release_YY.MM

Otherwise:

  % git pull upstream release_YY.MM

For the changes to take effect, *you must restart all Tool Shed server
processes*.

Credit for the arbitrary code execution fix also goes to my fellow Galaxy
Team member Daniel Blankenberg.

On behalf of the Galaxy Team,
--nate
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/