Re: [galaxy-dev] Galaxy for Natural Language Processing
Hi Björn, On Apr 22, 2015, at 8:00 AM, Björn Grüning wrote: >> Do you have a beer preference? > > Outing: I'm one of the rare Germans that do not drink alcohol ;) That must be awkward ;) > This can be done via the ToolShed. I assume your custom command > interpreter is not different than python or perl as interpreter? One difference is that my interpreter is a Java program. I likely should have mentioned that little detail... anyone wanting to install our tools would need my interpreter AND Java 1.7+ on their server. Hopefully that is not an insurmountable problem. However, does the bioinformatics community really want a bunch of NLP tools in their tool shed? >> The editor also allows me to select output formats that have no converters >> defined, >> so either I am still missing something or the workflow editor does not do >> what I want. I can convert formats through the "Edit attributes" menu, >> so Galaxy knows about my converters and how to invoke them, just not in the >> workflow editor. > > Ok, I think I understood. Not sure if this is the best way but put your > converters into the toolbox. By the "toolbox" do you mean adding my converters to the tool_conf.xml file so they are available on the Tools menu? I have done that and I can add the converters to a workflow manually. I was just hoping the workflow editor could detect when it could perform the conversion and insert the converters as needed; it seems this is not possible. >> Do you have more pointers to tools that use the attached metadata? In >> particular tools that set metadata that is consumed by subsequent tools. > > The sqlite datatype should be a good example. Keep in mind, we can not > set metadata from inside a tool. > Imho this is not possible, yet, but a > common requested feature. But you can "calculate" such metadata inside > your datatype definition and set it implicitly after your tool is finished. Setting the metadata in the tool wrapper is fine, and after grepping through some of the other wrappers I think I need something like: True metadata.tokens is not None That is, the input validator simply checks if some value has been set in the metadata, and the output sets a value in the metadata. The above does not work, but at least Galaxy stopped complaining about the tool XML with this. However, the documentation for and does not match up with what existing wrappers (in the dev branch) are doing so I am having problems with the exact syntax. >> Do you have pointers to any documentation on data collections? My searches >> haven't turned up much but tantalizing references [1], >> and my experiments trying to return a data collection from a tool have been >> unsuccessful. > > https://wiki.galaxyproject.org/Histories?highlight=%28collection%29#Dataset_Collections > > https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax -> > data_collection > > And have a look at: > https://github.com/galaxyproject/galaxy/tree/dev/test/functional/tools Success! I was running the code from master, so I suspect that was part of my problem. However, my browser is still complaining about long running scripts. > A script on this page may be busy, or it may have stopped responding. You can > stop the script now, open the script in the debugger, or let the script > continue. > > Script: http://localhost:8000/static/s…/jquery/jquery.js?v=1429811186:2 I accidentally left visible="true" when creating the dataset collection and ended up with +1500 items in my history; the above message kept popping up while the workflow was running (at least until I selected "Don't show this again"). Deleting +1500 datasets from the history is also very slow, but that is a different issue. On the bright side, at least I had +1500 items in the history to delete. >> I have also been trying John Chiltons blend4j and managed to populate a data >> library, and this is almost what I want, >> but I would like a tool that can be included in a workflow as the data from >> the library may not necessarily be the first step. >> I have no problem calling the Galaxy API from my tools, except that between >> the bioinformatics lingo and Python (I'm a Java programmer) it's slow going. > > If possible at all you should avoid this, but as last resort probably an > option. Out of curiosity, what exactly should I avoid; making calls to the Galaxy REST/API from inside a tool, using blend4j, or populating a data library from inside a tool? I can see myself doing all three in the near future. Cheers, Keith > > Ciao, > Bjoern > >> Cheers, >> Keith >> >> REFERENCES >> >> 1. https://wiki.galaxyproject.org/Learn/API#Collections >> 2. >> https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_of_Output_datasets_cannot_be_determined_until_tool_run >> >>> > Oh yes this is supported out of the box! >
Re: [galaxy-dev] Shared Data Library issues with permissions
Hi Carrie, in what step did you actually add the datasets to the folder? Also how did you add it? Thanks Martin On Thu, Apr 23, 2015 at 6:27 PM Ganote, Carrie L wrote: > Hi Devs, > > Maybe I'm missing something, but I'm having trouble getting permissions > right on the Data Libraries. > I can reproduce this: > As an admin, I go to Manage data libraries. > On top right, I click Create new data library. Let's call it test. > I Add a folder in test, called "more testing". > I check the permissions of the library in Library Actions->Edit > permissions and make sure that all roles associated are blank. > I check permissions in more testing to make sure they are all blank. > I click on Shared Data -> Data Libraries to see if I can view the files > with the user perspective. > I click on the Test library I just created: > Data Library “Test” The data library 'Test' does not contain any datasets > that you can access. > > I have a fairly recent (April 2015) update to Galaxy. > > Any advice? I first started having trouble getting access for a user to > their own library; when I associated their role with their own folder, they > still couldn't see anything they could access. I should at least get an > empty folder. > > Sincerely, > > Carrie Ganote > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > https://lists.galaxyproject.org/ > > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Serving a galaxy instance over Apache
Hi, I am setting up apache over a remote server on which a virtual apache was previously installed under /seq/../iwww. Set up of this directory looks something like this: /seq/../iwww -->apache/ -->sites-available/ -->defaults -->sites-enabled/ -->myPublic.conf (this is equivalent to httpd.conf) As John suggested to start with an easier apache config: I have done the following:- 1) In defaults I have this: . . RewriteEngine on RewriteRule ^(.*) [http://[IP address]:8080/]$1 [P] 2) In myPublic.conf I have these two directories: Options +ExecCGI AllowOverride All DirectoryIndex index.html index.cgi AddHandler cgi-script .cgi Order Allow,Deny Allow from All Deny from all AllowOverride All DirectoryIndex static/welcome.html Order Allow,Deny Allow from All Deny from all At this point I am not sure in which direction should I go to get this running off of apache. Kindly note: I am able to start an apache instance by executing sh run.sh and the galaxy instance is up and running through the IP address. Please advice. Thank you, Asma On Tue, Apr 21, 2015 at 9:57 AM, John Chilton wrote: > This configuration looks correct - are you sure the correct properties > are set in config/galaxy.ini: > > You need this section: > > [filter:proxy-prefix] > use = egg:PasteDeploy#prefix > prefix = /galaxy > > And [app:main] needs following properties: > > filter-with = proxy-prefix > cookie_path = /galaxy > > I would also make sure your Galaxy is up and running on port 8080 - > maybe by running a command such as > > wget http://127.0.0.1:8080 > > from the server to make sure you get a response from Galaxy. > > If none of that leads to clues - it might be worth starting with the > easier apache proxy configuration (serving on / instead of /galaxy) > and getting that working first to rule out Galaxy and very basic > Apache configuration problems. > > -John > > > > > > > > On Fri, Apr 17, 2015 at 11:50 AM, Asma Riyaz > wrote: > > Hi, > > > > I have been through previous issues regarding this issue. So far, I have > > included the rewrite Engine logic written in > > /etc/apache2/sites-available/defaults - looks like so: > > > > > > > > . > > > > . > > > > RewriteEngine on > > > > RewriteRule ^/galaxy$ /galaxy/ [R] > > > > RewriteRule ^/galaxy/static/style/(.*) > > /seq/SOFTWARE/galaxy/static/june_2007_style/blue/$1 [L] > > > > RewriteRule ^/galaxy/static/scripts/(.*) > > /seq/SOFTWARE/galaxy/static/scripts/packed/$1 [L] > > > > RewriteRule ^/galaxy/static/(.*) /seq/SOFTWARE/galaxy/static/$1 [L] > > > > RewriteRule ^/galaxy/favicon.ico /seq/SOFTWARE/galaxy/static/favicon.ico > [L] > > > > RewriteRule ^/galaxy/robots.txt /seq/SOFTWARE/galaxy/static/robots.txt > [L] > > > > RewriteRule ^/galaxy(.*) http://[my IP address]/$1 [P] > > > > I restarted apache after this and accessed : http://[my IP > address]/galaxy > > but no webpage is found. Any advice on how I can proceed to configure the > > apache correctly? > > > > Thank you, > > > > Asma > > > > > > > > > > > > ___ > > Please keep all replies on the list by using "reply all" > > in your mail client. To manage your subscriptions to this > > and other Galaxy lists, please use the interface at: > > https://lists.galaxyproject.org/ > > > > To search Galaxy mailing lists use the unified search at: > > http://galaxyproject.org/search/mailinglists/ > ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] Shared Data Library issues with permissions
Hi Devs, Maybe I'm missing something, but I'm having trouble getting permissions right on the Data Libraries. I can reproduce this: As an admin, I go to Manage data libraries. On top right, I click Create new data library. Let's call it test. I Add a folder in test, called "more testing". I check the permissions of the library in Library Actions->Edit permissions and make sure that all roles associated are blank. I check permissions in more testing to make sure they are all blank. I click on Shared Data -> Data Libraries to see if I can view the files with the user perspective. I click on the Test library I just created: Data Library “Test” The data library 'Test' does not contain any datasets that you can access. I have a fairly recent (April 2015) update to Galaxy. Any advice? I first started having trouble getting access for a user to their own library; when I associated their role with their own folder, they still couldn't see anything they could access. I should at least get an empty folder. Sincerely, Carrie Ganote ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Data Libraries
Hello Ryan, it is currently not possible to give users rights to create data libraries. However you can create the libraries for them and give them rights to create and manage subfolders (doing the same things but one level below). Would that address your goal? Martin On Thu, Apr 23, 2015 at 4:33 PM Ryan G wrote: > Hi all - We are trying to use Galaxy as a mechanism for our sequencing lab > to create data libraries for data they generate. I noticed in the docs, > only Admins are able to create data libraries. Is there a way to change > this? I'd like to give specific users in our group this ability without > giving them admin rights. > > Ryan > > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > https://lists.galaxyproject.org/ > > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] Data Libraries
Hi all - We are trying to use Galaxy as a mechanism for our sequencing lab to create data libraries for data they generate. I noticed in the docs, only Admins are able to create data libraries. Is there a way to change this? I'd like to give specific users in our group this ability without giving them admin rights. Ryan ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] InvalidRequestError: kombu.transport.sqlalchemy.Queue
Hi, I am receiving this traceback when I do restart Apache and not sure what to make of it: Exception in thread WorkflowRequestMonitor.monitor_thread: Traceback (most recent call last): File "/broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/lib/python2.7/threading.py", line 530, in __bootstrap_inner self.run() File "/broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/lib/python2.7/threading.py", line 483, in run self.__target(*self.__args, **self.__kwargs) File "/seq/regev_genome_portal/SOFTWARE/galaxy/lib/galaxy/workflow/scheduling_manager.py", line 158, in __monitor self.__schedule( workflow_scheduler_id, workflow_scheduler ) File "/seq/regev_genome_portal/SOFTWARE/galaxy/lib/galaxy/workflow/scheduling_manager.py", line 162, in __schedule invocation_ids = self.__active_invocation_ids( workflow_scheduler_id ) File "/seq/regev_genome_portal/SOFTWARE/galaxy/lib/galaxy/workflow/scheduling_manager.py", line 191, in __active_invocation_ids handler=handler, File "/seq/regev_genome_portal/SOFTWARE/galaxy/lib/galaxy/model/__init__.py", line 3254, in poll_active_workflow_ids WorkflowInvocation File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/scoping.py", line 150, in do return getattr(self.registry(), name)(*args, **kwargs) File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/session.py", line 1165, in query return self._query_cls(entities, self, **kwargs) File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/query.py", line 108, in __init__ self._set_entities(entities) File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/query.py", line 118, in _set_entities self._set_entity_selectables(self._entities) File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/query.py", line 151, in _set_entity_selectables ent.setup_entity(*d[entity]) File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/query.py", line 3036, in setup_entity self._with_polymorphic = ext_info.with_polymorphic_mappers File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/util/langhelpers.py", line 725, in __get__ obj.__dict__[self.__name__] = result = self.fget(obj) File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/mapper.py", line 1877, in _with_polymorphic_mappers configure_mappers() File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/mapper.py", line 2589, in configure_mappers mapper._post_configure_properties() File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/mapper.py", line 1694, in _post_configure_properties prop.init() File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/interfaces.py", line 144, in init self.do_init() File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/relationships.py", line 1549, in do_init self._process_dependent_arguments() File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/relationships.py", line 1605, in _process_dependent_arguments self.target = self.mapper.mapped_table File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/util/langhelpers.py", line 725, in __get__ obj.__dict__[self.__name__] = result = self.fget(obj) File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/relationships.py", line 1522, in mapper argument = self.argument() File "/seq/regev_genome_portal/SOFTWARE/galaxy/eggs/SQLAlchemy-0.9.8-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/ext/declarative/clsregistry.py", line 283, in __call__ (self.prop.parent, self.arg, n.args[0], self.cls) InvalidRequestError: When initializing mapper Mapper|Queue|kombu_queue, expression 'Message' failed to locate a name ("name 'Message' is not defined"). If this is a class name, consider adding this relationship() to the class after both dependent classes have been defined. This happens sometimes but when I shut down galaxy and re-start I don't get it anymore. Can someone guide me to a more permanent solution to this? Asma ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscr
Re: [galaxy-dev] question about splitting bams
I am a pragmatist - I have no problem just splitting the inputs and skipping the metadata files. I would just convert the error into an log.info() and warn that the tool cannot use metadata files. If the underlying tool needs an index it can recreate it instead I think. One can imagine a more intricate solution that would recreate metadata files as needed - but that would be a lot of work I think. Does that make sense? About BB PR 175 there were some recent discussions about that approach - I would check out http://dev.list.galaxyproject.org/Parallelism-using-metadata-td4666763.html. -John On Thu, Apr 23, 2015 at 11:55 AM, Roberto Alonso CIPF wrote: > Hello, > I ma trying ti write some code in order to give the possibility of > parallelize some tasks. Now, I was with the problem of splitting a bam in > some parts, for this I create this simple tool > > merge_outputs="output" split_inputs="input" > > > > java -jar > /home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T > UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I > $input -o $output 2> /dev/null; > > > > > > > > > > But I have one problem, when I execute the tool it goes through this part of > code (I am working in dev branch): > > $galaxy/lib/galaxy/jobs/splitters/multi.py, line 75: > > for input in parent_job.input_datasets: > if input.name in split_inputs: > this_input_files = > job_wrapper.get_input_dataset_fnames(input.dataset) > if len(this_input_files) > 1: > log_error = "The input '%s' is composed of multiple files - > splitting is not allowed" % str(input.name) > log.error(log_error) > raise Exception(log_error) > input_datasets.append(input.dataset) > > So, it is raising the exception because this_input_files=2, concretely: > ['/home/ralonso/galaxy/database/files/000/dataset_171.dat', > '/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'], > I guess that: > dataset_171.dat: It is the bam file. > metadata_13.dat: It is the bai file. > > So, Galaxy can't move on and I don't know which would be the best solution. > Maybe change the if to check only non-metadata files? I think I should use > both files in order to create the bam sub-files, but this would be inside > the Bam class, under binary.py file. > Could you please guide me before I mess things up? > > Thanks so much > -- > Roberto Alonso > Functional Genomics Unit > Bioinformatics and Genomics Department > Prince Felipe Research Center (CIPF) > C./Eduardo Primo Yúfera (Científic), nº 3 > (junto Oceanografico) > 46012 Valencia, Spain > Tel: +34 963289680 Ext. 1021 > Fax: +34 963289574 > E-Mail: ralo...@cipf.es > > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > https://lists.galaxyproject.org/ > > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Galaxy on HPC and Bright Cluster Manager?
Hi Carlos, sorry for the slow reply. With a small 10 user setup (I assume just one group) your installation is probably going to be a lot less complex than ours. We are running services for users from multiple labs and departments, so a chief concern was making sure private datasets always stay private - which is relatively involved when using Galaxy to run jobs on a general-purpose shared cluster. I think if I had to recommend picking a job scheduler I would suggest SLURM since it's probably the most 'fashionable' choice at present. You also have the advantage that SLURM is used with Galaxy in the Galaxy docker image etc. which ensures people notice if the Galaxy->DRMAA->SLURM setup isn't working. I've also used Galaxy with GridEngine in the past, and that was fine - but is becoming a less common choice as a scheduler. Having said that, I don't think that the job scheduler needs to be your biggest concern. I would focus most on the file system and user account setup you are going to need. * How are you going to migrate from standalone Galaxy to a situation where your new cluster can see the Galaxy data files, tools etc. Are you purchasing storage with the cluster? If so do you move Galaxy onto that storage, or can you mount existing Galaxy data onto the cluster nodes? If you can do that, is your networking such that performance is sufficient for the type of analysis you are going to run? * Do you need to, or will you need to, keep track of per-user usage of the cluster for things that Galaxy will be running? If not then you can just have a galaxy user on your cluster and things are pretty easy for file permissions etc. If you need to track jobs per-user then it becomes more complex, and the solution depends on how much privacy you need for datasets, how your cluster will authenticate users etc. The filesystem and user accounts issues are, in my mind, the ones to focus on. You can always modify Galaxy's config to switch to a different job scheduler fairly easily. You cannot as easily move around large amounts of data, and reconcile local vs cluster user accounts, should that be necessary. Cheers, Dave Trudgian -Original Message- From: Carlos Lijeron [mailto:clije...@hunter.cuny.edu] Sent: Wednesday, April 22, 2015 10:54 AM To: David Trudgian; John Chilton Cc: RODRIGO GONZALEZ SERRANO Subject: Re: [galaxy-dev] Galaxy on HPC and Bright Cluster Manager? Hello David, Thank you for the great feedback. We are at Hunter College in NYC, part of the City University of New York. We recently ordered the cluster which comes with Bright Cluster Management, and our PI wants to implement Galaxy for all the users (about 10) on the cluster and manage all job submissions through a job scheduler. So, to answer your question, we are not really using any scheduler at this point, but only a stand alone server with a local installation of Galaxy. Our Cluster should be assembled and installed by the end of May, so I¹m trying to gather as much information as possible in preparation for the deployment. Based on your experience, what do you think I should focus on to ensure we maximize outcome and reduce the possibility of mistakes? In other words, any lessons learned that you would like to share will be greatly appreciated. Thanks again, Carlos Lijeron. On 4/22/15, 10:34 AM, "David Trudgian" wrote: >Carlo, > >We have Bright Cluster Manager in use on our cluster for node >provisioning etc. but the actual job scheduler in use in our case is >SLURM, which we use directly. > >Are you using one of the integrated workload managers such as SLURM / >SGE / TORQUE directly, or indirectly via cmsub? > >I guess the easiest way to come up with some kind of advice is if you >can provide an example of generic job script you are using on your system. >If you're using cmsub is it specifying a --wlmanager etc. > >DT > >-Original Message- >From: galaxy-dev [mailto:galaxy-dev-boun...@lists.galaxyproject.org] On >Behalf Of John Chilton >Sent: Wednesday, April 22, 2015 8:26 AM >To: Carlos Lijeron >Cc: galaxy-dev@lists.galaxyproject.org >Subject: Re: [galaxy-dev] Galaxy on HPC and Bright Cluster Manager? > >Hello Carlos, > > I have never heard of anyone running Galaxy with Bright Cluster >Manager (though hopefully someone will chime in if they have). If you >are interested in adding support it should be possible. One >complication is that Bright Cluster Manager doesn't appear to have a >DRMAA interface >(http://www.drmaa.org/) which is the most direct way to utilize new DRMs. >Without that my approach would be to build a new CLI runner: > >There are a few examples here that one can use as template: > >https://github.com/galaxyproject/galaxy/tree/dev/lib/galaxy/jobs/runner >s/u >til/cli/job > >I guess you would have to write a new one targeting cmsub I guess - you >also need to be able to parse a job status somehow - I haven't figured >out how to do that from the d
[galaxy-dev] Writing Auth Layer as WSGI Middleware
Hi All, I have a need for authentication as a layer in front of Galaxy which is more specialized than the available options -- specifically 3-legged OAuth against site of my choice. After looking into writing this in PHP and having the webserver (nginx) set remote_user, I decided to nix that approach for a couple of reasons -- one of which is that I don't have PHP experience. After a few discussions with other devs, I've decided that there are two easy options available to me: - Write a WSGI app which does authentication, and *proxies* authenticated requests to Galaxy with remote_user set. Since that's a WSGI app doing proxying, obvious code smell there - Write a WSGI middleware that wraps the existing Galaxy WSGI app, and passes authenticated requests directly to the Galaxy app That second solution seems much better, but I'm now faced with the question of "How do I do it?" Looking over the sample config, I see these lines: # The factory for the WSGI application. This should not be changed. paste.app_factory = galaxy.web.buildapp:app_factory I'm thinking that I could change that to my middleware, which will turn to `galaxy.web.buildapp` when the time comes. One problem I'm seeing is that my middleware and galaxy both have to run in the same virtualenv, so there's potential for dependency conflicts. The lib I want to use for this does rely on PyYAML and a few other things which Galaxy also needs, so that possibility is very real. Other than that hurdle, are there any gotchas I should be aware of with this approach? Are there similarly simple alternatives to this which I am not seeing? Ultimately, if I have to write an app that does proxying, I'd prefer that to the wide variety of highly effortful solutions I have envisioned. Those include, but are not limited to, a PAM which does the OAuth and doing Basic Authentication against that, just to give a flavor. Thanks very much for your help, -Stephen ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] question about splitting bams
Regarding my previous mail I found this thread http://www.bytebucket.org/galaxy/galaxy-central/pull-request/175/parameter-based-bam-file-parallelization/diff is it still alive? is it maybe the best choice to do the bam parallelization? Thanks! Best regards On 23 April 2015 at 17:55, Roberto Alonso CIPF wrote: > Hello, > I ma trying ti write some code in order to give the possibility of > parallelize some tasks. Now, I was with the problem of splitting a bam in > some parts, for this I create this simple tool > > merge_outputs="output" split_inputs="input" > > > > java -jar > /home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T > UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I > $input -o $output 2> /dev/null; > > > > > > > > > > But I have one problem, when I execute the tool it goes through this part > of code (I am working in dev branch): > > *$galaxy/lib/galaxy/jobs/splitters/multi.py, line 75:* > > for input in parent_job.input_datasets: > if input.name in split_inputs: > this_input_files = > job_wrapper.get_input_dataset_fnames(input.dataset) > if len(this_input_files) > 1: > log_error = "The input '%s' is composed of multiple files > - splitting is not allowed" % str(input.name) > log.error(log_error) > raise Exception(log_error) > input_datasets.append(input.dataset) > > So, it is raising the exception because this_input_files=2, concretely: > ['/home/ralonso/galaxy/database/files/000/dataset_171.dat', > '/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'], > I guess that: > *dataset_171.dat*: It is the bam file. > *metadata_13.dat*: It is the bai file. > > So, Galaxy can't move on and I don't know which would be the best > solution. Maybe change the *if* to check only non-metadata files? I think > I should use both files in order to create the bam sub-files, but this > would be inside the Bam class, under *binary.py* file. > Could you please guide me before I mess things up? > > Thanks so much > -- > Roberto Alonso > Functional Genomics Unit > Bioinformatics and Genomics Department > Prince Felipe Research Center (CIPF) > C./Eduardo Primo Yúfera (Científic), nº 3 > (junto Oceanografico) > 46012 Valencia, Spain > Tel: +34 963289680 Ext. 1021 > Fax: +34 963289574 > E-Mail: ralo...@cipf.es > -- Roberto Alonso Functional Genomics Unit Bioinformatics and Genomics Department Prince Felipe Research Center (CIPF) C./Eduardo Primo Yúfera (Científic), nº 3 (junto Oceanografico) 46012 Valencia, Spain Tel: +34 963289680 Ext. 1021 Fax: +34 963289574 E-Mail: ralo...@cipf.es ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] use of copied history while original user account deleled (and purged)
Dear Developers, We manage a Galaxy instance at Institut Pasteur where there are more than a hundred users. Among them there are postdocs that are meant to leave the institute. Our Galaxy instance is configured with LDAP authentication. The LDAP entry is suppressed shortly after the end of contracts. We are facing the following problem: A user left a few month ago, 1/ before leaving, she shared interesting histories with another colleague of the lab. 2/ before leaving, the colleague created a copy of every shared history in her own Galaxy account. 3/ The colleague made a few tests to check that the data were really transfered, by displaying them or downloading them and everything was ok. 4/ The time passed and the ldap user account of the user who left was suppressed. We deleted then purged her Galaxy account. 5/ The colleague has tried to relaunch some analysis using the copied histories. The data are there as her colleague is able to download the files or display them, but the jobs haven't been launched. They have remained grey in the copied history. The problem is that there are no logs at all on the galaxy side. No command line is generated either. On the reporting side of Galaxy, we see that the jobs have been created but their status remains "NEW' We know that this will be a recurrent problem if we can't resolve it. Have someone already complained about something like that? Best regards, -- Olivia Doppelt-Azeroual, PhD Fabien Mareuil, PhD Bioinformatics Engineer Galaxy Team - CIB/C3BI Institut Pasteur, Paris ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] question about splitting bams
Hello, I ma trying ti write some code in order to give the possibility of parallelize some tasks. Now, I was with the problem of splitting a bam in some parts, for this I create this simple tool java -jar /home/ralonso/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /home/ralonso/BiB/Galaxy/data/chr_19_hg19_ucsc.fa -I $input -o $output 2> /dev/null; But I have one problem, when I execute the tool it goes through this part of code (I am working in dev branch): *$galaxy/lib/galaxy/jobs/splitters/multi.py, line 75:* for input in parent_job.input_datasets: if input.name in split_inputs: this_input_files = job_wrapper.get_input_dataset_fnames(input.dataset) if len(this_input_files) > 1: log_error = "The input '%s' is composed of multiple files - splitting is not allowed" % str(input.name) log.error(log_error) raise Exception(log_error) input_datasets.append(input.dataset) So, it is raising the exception because this_input_files=2, concretely: ['/home/ralonso/galaxy/database/files/000/dataset_171.dat', '/home/ralonso/galaxy/database/files/_metadata_files/000/metadata_13.dat'], I guess that: *dataset_171.dat*: It is the bam file. *metadata_13.dat*: It is the bai file. So, Galaxy can't move on and I don't know which would be the best solution. Maybe change the *if* to check only non-metadata files? I think I should use both files in order to create the bam sub-files, but this would be inside the Bam class, under *binary.py* file. Could you please guide me before I mess things up? Thanks so much -- Roberto Alonso Functional Genomics Unit Bioinformatics and Genomics Department Prince Felipe Research Center (CIPF) C./Eduardo Primo Yúfera (Científic), nº 3 (junto Oceanografico) 46012 Valencia, Spain Tel: +34 963289680 Ext. 1021 Fax: +34 963289574 E-Mail: ralo...@cipf.es ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] Galaxy Tool Shed security vulnerability
*Please note: This notice affects Galaxy Tool Shed servers only. Galaxy servers are unaffected.* A security vulnerability was recently discovered by Daniel Blankenberg of the Galaxy Team that would allow a malicious person to execute arbitrary code on a Galaxy Tool Shed server. The vulnerability is due to reuse of tool loading code from Galaxy, which executes "code files" defined by Galaxy tool config files. Because the Tool Shed allows any user to create and "load" tools, any user could cause arbitrary code to be executed by the Tool Shed server. In Galaxy, administrators control which tools are loaded, which is why this vulnerability does not affect Galaxy itself. Although we recommend upgrading to the latest stable version (15.03.2), a fix for this issue has been committed to Galaxy versions from 14.08 and newer. If you are using Mercurial, you can update with (where YY.MM corresponds to the Galaxy release you are currently running): % hg pull % hg update release_YY.MM If you are using git, you can update with (assuming your remote upstream is set to https://github.com/galaxyproject/galaxy/): If you have not yet set up a remote tracking branch for the release you are using: % git fetch upstream % git checkout -b release_YY.MM upstream/release_YY.MM Otherwise: % git pull upstream release_YY.MM For the changes to take effect, *you must restart all Tool Shed server processes*. Credit for the arbitrary code execution fix also goes to my fellow Galaxy Team member Daniel Blankenberg. On behalf of the Galaxy Team, --nate ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/