Re: [galaxy-dev] Tool code for symlinking a data collection from input to output?
Thanks for including me Damion.For symlinking, you're right John and I never thought about any of the issues with deleting datasets in Galaxy afterwards.The ability to define a connection for passing/failing a subsequent tool you mentioned sounds exactly what we were trying to accomplish. Is there any link to documentation on how to do this? Passing a dummy input "passing_qc_text_file" could work too, but we were designing our QC tool to handle input from many types of tools (along with a "rules" file for how to evaluate the quality), so we would end up modifying a lot of existing tools just to add a "passing_qc_text_file" input. I'm not sure how much time we'd want to spend updating a bunch of tools if something new is on the way.Aaron-"Dooley, Damion"wrote: -To: John Chilton From: "Dooley, Damion" Date: 11/17/2015 01:31PMCc: "galaxy-...@lists.bx.psu.edu" , "aaron.pet...@phac-aspc.gc.ca" Subject: Re: [galaxy-dev] Tool code for symlinking a data collection from input to output?Ah, I can see how symlinking could lead to file management issues. Well,we were trying to avoid the situation where use of our qc tool wouldrequire customizing any subsequent tools in a workflow, and as well,reduce disk overhead of hundred megabyte files being passed along in aworkflow.So wow on the second paragraph - enabling dependencies outside of toolfile I/o. I agree with Eric, this will be great.Now in our current canned workflows we actually don't need this to beedited via the interface - so are there details on how to edit a workflowfile directly to get this dependency of tool B on tool A in place?Thanks,DamionOn 2015-11-17, 11:18 AM, "John Chilton" wrote:>Slowly trying to catch up on e-mail after a lot of travel in November>and I answered a variant of this to Damion directly, the most relevant>snippet was:>>">I would not symbolic link the>files though. I would just take the original collection and pipe it>into the next tool and add a dummy input to the next tool>("passing_qc_text_file") that would cause the workflow to fail if the>qc fails. This is a bit hacky, but symbolic linking will break>Galaxy's deletion, purging, etc You can delete the original>dataset collection and the result would affect the files on disk for>the output collection without Galaxy having anyway to know.>>The workflow subsystem has the ability to define a connection like>this (just wait for one tool to pass before calling the next without a>input/output relationship) but it hasn't been exposed in the workflow>editor yet.">>-John ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] installable revisions for repository suite definitions
Hello, I'm running into some issues with maintaining multiple installable revisions for a repository suite and I'm not sure if there's something I've missed. I have a local toolshed where I'm maintaining my tools, and I have a repository suite definition, defining a number of dependencies. So, I have: suite_X (mercurial revision 0): Dependency A Dependency B I updated "suite_X" and changed the dependencies: suite_X (mercurial revision 1): Dependency A Dependency B Dependency C I would like to have both mercurial revision 0 and 1 installable in Galaxy, but only revision 1 (the latest) is showing up as installable. I found documentation at https://wiki.galaxyproject.org/RepositoryRevisions but this seems to only cover a repository containing a single tool, not a suite of tools. Is there something I'm doing wrong or some setting I have to change here? My toolshed is running with Galaxy tag "v15.07". Thanks, Aaron ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Change in toolshed disallowing install of previous versions of tools?
Hello Martin, Yep, this is fully resolved for me. Thanks for following up, and for all the hard work you and the rest of the team do. Aaron On Fri, Aug 21, 2015 at 9:55 AM, Martin Čech mar...@bx.psu.edu wrote: Hi Aaron, I am just following up with a note that this has been fully resolved. Please let us know if you find any more problems. Thank you for using Galaxy. Martin On Wed, Aug 12, 2015 at 7:21 PM Aaron Petkau aaron.pet...@gmail.com wrote: Heh, always something that's discovered after the fact. Happens to me all the time. Thanks for getting back to me quickly. Looks like the fix you did works. Only other tool is sam_to_bam https://toolshed.g2.bx.psu.edu/repository?repository_id=01221a8c57f0fc1achangeset_revision=c73bf16b45df. Hopefully not too much trouble. Thanks again Martin, Aaron On Wed, Aug 12, 2015 at 5:10 PM, Martin Čech mar...@bx.psu.edu wrote: Hi Aaron, this is most probably connected with a bug that we discovered today in the 'reset metadata for all my repositories' feature. Unfortunately we discovered it by running it on Main Tool Shed. We have an alternate (time consuming) way of fixing individual repos (like I just did for the samtools_mpileup) so if you give me a list of tools you need to use with old revisions I will make sure they are in a good shape. This only affects devteam repos. We are working on a fix, please bare with us. Thank you for using Galaxy! Martin On Wed, Aug 12, 2015 at 5:44 PM Aaron Petkau aaron.pet...@gmail.com wrote: Hey everyone, So I rely a lot on specific versions of tools in the toolshed, and am often testing out the install process to make sure everything's working. Today I noticed that my install process was failing, and it looks like it's due to not being able to install previous versions of tools through the API. It seems to default to installing the latest version of a tool. For example, for the tool samtools_mpileup https://toolshed.g2.bx.psu.edu/repository?repository_id=01d08a1b766b864echangeset_revision=aa0ef6f0ee89, if I run the code: gi.toolShed.install_repository_revision(http://toolshed.g2.bx.psu.edu/ ,'samtools_mpileup','devteam','973fea5b4bdf',install_tool_dependencies=True,install_repository_dependencies=True) in bioblend (to install revision 973fea5b4bdf) I instead get the latest (aa0ef6f0ee89) revision installed. I can see the previous revision still exists https://toolshed.g2.bx.psu.edu/repository?repository_id=01d08a1b766b864echangeset_revision=973fea5b4bdf . I'm wondering if something has changed here? I remember being able to install old versions of tools through the API before, but maybe I'm getting mixed up? Thanks, Aaron ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] Change in toolshed disallowing install of previous versions of tools?
Hey everyone, So I rely a lot on specific versions of tools in the toolshed, and am often testing out the install process to make sure everything's working. Today I noticed that my install process was failing, and it looks like it's due to not being able to install previous versions of tools through the API. It seems to default to installing the latest version of a tool. For example, for the tool samtools_mpileup https://toolshed.g2.bx.psu.edu/repository?repository_id=01d08a1b766b864echangeset_revision=aa0ef6f0ee89, if I run the code: gi.toolShed.install_repository_revision(http://toolshed.g2.bx.psu.edu/ ,'samtools_mpileup','devteam','973fea5b4bdf',install_tool_dependencies=True,install_repository_dependencies=True) in bioblend (to install revision 973fea5b4bdf) I instead get the latest (aa0ef6f0ee89) revision installed. I can see the previous revision still exists https://toolshed.g2.bx.psu.edu/repository?repository_id=01d08a1b766b864echangeset_revision=973fea5b4bdf . I'm wondering if something has changed here? I remember being able to install old versions of tools through the API before, but maybe I'm getting mixed up? Thanks, Aaron ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] Creating new dataset collections in a workflow
Hey, So, I've been working on a tool which will product a new dataset collection as output. I was following some of the instructions from https://bitbucket.org/galaxy/galaxy-central/pull-requests/582/allow-tools-to-explicitly-produce-dataset/diff. I managed to get the tool itself working, but when I go to use it in a workflow I'm getting errors. Mainly: History does not include a dataset collection of the correct type or containing the correct types of datasets I'm wondering if there's something I'm doing wrong, or if tools which product dataset collections are not supported within workflows? I'm working with the second case in that merge requests, using an input list as the structure for my output list. Thanks, Aaron ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] Database deadlock with large workflows + dataset collections
Hey, I wanted to know if anyone else has had experience with database deadlock when using dataset collections and running a large number of samples through a workflow. Traceback (most recent call last): File /Warehouse/Applications/irida/galaxy/galaxy-dist/lib/galaxy/jobs/runners/__init__.py, line 565, in finish_job job_state.job_wrapper.finish( stdout, stderr, exit_code ) File /Warehouse/Applications/irida/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py, line 1250, in finish self.sa_session.flush() File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/scoping.py, line 114, in do return getattr(self.registry(), name)(*args, **kwargs) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py, line 1718, in flush self._flush(objects) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py, line 1789, in _flush flush_context.execute() File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py, line 331, in execute rec.execute(self) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py, line 475, in execute uow File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py, line 59, in save_obj mapper, table, update) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py, line 485, in _emit_update_statements execute(statement, params) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py, line 1449, in execute params) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py, line 1584, in _execute_clauseelement compiled_sql, distilled_params File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py, line 1698, in _execute_context context) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py, line 1691, in _execute_context context) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/default.py, line 331, in do_execute cursor.execute(statement, parameters) DBAPIError: (TransactionRollbackError) deadlock detected DETAIL: Process 25859 waits for ShareLock on transaction 144373; blocked by process 25858. Process 25858 waits for ShareLock on transaction 144372; blocked by process 25859. HINT: See server log for query details. 'UPDATE workflow_invocation SET update_time=%(update_time)s WHERE workflow_invocation.id = %(workflow_invocation_id)s' {'update_time': datetime.datetime(2015, 2, 27, 3, 51, 57, 81403), 'workflow_invocation_id': 48} I saw this post http://dev.list.galaxyproject.org/data-collections-workflow-bug-td4666496.html with a similar issue and the solution was to make sure not to use a sqlite database, but I'm using a postgres database and still encountered this issue. This was after running a very large number of samples (~200) using dataset collections. Just wondering if anyone else was running into this issue? Thanks, Aaron ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Database deadlock with large workflows + dataset collections
Hello John, Awesome, thanks so much for looking into this and all the other work you've been doing on dataset collections. Yes, this is the January stable release, exact commit is https://bitbucket.org/galaxy/galaxy-dist/commits/097bbb3b7d3246faaa5188a1fc2a79b01630025c. I currently have one web worker and one background job handler so it would have been scheduled in the job handler. It is configured like: [server:handler0] use = egg:Paste#http port = 8079 host = 127.0.0.1 use_threadpool = true threadpool_workers = 5 I can paste the relevant section from the handler log, but I didn't see anything there other then the same exception and stack trace. Our job handler is submitting to our cluster using DRMAA, from job_conf.xml: plugin id=drmaa type=runner load=galaxy.jobs.runners.drmaa:DRMAAJobRunner workers=8/ The submission did take forever, over an hour. It was being submitted through the API so I didn't check the web response time but from my experience before I imagine it would be very slow. I've noticed this slowdown before when I get into the 150-200 paired-end samples range. I'll look forward to the next release though. It would be awesome to have the workflow submission sped up. Let me know if there's any other info I can provide to help debug, Aaron On Fri, Feb 27, 2015 at 9:38 AM, John Chilton jmchil...@gmail.com wrote: Hey Aaron, Thanks for the bug report - I have added it to Trello here (https://trello.com/c/I0n23JEP). I assume this is the January stable release (15.01)? Any clue if this workflow was being scheduled in a web thread or a background job handler thread? (Did the submission take forever - or was the web page responsive and the server bogged down). So there is a race condition here it would seem - and I don't have a fix right away - but I do think (hope) the next release due out in a couple of weeks will result in a massive speed up in workflow scheduling - so hopefully we will be less likely to hit these conditions. -John On Fri, Feb 27, 2015 at 10:11 AM, Aaron Petkau aaron.pet...@gmail.com wrote: Hey, I wanted to know if anyone else has had experience with database deadlock when using dataset collections and running a large number of samples through a workflow. Traceback (most recent call last): File /Warehouse/Applications/irida/galaxy/galaxy-dist/lib/galaxy/jobs/runners/__init__.py, line 565, in finish_job job_state.job_wrapper.finish( stdout, stderr, exit_code ) File /Warehouse/Applications/irida/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py, line 1250, in finish self.sa_session.flush() File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/scoping.py, line 114, in do return getattr(self.registry(), name)(*args, **kwargs) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py, line 1718, in flush self._flush(objects) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py, line 1789, in _flush flush_context.execute() File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py, line 331, in execute rec.execute(self) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py, line 475, in execute uow File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py, line 59, in save_obj mapper, table, update) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py, line 485, in _emit_update_statements execute(statement, params) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py, line 1449, in execute params) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py, line 1584, in _execute_clauseelement compiled_sql, distilled_params File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py, line 1698, in _execute_context context) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py, line 1691, in _execute_context context) File /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/default.py, line 331, in do_execute cursor.execute(statement, parameters) DBAPIError