Re: [galaxy-dev] Tool code for symlinking a data collection from input to output?

2015-11-19 Thread Aaron Petkau
Thanks for including me Damion.For symlinking, you're right John and I never thought about any of the issues with deleting datasets in Galaxy afterwards.The ability to define a connection for passing/failing a subsequent tool you mentioned sounds exactly what we were trying to accomplish.  Is there any link to documentation on how to do this?  Passing a dummy input "passing_qc_text_file" could work too, but we were designing our QC tool to handle input from many types of tools (along with a "rules" file for how to evaluate the quality), so we would end up modifying a lot of existing tools just to add a "passing_qc_text_file" input.  I'm not sure how much time we'd want to spend updating a bunch of tools if something new is on the way.Aaron-"Dooley, Damion"  wrote: -To: John Chilton From: "Dooley, Damion" Date: 11/17/2015 01:31PMCc: "galaxy-...@lists.bx.psu.edu" , "aaron.pet...@phac-aspc.gc.ca" Subject: Re: [galaxy-dev] Tool code for symlinking a data collection from input to output?Ah, I can see how symlinking could lead to file management issues.  Well,we were trying to avoid the situation where use of our qc tool wouldrequire customizing any subsequent tools in a workflow, and as well,reduce disk overhead of hundred megabyte files being passed along in aworkflow.So wow on the second paragraph - enabling dependencies outside of toolfile I/o.  I agree with Eric, this will be great.Now in our current canned workflows we actually don't need this to beedited via the interface - so are there details on how to edit a workflowfile directly to get this dependency of tool B on tool A in place?Thanks,DamionOn 2015-11-17, 11:18 AM, "John Chilton"  wrote:>Slowly trying to catch up on e-mail after a lot of travel in November>and I answered a variant of this to Damion directly, the most relevant>snippet was:>>">I would not symbolic link the>files though. I would just take the original collection and pipe it>into the next tool and add a dummy input to the next tool>("passing_qc_text_file") that would cause the workflow to fail if the>qc fails. This is a bit hacky, but symbolic linking will break>Galaxy's deletion, purging, etc You can delete the original>dataset collection and the result would affect the files on disk for>the output collection without Galaxy having anyway to know.>>The workflow subsystem has the ability to define a connection like>this (just wait for one tool to pass before calling the next without a>input/output relationship) but it hasn't been exposed in the workflow>editor yet.">>-John
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] installable revisions for repository suite definitions

2015-10-29 Thread Aaron Petkau
Hello,

I'm running into some issues with maintaining multiple installable
revisions for a repository suite and I'm not sure if there's something I've
missed.

I have a local toolshed where I'm maintaining my tools, and I have a
repository suite definition, defining a number of dependencies.  So, I have:

suite_X (mercurial revision 0):
   Dependency A
   Dependency B

I updated "suite_X" and changed the dependencies:

suite_X (mercurial revision 1):
Dependency A
Dependency B
Dependency C

I would like to have both mercurial revision 0 and 1 installable in Galaxy,
but only revision 1 (the latest) is showing up as installable.  I found
documentation at https://wiki.galaxyproject.org/RepositoryRevisions but
this seems to only cover a repository containing a single tool, not a suite
of tools.

Is there something I'm doing wrong or some setting I have to change here?
My toolshed is running with Galaxy tag "v15.07".

Thanks,

Aaron
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Change in toolshed disallowing install of previous versions of tools?

2015-08-21 Thread Aaron Petkau
Hello Martin,

Yep, this is fully resolved for me.  Thanks for following up, and for all
the hard work you and the rest of the team do.

Aaron

On Fri, Aug 21, 2015 at 9:55 AM, Martin Čech mar...@bx.psu.edu wrote:

 Hi Aaron,

 I am just following up with a note that this has been fully resolved.
 Please let us know if you find any more problems.

 Thank you for using Galaxy.

 Martin

 On Wed, Aug 12, 2015 at 7:21 PM Aaron Petkau aaron.pet...@gmail.com
 wrote:

 Heh, always something that's discovered after the fact.  Happens to me
 all the time.  Thanks for getting back to me quickly.  Looks like the fix
 you did works.  Only other tool is sam_to_bam
 https://toolshed.g2.bx.psu.edu/repository?repository_id=01221a8c57f0fc1achangeset_revision=c73bf16b45df.
 Hopefully not too much trouble.

 Thanks again Martin,

 Aaron
 On Wed, Aug 12, 2015 at 5:10 PM, Martin Čech mar...@bx.psu.edu wrote:

 Hi Aaron,

 this is most probably connected with a bug that we discovered today in
 the 'reset metadata for all my repositories' feature. Unfortunately we
 discovered it by running it on Main Tool Shed. We have an alternate (time
 consuming) way of fixing individual repos (like I just did for the
 samtools_mpileup) so if you give me a list of tools you need to use
 with old revisions I will make sure they are in a good shape.

 This only affects devteam repos.
 We are working on a fix, please bare with us.

 Thank you for using Galaxy!

 Martin

 On Wed, Aug 12, 2015 at 5:44 PM Aaron Petkau aaron.pet...@gmail.com
 wrote:

 Hey everyone,

 So I rely a lot on specific versions of tools in the toolshed, and am
 often testing out the install process to make sure everything's working.
 Today I noticed that my install process was failing, and it looks like it's
 due to not being able to install previous versions of tools through the
 API.  It seems to default to installing the latest version of a tool.

 For example, for the tool samtools_mpileup
 https://toolshed.g2.bx.psu.edu/repository?repository_id=01d08a1b766b864echangeset_revision=aa0ef6f0ee89,
 if I run the code:

 gi.toolShed.install_repository_revision(http://toolshed.g2.bx.psu.edu/
 ,'samtools_mpileup','devteam','973fea5b4bdf',install_tool_dependencies=True,install_repository_dependencies=True)

 in bioblend (to install revision 973fea5b4bdf) I instead get the latest
 (aa0ef6f0ee89) revision installed.  I can see the previous revision still
 exists
 https://toolshed.g2.bx.psu.edu/repository?repository_id=01d08a1b766b864echangeset_revision=973fea5b4bdf
 .

 I'm wondering if something has changed here?  I remember being able to
 install old versions of tools through the API before, but maybe I'm getting
 mixed up?

 Thanks,

 Aaron
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   https://lists.galaxyproject.org/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Change in toolshed disallowing install of previous versions of tools?

2015-08-12 Thread Aaron Petkau
Hey everyone,

So I rely a lot on specific versions of tools in the toolshed, and am often
testing out the install process to make sure everything's working.  Today I
noticed that my install process was failing, and it looks like it's due to
not being able to install previous versions of tools through the API.  It
seems to default to installing the latest version of a tool.

For example, for the tool samtools_mpileup
https://toolshed.g2.bx.psu.edu/repository?repository_id=01d08a1b766b864echangeset_revision=aa0ef6f0ee89,
if I run the code:

gi.toolShed.install_repository_revision(http://toolshed.g2.bx.psu.edu/
,'samtools_mpileup','devteam','973fea5b4bdf',install_tool_dependencies=True,install_repository_dependencies=True)

in bioblend (to install revision 973fea5b4bdf) I instead get the latest
(aa0ef6f0ee89) revision installed.  I can see the previous revision still
exists
https://toolshed.g2.bx.psu.edu/repository?repository_id=01d08a1b766b864echangeset_revision=973fea5b4bdf
.

I'm wondering if something has changed here?  I remember being able to
install old versions of tools through the API before, but maybe I'm getting
mixed up?

Thanks,

Aaron
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Creating new dataset collections in a workflow

2015-08-07 Thread Aaron Petkau
Hey,

So, I've been working on a tool which will product a new dataset collection
as output.  I was following some of the instructions from
https://bitbucket.org/galaxy/galaxy-central/pull-requests/582/allow-tools-to-explicitly-produce-dataset/diff.
I managed to get the tool itself working, but when I go to use it in a
workflow I'm getting errors.  Mainly:

History does not include a dataset collection of the correct type or
containing the correct types of datasets

I'm wondering if there's something I'm doing wrong, or if tools which
product dataset collections are not supported within workflows?  I'm
working with the second case in that merge requests, using an input list as
the structure for my output list.

Thanks,

Aaron
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Database deadlock with large workflows + dataset collections

2015-02-27 Thread Aaron Petkau
Hey,

I wanted to know if anyone else has had experience with database deadlock
when using dataset collections and running a large number of samples
through a workflow.

Traceback (most recent call last):
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/lib/galaxy/jobs/runners/__init__.py,
line 565, in finish_job
job_state.job_wrapper.finish( stdout, stderr, exit_code )
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py,
line 1250, in finish
self.sa_session.flush()
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/scoping.py,
line 114, in do
return getattr(self.registry(), name)(*args, **kwargs)
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py,
line 1718, in flush
self._flush(objects)
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py,
line 1789, in _flush
flush_context.execute()
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py,
line 331, in execute
rec.execute(self)
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py,
line 475, in execute
uow
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py,
line 59, in save_obj
mapper, table, update)
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py,
line 485, in _emit_update_statements
execute(statement, params)
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py,
line 1449, in execute
params)
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py,
line 1584, in _execute_clauseelement
compiled_sql, distilled_params
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py,
line 1698, in _execute_context
context)
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py,
line 1691, in _execute_context
context)
  File 
/Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/default.py,
line 331, in do_execute
cursor.execute(statement, parameters)
DBAPIError: (TransactionRollbackError) deadlock detected
DETAIL:  Process 25859 waits for ShareLock on transaction 144373;
blocked by process 25858.
Process 25858 waits for ShareLock on transaction 144372; blocked by
process 25859.
HINT:  See server log for query details.
 'UPDATE workflow_invocation SET update_time=%(update_time)s WHERE
workflow_invocation.id = %(workflow_invocation_id)s' {'update_time':
datetime.datetime(2015, 2, 27, 3, 51, 57, 81403),
'workflow_invocation_id': 48}

I saw this post
http://dev.list.galaxyproject.org/data-collections-workflow-bug-td4666496.html
with a similar issue and the solution was to make sure not to use a
sqlite database, but I'm using a postgres database and still
encountered this issue.  This was after running a very large number of
samples (~200) using dataset collections.  Just wondering if anyone
else was running into this issue?

Thanks,

Aaron
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Database deadlock with large workflows + dataset collections

2015-02-27 Thread Aaron Petkau
Hello John,

Awesome, thanks so much for looking into this and all the other work you've
been doing on dataset collections.  Yes, this is the January stable
release, exact commit is
https://bitbucket.org/galaxy/galaxy-dist/commits/097bbb3b7d3246faaa5188a1fc2a79b01630025c.
I currently have one web worker and one background job handler so it would
have been scheduled in the job handler.  It is configured like:

[server:handler0]
use = egg:Paste#http
port = 8079
host = 127.0.0.1
use_threadpool = true
threadpool_workers = 5

I can paste the relevant section from the handler log, but I didn't see
anything there other then the same exception and stack trace.

Our job handler is submitting to our cluster using DRMAA, from job_conf.xml:

plugin id=drmaa type=runner
load=galaxy.jobs.runners.drmaa:DRMAAJobRunner workers=8/

The submission did take forever, over an hour.  It was being submitted
through the API so I didn't check the web response time but from my
experience before I imagine it would be very slow.  I've noticed this
slowdown before when I get into the 150-200 paired-end samples range.

I'll look forward to the next release though.  It would be awesome to have
the workflow submission sped up.

Let me know if there's any other info I can provide to help debug,

Aaron

On Fri, Feb 27, 2015 at 9:38 AM, John Chilton jmchil...@gmail.com wrote:

 Hey Aaron,

 Thanks for the bug report - I have added it to Trello here
 (https://trello.com/c/I0n23JEP). I assume this is the January stable
 release (15.01)? Any clue if this workflow was being scheduled in a
 web thread or a background job handler thread? (Did the submission
 take forever - or was the web page responsive and the server bogged
 down).

 So there is a race condition here it would seem - and I don't have a
 fix right away - but I do think (hope) the next release due out in a
 couple of weeks will result in a massive speed up in workflow
 scheduling - so hopefully we will be less likely to hit these
 conditions.

 -John

 On Fri, Feb 27, 2015 at 10:11 AM, Aaron Petkau aaron.pet...@gmail.com
 wrote:
  Hey,
 
  I wanted to know if anyone else has had experience with database deadlock
  when using dataset collections and running a large number of samples
 through
  a workflow.
 
  Traceback (most recent call last):
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/lib/galaxy/jobs/runners/__init__.py,
  line 565, in finish_job
  job_state.job_wrapper.finish( stdout, stderr, exit_code )
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py,
  line 1250, in finish
  self.sa_session.flush()
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/scoping.py,
  line 114, in do
  return getattr(self.registry(), name)(*args, **kwargs)
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py,
  line 1718, in flush
  self._flush(objects)
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py,
  line 1789, in _flush
  flush_context.execute()
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py,
  line 331, in execute
  rec.execute(self)
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py,
  line 475, in execute
  uow
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py,
  line 59, in save_obj
  mapper, table, update)
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py,
  line 485, in _emit_update_statements
  execute(statement, params)
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py,
  line 1449, in execute
  params)
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py,
  line 1584, in _execute_clauseelement
  compiled_sql, distilled_params
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py,
  line 1698, in _execute_context
  context)
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py,
  line 1691, in _execute_context
  context)
File
 
 /Warehouse/Applications/irida/galaxy/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/default.py,
  line 331, in do_execute
  cursor.execute(statement, parameters)
  DBAPIError