[galaxy-dev] Dataset Collections status

2015-08-07 Thread Keith Suderman
Greetings,

I started pulling Galaxy code from the dev branch a few months ago to take 
advantage of the (then just emerging) dataset collections feature.  However, it 
is not clear to me from the latest release notes if the data collections are 
now fully merged into master, or if I should continue to use the code in the 
dev branch to take advantage of bleeding edge code.  I would like to move back 
to the master branch as soon as feasible.

When running workflows over dataset collections I will frequently see errors 
like:

/bin/sh: 1: 
/home/galaxy/galaxy_old/database/job_working_directory/001/1216/galaxy_1216.sh: 
Text file busy

Which, from what I can tell, occurs when one process is trying to modify/delete 
a file open in another process.  While the error seems to be repeatable, it 
also seems random as the errors do not occur in the same places if I run the 
workflow multiple times.

Given that I am working from the dev branch I don't want to open/raise issues 
on features still in development.  But if this is unexpected then I can do some 
more investigating and file a proper bug report.

Cheers,
Keith

--
Research Associate
Department of Computer Science
Vassar College
Poughkeepsie, NY


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Creating new dataset collections in a workflow

2015-08-07 Thread John Chilton
None of the fixes are newer than that - and I haven't checked exactly
but I guess any applied to 15.07 had already been propagated back to
dev by August 4th. I would try to build the minimal example and hand
it off to me then - this really should work.

The size list thing is a bit awkward - is this just so you can use the
structured_like="" attribute on the output collection? I would just
use a dynamic output collection I guess with a discovered_datatsets
tag.

Merging collections should also be a native operation that doesn't
require a formal tool - along with other stuff like filtering
empty/bad datasets, zipping and unzipping pairs, splitting/merging
using operations defined on datatypes, etc I plan to throw
together a framework for collection operations (hopefully this next
release cycle).

-John

On Fri, Aug 7, 2015 at 6:58 PM, Aaron Petkau  wrote:
> Hello John,
>
> I tested this out on the github Galaxy with a commit from August 4
> (253ce7b8e3ddad7693da034374bed1a751173839) in dev.  Were any of the fixes
> you added done since then?  I'll try it out on the 15.07 release though.
>
> I would like to ask your opinion about something though.  So, the tool I'm
> trying to write is one to merge 2 dataset lists together into a larger list.
> The problem I want to solve is that we often deal with paired-end sequencing
> data (and collections work awesome for that), but not all sequencing is
> paired-end.  Sometimes we have single-end data as well we want to put
> through the same workflow.
>
> So, I figured I can just have another "input" element in my workflow for
> both paired-end and single-end.  The first step is reference mapping which
> produces BAM files.  I can do that separatly (since the mapper needs
> different parameters for single vs. paired end), but then I want to merge my
> BAM file lists together).
>
> That is, I have a workflow like:
>
> +--+   +---+
> |  |   |   |
> |  Paired-end  +--^+ Ref Mapping   +--+
> |  |   |   |  |
> +--+   +---+ +v+
>  | |
>  | Merge BAM Lists +-->
>  | |
> +--+   +---+ +^+
> |  |   |   |  |
> | Single-end   +--^+ Ref Mapping   +--+
> |  |   |   |
> +--+   +---+
>
> So, I was trying to develop a tool to merge both lists together.  I was
> doing this by writing a tool that takes as input both BAM lists, along with
> another list defining the exact size of the merged lists but with empty
> datasets, and copying any datasets over.  That is:
>
> Input List1: [A: a.bam ,B: b.bam]
> Input List2: [C: c.bam]
> Input Size List: [A: empty, B: empty ,C: empty]
>
> Output List: [A: a.bam, B: b.bam, C: c.bam]
>
> I know it looks a bit ugly to have that "size list" around, but I'm
> automating execution of the workflow so it's not as big of a deal to me.
> Not sure if you have any other solutions?
>
> Thanks for taking the time to read over this.  I'll do a bit more testing of
> my tool in other Galaxy versions.
>
> Aaron
>
> On Fri, Aug 7, 2015 at 12:15 PM, John Chilton  wrote:
>>
>> Aaron,
>>
>>   We fixed a few bugs related to this recently. Are you targetting
>> bitbucket or github - and which tag of Galaxy? I would probably target
>> the 15.07 release on github for the latest and greatest fixes.
>>
>>   If that still doesn't work I would recommend trying to pair down the
>> tool and workflow to build a minimal example to post. This really
>> should work in the abstract.
>>
>> -John
>>
>> On Fri, Aug 7, 2015 at 6:11 PM, Aaron Petkau 
>> wrote:
>> > Hey,
>> >
>> > So, I've been working on a tool which will product a new dataset
>> > collection
>> > as output.  I was following some of the instructions from
>> >
>> > https://bitbucket.org/galaxy/galaxy-central/pull-requests/582/allow-tools-to-explicitly-produce-dataset/diff.
>> > I managed to get the tool itself working, but when I go to use it in a
>> > workflow I'm getting errors.  Mainly:
>> >
>> > History does not include a dataset collection of the correct type or
>> > containing the correct types of datasets
>> >
>> > I'm wondering if there's something I'm doing wrong, or if tools which
>> > product dataset collections are not supported within workflows?  I'm
>> > working
>> > with the second case in that merge requests, using an input list as the
>> > structure for my output list.
>> >
>> > Thanks,
>> >
>> > Aaron
>> >
>> > ___
>> > Please keep all replies on the list by using "reply all"
>> > in your mail client.  To manage your subscriptions to this
>> > and other Galaxy lists, 

Re: [galaxy-dev] Adding categories to Main

2015-08-07 Thread Martin Čech
No sub-categories yet; we plan on tackling the browsing issue by having
good search. :)

M.

On Fri, Aug 7, 2015 at 2:43 PM Keith Suderman 
wrote:

> Awesome!
>
> One concern we have is lumping all NLP tools into a single category; are
> sub-categories (and sub-sub-categories) possible?  However, we can start
> with this and see what we need later.
>
> Cheers,
> Keith
>
> On Aug 7, 2015, at 12:25 PM, Martin Čech  wrote:
>
> Dear Keith,
>
> I have created 'NLP' category at https://testtoolshed.g2.bx.psu.edu for
> you to test out the Tool Shed environment.
>
> Unless your tools are proprietary I strongly recommend using the Main Tool
> Shed at https://toolshed.g2.bx.psu.edu for their publishing and
> distribution. Let me know when you are ready with the tools and I will
> create the category for you in there.
>
> We happily welcome every tool contribution to Galaxy! Thank you for it.
>
> Martin, Galaxy Team
>
>
> On Fri, Aug 7, 2015 at 12:56 PM Björn Grüning 
> wrote:
>
>> Hi Keith,
>>
>> the Galaxy team can add new categories to the Tool Shed very easily and
>> such a mail is exactly the way to go. As soon as the Galaxy teams has
>> added the category you can use tools like planemo to upload all your
>> tools at once.
>>
>> https://github.com/galaxyproject/planemo
>>
>> Looking forward to see NLP tools in Galaxy and the Main TS! Awesome!
>> Bjoern
>>
>>
>> > Dear Galaxy Team,
>> >
>> > A colleague would like to upload some NLP (Natural Language
>> > Processing) tools to the Galaxy Test/Main tools sheds, but we are
>> > unclear what "categories" to use for the tools.  I see that the
>> > Main/Test tools sheds have a category for "Text Manipulation", but
>> > that does not seem appropriate for NLP tools.  Is it possible to have
>> > new categories added to the tool shed(s)?  If so, what is the
>> > process?
>> >
>> > I am just starting to investigate setting up our own local tool shed
>> > and I am coming across mentions of repository capsules and exporting
>> > tool sheds.  Would it be preferable to install the NLP tools to a
>> > local tool shed and then export a repository/capsule to be imported
>> > to the Test/Main tools sheds?  What happens if our tool shed uses a
>> > disjoint set of categories than Test/Main?
>> >
>> > Cheers, Keith
>> >
>> > -- Research Associate Department of
>> > Computer Science Vassar College Poughkeepsie, NY
>> >
>> >
>> > ___ Please
>> > keep all replies on the list by using "reply all" in your mail
>> > client.  To manage your subscriptions to this and other Galaxy lists,
>> > please use the interface at: https://lists.galaxyproject.org/
>> >
>> > To search Galaxy mailing lists use the unified search at:
>> > http://galaxyproject.org/search/mailinglists/
>> >
>>
>
> --
> Research Associate
> Department of Computer Science
> Vassar College
> Poughkeepsie, NY
>
>
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Adding categories to Main

2015-08-07 Thread Keith Suderman
Awesome! 

One concern we have is lumping all NLP tools into a single category; are 
sub-categories (and sub-sub-categories) possible?  However, we can start with 
this and see what we need later.

Cheers,
Keith

On Aug 7, 2015, at 12:25 PM, Martin Čech  wrote:

> Dear Keith,
> 
> I have created 'NLP' category at https://testtoolshed.g2.bx.psu.edu for you 
> to test out the Tool Shed environment.
> 
> Unless your tools are proprietary I strongly recommend using the Main Tool 
> Shed at https://toolshed.g2.bx.psu.edu for their publishing and distribution. 
> Let me know when you are ready with the tools and I will create the category 
> for you in there.
> 
> We happily welcome every tool contribution to Galaxy! Thank you for it.
> 
> Martin, Galaxy Team
> 
> 
> On Fri, Aug 7, 2015 at 12:56 PM Björn Grüning  
> wrote:
> Hi Keith,
> 
> the Galaxy team can add new categories to the Tool Shed very easily and
> such a mail is exactly the way to go. As soon as the Galaxy teams has
> added the category you can use tools like planemo to upload all your
> tools at once.
> 
> https://github.com/galaxyproject/planemo
> 
> Looking forward to see NLP tools in Galaxy and the Main TS! Awesome!
> Bjoern
> 
> 
> > Dear Galaxy Team,
> >
> > A colleague would like to upload some NLP (Natural Language
> > Processing) tools to the Galaxy Test/Main tools sheds, but we are
> > unclear what "categories" to use for the tools.  I see that the
> > Main/Test tools sheds have a category for "Text Manipulation", but
> > that does not seem appropriate for NLP tools.  Is it possible to have
> > new categories added to the tool shed(s)?  If so, what is the
> > process?
> >
> > I am just starting to investigate setting up our own local tool shed
> > and I am coming across mentions of repository capsules and exporting
> > tool sheds.  Would it be preferable to install the NLP tools to a
> > local tool shed and then export a repository/capsule to be imported
> > to the Test/Main tools sheds?  What happens if our tool shed uses a
> > disjoint set of categories than Test/Main?
> >
> > Cheers, Keith
> >
> > -- Research Associate Department of
> > Computer Science Vassar College Poughkeepsie, NY
> >
> >
> > ___ Please
> > keep all replies on the list by using "reply all" in your mail
> > client.  To manage your subscriptions to this and other Galaxy lists,
> > please use the interface at: https://lists.galaxyproject.org/
> >
> > To search Galaxy mailing lists use the unified search at:
> > http://galaxyproject.org/search/mailinglists/
> >

--
Research Associate
Department of Computer Science
Vassar College
Poughkeepsie, NY


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Creating new dataset collections in a workflow

2015-08-07 Thread Aaron Petkau
Hello John,

I tested this out on the github Galaxy with a commit from August 4
(253ce7b8e3ddad7693da034374bed1a751173839) in dev.  Were any of the fixes
you added done since then?  I'll try it out on the 15.07 release though.

I would like to ask your opinion about something though.  So, the tool I'm
trying to write is one to merge 2 dataset lists together into a larger
list.  The problem I want to solve is that we often deal with paired-end
sequencing data (and collections work awesome for that), but not all
sequencing is paired-end.  Sometimes we have single-end data as well we
want to put through the same workflow.

So, I figured I can just have another "input" element in my workflow for
both paired-end and single-end.  The first step is reference mapping which
produces BAM files.  I can do that separatly (since the mapper needs
different parameters for single vs. paired end), but then I want to merge
my BAM file lists together).

That is, I have a workflow like:

+--+   +---+
|  |   |   |
|  Paired-end  +--^+ Ref Mapping   +--+
|  |   |   |  |
+--+   +---+ +v+
 | |
 | Merge BAM Lists +-->
 | |
+--+   +---+ +^+
|  |   |   |  |
| Single-end   +--^+ Ref Mapping   +--+
|  |   |   |
+--+   +---+

So, I was trying to develop a tool to merge both lists together.  I was
doing this by writing a tool that takes as input both BAM lists, along with
another list defining the exact size of the merged lists but with empty
datasets, and copying any datasets over.  That is:

Input List1: [A: a.bam ,B: b.bam]
Input List2: [C: c.bam]
Input Size List: [A: empty, B: empty ,C: empty]

Output List: [A: a.bam, B: b.bam, C: c.bam]

I know it looks a bit ugly to have that "size list" around, but I'm
automating execution of the workflow so it's not as big of a deal to me.
Not sure if you have any other solutions?

Thanks for taking the time to read over this.  I'll do a bit more testing
of my tool in other Galaxy versions.

Aaron

On Fri, Aug 7, 2015 at 12:15 PM, John Chilton  wrote:

> Aaron,
>
>   We fixed a few bugs related to this recently. Are you targetting
> bitbucket or github - and which tag of Galaxy? I would probably target
> the 15.07 release on github for the latest and greatest fixes.
>
>   If that still doesn't work I would recommend trying to pair down the
> tool and workflow to build a minimal example to post. This really
> should work in the abstract.
>
> -John
>
> On Fri, Aug 7, 2015 at 6:11 PM, Aaron Petkau 
> wrote:
> > Hey,
> >
> > So, I've been working on a tool which will product a new dataset
> collection
> > as output.  I was following some of the instructions from
> >
> https://bitbucket.org/galaxy/galaxy-central/pull-requests/582/allow-tools-to-explicitly-produce-dataset/diff
> .
> > I managed to get the tool itself working, but when I go to use it in a
> > workflow I'm getting errors.  Mainly:
> >
> > History does not include a dataset collection of the correct type or
> > containing the correct types of datasets
> >
> > I'm wondering if there's something I'm doing wrong, or if tools which
> > product dataset collections are not supported within workflows?  I'm
> working
> > with the second case in that merge requests, using an input list as the
> > structure for my output list.
> >
> > Thanks,
> >
> > Aaron
> >
> > ___
> > Please keep all replies on the list by using "reply all"
> > in your mail client.  To manage your subscriptions to this
> > and other Galaxy lists, please use the interface at:
> >   https://lists.galaxyproject.org/
> >
> > To search Galaxy mailing lists use the unified search at:
> >   http://galaxyproject.org/search/mailinglists/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Adding categories to Main

2015-08-07 Thread Martin Čech
Dear Keith,

I have created 'NLP' category at https://testtoolshed.g2.bx.psu.edu for you
to test out the Tool Shed environment.

Unless your tools are proprietary I strongly recommend using the Main Tool
Shed at https://toolshed.g2.bx.psu.edu for their publishing and
distribution. Let me know when you are ready with the tools and I will
create the category for you in there.

We happily welcome every tool contribution to Galaxy! Thank you for it.

Martin, Galaxy Team


On Fri, Aug 7, 2015 at 12:56 PM Björn Grüning 
wrote:

> Hi Keith,
>
> the Galaxy team can add new categories to the Tool Shed very easily and
> such a mail is exactly the way to go. As soon as the Galaxy teams has
> added the category you can use tools like planemo to upload all your
> tools at once.
>
> https://github.com/galaxyproject/planemo
>
> Looking forward to see NLP tools in Galaxy and the Main TS! Awesome!
> Bjoern
>
>
> > Dear Galaxy Team,
> >
> > A colleague would like to upload some NLP (Natural Language
> > Processing) tools to the Galaxy Test/Main tools sheds, but we are
> > unclear what "categories" to use for the tools.  I see that the
> > Main/Test tools sheds have a category for "Text Manipulation", but
> > that does not seem appropriate for NLP tools.  Is it possible to have
> > new categories added to the tool shed(s)?  If so, what is the
> > process?
> >
> > I am just starting to investigate setting up our own local tool shed
> > and I am coming across mentions of repository capsules and exporting
> > tool sheds.  Would it be preferable to install the NLP tools to a
> > local tool shed and then export a repository/capsule to be imported
> > to the Test/Main tools sheds?  What happens if our tool shed uses a
> > disjoint set of categories than Test/Main?
> >
> > Cheers, Keith
> >
> > -- Research Associate Department of
> > Computer Science Vassar College Poughkeepsie, NY
> >
> >
> > ___ Please
> > keep all replies on the list by using "reply all" in your mail
> > client.  To manage your subscriptions to this and other Galaxy lists,
> > please use the interface at: https://lists.galaxyproject.org/
> >
> > To search Galaxy mailing lists use the unified search at:
> > http://galaxyproject.org/search/mailinglists/
> >
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Creating new dataset collections in a workflow

2015-08-07 Thread John Chilton
Aaron,

  We fixed a few bugs related to this recently. Are you targetting
bitbucket or github - and which tag of Galaxy? I would probably target
the 15.07 release on github for the latest and greatest fixes.

  If that still doesn't work I would recommend trying to pair down the
tool and workflow to build a minimal example to post. This really
should work in the abstract.

-John

On Fri, Aug 7, 2015 at 6:11 PM, Aaron Petkau  wrote:
> Hey,
>
> So, I've been working on a tool which will product a new dataset collection
> as output.  I was following some of the instructions from
> https://bitbucket.org/galaxy/galaxy-central/pull-requests/582/allow-tools-to-explicitly-produce-dataset/diff.
> I managed to get the tool itself working, but when I go to use it in a
> workflow I'm getting errors.  Mainly:
>
> History does not include a dataset collection of the correct type or
> containing the correct types of datasets
>
> I'm wondering if there's something I'm doing wrong, or if tools which
> product dataset collections are not supported within workflows?  I'm working
> with the second case in that merge requests, using an input list as the
> structure for my output list.
>
> Thanks,
>
> Aaron
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Creating new dataset collections in a workflow

2015-08-07 Thread Aaron Petkau
Hey,

So, I've been working on a tool which will product a new dataset collection
as output.  I was following some of the instructions from
https://bitbucket.org/galaxy/galaxy-central/pull-requests/582/allow-tools-to-explicitly-produce-dataset/diff.
I managed to get the tool itself working, but when I go to use it in a
workflow I'm getting errors.  Mainly:

History does not include a dataset collection of the correct type or
containing the correct types of datasets

I'm wondering if there's something I'm doing wrong, or if tools which
product dataset collections are not supported within workflows?  I'm
working with the second case in that merge requests, using an input list as
the structure for my output list.

Thanks,

Aaron
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Adding categories to Main

2015-08-07 Thread Björn Grüning
Hi Keith,

the Galaxy team can add new categories to the Tool Shed very easily and
such a mail is exactly the way to go. As soon as the Galaxy teams has
added the category you can use tools like planemo to upload all your
tools at once.

https://github.com/galaxyproject/planemo

Looking forward to see NLP tools in Galaxy and the Main TS! Awesome!
Bjoern


> Dear Galaxy Team,
> 
> A colleague would like to upload some NLP (Natural Language
> Processing) tools to the Galaxy Test/Main tools sheds, but we are
> unclear what "categories" to use for the tools.  I see that the
> Main/Test tools sheds have a category for "Text Manipulation", but
> that does not seem appropriate for NLP tools.  Is it possible to have
> new categories added to the tool shed(s)?  If so, what is the
> process?
> 
> I am just starting to investigate setting up our own local tool shed
> and I am coming across mentions of repository capsules and exporting
> tool sheds.  Would it be preferable to install the NLP tools to a
> local tool shed and then export a repository/capsule to be imported
> to the Test/Main tools sheds?  What happens if our tool shed uses a
> disjoint set of categories than Test/Main?
> 
> Cheers, Keith
> 
> -- Research Associate Department of
> Computer Science Vassar College Poughkeepsie, NY
> 
> 
> ___ Please
> keep all replies on the list by using "reply all" in your mail
> client.  To manage your subscriptions to this and other Galaxy lists,
> please use the interface at: https://lists.galaxyproject.org/
> 
> To search Galaxy mailing lists use the unified search at: 
> http://galaxyproject.org/search/mailinglists/
> 
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Adding categories to Main

2015-08-07 Thread Keith Suderman
Dear Galaxy Team,

A colleague would like to upload some NLP (Natural Language Processing) tools 
to the Galaxy Test/Main tools sheds, but we are unclear what "categories" to 
use for the tools.  I see that the Main/Test tools sheds have a category for 
"Text Manipulation", but that does not seem appropriate for NLP tools.  Is it 
possible to have new categories added to the tool shed(s)?  If so, what is the 
process?

I am just starting to investigate setting up our own local tool shed and I am 
coming across mentions of repository capsules and exporting tool sheds.  Would 
it be preferable to install the NLP tools to a local tool shed and then export 
a repository/capsule to be imported to the Test/Main tools sheds?  What happens 
if our tool shed uses a disjoint set of categories than Test/Main?

Cheers,
Keith

--
Research Associate
Department of Computer Science
Vassar College
Poughkeepsie, NY


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/