Re: [galaxy-dev] Creating new dataset collections in a workflow
So, I tested out against 15.07, and same issue. I figured out what was giving me the "History does not include a dataset collection of the correct type or containing the correct types of datasets" error message, it was because I left out the *format* attribute in the tool XML. However, after setting the *format* attribute I get another error message "State validation failed." when trying to use the tool in a workflow. This same error message shows up for the *collection_creates_list *tool in Galaxy as well: https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/collection_creates_list.xml. I get the following message in the log file: galaxy.tools ERROR 2015-08-10 13:11:36,144 Checking parameter input1 failed. local variable 'rval' referenced before assignment Yeah, I know the size list is pretty hacky. It is just so that I can use the structured_like="" attribute. I did think of using the dynamic output collection, however I noticed that it only works with the updated workflow scheduler to schedule workflows in the background. My application currently uses the information about number of jobs completed/total number of jobs (example 10 completed/20 total) to create a progress bar for a workflow. I assumed that scheduling workflows in the background would break this since there's no way for me to know when Galaxy is finished scheduling all jobs (or is there? I haven't looked too closely). I did discuss this tool a bit more with some people over here and decided against pursuing it for now. I was trying to get a mixture of single-end and paired-end datasets working in Galaxy for an upcoming conference, but I can do without it and there's a lot of other work to do. It also doesn't make sense to hack something together if better support will be making it into the main Galaxy code base sometime soon. Thanks again for your input. It helped me out a lot, Aaron On Fri, Aug 7, 2015 at 1:57 PM, John Chilton wrote: > None of the fixes are newer than that - and I haven't checked exactly > but I guess any applied to 15.07 had already been propagated back to > dev by August 4th. I would try to build the minimal example and hand > it off to me then - this really should work. > > The size list thing is a bit awkward - is this just so you can use the > structured_like="" attribute on the output collection? I would just > use a dynamic output collection I guess with a discovered_datatsets > tag. > > Merging collections should also be a native operation that doesn't > require a formal tool - along with other stuff like filtering > empty/bad datasets, zipping and unzipping pairs, splitting/merging > using operations defined on datatypes, etc I plan to throw > together a framework for collection operations (hopefully this next > release cycle). > > -John > > On Fri, Aug 7, 2015 at 6:58 PM, Aaron Petkau > wrote: > > Hello John, > > > > I tested this out on the github Galaxy with a commit from August 4 > > (253ce7b8e3ddad7693da034374bed1a751173839) in dev. Were any of the fixes > > you added done since then? I'll try it out on the 15.07 release though. > > > > I would like to ask your opinion about something though. So, the tool > I'm > > trying to write is one to merge 2 dataset lists together into a larger > list. > > The problem I want to solve is that we often deal with paired-end > sequencing > > data (and collections work awesome for that), but not all sequencing is > > paired-end. Sometimes we have single-end data as well we want to put > > through the same workflow. > > > > So, I figured I can just have another "input" element in my workflow for > > both paired-end and single-end. The first step is reference mapping > which > > produces BAM files. I can do that separatly (since the mapper needs > > different parameters for single vs. paired end), but then I want to > merge my > > BAM file lists together). > > > > That is, I have a workflow like: > > > > +--+ +---+ > > | | | | > > | Paired-end +--^+ Ref Mapping +--+ > > | | | | | > > +--+ +---+ +v+ > > | | > > | Merge BAM Lists +--> > > | | > > +--+ +---+ +^+ > > | | | | | > > | Single-end +--^+ Ref Mapping +--+ > > | | | | > > +--+ +---+ > > > > So, I was trying to develop a tool to merge both lists together. I was > > doing this by writing a tool that takes as input both BAM lists, along > with > > another list defining the exact size of the merged lists but with empty > > datasets, and copying any dataset
Re: [galaxy-dev] Creating new dataset collections in a workflow
None of the fixes are newer than that - and I haven't checked exactly but I guess any applied to 15.07 had already been propagated back to dev by August 4th. I would try to build the minimal example and hand it off to me then - this really should work. The size list thing is a bit awkward - is this just so you can use the structured_like="" attribute on the output collection? I would just use a dynamic output collection I guess with a discovered_datatsets tag. Merging collections should also be a native operation that doesn't require a formal tool - along with other stuff like filtering empty/bad datasets, zipping and unzipping pairs, splitting/merging using operations defined on datatypes, etc I plan to throw together a framework for collection operations (hopefully this next release cycle). -John On Fri, Aug 7, 2015 at 6:58 PM, Aaron Petkau wrote: > Hello John, > > I tested this out on the github Galaxy with a commit from August 4 > (253ce7b8e3ddad7693da034374bed1a751173839) in dev. Were any of the fixes > you added done since then? I'll try it out on the 15.07 release though. > > I would like to ask your opinion about something though. So, the tool I'm > trying to write is one to merge 2 dataset lists together into a larger list. > The problem I want to solve is that we often deal with paired-end sequencing > data (and collections work awesome for that), but not all sequencing is > paired-end. Sometimes we have single-end data as well we want to put > through the same workflow. > > So, I figured I can just have another "input" element in my workflow for > both paired-end and single-end. The first step is reference mapping which > produces BAM files. I can do that separatly (since the mapper needs > different parameters for single vs. paired end), but then I want to merge my > BAM file lists together). > > That is, I have a workflow like: > > +--+ +---+ > | | | | > | Paired-end +--^+ Ref Mapping +--+ > | | | | | > +--+ +---+ +v+ > | | > | Merge BAM Lists +--> > | | > +--+ +---+ +^+ > | | | | | > | Single-end +--^+ Ref Mapping +--+ > | | | | > +--+ +---+ > > So, I was trying to develop a tool to merge both lists together. I was > doing this by writing a tool that takes as input both BAM lists, along with > another list defining the exact size of the merged lists but with empty > datasets, and copying any datasets over. That is: > > Input List1: [A: a.bam ,B: b.bam] > Input List2: [C: c.bam] > Input Size List: [A: empty, B: empty ,C: empty] > > Output List: [A: a.bam, B: b.bam, C: c.bam] > > I know it looks a bit ugly to have that "size list" around, but I'm > automating execution of the workflow so it's not as big of a deal to me. > Not sure if you have any other solutions? > > Thanks for taking the time to read over this. I'll do a bit more testing of > my tool in other Galaxy versions. > > Aaron > > On Fri, Aug 7, 2015 at 12:15 PM, John Chilton wrote: >> >> Aaron, >> >> We fixed a few bugs related to this recently. Are you targetting >> bitbucket or github - and which tag of Galaxy? I would probably target >> the 15.07 release on github for the latest and greatest fixes. >> >> If that still doesn't work I would recommend trying to pair down the >> tool and workflow to build a minimal example to post. This really >> should work in the abstract. >> >> -John >> >> On Fri, Aug 7, 2015 at 6:11 PM, Aaron Petkau >> wrote: >> > Hey, >> > >> > So, I've been working on a tool which will product a new dataset >> > collection >> > as output. I was following some of the instructions from >> > >> > https://bitbucket.org/galaxy/galaxy-central/pull-requests/582/allow-tools-to-explicitly-produce-dataset/diff. >> > I managed to get the tool itself working, but when I go to use it in a >> > workflow I'm getting errors. Mainly: >> > >> > History does not include a dataset collection of the correct type or >> > containing the correct types of datasets >> > >> > I'm wondering if there's something I'm doing wrong, or if tools which >> > product dataset collections are not supported within workflows? I'm >> > working >> > with the second case in that merge requests, using an input list as the >> > structure for my output list. >> > >> > Thanks, >> > >> > Aaron >> > >> > ___ >> > Please keep all replies on the list by using "reply all" >> > in your mail client. To manage your subscriptions to this >> > and other Galaxy lists,
Re: [galaxy-dev] Creating new dataset collections in a workflow
Hello John, I tested this out on the github Galaxy with a commit from August 4 (253ce7b8e3ddad7693da034374bed1a751173839) in dev. Were any of the fixes you added done since then? I'll try it out on the 15.07 release though. I would like to ask your opinion about something though. So, the tool I'm trying to write is one to merge 2 dataset lists together into a larger list. The problem I want to solve is that we often deal with paired-end sequencing data (and collections work awesome for that), but not all sequencing is paired-end. Sometimes we have single-end data as well we want to put through the same workflow. So, I figured I can just have another "input" element in my workflow for both paired-end and single-end. The first step is reference mapping which produces BAM files. I can do that separatly (since the mapper needs different parameters for single vs. paired end), but then I want to merge my BAM file lists together). That is, I have a workflow like: +--+ +---+ | | | | | Paired-end +--^+ Ref Mapping +--+ | | | | | +--+ +---+ +v+ | | | Merge BAM Lists +--> | | +--+ +---+ +^+ | | | | | | Single-end +--^+ Ref Mapping +--+ | | | | +--+ +---+ So, I was trying to develop a tool to merge both lists together. I was doing this by writing a tool that takes as input both BAM lists, along with another list defining the exact size of the merged lists but with empty datasets, and copying any datasets over. That is: Input List1: [A: a.bam ,B: b.bam] Input List2: [C: c.bam] Input Size List: [A: empty, B: empty ,C: empty] Output List: [A: a.bam, B: b.bam, C: c.bam] I know it looks a bit ugly to have that "size list" around, but I'm automating execution of the workflow so it's not as big of a deal to me. Not sure if you have any other solutions? Thanks for taking the time to read over this. I'll do a bit more testing of my tool in other Galaxy versions. Aaron On Fri, Aug 7, 2015 at 12:15 PM, John Chilton wrote: > Aaron, > > We fixed a few bugs related to this recently. Are you targetting > bitbucket or github - and which tag of Galaxy? I would probably target > the 15.07 release on github for the latest and greatest fixes. > > If that still doesn't work I would recommend trying to pair down the > tool and workflow to build a minimal example to post. This really > should work in the abstract. > > -John > > On Fri, Aug 7, 2015 at 6:11 PM, Aaron Petkau > wrote: > > Hey, > > > > So, I've been working on a tool which will product a new dataset > collection > > as output. I was following some of the instructions from > > > https://bitbucket.org/galaxy/galaxy-central/pull-requests/582/allow-tools-to-explicitly-produce-dataset/diff > . > > I managed to get the tool itself working, but when I go to use it in a > > workflow I'm getting errors. Mainly: > > > > History does not include a dataset collection of the correct type or > > containing the correct types of datasets > > > > I'm wondering if there's something I'm doing wrong, or if tools which > > product dataset collections are not supported within workflows? I'm > working > > with the second case in that merge requests, using an input list as the > > structure for my output list. > > > > Thanks, > > > > Aaron > > > > ___ > > Please keep all replies on the list by using "reply all" > > in your mail client. To manage your subscriptions to this > > and other Galaxy lists, please use the interface at: > > https://lists.galaxyproject.org/ > > > > To search Galaxy mailing lists use the unified search at: > > http://galaxyproject.org/search/mailinglists/ > ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Creating new dataset collections in a workflow
Aaron, We fixed a few bugs related to this recently. Are you targetting bitbucket or github - and which tag of Galaxy? I would probably target the 15.07 release on github for the latest and greatest fixes. If that still doesn't work I would recommend trying to pair down the tool and workflow to build a minimal example to post. This really should work in the abstract. -John On Fri, Aug 7, 2015 at 6:11 PM, Aaron Petkau wrote: > Hey, > > So, I've been working on a tool which will product a new dataset collection > as output. I was following some of the instructions from > https://bitbucket.org/galaxy/galaxy-central/pull-requests/582/allow-tools-to-explicitly-produce-dataset/diff. > I managed to get the tool itself working, but when I go to use it in a > workflow I'm getting errors. Mainly: > > History does not include a dataset collection of the correct type or > containing the correct types of datasets > > I'm wondering if there's something I'm doing wrong, or if tools which > product dataset collections are not supported within workflows? I'm working > with the second case in that merge requests, using an input list as the > structure for my output list. > > Thanks, > > Aaron > > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > https://lists.galaxyproject.org/ > > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-dev] Creating new dataset collections in a workflow
Hey, So, I've been working on a tool which will product a new dataset collection as output. I was following some of the instructions from https://bitbucket.org/galaxy/galaxy-central/pull-requests/582/allow-tools-to-explicitly-produce-dataset/diff. I managed to get the tool itself working, but when I go to use it in a workflow I'm getting errors. Mainly: History does not include a dataset collection of the correct type or containing the correct types of datasets I'm wondering if there's something I'm doing wrong, or if tools which product dataset collections are not supported within workflows? I'm working with the second case in that merge requests, using an input list as the structure for my output list. Thanks, Aaron ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/