Re: [galaxy-dev] tool for STAR RNA-seq aligner

2014-09-26 Thread Ross
Hi David.

I've not needed that workflow so haven't a solution for you and no, it
doesn't do anything with chimeric output - won't be hard to add I suspect.
There's no python wrapper - just shell script in the command segment.

It's not in an IUC main tool shed repository because it lacks a data
manager - manual star indexes are a bit of a pain but less pain than
writing a data manager :( so I haven't yet. Might be run best through the
API.

On shared memory: Pity. it works a treat for us. I didn't see anything on
the google group - do you recall where you learned about this deprecation ?


On Thu, Sep 25, 2014 at 10:41 PM, David Hoover hoove...@helix.nih.gov
wrote:

  Ross,

 About the index files:  It is way easier to have pre-built index files.
 However, when running a 2-pass STAR run, a user will need to generate their
 own reference index files based on the output SJ.tab.out file created in
 the first pass.  Is this incorporated into your tool?

 About shared memory:  I am under the impression that the latest version of
 STAR has deprecated this feature.  I am unclear how this would help unless
 a single large-memory machine was dedicated to running all STAR jobs.  Is
 this the case?

 Also, does the tool merge the SAM/BAM file with the output chimeric SAM
 file?

 David Hoover


 On 9/24/2014 7:03 PM, Ross wrote:

 Hi All,

  That (fubar in testtoolshed) star wrapper was derived from one
 originally written by Jeremy Goecks. I modified it for multiple inputs and
 added a few tweaks and it has been in production use in our group for about
 6 months so I'm pretty sure it works reasonably well in our hands at least.

  I would really appreciate any available help getting it to a proven
 useful state - suggestions and code welcomed. I have not moved it to the
 main toolshed because aside from some encouragement, I've had no feedback
 to suggest it's working - or not. It is extremely fast - we regularly see
 200-300M reads per minute in the logs!

  We regularly run a whole experiment worth (eg 12 - 24) fastq files
 simultaneously with the shared memory option working on our cluster - see
 the readme.

  Star index files made with a gene model (requires valid gff3) are huge -
 20-30GB for hg19 - hence the need for shared memory if you run multiple
 jobs. That will eventually become a serious problem if you really want to
 allow users to make their own - we definitely do not. You need to be very
 careful about matching the gene model gff3 file to the reference and I had
 enough trouble getting it right for the few major genomes we use to make me
 think that I do not want users trying to do that generating 25GB of rubbish
 every time they get it wrong.

  There are challenges to do with needing different indexes for different
 length reads but we are seeing fairly consistent 60bp single ended reads
 for most of the incoming RNA seq experiments.

  A data manager would be a boon if anyone cares to write one...


  On Thu, Sep 25, 2014 at 6:55 AM, Curtis Hendrickson (Campus) 
 curt...@uab.edu wrote:

 Bjorn

 We'd be interested in this tool, as well. Any idea how close to
 functional it is?
 I see it's only on TEST toolshed, and not on production, at this point.

 I don't see any related Trello card when searching on star

 Regards,
 Curtis
 Galaxy Admin @ University of Alabama at Birmingham

 -Original Message-
 From: galaxy-dev-boun...@lists.bx.psu.edu [mailto:
 galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Björn Grüning
 Sent: Wednesday, September 24, 2014 3:15 PM
 To: galaxy-dev@lists.bx.psu.edu; hoove...@helix.nih.gov  David Hoover
 Subject: Re: [galaxy-dev] tool for STAR RNA-seq aligner

 Hi David,

 yes there is inital code in the https://testtoolshed.g2.bx.psu.edu/. I
 think Ross has done some work on it.
 The main problem with Star is that is needs special indices (and a lot of
 it) and it would be great to offer data managers for it.

 Cheers,
 Bjoern

 Am 24.09.2014 um 22:05 schrieb David Hoover:
   Hi,
 
  I am developing a tool for STAR (https://code.google.com/p/rna-star/),
 and I realize I may be reinventing another wheel.  Has anyone else created
 a tool for STAR?  There's nothing else in the toolsheds for it yet.
 
  David
 
  
  David Hoover, PhD
  Helix Systems Staff
  SCB/DCSS/CIT/NIH
  301-435-2986
  http://helix.nih.gov
 
 
 
 
 
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this and other
  Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/
 
  To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this and other
 Galaxy lists, please use the interface at:
   

Re: [galaxy-dev] tool for STAR RNA-seq aligner

2014-09-26 Thread David Hoover
A colleague of mine mentioned it.  I'll ask him where he got his info.  Just to 
clarify: do you always run STAR jobs on the same host?  We are running Galaxy 
in front of a batch system cluster, and so by default STAR jobs would run on 
different nodes.  It's not clear to me how long the memory allocated would last 
after the batch job finished.  How do you determine whether the memory remains 
allocated and whether the job has been accelerated due to pre-loaded data?  For 
example, if you create a genome reference, using --genomeLoad=LoadAndKeep, then 
run an alignment, are subsequent alignments using the same genome reference 
much faster?  If so, how much faster?  

I apologize, I am a jack of all trades, master of none.  I could test this 
myself, but everything I touch related to genomics takes 50GB of memory and 
18 hours clocktime, and it gets painful to try testing everything.

David


David Hoover, PhD
Helix Systems Staff
SCB/DCSS/CIT/NIH
301-435-2986
http://helix.nih.gov




On Sep 26, 2014, at 4:13 AM, Ross ross.laza...@gmail.com wrote:

 Hi David.
 
 I've not needed that workflow so haven't a solution for you and no, it 
 doesn't do anything with chimeric output - won't be hard to add I suspect. 
 There's no python wrapper - just shell script in the command segment.
 
 It's not in an IUC main tool shed repository because it lacks a data manager 
 - manual star indexes are a bit of a pain but less pain than writing a data 
 manager :( so I haven't yet. Might be run best through the API.
 
 On shared memory: Pity. it works a treat for us. I didn't see anything on the 
 google group - do you recall where you learned about this deprecation ?
 
 
 On Thu, Sep 25, 2014 at 10:41 PM, David Hoover hoove...@helix.nih.gov wrote:
 Ross,
 
 About the index files:  It is way easier to have pre-built index files.  
 However, when running a 2-pass STAR run, a user will need to generate their 
 own reference index files based on the output SJ.tab.out file created in the 
 first pass.  Is this incorporated into your tool?
 
 About shared memory:  I am under the impression that the latest version of 
 STAR has deprecated this feature.  I am unclear how this would help unless a 
 single large-memory machine was dedicated to running all STAR jobs.  Is this 
 the case?
 
 Also, does the tool merge the SAM/BAM file with the output chimeric SAM file?
 
 David Hoover
 
 
 On 9/24/2014 7:03 PM, Ross wrote:
 Hi All,
 
 That (fubar in testtoolshed) star wrapper was derived from one originally 
 written by Jeremy Goecks. I modified it for multiple inputs and added a few 
 tweaks and it has been in production use in our group for about 6 months so 
 I'm pretty sure it works reasonably well in our hands at least. 
 
 I would really appreciate any available help getting it to a proven useful 
 state - suggestions and code welcomed. I have not moved it to the main 
 toolshed because aside from some encouragement, I've had no feedback to 
 suggest it's working - or not. It is extremely fast - we regularly see 
 200-300M reads per minute in the logs!
 
 We regularly run a whole experiment worth (eg 12 - 24) fastq files 
 simultaneously with the shared memory option working on our cluster - see 
 the readme.
 
 Star index files made with a gene model (requires valid gff3) are huge - 
 20-30GB for hg19 - hence the need for shared memory if you run multiple 
 jobs. That will eventually become a serious problem if you really want to 
 allow users to make their own - we definitely do not. You need to be very 
 careful about matching the gene model gff3 file to the reference and I had 
 enough trouble getting it right for the few major genomes we use to make me 
 think that I do not want users trying to do that generating 25GB of rubbish 
 every time they get it wrong.
 
 There are challenges to do with needing different indexes for different 
 length reads but we are seeing fairly consistent 60bp single ended reads for 
 most of the incoming RNA seq experiments.
 
 A data manager would be a boon if anyone cares to write one...
 
 
 On Thu, Sep 25, 2014 at 6:55 AM, Curtis Hendrickson (Campus) 
 curt...@uab.edu wrote:
 Bjorn
 
 We'd be interested in this tool, as well. Any idea how close to functional 
 it is?
 I see it's only on TEST toolshed, and not on production, at this point.
 
 I don't see any related Trello card when searching on star
 
 Regards,
 Curtis
 Galaxy Admin @ University of Alabama at Birmingham
 
 -Original Message-
 From: galaxy-dev-boun...@lists.bx.psu.edu 
 [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Björn Grüning
 Sent: Wednesday, September 24, 2014 3:15 PM
 To: galaxy-dev@lists.bx.psu.edu; hoove...@helix.nih.gov  David Hoover
 Subject: Re: [galaxy-dev] tool for STAR RNA-seq aligner
 
 Hi David,
 
 yes there is inital code in the https://testtoolshed.g2.bx.psu.edu/. I think 
 Ross has done some work on it.
 The main problem with Star is that is needs special indices 

[galaxy-dev] Set a new metadata attribute

2014-09-26 Thread Nikos Sidiropoulos
Hi all,

In a tool that I am writting I want to pass an input parameter value
(string) into the output file's metadata. Meaning that one of the tool
parameters is a barcode signature, 'NNWTGXN' for example. I want that
attribute to be stored somehow in the output file in order to be read by a
subsequent tool without the user having to set that parameter again.

The files I'll be working with are in FASTQ, BAM and tabular format.

Is it possible?

Bests,
Nikos
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Cloudman installed tools missing referneces

2014-09-26 Thread Iry Witham
Hi Team,

I have a new instance of galaxy cloudman running on AWS and when I go to run 
some of the tools I have installed like SAM-to-BAM it requires a reference 
genome, but none is available.  This is SAM-to-BAM version 1.1.4.  This is the 
first tool I have found this to be an issue so far.  Is there a loc file that 
needs modification?  I will need to add several references.

Thanks,
Iry

The information in this email, including attachments, may be confidential and 
is intended solely for the addressee(s). If you believe you received this email 
by mistake, please notify the sender by return email as soon as possible.
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Set a new metadata attribute

2014-09-26 Thread Peter Cock
On Fri, Sep 26, 2014 at 3:01 PM, Nikos Sidiropoulos
nikos.sid...@gmail.com wrote:
 Hi all,

 In a tool that I am writting I want to pass an input parameter value
 (string) into the output file's metadata. Meaning that one of the tool
 parameters is a barcode signature, 'NNWTGXN' for example. I want that
 attribute to be stored somehow in the output file in order to be read by a
 subsequent tool without the user having to set that parameter again.

 The files I'll be working with are in FASTQ, BAM and tabular format.

 Is it possible?

 Bests,
 Nikos

Your code can write the value directly into an output file
(e.g. one of the SAM/BAM headers might work), but I
don't think there is anything suitable within Galaxy for
re-exporting the parameter value as an input parameter
for a future tool.

However, at the workflow level you can set variables -
might that be a way forward?

https://wiki.galaxyproject.org/Learn/AdvancedWorkflow/VariablesEdit

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/