Re: [galaxy-dev] datacache & bowtie2 for mm9 ?

Jennifer Jackson Sat, 21 Sep 2013 00:15:16 -0700

Hello Curtis,

The datacache was originally pointed to the data staging area and is nowpointed to the data published area. The difference is that the publishedarea contains data and location (.loc) files that are in synch and havecompleted final testing. It is your choice about whether to use thestaged-only data - it depends how risk tolerant your project is and ifyou plan on testing. But, that said, I think it is almost certainly fineor our team wouldn't have staged it yet. A vanishingly small number ofdatasets are pulled back once they make it to staging, and this is whywe were comfortable pointing datacache there in the first place (wereunable to point to the published area at first, but wanted to make thedata available ASAP).

Going forward - I can let you know that these indexes are very easy tocreate: one command-line execution, then add one line to the associated.loc file. Instructions are here, see "Bowtie and Tophat":

http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup

For one or few genomes, not a problem. For hundreds of genomes withvariants, can become tedious even with helper tools and in our case, theprocessing interacted with disk that was undergoing changes (as we havebeen working on system configuration most of the summer). Also, with theData Manager is now available, creating batch indexes for use via rsyncbecome lower priority. Even so, I would expect more indexes to be fullypublished once the final configuration is in place, as many are alreadystaged or close being staged (watch the yellow banner on Main).

Hopefully this helps to explain the data, guides you to making aninformed decision, and aids with creating your own indexes as needed,


Thanks!
Jen
Galaxy team

On 9/18/13 1:04 PM, Curtis Hendrickson (Campus) wrote:

Folks,
First, I wanted to thank you for making the datacache available(http://wiki.galaxyproject.org/Admin/Data%20Integration;rsync://datacache.g2.bx.psu.edu). It's a great resource.
However, what is the best way to stay abreast of changes to what's indatacache, and understand how these indexes are computed?
We are currently upgrading to bowtie2, but I notice that the bowtie2indices for mm9, which used to be in
rsync://datacache.g2.bx.psu.edu/indexes/mm9/mm9*/bowtie2_index
have been removed, and only the hg19 genome has bowtie2 indices. Whyonly that one, and not the others?
Where are the scripts you use to make these indices, in case I want tocreate bowtie2 indices for other
So, how do I find out **why** they were removed? (Can I safely use thecopy I have, or was there a problem with them?)
More generally, how do I understand the policies and logic behind thedatacache indices, and be notified of changes, short of running my ownperiodic rsync/diff?
Finally, since I'm doing "reproducible research" is anything plannedfor systematically versioning genome indices, so I can easily tellwhat version of a system (ie, what BWA version) was used to create theindex, and be sure that an index will not suddenly disappear.
Thanks,

Curtis

Research Associate/CTSA-Informatics Team

University of Alabama at Birmingham



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] datacache & bowtie2 for mm9 ?

Reply via email to