Hi Jaaved, We ran faCount, so no need to do that yourself:
http://www.broadinstitute.org/ftp/pub/assemblies/insects/droSec1/assembly.bases.gz 21,424 contigs (UCSC: 14,730 super contigs) http://www.broadinstitute.org/ftp/pub/assemblies/insects/droPer1/assembly.bases.gz 26,812 contigs (UCSC: 12,838 super contigs) As stated before, the assemblies hosted at UCSC have not been updated for quite some time. Obviously at lot of work has been done on these organisms since. You would have to go track down what labs produced the newer data, etc, in order to answer your questions. We just don't have that information. Please let us know if you have any additional questions: [email protected] - Greg Roe UCSC Genome Bioinformatics Group On 9/19/11 6:53 AM, Jaaved Mohammed wrote: > Hi Vanessa, > > Thanks for your response. Can you help point me to the download site > with the latest assembly for either of the 3 fly species for which the > engineer speaks of. > > I can find multiple nucleotide sequences across several sites. For > example, for D. willistoni, I can find an assembly at LBNL > (http://rana.lbl.gov/drosophila/assemblies.html), and from NCBI, I can > download all the raw sequences from > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=7260. I > could not find any of the insects up on NCBI Ensembl genome browser > either. Can you help point me in the right direction. > > Thanks, > Jaaved > > > > On Fri, Sep 16, 2011 at 3:06 PM, Vanessa Kirkup Swing > <[email protected]> wrote: >> Hi Jaaved, >> >> To answer your first question: >> >> The genomes we have are old so it is possible that the differences may >> be due to years of version updates. >> >> On of our engineers has this to say: >> Go to the current download site for these genomes, fetch the sequence >> file, and run an faCount on >> it to see what they name the bits. Compare names and genome >> organization with what we display. >> I would assume after 5 or 6 years, these genomes most likely have new >> assemblies. These genome >> project sites would most likely explain their update history. You may >> also find assembly history >> in the browsers at Ensembl. There may also be information on their >> trace archive pages if >> they have them. For example: >> http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=AAMC01 >> >> To answer your second question: >> >> Unfortunately, our funding covers primarily vertebrate genomes, though >> we do host a few of the major model organisms. >> >> Hope this help you. If you have further questions, please contact the >> mailing list: [email protected]. >> >> Vanessa Kirkup Swing >> UCSC Genome Bioinformatics Group >> >> >> ---------- Forwarded message ---------- >> From: Jaaved Mohammed<[email protected]> >> Date: Thu, Sep 15, 2011 at 8:57 AM >> Subject: [Genome] super vs scaffold coordinates& D. willistoni on the >> browser. >> To: [email protected] >> >> >> Hello, >> >> I have two questions that I would really appreciate your help with >> answering. >> >> =========== >> Firstly, >> =========== >> >> I am trying to understand the origin of the "super*" coordinates for the >> droPer1 and droSec1 genomes available on the UCSC Genome Browser. >> >> For example, in the D. sechellia assembly, I see that all the chromosomes >> are prefixed by "super" on the Genome Browser: >> http://genome-mirror.bscb.cornell.edu/cgi-bin/hgTracks?hgsid=36382&chromInfo >> Page=. However, from Flybase.org, the GFF files, or any coordinate for that >> matter on Flybase, is always prefixed by "scaffold" as can be seen from >> ftp://flybase.net/genomes/Drosophila_sechellia/current/gff/. >> Why is this? How were the conversion done from "scaffold" into "super" >> coordinates? I'm trying to convert the flybase genes reported in the GFF >> files into a file that I can upload to the browser to see the flybase >> annotated genes, non-coding RNAs, etc. however this clash of coordinate >> names is causing much problems. >> >> I should note that I looked in all the older revisions of the Flybase GFF >> files and still I see no "super" prefixed coordinates. I hope I'm not >> looking at the wrong flybase GFF files. >> >> The same observation was made in the droPer1 reference assembly. >> >> ============= >> Secondly, I've noticed that D. willistoni reference assembly is not >> available on the UCSC Genome Browser. Why is this? >> >> I've added this genome to the Cornell mirror using the droWil1.fa file >> downloaded/available from the UCSC browser. The added genome can be viewed >> here: >> http://genome-mirror.bscb.cornell.edu/cgi-bin/hgGateway?hgsid=36387&clade=in >> sect&org=D.+willistoni&db=0 >> >> On a similar note to the first point above, I've observed that the >> coordinates are prefixed with "scaffold" on the browser, but flybase reports >> coordinates prefixed with "scf2_": >> ftp://flybase.net/genomes/Drosophila_willistoni/current/gff/. >> >> >> Thanks, >> Jaaved >> >> >> -- >> Jaaved Mohammed, >> Ph.D. Student of Computational Biology >> Tri-Institutional Training Program in Computational Biology and Medicine >> (Cornell University - Ithaca, Weill Cornell Medical College, and Memorial >> Sloan-Kettering Cancer Center) >> >> >> >> >> >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
