Dear UCSC, I'm trying to use Multiz to merge the pairwise alignments (all use human genome as a target) to get mutiple-species alignments. Before that, it is necessary for me to convert the axt format to maf format. So I need the tSizes and qSizes files of all the species I am interested.
In the latest reply, Dr. Angie Hinrichs told us to use fetchChromSizes to get the sizes of different species. And it does work well in most species. However, fetchChromSizes still cannot get sizes of some species, such as aplaca_vicPac1, baboon_papHam1 and so on. As we can download the pairwise alignments (human genome as a target) of these species from UCSC, it should have been sequenced and have sizes files of all these species. So, I wonder where and how I can get these species' sizes files which cannot get by fetchChromSizes. We are looking forward to your reply. Thank you! PS: Here are the list of all these species lack sizes files. alpaca_vicPac1 armadillo_dasNov2 baboon_papHam1 dolphin_turTru1 megabat_pteVam1 microbat_myoLuc1 mouseLemur_micMur1 pika_ochPri2 rockHyrax_proCap1 shrew_sorAra1 sloth_choHof1 tarsier_tarSyr1 tenrec_echTel1 wallaby_macEug1 Guangyi Dai Laboratory of Evolutionary Genomics CAS-MPG Partner Institute for Computational Biology Chinese Academy of Sciences Yue Yang Road 320 Shanghai, 200031 China Tel: +(86)-21-54920487 Fax: +(86)-21-54920451 E-mail: [email protected] -----Original Message----- On 2011-11-24, at 上午4:05, Angie Hinrichs wrote: Hello Chen Ming, I would like to add a bit about these parts of your question: So could you please tell me where I can find the correct information about the supercontigs, especially their length information ? We have a shell script fetchChromSizes that retrieves the sizes. You can download the script here: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/fetchChromSizes Run "fetchChromSizes" with no arguments to see usage instructions. Could you please explain about the function of tSizes and qSizes in the axtToMaf program? Is it necessary for them to be accuracy? MAF's "s" and "e" blocks have a srcSize field (see http://genome.ucsc.edu/FAQ/FAQformat.html#format5). AXT does not include that info, so axtToMaf takes it from the tSizes and qSizes input files. MAF's inclusion of chromosome sizes makes it possible to calculate forward-strand coordinates from the reverse-strand coordinates when the strand field is "-". Hope that helps, Angie ----- Original Message ----- From: "MING Chen,evolgen" <[email protected] > To: [email protected] Sent: Tuesday, November 22, 2011 10:07:27 AM Subject: [Genome] the supercontigs information of gorilla and the usage of axtToMaf Dear UCSC, I'm trying to convert the pairwise alignment files between human and gorilla (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/vsGorGor1/axtNet/ ) in axt format to maf, by using the axtToMaf program. But axtToMaf needs tSizes and qSizes. The human genome sizes is easy to get. But the gorilla genome (GorGor1)size is difficult to find, especially for supercontigs. I have searched NCBI WGS ( http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=CABD01 ) and Ensembl (ftp://ftp.ensembl.org/pub/release-57/fasta/gorilla_gorilla/dna/ ). But there is no supercontig information matching your pairwise alignment files, for example: 0 chr1 10974 20818 Supercontig_0000035 107816 117704 + 890977 So could you please tell me where I can find the correct information about the supercontigs, especially their length information ? Could you please explain about the function of tSizes and qSizes in the axtToMaf program? Is it necessary for them to be accuracy? Thanks very much Plus: the pairwise alignment assembles: target/reference: Human (hg19, Feb. 2009, GRCh37 Genome Reference Consortium Human Reference 37 (GCA_000001405.1)) query: Gorilla (gorGor1, Oct. 2008, Sanger Institute Oct 2008 (NCBI project 31265, CABD01000000) Chen Ming Evolutionary Genomics (Evolgen) CAS-MPG Partner Institute for Computational Biology (PICB) Shanghai Institutes for Biological Sciences(SIBS) Chinese Academy of Sciences (CAS) 320 Yue Yang Rd. Shanghai, P.R.China 200031 TEL: +86-21-5492-0467 http://www.picb.ac.cn/evolgen/ _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
