Please use the UCSC 2bit files for these genomes. For example Alpaca at: ftp://hgdownload.cse.ucsc.edu/gbdb/vicPac1/ Armadillo: ftp://hgdownload.cse.ucsc.edu/gbdb/dasNov2/ Dolphin: ftp://hgdownload.cse.ucsc.edu/gbdb/turTru1/ Baboon: ftp://hgdownload.cse.ucsc.edu/gbdb/papHam1/ Sloth: ftp://hgdownload.cse.ucsc.edu/gbdb/choHof1/ Tarsier: ftp://hgdownload.cse.ucsc.edu/gbdb/tarSyr1/
Not all genome downloads you can find on the internet are equivalent. These UCSC 2bit files are the sequence used in the pair wise alignments. --Hiram ----- Original Message ----- From: "guzhili" <[email protected]> To: "genome" <[email protected]> Cc: "wangyuting" <[email protected]> Sent: Friday, February 10, 2012 1:10:50 AM Subject: [Genome] A problem about generating synAxt files Dear staff members, Greetings! We are doing a project which needs alignment files of syntenicNet format. Since there's no direct download of these files in some species, we decided to do it on our own. During the work, syntenicNet files were sucessfully generated in most species, but we still met a problem in some other species. For example, in order to generate human-alpaca (Vicugna pacos) alignment file, we first download the alpaca genome sequence from Broad Institute (ftp://ftp.broadinstitute.org/pub/assemblies/mammals/alpaca/VicPac1.0/assembly_supers.fasta.gz) according to your documents. (The alpaca genome is not available at your site.) Then, we downloaded the all chain file and net file from your site (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/vsVicPac1/ ). And last, we generated the synAxt file using these files. We found the programme stopped and showed "chr15 twoBitReadSeqFrag in scaffold_168 end (1214406) >= seqSize (1214360)". the same problem also happened in 5 other speices: 1. armadillo. Genome are downloaded at Broad Institute. the message is "Processing chr10 twoBitReadSeqFrag in scaffold_41684 end (14002) >= seqSize (12349) " 2. baboon. Genome are downloaded from http://www.hgsc.bcm.tmc.edu/ftp-archive/Phamadryas/fasta/Pham_1.0/contigs/ . The message is "chr10 scaffold25053 is not found". 3. dolphin. Genome are downloaded at Broad Institute. the message is "chr10 twoBitReadSeqFrag in scaffold_110652 end (-113207) >= seqSize (28288) ". 4. sloth. Genome are downloaded at Broad Institute. the message is "chr1 twoBitReadSeqFrag in scaffold_34916 end (14118) >= seqSize (14091)". 5. tarsier. Genome are downloaded at Broad Institute. the message is "chr10 twoBitReadSeqFrag in scaffold_134510 end (6753) >= seqSize (5490)". It seems the errors are due to the mismatch of the genomes we downloaded, but We don't know where's going wrong. We are looking forward to your replay. Best, Gu Zhili 2012-02-10 _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
