Thank Pauline. I manually checked a couple of chromosomes and found out that you have updated the files chr*.fa.gz at ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/chromosomes, though the date was not updated to July 2012. These fa files are now the same as the those from rn4.2bit file or from chromFa.tar.gz at ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/bigZips.
Best, Yongchao On Fri, Jul 6, 2012 at 6:01 PM, Pauline Fujita <[email protected]> wrote: > Hello Yongchao, > > Thank you for reporting this issue. You are correct that these should match > and we have updated the relevant files accordingly. Please do not hesitate > to contact us again if you are still seeing discrepancies. > > > Best regards, > > Pauline Fujita > UCSC Genome Bioinformatics Group > http://genome.ucsc.edu > > > > > On 7/3/12 7:35 AM, Yongchao Ge wrote: >> >> Hi, >> >> I am working on the sequence data for the rn4. There seems three ways >> to access the sequence data with the following options >> >> 1. chr*.fa.gz: >> ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/chromosomes/chr*.fa.gz >> 2. chromFa.tar.gz: >> ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/bigZips/chromFa.tar.gz >> 3. rn4.2bit: ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/bigZips/rn4.2bit >> and then use the twoBitToFa command to convert the data into fasta >> format. >> >> Options 2 (chromFa.tag.gz) and 3 (rn4.bit) give identical sequences >> for chr1. However, there are differences between options 1 >> (chr1.fa.gz) and 2 (chromFa.tar.gz). On my Linux computer, the diff >> command to compare the two files can be seen at the end of the email, >> >> My understanding is that for both files should be identical as >> "Repeats from RepeatMasker and Tandem Repeats Finder (with period >> of 12 or less) are shown in lower case; non-repeating sequence is >> shown in upper case." >> >> My questions are, what caused the difference between the two files, >> was it possibly caused by different version of RepeatMasker or Tandem >> Repeats Finder or different parameters setting in those two softwares? >> and which file should I use in extracting the sequence? >> >> Thanks, >> >> Yongchao >> >> >> >> ------------------------------------------------------------------------------------------------------- >> chr1.fa is the unzipped file chr1.fa.gz (option 1) and >> chromFa/1/chr1.fa is extracted from the file chromFa.tar.gz (option >> 2). >> >> $ diff chr1.fa chromFa/1/chr1.fa |less >> 342,343c342,343 >> < ACTGCCTAAAGCAATACTAATTAGTAAGTTTTGGTGGCAAATGAGCTCTC >> < AGAAGCCTAAACATAttgagaacaggcaatctccattaatgggaggttgc >> --- >>> >>> ACTGCCTAAAGCAATACTAATTAGTAAGTTTTGGTGGCAAATGAGCTCTc >>> agaagcctaaacatattgagaacaggcaatctccattaatgggaggttgc >> >> 385,386c385,386 >> < AGCATATCCAAGATATTGTACTGTTTAATTTTTATCACCTTGATAAAATT >> < AGAACCATTTGAGAGAAGGAAATGAGAACATGAGTTTAAGGGCCTTCTTT >> --- >>> >>> AGCATATCCAAGATATtgtactgtttaatttttatcaccttgataaaatt >>> agaaccatttgagagaaggaaaTGAGAACATGAGTTTAAGGGCCTTCTTT >> >> 653,654c653,654 >> < acagtcaatgtctggcactgtggtatcccaaatatctgctagatatcttA >> < AGTTtcatagcactgagtgcctccacaataaaacaggagatagcatgcat >> --- >>> >>> acagtcaatgtctggcactgtggtatcccaaatatctgctagatatctta >>> agtttcatagcactgagtgcctcCACAATAaaacaggagatagcatgcat >> >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome > > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
