Hello Florian, Hopefully we can help clarify the data a bit.
The results you have depend on how you did the filtering, but the guess is that you used the coordinates directly. If so, it would not be unexpected for two RefSeq transcripts to have the same 5' UTR but a different 3' UTR (or the reverse). It would not be unexpected for two RefSeq transcripts to have the same 5' and 3' UTR, but a different CDS (coding) region. The RefSeq track would not be expected to contain any completely redundant entries that are not labeled with the same RefSeq identifier. Meaning, a single distinct RefSeq transcript may map to the genome in more than one location (as documented in the track Methods), but two or more distinct RefSeq transcripts would be not expected to be identical in content. The RefSeq Genes track in the UCSC Browser has data processing that involves aligning the transcripts to the genome only (we do not curate the content of the dataset as published by NCBI). The best advice (if you want to explore this further) is to examine the transcripts that you think are redundant and first eliminate those with the same identifier (NM_* name). If you do find two records that are exactly the same using the UCSC browser, confirm by examining the most recent records at NCBI. Then, contact them to share the redundancy evidence, as it would indicate a problem that they may or may not be aware of. Thanks! Jennifer --------------------------------- Jennifer Jackson UCSC Genome Bioinformatics Group http://genome.ucsc.edu/ On 2/26/10 3:29 AM, Florian Wagner wrote: > Dear Sir/Madam, > > I fetched upstream and downstream regions for all RefSeq genes in a > certain region of chr2, based on NCBI36/hg18. Initially, the lists had > the same number of entries, but after filtering for redundant entries I > ended up with slightly different numbers (138 for upstream, 130 for > downstream). This is a bit surprising, as I would expect equal numbers > (it is based on the same RefSeq genes). Could you please comment on this? > > Thank you and best regards, > Florian Wagner > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
