Hi Sebastian, One of our engineers commented:
" The rules are not actually as stated. The actual rule is encoded by this SQL statement: select name from refGene where cdsStart != cdsEnd and cdsStart != txStart and cdsEnd != txEnd that is, there has to be a coding portion, and the gene model has to have annotated 5' *and* 3' UTR's." Please let us know if you have any additional questions: [email protected] - Greg Roe UCSC Genome Bioinformatics Group On 3/5/12 3:16 AM, Sebastian Ohse wrote: > Hi Greg, > > thanks for adding the link. There is one other question I have with > regards to the provided "upstream1000.fa.gz" file. It mentions in the > FAQ that only "RefSeq genes that have annotated 5' UTRs" are included. > > This is probably because information about the TSS is needed to compute > the correct upstream sequences. However, what exactly is the criteria > used to decide if a RefSeq gene has a properly annotated 5' UTR? Is > there a public database which contains this subgroup or is some UCSC > pipeline used instead? > > Best, > Sebastian > > > > On Tue, 2012-02-21 at 13:18 -0800, Greg Roe wrote: >> Ah, yeah, we try to limit search engines from cataloging our test >> server, but sometimes things get through. >> >> To get to the fold from out home page (http://genome.ucsc.edu), click >> the downloads link down the left side menu. Then click Human, and then >> the "Full Data Set" link. >> >> Good idea to add a link from that FAQ page. I'll do that today. >> >> - Greg >> >> >> On 2/21/12 12:03 PM, Sebastian Ohse wrote: >>> It comes up in the first results when you goggle for "bigzips ucsc". How >>> do people normally find the folder? The below reference mentioned it's >>> in the bigZips download directory but there was no link... >>> >>> http://genome.ucsc.edu/FAQ/FAQdownloads.html#download18 >>> >>> Best, >>> Sebastian >>> >>> >>> On Tue, 2012-02-21 at 11:52 -0800, Greg Roe wrote: >>>> Just out of curiosity. How did you end up with a url to our test server >>>> (hgdownload-test.cse.ucsc.edu)? >>>> >>>> - Greg >>>> >>>> On 2/18/12 6:21 AM, Sebastian Ohse wrote: >>>>> Hi, >>>>> >>>>> I have a question about the "upstream1000.fa.gz" file in the bigZips >>>>> directory of the current assembly. >>>>> >>>>> http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/bigZips/ >>>>> >>>>> It is mentioned in the Readme.txt of the same directory that the >>>>> "upstream1000.fa.gz" file is updated weekly. However, the date of last >>>>> modification is displayed as "20-Mar-2009 09:53". Clearly it's old. >>>>> >>>>> Further more, a previous post on the mailing list (back form the hg18 >>>>> assembly) mentioned that these download files are only created once and >>>>> thus anyone requiring an up-to-date version has to create such >>>>> themselves. >>>>> >>>>> https://lists.soe.ucsc.edu/pipermail/genome/2008-December/017777.html >>>>> >>>>> Can someone tell me if the information in the Readme.txt is indeed >>>>> incorrect and that I now have to construct an up-to-date version myself? >>>>> >>>>> Thanks, >>>>> Sebastian >>>>> >>>>> >>>>> _______________________________________________ >>>>> Genome maillist - [email protected] >>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
