Hi Brooke, 

Thank you very much for your very informative email. 

I followed your instructions and I downloaded the REFSEQ and ENSEMBL 
transcripts from GALAXY exactly the way you described and I also downloaded the 
gbstatus and did a "join" on the transcript name. 

Now, I need to know which version of ENSEMBLE and REFSEQ are these that I 
downloaded. Would it be possible for you to please kindly let me know how and 
where I can retrieve this information? 

Basically to summarize, what versions of ENSEMBLE and REFSEQ transcripts are on 
currently UCSC website? How often are they updated on UCSC website?  Is there 
an online link where this information is provided?


Another issue is, is it for sure that GALAXY is in-sync with all updates from 
UCSC website?  Perhaps, this is a question for GALAXY, but in case you may 
know, I wanted to ask you as well. When users access UCSC MAIN from GALAXY, are 
they connected to UCSC online website or some version of "in-house UCSC browser 
within GALAXY"?

Once again, thank you very much for your help. 

Laura
 


________________________________
 From: Brooke Rhead <[email protected]>
To: Laura Smith <[email protected]> 
Cc: "[email protected]" <[email protected]> 
Sent: Monday, June 4, 2012 5:15 PM
Subject: Re: [Genome] Downloading old refseq and ensemble transcripts with the 
"version numbers" in the accession IDs.
 
Hi Laura,

It looks like the Table Browser is timing out on this large query. 
There are a couple of ways you could work around this:

You could try limiting the output by pasting a list of the RefSeq 
identifiers that you are interested in.  When I followed the 
instructions in the link you sent but pasted in a single identifier, I 
was able to get results.

Another way to get the information you want would be to download the two 
tables you are working with from our downloads server:

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/
(this page takes a while to load)

Specifically, you would need:
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/gbStatus.txt.gz

Then you could join the two tables yourself.

If you don't have a good way to accomplish a join of the tables, you 
could use Galaxy: https://main.g2.bx.psu.edu/.  You would need to first 
fetch each of the tables separately using the "UCSC Main table browser" 
link (under "Get Data"), and then join them on the 
refGene.name/gbStatus.acc fields using the "Join two Datasets" link 
(under "Join, Subtract and Group").

If you have any questions about using Galaxy, please contact their 
helpdesk at [email protected].

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 6/1/12 1:53 PM, Laura Smith wrote:
> Hello Steve,
>
> Thank you very much for your reply. Based on your suggestion, I
decided to download the newest REFSEQ and ENSEMBLE transcripts from UCSC
Browser with all of the gbstatus subfields.
>
> I have tried to download these files with the gbstatus fields,
> however
I keep getting error from UCSC genome browser website. I am following
the directions listed here:
>
> https://lists.soe.ucsc.edu/pipermail/genome/2011-September/027099.html
>
>
>
>
Is there something I am doing wrong perhaps? Please see the attached 2
files for the screenshots of the error messages from UCSC browser.
>
>
> If not, since I am not able to download these files, would it be
possible for you to please send me or provide me a link to the latest
Refseq and Ensemble transcripts please with all of the gbstatus subfields?
> or if you could please let me know how I can download them with the
gbstatus fields, I would very much appreciate it.

> Thank you,
> Laura
>
>
>
>
> ________________________________
>   From: Steve Heitner<[email protected]>
> To: 'Laura Smith'<[email protected]>; [email protected]
> Sent: Wednesday, May 30, 2012 10:47 AM
> Subject: RE: [Genome] Downloading old refseq and ensemble transcripts with 
> the "version numbers" in the accession IDs.
>
> Hello, Laura.
>
> As you mentioned, the refGene table does not actually list the Genbank
> version number.  The hg19.gbStatus.version field does list the version
> number, but the problem is that this is the current version number and not
> necessarily the version that was current as of June 23, 2011.  There is also
> a field called hg19.gbStatus.modDate that lists the last modified date, but
> there are two problems with this.  First, our modDate does not necessarily
> coincide precisely with the official Genbank version date (e.g., our modDate
> for NM_021219.2 is March 21, 2012 while Genbank lists it as April 21, 2012).
> Also, if the particular transcript you are looking at is a version 3 (e.g.,
> NR_001458.3), the gbStatus table does not keep a history of previous
> versions and modDates, so there is no way to know whether it was NR_001458.1
> or NR_001458.2 on June 23, 2011.
>
> We do not keep histories of the refGene table, so there is no June 23, 2011
> version of refGene that we can direct you to.  There is no easy way to get a
> snapshot of the data as it existed on June 23, 2011.  It is possible to look
> directly at Genbank to find the dates corresponding with the various
> transcript versions (e.g., http://www.ncbi.nlm.nih.gov/nuccore/NM_021219.1
> shows that NM_021219.1 was released on April 24, 2002 and
> http://www.ncbi.nlm.nih.gov/nuccore/NM_021219.2  shows that NM_021219.2 was
> released on April 21, 2012), but if you have a large number of IDs, this
> would be very tedious without some kind of custom script.
>
> Please contact us again at [email protected] if you have any further
> questions.
>
> ---
> Steve Heitner
> UCSC Genome Bioinformatics Group
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On
> Behalf Of Laura Smith
> Sent: Tuesday, May 29, 2012 1:56 PM
> To: [email protected]
> Subject: [Genome] Downloading old refseq and ensemble transcripts with the
> "version numbers" in the accession IDs.
>
> Hello,
>
> I have been using the refseq transcripts and ensemble transcripts downloaded
> from UCSC genome browser table on June 23 2011. The transcript IDs in these
> datasets that were downloaded from UCSC do not have the version numbers
> (such as NM_134564.2)  where ".2" is the version number after the period.
>
> However, recently, it turns out that I need to have the version numbers of
> each transcript.  So, I tried to look for them and download them using the
> info provided here, however there is no way for me to choose the refseq
> transcripts for the date June 23 2011:
>
> https://lists.soe.ucsc.edu/pipermail/genome/2011-September/027099.html
>
>
> Would it be possible for you to please send me the refseq and ensemble
> transcripts for June 23 2011 from your archives please which includes the
> version numbers for each transcript in them?
>
>
> Or if there is a way that I could access this data myself, if you could
> please let me know I would very much appreciate it.
>
>
> Thank you,
> Laura
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
>
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to