Hi Laura,

The gbStatus table is part of the suite of tables that supports our 
Genbank tracks; it is not related to the Ensembl tracks.

I looked at the Ensembl website for an example of the version number you 
are referring to.  I see that they do list a version number for each 
transcript or gene on their pages.  However, we do not keep the version 
in any of our tables.  You might be able to get the version numbers for 
a specific genebuild (version 65, in this case) directly from Ensembl.

If you have further questions, please contact us again at 
[email protected].

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 6/7/12 9:20 AM, Laura Smith wrote:
> Hi Brooke,
>
> Thank you again for your email.
>
> I have a question on gbstatus. I downloaded the gbStatus.txt file in
> this link you sent to me:
>
> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/gbStatus.txt.gz
>
> However, this file only contains "REFSEQ transcripts". It does not
> contain "ENSEMBL transcripts".
>
> I also would like to get the gbStatus for ENSEMBL transcripts.
>
> Does UCSC browser provide them?
>
> To be more clear:
>
> For example, on ucsc genome website, when I look at a random gene's
> refseq transcripts and ensembl transcripts, I noticed that when I click
> on refseq transcripts, they contain the accession ID.VERSION number as
> is NM_123.4 where ".4" is the version.
> However, the ENSEMBL transcripts do not have version numbers. They only
> have accession numbers such as ENS_123 on UCSC website. ENSEMBL
> transcripts should also have version numbers as listed in ENSEMBL website.
>
> So, do you know why this information is not included in the UCSC genome
> browser website?
>
>
> Another thing I tried is this: When I try to get the ENSEMBL gbstatus
> from galaxy website and do a join between gbstatus and ensembl
> transcripts and empty set is returned. I am guessing that the reason is
> because in the original gbstatus.txt file there is no ENSEMBL
> transcripts so there is nothing to join based on accession ids.
>
> Is there any plan to include "ENSEMBL versions" in gbstatus.txt file in
> the near future? If not, is there another way for me to retrieve them
> from ucsc genome browser?
>
> If you could please provide me any recommendation I would greatly
> appreciate it.
>
> Thank you,
> Laura
>
>
>
>
> ------------------------------------------------------------------------
> *From:* Brooke Rhead <[email protected]>
> *To:* Laura Smith <[email protected]>
> *Cc:* "[email protected]" <[email protected]>
> *Sent:* Monday, June 4, 2012 5:15 PM
> *Subject:* Re: [Genome] Downloading old refseq and ensemble transcripts
> with the "version numbers" in the accession IDs.
>
> Hi Laura,
>
> It looks like the Table Browser is timing out on this large query.
> There are a couple of ways you could work around this:
>
> You could try limiting the output by pasting a list of the RefSeq
> identifiers that you are interested in. When I followed the
> instructions in the link you sent but pasted in a single identifier, I
> was able to get results.
>
> Another way to get the information you want would be to download the two
> tables you are working with from our downloads server:
>
> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/
> (this page takes a while to load)
>
> Specifically, you would need:
> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz
> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/gbStatus.txt.gz
>
> Then you could join the two tables yourself.
>
> If you don't have a good way to accomplish a join of the tables, you
> could use Galaxy: https://main.g2.bx.psu.edu/. You would need to first
> fetch each of the tables separately using the "UCSC Main table browser"
> link (under "Get Data"), and then join them on the
> refGene.name/gbStatus.acc fields using the "Join two Datasets" link
> (under "Join, Subtract and Group").
>
> If you have any questions about using Galaxy, please contact their
> helpdesk at [email protected]
> <mailto:[email protected]>.
>
> --
> Brooke Rhead
> UCSC Genome Bioinformatics Group
>
>
> On 6/1/12 1:53 PM, Laura Smith wrote:
>  > Hello Steve,
>  >
>  > Thank you very much for your reply. Based on your suggestion, I
> decided to download the newest REFSEQ and ENSEMBLE transcripts from UCSC
> Browser with all of the gbstatus subfields.
>  >
>  > I have tried to download these files with the gbstatus fields,
>  > however
> I keep getting error from UCSC genome browser website. I am following
> the directions listed here:
>  >
>  > https://lists.soe.ucsc.edu/pipermail/genome/2011-September/027099.html
>  >
>  >
>  >
>  >
> Is there something I am doing wrong perhaps? Please see the attached 2
> files for the screenshots of the error messages from UCSC browser.
>  >
>  >
>  > If not, since I am not able to download these files, would it be
> possible for you to please send me or provide me a link to the latest
> Refseq and Ensemble transcripts please with all of the gbstatus subfields?
>  > or if you could please let me know how I can download them with the
> gbstatus fields, I would very much appreciate it.
>
>  > Thank you,
>  > Laura
>  >
>  >
>  >
>  >
>  > ________________________________
>  > From: Steve Heitner<[email protected] <mailto:[email protected]>>
>  > To: 'Laura Smith'<[email protected]
> <mailto:[email protected]>>; [email protected]
> <mailto:[email protected]>
>  > Sent: Wednesday, May 30, 2012 10:47 AM
>  > Subject: RE: [Genome] Downloading old refseq and ensemble transcripts
> with the "version numbers" in the accession IDs.
>  >
>  > Hello, Laura.
>  >
>  > As you mentioned, the refGene table does not actually list the Genbank
>  > version number. The hg19.gbStatus.version field does list the version
>  > number, but the problem is that this is the current version number
> and not
>  > necessarily the version that was current as of June 23, 2011. There
> is also
>  > a field called hg19.gbStatus.modDate that lists the last modified
> date, but
>  > there are two problems with this. First, our modDate does not necessarily
>  > coincide precisely with the official Genbank version date (e.g., our
> modDate
>  > for NM_021219.2 is March 21, 2012 while Genbank lists it as April 21,
> 2012).
>  > Also, if the particular transcript you are looking at is a version 3
> (e.g.,
>  > NR_001458.3), the gbStatus table does not keep a history of previous
>  > versions and modDates, so there is no way to know whether it was
> NR_001458.1
>  > or NR_001458.2 on June 23, 2011.
>  >
>  > We do not keep histories of the refGene table, so there is no June
> 23, 2011
>  > version of refGene that we can direct you to. There is no easy way to
> get a
>  > snapshot of the data as it existed on June 23, 2011. It is possible
> to look
>  > directly at Genbank to find the dates corresponding with the various
>  > transcript versions (e.g.,
> http://www.ncbi.nlm.nih.gov/nuccore/NM_021219.1
>  > shows that NM_021219.1 was released on April 24, 2002 and
>  > http://www.ncbi.nlm.nih.gov/nuccore/NM_021219.2 shows that
> NM_021219.2 was
>  > released on April 21, 2012), but if you have a large number of IDs, this
>  > would be very tedious without some kind of custom script.
>  >
>  > Please contact us again at [email protected]
> <mailto:[email protected]> if you have any further
>  > questions.
>  >
>  > ---
>  > Steve Heitner
>  > UCSC Genome Bioinformatics Group
>  >
>  > -----Original Message-----
>  > From: [email protected]
> <mailto:[email protected]> [mailto:[email protected]
> <mailto:[email protected]>] On
>  > Behalf Of Laura Smith
>  > Sent: Tuesday, May 29, 2012 1:56 PM
>  > To: [email protected] <mailto:[email protected]>
>  > Subject: [Genome] Downloading old refseq and ensemble transcripts
> with the
>  > "version numbers" in the accession IDs.
>  >
>  > Hello,
>  >
>  > I have been using the refseq transcripts and ensemble transcripts
> downloaded
>  > from UCSC genome browser table on June 23 2011. The transcript IDs in
> these
>  > datasets that were downloaded from UCSC do not have the version numbers
>  > (such as NM_134564.2) where ".2" is the version number after the period.
>  >
>  > However, recently, it turns out that I need to have the version
> numbers of
>  > each transcript. So, I tried to look for them and download them using the
>  > info provided here, however there is no way for me to choose the refseq
>  > transcripts for the date June 23 2011:
>  >
>  > https://lists.soe.ucsc.edu/pipermail/genome/2011-September/027099.html
>  >
>  >
>  > Would it be possible for you to please send me the refseq and ensemble
>  > transcripts for June 23 2011 from your archives please which includes the
>  > version numbers for each transcript in them?
>  >
>  >
>  > Or if there is a way that I could access this data myself, if you could
>  > please let me know I would very much appreciate it.
>  >
>  >
>  > Thank you,
>  > Laura
>  > _______________________________________________
>  > Genome maillist - [email protected] <mailto:[email protected]>
>  > https://lists.soe.ucsc.edu/mailman/listinfo/genome
>  >
>  >
>  >
>  > _______________________________________________
>  > Genome maillist - [email protected] <mailto:[email protected]>
>  > https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to