Hi Steve, As requested here is the forwarded message. Could you advise me what to do now? There seems to be a missing entry in the kgXref file, which I downloaded via Galaxy (twice now). Best wishes, Beate
-------- Original Message -------- Subject: RE: [Genome] Missing chr13 entries on table: kgXref, track: UCSC Genes, group: Genes and Gene Prediction Tracks, assembly: Mar.2006 Date: Tue, 19 Jun 2012 16:40:48 -0700 From: Steve Heitner <[email protected]> Reply-To: <[email protected]> To: 'Beate St. Pourcain' <[email protected]> Hello, Beate. Could I ask that you please also send your response to [email protected] <mailto:[email protected]>? This way, everyone will be able to see the question and answer. Thanks! --Steve *From:*Beate St. Pourcain [mailto:[email protected]] *Sent:* Tuesday, June 19, 2012 4:02 PM *To:* [email protected] *Subject:* Re: [Genome] Missing chr13 entries on table: kgXref, track: UCSC Genes, group: Genes and Gene Prediction Tracks, assembly: Mar.2006 Hello Steve, I just doublechecked the files and downloaded kgXref again. The file does not contain the uc010aga.1 entry (see below). I went via Galaxy and obtained: 5: UCSC Main on Human: kgXref (genome) <javascript:void(0);> 66,803 lines, 1 comments format: tabular, database: hg18 Is there another safer way to download kgXref? It usually works fine via Galaxy otherwise. It puzzled me that specifically chr13 is missing as I dont select for it or do anything specifically with that chromosome. I really need the whole genome. Best wishes, Beate ###################################################################### kgXref<-read.table(paste(d.dir,"kgXref.tabular",sep=""),sep="\t",header=F) names(kgXref)<-c("name","mRNA","spID","spDisplayID", "geneSymbol", "refseq", "protAcc", "description") > kgXref[kgXref$name=="uc010aga.1",] [1] name mRNA spID spDisplayID geneSymbol refseq protAcc description <0 rows> (or 0-length row.names) kgXref_new<-read.table(paste(d.dir,"kgXref_new.tabular",sep=""),sep="\t",header=F) names(kgXref_new)<-c("name","mRNA","spID","spDisplayID", "geneSymbol", "refseq", "protAcc", "description") > kgXref_new[kgXref_new$name=="uc010aga.1",] [1] name mRNA spID spDisplayID geneSymbol refseq protAcc description <0 rows> (or 0-length row.names) On 19/06/2012 23:23, Steve Heitner wrote: Hello, Beate. I just double-checked using our Table Browser to ensure that the transcripts you listed would provide valid results when linked with kgXref, which they do. You are correct that kgXref does not have a chromosome entry. It is merely a table to cross-reference UCSC Gene IDs with other information such as gene symbols, RefSeq IDs, etc. Not every entry in kgXref will have a value in every field, but the entry for uc010aga.1 certainly does. The first thing to do would be to make sure that both of your tables have proper entries for uc010aga.1. The entry from kgXref should contain the following: kgID: uc010aga.1 mRNA: NM_001127692 spID: Q8WXQ7 spDisplayID: Q8WXQ7_HUMAN geneSymbol: PCCA refseq: NM_001127692 protAcc: NP_001121164 description: propionyl-Coenzyme A carboxylase, alpha If kgXref does not have this entry for uc010aga.1, you need to re-download your copy of kgXref and try again. If there is a valid entry for uc010aga.1 in knownGene and hgXref does contain this entry, there is likely some kind of problem with your R script. If that is the case, we unfortunately cannot provide assistance. I would recommend finding an online forum specifically for R support. Please contact us again [email protected] <mailto:[email protected]> if you have any further questions. --- Steve Heitner UCSC Genome Bioinformatics Group -----Original Message----- From:[email protected] <mailto:[email protected]> [mailto:[email protected]] On Behalf Of Beate St. Pourcain Sent: Tuesday, June 19, 2012 2:08 PM To:[email protected] <mailto:[email protected]> Subject: [Genome] Missing chr13 entries on table: kgXref, track: UCSC Genes, group: Genes and Gene Prediction Tracks, assembly: Mar.2006 Hi, I have been checking some gene predictions and for this reason I aligned the tables "knownGene" WITH "kgXref" from track: UCSC Genes, group: Genes and Gene Prediction Tracks, assembly: Mar.2006 This alignment deleted (excluded) all chrom13 entries from "knownGene". "kgXref" has not got a chromosome entry, but it looks as if the file has all entries referring to chr13 missing. See R snippet: knowngenes<-read.table(paste(d.dir,"knownGene_hg18_full.tabular",sep=""),sep ="\t",header=F) names(knowngenes)<-c("name", "chrom","strand","txStart","txEnd", "cdsStart", "cdsEnd", "exonCount", "exonStarts", "exonEnds", "proteinID", "alignID") kgXref<-read.table(paste(d.dir,"kgXref.tabular",sep=""),sep="\t",header=F) names(kgXref)<-c("name","mRNA","spID","spDisplayID", "geneSymbol", "refseq", "protAcc", "description") knowngenes_ref<-merge(knowngenes,kgXref,by="name", all.x=T) knowngenes_ref[knowngenes_ref$chrom=="chr13",] See below. Could you help with that? Many thanks, Beate 57554 NP_001121164 uc010aga.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57555 O95965 uc010agb.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57556 Q86UB2-2 uc010agc.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57557 uc010agd.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57558 uc010age.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57559 P49917 uc010agf.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57560 A8K8Q4 uc010agg.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57561 P49917 uc010agh.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57562 Q7L211 uc010agi.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57563 Q7Z5J2 uc010agj.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57564 A6H8Y0 uc010agk.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57565 uc010agl.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57566 Q9UK53-3 uc010agm.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57567 NP_003890 uc010agn.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 57568 uc010ago.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> _______________________________________________ Genome maillist [email protected] <mailto:[email protected]> https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
