Hello Jen, I wrote you about a year ago with a question about gnf2 expression data that I downloaded using the UCSC genome table browser. I've come back to this data for a different project and was reviewing our correspondence (see previous emails at bottom of page) and checking it against some downloaded data. There seems to be a significant discrepancy that I hope you can clarify.
Essentially, I want to be certain that I know which tissue and replicate each of the expression values in the output file corresponds to. So, I downloaded the GNF Atlas 2 absolute expression values for both original samples/replicates by opening the Table Browser and proceeding as follows: 1) Selected Clade: Mammal, Genome: Human, Assembly: Feb 2009, Group: Expression, Track: GNF Atlas 2, Table: hgFixed.gnfHumanAtlas2All 2) Next, I selected output format: 'all fields from selected table' 3) then I clicked 'Get output,' which opens an html window with the requested data, the first two lines of which is as follows: #name expCount expScores 1007_s_at 158 3621,3212,1078,1130,475,408,375,528,668,482,543,392,745,996,696,649,1124,1259,291,451,707,745,1022,1296,2956,2359,1462,2318,1157,1437,1662,841,1288,1575,3465,2565,1281,1504,1203,1415,1919,1330,292,112,1039,1498,1868,1679,1855,2219,2701,3162,3561,2943,3455,4784,4332,4136,3441,3333,3043,2922,3291,4413,2727,5157,3332,3064,6515,6949,4237,5045,1896,1810,2531,2425,2542,2070,8931,9319,4300,4765,2586,2623,3334,5043,1872,2320,1515,2165,2561,2859,5122,5007,1572,1717,5614,5501,4380,4137,2087,2416,4298,4484,1867,2184,2081,1932,5530,6309,1077,1149,3709,1832,2859,8037,1718,1876,1303,1537,1441,925,864,978,1571,1110,2494,1825,4551,2741,1588,1161,726,1428,1434,1005,1687,1509,775,996,930,1187,768,800,1110,1114,1436,1281,1211,1171,1225,1455,2559,2741,3083,4111,2179,2653, 1053_at 158 1041,522,265,351,222,244,519,248,272,247,297,538,191,60,195,102,390,635,526,384,510,700,549,657,1436,1441,316,253,301,228,530,905,757,530,247,296,228,301,182,229,175,99,453,329,239,130,30,32,29,79,147,75,42,104,74,112,142,121,50,76,98,28,119,124,24,129,24,109,30,194,110,48,122,19,17,172,27,158,221,60,38,231,17,60,378,242,170,318,54,212,17,74,42,170,30,126,224,199,136,123,153,135,155,25,293,396,303,214,270,145,159,31,62,95,118,111,153,122,57,171,174,214,73,30,29,106,16,225,67,24,131,48,76,28,172,46,70,35,34,117,29,75,22,25,59,97,21,72,38,127,130,74,156,31,31,17,55,33, Since this particular table does not have the expression IDs or descriptive names, I do not know which tissues/replicates each of the 158 values for each probe corresponds to. So, my first question is: Are the expression values in order of the tissue ID with each pair of replicates adjacent to one another (i.e., 0,0,1,1,2,2,3,3,etc...), or ordered by tissue ID for first replicate then by tissue ID for second replicate (i.e., 0,1,2,3,....; 0,1,2,3,...), or in some other order? Finally, I want to be sure that the tissue IDs listed in the table 'hgFixed.gnfHumanAtlas2MedianExps' (pasted below) are the same tissue IDs that I should be using to reference the absolute expression data provided in the 'hgFixed.gnfHumanAtlas2All' table. I ask this, in particular, because your correspondence of March 30,2010 indicated: " > For example, gnfHumanAtlas2AllExps.id =0 or =1, the first two fields are: > > id name > > 0 ColorectalAdenocarcinoma > > 1 ColorectalAdenocarcinoma 2 " which is different than the tissue ID-tissue description matches listed when I select and output the 'hgFixed.gnfHumanAtlas2All' table, for which I get the following list: #id description 0 fetal brain 1 whole brain 2 temporal lobe 3 parietal lobe 4 occipital lobe 5 prefrontal cortex 6 cingulate cortex 7 cerebellum 8 cerebellum peduncles 9 amygdala 10 hypothalamus 11 thalamus 12 subthalamic nucleus 13 caudate nucleus 14 globus pallidus 15 olfactory bulb 16 pons 17 medulla oblongata 18 spinal cord 19 ciliary ganglion 20 trigeminal ganglion 21 superior cervical ganglion 22 dorsal root ganglion 23 thymus 24 tonsil 25 lymph node 26 bone marrow 27 BM-CD71+ early erythroid 28 BM-CD33+ myeloid 29 BM-CD105+ endothelial 30 BM-CD34+ 31 whole blood 32 PB-BDCA4+ dentritic cells 33 PB-CD14+ monocytes 34 PB-CD56+ NKCells 35 PB-CD4+ Tcells 36 PB-CD8+ Tcells 37 PB-CD19+ Bcells 38 leukemia lymphoblastic(molt4) 39 721 B lymphoblasts 40 lymphoma Burkitts Raji 41 leukemia promyelocytic(hl60) 42 lymphoma Burkitts Daudi 43 leukemia chronic myelogenous(k562) 44 colorectal adenocarcinoma 45 appendix 46 skin 47 adipocyte 48 fetal thyroid 49 thyroid 50 pituitary gland 51 adrenal gland 52 adrenal cortex 53 prostate 54 salivary gland 55 pancreas 56 pancreatic islets 57 atrioventricular node 58 heart 59 cardiac myocytes 60 skeletal muscle 61 tongue 62 smooth muscle 63 uterus 64 uterus corpus 65 trachea 66 bronchial epithelial cells 67 fetal lung 68 lung 69 kidney 70 fetal liver 71 liver 72 placenta 73 testis 74 testis Leydig cell 75 testis germ cell 76 testis interstitial 77 testis seminiferous tubule 78 ovary Thank you for any assistance you may provide. Kathleen On Tue, Mar 30, 2010 at 3:51 PM, Jennifer Jackson <[email protected]> wrote: > Hello Kathleen, > > There are 76 distinct tissues with two replicates per experiment, which > brings the number of values = 158 scores. The order of the tissues is in the > gnfHumanAtlas2AllExps.id field, the tissue names are in the > gnfHumanAtlas2AllExps.name field. > > For example, gnfHumanAtlas2AllExps.id =0 or =1, the first two fields are: > > id name > > 0 ColorectalAdenocarcinoma > > 1 ColorectalAdenocarcinoma 2 > > This replication per-tissue is explained in the track's description page > (open Assembly browser and click on track name - or - open the Table browser > to the track, leave the primary table as-is, click on "describe table > schema", then scroll to the bottom on the page. > > Hopefully this addresses your questions, but please let us know if you need > more information, > Jen > > --------------------------------- > Jennifer Jackson > UCSC Genome Informatics Group > http://genome.ucsc.edu/ > > On 3/30/10 5:49 AM, kathleen askland wrote: >> >> I have recently downloaded human expression data via UCSC genome Table >> Browser using the following query parameters: Mammal, human, Assembly: >> Feb 2009(GRCh37/hg19), Group: Expression, Track: GNFAtlas2, Table: >> hgFixed.gnfHumanAtlas2All, as I wanted all available replicates >> available for each probe. >> >> However, the file output is very difficult to understand. There were >> 44775 probes (as expected) for which data are available. Each probe >> has a corresponding 'hgFixed.gnfHumanAtlas2All.expCount' value= 158, >> suggesting there should be 158 expression values per probe and, in >> fact, the column headed 'hgFixed.gnfHumanAtlas2All.expScores' does in >> fact contain 158 comma-separated absolute expression values. >> >> However, I am not able to obtain the EXP ids (i.e., tissue name) >> associated with each of the 158 expression values in the sequence so >> how is one supposed to figure out which tissue each of the 158 >> expression scores corresponds to? >> >> I have attempted to obtain those expression IDs in several ways, by >> selecting different associated tables to join and seemingly relevant >> variables to no avail. Moreover, even more confusingly, when I select >> from associated table gnfHumanAtlas2MedianExps the variables >> 'hgFixed.gnfHumanAtlas2AllExps.id' and >> 'hgFixed.gnfHumanAtlas2AllExps.name' which would seem like the desired >> information, I get a series of comma-separated EXP ids and the >> corresponding EXP id tissue names (e.g., 112 and Pancreas, >> respectively), but there are generally not 158 entries in each of >> these cells and many probes have 'n/a' in both columns. >> >> So, for example, probe '1007_s_at' has the following associated data: >> hgFixed.gnfHumanAtlas2All.expCount='158', >> hgFixed.gnfHumanAtlas2All.expScores= >> '3621,3212,1078,1130,475,408,375,528,...' (158 distinct values >> comma-separated) >> hgFixed.gnfHumanAtlas2AllExps.id= '112' >> hgFixed.gnfHumanAtlas2AllExps.name='Pancreas' >> >> While probe '117_at' gives: >> hgFixed.gnfHumanAtlas2All.expCount='158', >> hgFixed.gnfHumanAtlas2All.expScores= >> '338,277,2383,2456,617,423,...'(158 comma-separated values) >> hgFixed.gnfHumanAtlas2AllExps.id= >> '52,74,75,85,94,96,98,112,121,127,129,137,' >> >> hgFixed.gnfHumanAtlas2AllExps.name='cerebellum,CingulateCortex,CingulateCortex >> 2,Lung 2,Uterus,Thyroid,fetalThyroid,Pancreas,TestisGermCell >> 2,salivarygland 2,trachea 2,skin 2,' >> >> Since the number of expression values listed under >> 'hgFixed.gnfHumanAtlas2All.expScores' does not correspond to the >> number of Expression IDs/names listed under >> 'hgFixed.gnfHumanAtlas2AllExps.id' and >> 'hgFixed.gnfHumanAtlas2AllExps.name', respectively, how is one >> supposed to figure out which tissue each of the 158 expression scores >> corresponds to? >> > -- Kathleen Askland, MD Assistant Professor Department of Psychiatry & Human Behavior Warren Alpert School of Medicine Brown University/Butler Hospital _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
