Hello Jen,

I wrote you about a year ago with a question about gnf2 expression
data that I downloaded using the UCSC genome table browser. I've come
back to this data for a different project and was reviewing our
correspondence (see previous emails at bottom of page) and checking it
against some downloaded data. There seems to be a significant
discrepancy that I hope you can clarify.

Essentially, I want to be certain that I know which tissue and
replicate each of the expression values in the output file corresponds
to.
So, I downloaded the GNF Atlas 2 absolute expression values for both
original samples/replicates by opening the Table Browser and
proceeding as follows:
1) Selected Clade: Mammal, Genome: Human, Assembly: Feb 2009, Group:
Expression, Track: GNF Atlas 2, Table: hgFixed.gnfHumanAtlas2All
2) Next, I selected output format: 'all fields from selected table'
3) then I clicked 'Get output,' which opens an html window with the
requested data, the first two lines of which is as follows:

#name   expCount        expScores
1007_s_at       158     
3621,3212,1078,1130,475,408,375,528,668,482,543,392,745,996,696,649,1124,1259,291,451,707,745,1022,1296,2956,2359,1462,2318,1157,1437,1662,841,1288,1575,3465,2565,1281,1504,1203,1415,1919,1330,292,112,1039,1498,1868,1679,1855,2219,2701,3162,3561,2943,3455,4784,4332,4136,3441,3333,3043,2922,3291,4413,2727,5157,3332,3064,6515,6949,4237,5045,1896,1810,2531,2425,2542,2070,8931,9319,4300,4765,2586,2623,3334,5043,1872,2320,1515,2165,2561,2859,5122,5007,1572,1717,5614,5501,4380,4137,2087,2416,4298,4484,1867,2184,2081,1932,5530,6309,1077,1149,3709,1832,2859,8037,1718,1876,1303,1537,1441,925,864,978,1571,1110,2494,1825,4551,2741,1588,1161,726,1428,1434,1005,1687,1509,775,996,930,1187,768,800,1110,1114,1436,1281,1211,1171,1225,1455,2559,2741,3083,4111,2179,2653,

1053_at 158     
1041,522,265,351,222,244,519,248,272,247,297,538,191,60,195,102,390,635,526,384,510,700,549,657,1436,1441,316,253,301,228,530,905,757,530,247,296,228,301,182,229,175,99,453,329,239,130,30,32,29,79,147,75,42,104,74,112,142,121,50,76,98,28,119,124,24,129,24,109,30,194,110,48,122,19,17,172,27,158,221,60,38,231,17,60,378,242,170,318,54,212,17,74,42,170,30,126,224,199,136,123,153,135,155,25,293,396,303,214,270,145,159,31,62,95,118,111,153,122,57,171,174,214,73,30,29,106,16,225,67,24,131,48,76,28,172,46,70,35,34,117,29,75,22,25,59,97,21,72,38,127,130,74,156,31,31,17,55,33,

Since this particular table does not have the expression IDs or
descriptive names, I do not know which tissues/replicates each of the
158 values for each probe corresponds to. So, my first question is:
Are the expression values in order of the tissue ID with each pair of
replicates adjacent to one another (i.e., 0,0,1,1,2,2,3,3,etc...), or
ordered by tissue ID for first replicate then by tissue ID for second
replicate (i.e., 0,1,2,3,....; 0,1,2,3,...), or in some other order?

Finally, I want to be sure that the tissue IDs listed in the table
'hgFixed.gnfHumanAtlas2MedianExps' (pasted below) are the same tissue
IDs that I should be using to reference the absolute expression data
provided in the 'hgFixed.gnfHumanAtlas2All' table.  I ask this, in
particular, because your correspondence of March 30,2010 indicated: "
> For example, gnfHumanAtlas2AllExps.id =0 or =1, the first two fields are:
>
> id      name
>
> 0       ColorectalAdenocarcinoma
>
> 1       ColorectalAdenocarcinoma 2 "

which is different than the tissue ID-tissue description matches
listed when I select and output the  'hgFixed.gnfHumanAtlas2All'
table, for which I get the following list:

#id     description
0       fetal brain
1       whole brain
2       temporal lobe
3       parietal lobe
4       occipital lobe
5       prefrontal cortex
6       cingulate cortex
7       cerebellum
8       cerebellum peduncles
9       amygdala
10      hypothalamus
11      thalamus
12      subthalamic nucleus
13      caudate nucleus
14      globus pallidus
15      olfactory bulb
16      pons
17      medulla oblongata
18      spinal cord
19      ciliary ganglion
20      trigeminal ganglion
21      superior cervical ganglion
22      dorsal root ganglion
23      thymus
24      tonsil
25      lymph node
26      bone marrow
27      BM-CD71+ early erythroid
28      BM-CD33+ myeloid
29      BM-CD105+ endothelial
30      BM-CD34+
31      whole blood
32      PB-BDCA4+ dentritic cells
33      PB-CD14+ monocytes
34      PB-CD56+ NKCells
35      PB-CD4+ Tcells
36      PB-CD8+ Tcells
37      PB-CD19+ Bcells
38      leukemia lymphoblastic(molt4)
39      721 B lymphoblasts
40      lymphoma Burkitts Raji
41      leukemia promyelocytic(hl60)
42      lymphoma Burkitts Daudi
43      leukemia chronic myelogenous(k562)
44      colorectal adenocarcinoma
45      appendix
46      skin
47      adipocyte
48      fetal thyroid
49      thyroid
50      pituitary gland
51      adrenal gland
52      adrenal cortex
53      prostate
54      salivary gland
55      pancreas
56      pancreatic islets
57      atrioventricular node
58      heart
59      cardiac myocytes
60      skeletal muscle
61      tongue
62      smooth muscle
63      uterus
64      uterus corpus
65      trachea
66      bronchial epithelial cells
67      fetal lung
68      lung
69      kidney
70      fetal liver
71      liver
72      placenta
73      testis
74      testis Leydig cell
75      testis germ cell
76      testis interstitial
77      testis seminiferous tubule
78      ovary

Thank you for any assistance you may provide.

Kathleen


On Tue, Mar 30, 2010 at 3:51 PM, Jennifer Jackson <[email protected]> wrote:
> Hello Kathleen,
>
> There are 76 distinct tissues with two replicates per experiment, which
> brings the number of values = 158 scores. The order of the tissues is in the
> gnfHumanAtlas2AllExps.id field, the tissue names are in the
> gnfHumanAtlas2AllExps.name field.
>
> For example, gnfHumanAtlas2AllExps.id =0 or =1, the first two fields are:
>
> id      name
>
> 0       ColorectalAdenocarcinoma
>
> 1       ColorectalAdenocarcinoma 2
>
> This replication per-tissue is explained in the track's description page
> (open Assembly browser and click on track name - or - open the Table browser
> to the track, leave the primary table as-is, click on "describe table
> schema", then scroll to the bottom on the page.
>
> Hopefully this addresses your questions, but please let us know if you need
> more information,
> Jen
>
> ---------------------------------
> Jennifer Jackson
> UCSC Genome Informatics Group
> http://genome.ucsc.edu/
>
> On 3/30/10 5:49 AM, kathleen askland wrote:
>>
>> I have recently downloaded human expression data via UCSC genome Table
>> Browser using the following query parameters: Mammal, human, Assembly:
>> Feb 2009(GRCh37/hg19), Group: Expression, Track: GNFAtlas2, Table:
>> hgFixed.gnfHumanAtlas2All, as I wanted all available replicates
>> available for each probe.
>>
>> However, the file output is very difficult to understand. There were
>> 44775 probes (as expected) for which data are available.  Each probe
>> has a corresponding 'hgFixed.gnfHumanAtlas2All.expCount' value= 158,
>> suggesting there should be 158 expression values per probe and, in
>> fact, the column headed 'hgFixed.gnfHumanAtlas2All.expScores' does in
>> fact contain 158 comma-separated absolute expression values.
>>
>> However, I am not able to obtain the EXP ids (i.e., tissue name)
>> associated with each of the 158 expression values in the sequence so
>> how is one supposed to figure out which tissue each of the 158
>> expression scores corresponds to?
>>
>> I have attempted to obtain those expression IDs in several ways, by
>> selecting different associated tables to join and seemingly relevant
>> variables to no avail. Moreover, even more confusingly, when I select
>> from associated table gnfHumanAtlas2MedianExps the variables
>> 'hgFixed.gnfHumanAtlas2AllExps.id' and
>> 'hgFixed.gnfHumanAtlas2AllExps.name' which would seem like the desired
>> information, I get a series of comma-separated EXP ids and the
>> corresponding EXP id tissue names (e.g., 112 and Pancreas,
>> respectively), but there are generally not 158 entries in each of
>> these cells and many probes have 'n/a' in both columns.
>>
>> So, for example, probe '1007_s_at' has the following associated data:
>> hgFixed.gnfHumanAtlas2All.expCount='158',
>> hgFixed.gnfHumanAtlas2All.expScores=
>> '3621,3212,1078,1130,475,408,375,528,...' (158 distinct values
>> comma-separated)
>> hgFixed.gnfHumanAtlas2AllExps.id= '112'
>> hgFixed.gnfHumanAtlas2AllExps.name='Pancreas'
>>
>> While probe '117_at' gives:
>> hgFixed.gnfHumanAtlas2All.expCount='158',
>> hgFixed.gnfHumanAtlas2All.expScores=
>> '338,277,2383,2456,617,423,...'(158 comma-separated values)
>> hgFixed.gnfHumanAtlas2AllExps.id=
>> '52,74,75,85,94,96,98,112,121,127,129,137,'
>>
>> hgFixed.gnfHumanAtlas2AllExps.name='cerebellum,CingulateCortex,CingulateCortex
>> 2,Lung 2,Uterus,Thyroid,fetalThyroid,Pancreas,TestisGermCell
>> 2,salivarygland 2,trachea 2,skin 2,'
>>
>> Since the number of expression values listed under
>> 'hgFixed.gnfHumanAtlas2All.expScores' does not correspond to the
>> number of Expression IDs/names listed under
>> 'hgFixed.gnfHumanAtlas2AllExps.id' and
>> 'hgFixed.gnfHumanAtlas2AllExps.name', respectively, how is one
>> supposed to figure out which tissue each of the 158 expression scores
>> corresponds to?
>>
>



-- 
Kathleen Askland, MD
Assistant Professor
Department of Psychiatry & Human Behavior
Warren Alpert School of Medicine
Brown University/Butler Hospital

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to