Thanks, Luvina.
I'll disregard the output and perform the join myself.
Best,
Kathleen

On Mon, Jun 13, 2011 at 2:40 PM, Luvina Guruvadoo <[email protected]> wrote:
> Hi Kathleen,
>
> To answer your questions:
>
> 1. 'Original order' refers to the order that we received the data in from
> GNF.
> 2. The Table Browser is not performing the join correctly, so you can
> disregard the output. We recommend writing a Perl script to perform the
> join. Please feel free to contact us if you need help doing this.
>
> Regards,
> ---
> Luvina Guruvadoo
> UCSC Genome Bioinformatics Group
>
>
>
> kathleen askland wrote:
>>
>> ---------- Forwarded message ----------
>> From: Mary Goldman <[email protected]>
>> Date: Fri, Jun 10, 2011 at 12:40 PM
>> Subject: Re: [Genome] GNF Atlas 2 data structure
>> To: kathleen askland <[email protected]>
>>
>>
>> Hi Kathleen,
>>
>> I would appreciate it if you resent your email to [email protected].
>> This will put your question into our tracking system, allowing our
>> whole team to work on answering your question. Additionally, other
>> users with similar issues will be able to benefit from your question
>> and our answers. Thank you for your understanding in this matter.
>>
>> Much thanks,
>> Mary
>> ------------------
>> Mary Goldman
>> UCSC Bioinformatics Group
>>
>> On 6/10/11 9:33 AM, kathleen askland wrote:
>>
>>>
>>> Hello Mary,
>>> Thank you so much for getting back to me and for your explanation.
>>> I do have a couple of follow-up questions based on the data I have
>>> downloaded.
>>> 1) You note that hgFixed.gnfHumanAtlas2AllExps table has the tissues
>>> listed in the 'original' order. Just curious what is meant by
>>> 'original' (or are you just contrasting to the changed order used for
>>> the hgFixed.gnfHumanAtlas2MedianExps table?)
>>> 2) You note that hgFixed.gnfHumanAtlas2AllExps table has the tissues
>>> listed with replicates side-by-side: A,A,B,B... and that this connects
>>> to the expScores in  hgFixed.gnfHumanAtlas2All table. However, if I
>>> attempt to download  the hgFixed.gnfHumanAtlas2All as the primary
>>> table and attempt to join the  'id' and 'name' fields from the
>>> hgFixed.gnfHumanAtlas2AllExps table to it, the output (at least in the
>>> UCSC Browser window) is very unclear. First of all, all 158 tissue
>>> replicate names are not listed in the 'name' field (thus there are
>>> fewer tissue names than there are expression values) and, second, the
>>> tissue names are in a a different order than would be expected
>>> according to the hgFixed.gnfHumanAtlas2AllExps table (i.e., starting
>>> with 0=ColorectalAdenocarcinoma, 1=ColorectalAdenocarcinoma2, etc...).
>>> Is there an error in the join function?
>>> Thanks again for your time and help.
>>> Kathleen
>>>
>>>
>>> On Wed, Jun 1, 2011 at 1:04 PM, Mary Goldman<[email protected]>  wrote:
>>>
>>>>
>>>> Hi Kathleen,
>>>>
>>>> Thank you for your patience while we worked on your question!
>>>>
>>>> The table hgFixed.gnfHumanAtlas2AllExps has the tissues listed in the
>>>> original order (with replicates being side-by-side: A,A,B,B,C,C, etc).
>>>> This
>>>> is the the table that connects the expScores in
>>>> hgFixed.gnfHumanAtlas2All to
>>>> tissue types and contains the data you want.
>>>>
>>>> The table hgFixed.gnfHumanAtlas2MedianExps was made to connect with
>>>> tables
>>>> that had only the median of the two replicates (like gnfAtlas2). When
>>>> this
>>>> was done, the tissues were also reordered to group similar tissue types.
>>>> When the output format "microarray names" is chosen for gnfAtlas2, it
>>>> obtains the tissue names from this table
>>>> (hgFixed.gnfHumanAtlas2MedianExps).
>>>>
>>>> Unfortunately, at this point in time, if you select the track GNF Atlas
>>>> 2
>>>> from the table browser, it will not let you select
>>>> hgFixed.gnfHumanAtlas2AllExps - we are working on this and hope to have
>>>> a
>>>> fix out soon. To get the data from the hgFixed.gnfHumanAtlas2AllExps
>>>> table,
>>>> you will need to select the following in the table browser:
>>>>
>>>> group: All Tables
>>>> database: hgFixed
>>>> table: hgFixed.gnfHumanAtlas2AllExps
>>>>
>>>> I hope this information is helpful. Please feel free to contact the mail
>>>> list again if you require further assistance.
>>>>
>>>> Best,
>>>> Mary
>>>> ------------------
>>>> Mary Goldman
>>>> UCSC Bioinformatics Group
>>>>
>>>>
>>>>
>>>> On 5/17/11 9:54 AM, kathleen askland wrote:
>>>>
>>>>>
>>>>> Hello Jen,
>>>>>
>>>>> I wrote you about a year ago with a question about gnf2 expression
>>>>> data that I downloaded using the UCSC genome table browser. I've come
>>>>> back to this data for a different project and was reviewing our
>>>>> correspondence (see previous emails at bottom of page) and checking it
>>>>> against some downloaded data. There seems to be a significant
>>>>> discrepancy that I hope you can clarify.
>>>>>
>>>>> Essentially, I want to be certain that I know which tissue and
>>>>> replicate each of the expression values in the output file corresponds
>>>>> to.
>>>>> So, I downloaded the GNF Atlas 2 absolute expression values for both
>>>>> original samples/replicates by opening the Table Browser and
>>>>> proceeding as follows:
>>>>> 1) Selected Clade: Mammal, Genome: Human, Assembly: Feb 2009, Group:
>>>>> Expression, Track: GNF Atlas 2, Table: hgFixed.gnfHumanAtlas2All
>>>>> 2) Next, I selected output format: 'all fields from selected table'
>>>>> 3) then I clicked 'Get output,' which opens an html window with the
>>>>> requested data, the first two lines of which is as follows:
>>>>>
>>>>> #name   expCount        expScores
>>>>> 1007_s_at       158
>>>>>
>>>>> 3621,3212,1078,1130,475,408,375,528,668,482,543,392,745,996,696,649,1124,1259,291,451,707,745,1022,1296,2956,2359,1462,2318,1157,1437,1662,841,1288,1575,3465,2565,1281,1504,1203,1415,1919,1330,292,112,1039,1498,1868,1679,1855,2219,2701,3162,3561,2943,3455,4784,4332,4136,3441,3333,3043,2922,3291,4413,2727,5157,3332,3064,6515,6949,4237,5045,1896,1810,2531,2425,2542,2070,8931,9319,4300,4765,2586,2623,3334,5043,1872,2320,1515,2165,2561,2859,5122,5007,1572,1717,5614,5501,4380,4137,2087,2416,4298,4484,1867,2184,2081,1932,5530,6309,1077,1149,3709,1832,2859,8037,1718,1876,1303,1537,1441,925,864,978,1571,1110,2494,1825,4551,2741,1588,1161,726,1428,1434,1005,1687,1509,775,996,930,1187,768,800,1110,1114,1436,1281,1211,1171,1225,1455,2559,2741,3083,4111,2179,2653,
>>>>>
>>>>> 1053_at 158
>>>>>
>>>>> 1041,522,265,351,222,244,519,248,272,247,297,538,191,60,195,102,390,635,526,384,510,700,549,657,1436,1441,316,253,301,228,530,905,757,530,247,296,228,301,182,229,175,99,453,329,239,130,30,32,29,79,147,75,42,104,74,112,142,121,50,76,98,28,119,124,24,129,24,109,30,194,110,48,122,19,17,172,27,158,221,60,38,231,17,60,378,242,170,318,54,212,17,74,42,170,30,126,224,199,136,123,153,135,155,25,293,396,303,214,270,145,159,31,62,95,118,111,153,122,57,171,174,214,73,30,29,106,16,225,67,24,131,48,76,28,172,46,70,35,34,117,29,75,22,25,59,97,21,72,38,127,130,74,156,31,31,17,55,33,
>>>>>
>>>>> Since this particular table does not have the expression IDs or
>>>>> descriptive names, I do not know which tissues/replicates each of the
>>>>> 158 values for each probe corresponds to. So, my first question is:
>>>>> Are the expression values in order of the tissue ID with each pair of
>>>>> replicates adjacent to one another (i.e., 0,0,1,1,2,2,3,3,etc...), or
>>>>> ordered by tissue ID for first replicate then by tissue ID for second
>>>>> replicate (i.e., 0,1,2,3,....; 0,1,2,3,...), or in some other order?
>>>>>
>>>>> Finally, I want to be sure that the tissue IDs listed in the table
>>>>> 'hgFixed.gnfHumanAtlas2MedianExps' (pasted below) are the same tissue
>>>>> IDs that I should be using to reference the absolute expression data
>>>>> provided in the 'hgFixed.gnfHumanAtlas2All' table.  I ask this, in
>>>>> particular, because your correspondence of March 30,2010 indicated: "
>>>>>
>>>>>>
>>>>>> For example, gnfHumanAtlas2AllExps.id =0 or =1, the first two fields
>>>>>> are:
>>>>>>
>>>>>> id      name
>>>>>>
>>>>>> 0       ColorectalAdenocarcinoma
>>>>>>
>>>>>> 1       ColorectalAdenocarcinoma 2 "
>>>>>>
>>>>>
>>>>> which is different than the tissue ID-tissue description matches
>>>>> listed when I select and output the  'hgFixed.gnfHumanAtlas2All'
>>>>> table, for which I get the following list:
>>>>>
>>>>> #id     description
>>>>> 0       fetal brain
>>>>> 1       whole brain
>>>>> 2       temporal lobe
>>>>> 3       parietal lobe
>>>>> 4       occipital lobe
>>>>> 5       prefrontal cortex
>>>>> 6       cingulate cortex
>>>>> 7       cerebellum
>>>>> 8       cerebellum peduncles
>>>>> 9       amygdala
>>>>> 10      hypothalamus
>>>>> 11      thalamus
>>>>> 12      subthalamic nucleus
>>>>> 13      caudate nucleus
>>>>> 14      globus pallidus
>>>>> 15      olfactory bulb
>>>>> 16      pons
>>>>> 17      medulla oblongata
>>>>> 18      spinal cord
>>>>> 19      ciliary ganglion
>>>>> 20      trigeminal ganglion
>>>>> 21      superior cervical ganglion
>>>>> 22      dorsal root ganglion
>>>>> 23      thymus
>>>>> 24      tonsil
>>>>> 25      lymph node
>>>>> 26      bone marrow
>>>>> 27      BM-CD71+ early erythroid
>>>>> 28      BM-CD33+ myeloid
>>>>> 29      BM-CD105+ endothelial
>>>>> 30      BM-CD34+
>>>>> 31      whole blood
>>>>> 32      PB-BDCA4+ dentritic cells
>>>>> 33      PB-CD14+ monocytes
>>>>> 34      PB-CD56+ NKCells
>>>>> 35      PB-CD4+ Tcells
>>>>> 36      PB-CD8+ Tcells
>>>>> 37      PB-CD19+ Bcells
>>>>> 38      leukemia lymphoblastic(molt4)
>>>>> 39      721 B lymphoblasts
>>>>> 40      lymphoma Burkitts Raji
>>>>> 41      leukemia promyelocytic(hl60)
>>>>> 42      lymphoma Burkitts Daudi
>>>>> 43      leukemia chronic myelogenous(k562)
>>>>> 44      colorectal adenocarcinoma
>>>>> 45      appendix
>>>>> 46      skin
>>>>> 47      adipocyte
>>>>> 48      fetal thyroid
>>>>> 49      thyroid
>>>>> 50      pituitary gland
>>>>> 51      adrenal gland
>>>>> 52      adrenal cortex
>>>>> 53      prostate
>>>>> 54      salivary gland
>>>>> 55      pancreas
>>>>> 56      pancreatic islets
>>>>> 57      atrioventricular node
>>>>> 58      heart
>>>>> 59      cardiac myocytes
>>>>> 60      skeletal muscle
>>>>> 61      tongue
>>>>> 62      smooth muscle
>>>>> 63      uterus
>>>>> 64      uterus corpus
>>>>> 65      trachea
>>>>> 66      bronchial epithelial cells
>>>>> 67      fetal lung
>>>>> 68      lung
>>>>> 69      kidney
>>>>> 70      fetal liver
>>>>> 71      liver
>>>>> 72      placenta
>>>>> 73      testis
>>>>> 74      testis Leydig cell
>>>>> 75      testis germ cell
>>>>> 76      testis interstitial
>>>>> 77      testis seminiferous tubule
>>>>> 78      ovary
>>>>>
>>>>> Thank you for any assistance you may provide.
>>>>>
>>>>> Kathleen
>>>>>
>>>>>
>>>>> On Tue, Mar 30, 2010 at 3:51 PM, Jennifer Jackson<[email protected]>
>>>>>  wrote:
>>>>>
>>>>>>
>>>>>> Hello Kathleen,
>>>>>>
>>>>>> There are 76 distinct tissues with two replicates per experiment,
>>>>>> which
>>>>>> brings the number of values = 158 scores. The order of the tissues is
>>>>>> in
>>>>>> the
>>>>>> gnfHumanAtlas2AllExps.id field, the tissue names are in the
>>>>>> gnfHumanAtlas2AllExps.name field.
>>>>>>
>>>>>> For example, gnfHumanAtlas2AllExps.id =0 or =1, the first two fields
>>>>>> are:
>>>>>>
>>>>>> id      name
>>>>>>
>>>>>> 0       ColorectalAdenocarcinoma
>>>>>>
>>>>>> 1       ColorectalAdenocarcinoma 2
>>>>>>
>>>>>> This replication per-tissue is explained in the track's description
>>>>>> page
>>>>>> (open Assembly browser and click on track name - or - open the Table
>>>>>> browser
>>>>>> to the track, leave the primary table as-is, click on "describe table
>>>>>> schema", then scroll to the bottom on the page.
>>>>>>
>>>>>> Hopefully this addresses your questions, but please let us know if you
>>>>>> need
>>>>>> more information,
>>>>>> Jen
>>>>>>
>>>>>> ---------------------------------
>>>>>> Jennifer Jackson
>>>>>> UCSC Genome Informatics Group
>>>>>> http://genome.ucsc.edu/
>>>>>>
>>>>>> On 3/30/10 5:49 AM, kathleen askland wrote:
>>>>>>
>>>>>>>
>>>>>>> I have recently downloaded human expression data via UCSC genome
>>>>>>> Table
>>>>>>> Browser using the following query parameters: Mammal, human,
>>>>>>> Assembly:
>>>>>>> Feb 2009(GRCh37/hg19), Group: Expression, Track: GNFAtlas2, Table:
>>>>>>> hgFixed.gnfHumanAtlas2All, as I wanted all available replicates
>>>>>>> available for each probe.
>>>>>>>
>>>>>>> However, the file output is very difficult to understand. There were
>>>>>>> 44775 probes (as expected) for which data are available.  Each probe
>>>>>>> has a corresponding 'hgFixed.gnfHumanAtlas2All.expCount' value= 158,
>>>>>>> suggesting there should be 158 expression values per probe and, in
>>>>>>> fact, the column headed 'hgFixed.gnfHumanAtlas2All.expScores' does in
>>>>>>> fact contain 158 comma-separated absolute expression values.
>>>>>>>
>>>>>>> However, I am not able to obtain the EXP ids (i.e., tissue name)
>>>>>>> associated with each of the 158 expression values in the sequence so
>>>>>>> how is one supposed to figure out which tissue each of the 158
>>>>>>> expression scores corresponds to?
>>>>>>>
>>>>>>> I have attempted to obtain those expression IDs in several ways, by
>>>>>>> selecting different associated tables to join and seemingly relevant
>>>>>>> variables to no avail. Moreover, even more confusingly, when I select
>>>>>>> from associated table gnfHumanAtlas2MedianExps the variables
>>>>>>> 'hgFixed.gnfHumanAtlas2AllExps.id' and
>>>>>>> 'hgFixed.gnfHumanAtlas2AllExps.name' which would seem like the
>>>>>>> desired
>>>>>>> information, I get a series of comma-separated EXP ids and the
>>>>>>> corresponding EXP id tissue names (e.g., 112 and Pancreas,
>>>>>>> respectively), but there are generally not 158 entries in each of
>>>>>>> these cells and many probes have 'n/a' in both columns.
>>>>>>>
>>>>>>> So, for example, probe '1007_s_at' has the following associated data:
>>>>>>> hgFixed.gnfHumanAtlas2All.expCount='158',
>>>>>>> hgFixed.gnfHumanAtlas2All.expScores=
>>>>>>> '3621,3212,1078,1130,475,408,375,528,...' (158 distinct values
>>>>>>> comma-separated)
>>>>>>> hgFixed.gnfHumanAtlas2AllExps.id= '112'
>>>>>>> hgFixed.gnfHumanAtlas2AllExps.name='Pancreas'
>>>>>>>
>>>>>>> While probe '117_at' gives:
>>>>>>> hgFixed.gnfHumanAtlas2All.expCount='158',
>>>>>>> hgFixed.gnfHumanAtlas2All.expScores=
>>>>>>> '338,277,2383,2456,617,423,...'(158 comma-separated values)
>>>>>>> hgFixed.gnfHumanAtlas2AllExps.id=
>>>>>>> '52,74,75,85,94,96,98,112,121,127,129,137,'
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> hgFixed.gnfHumanAtlas2AllExps.name='cerebellum,CingulateCortex,CingulateCortex
>>>>>>> 2,Lung 2,Uterus,Thyroid,fetalThyroid,Pancreas,TestisGermCell
>>>>>>> 2,salivarygland 2,trachea 2,skin 2,'
>>>>>>>
>>>>>>> Since the number of expression values listed under
>>>>>>> 'hgFixed.gnfHumanAtlas2All.expScores' does not correspond to the
>>>>>>> number of Expression IDs/names listed under
>>>>>>> 'hgFixed.gnfHumanAtlas2AllExps.id' and
>>>>>>> 'hgFixed.gnfHumanAtlas2AllExps.name', respectively, how is one
>>>>>>> supposed to figure out which tissue each of the 158 expression scores
>>>>>>> corresponds to?
>>>>>>>
>>>>>>>
>>>
>>>
>>
>>
>>
>>
>
>



-- 
Kathleen Askland, MD
Assistant Professor
Department of Psychiatry & Human Behavior
Warren Alpert School of Medicine
Brown University/Butler Hospital

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to