Hi Anton,

The rn4 refGene table, referenced here:

> inclusive format.) We are updating the rn4 table now. I expect the
> new version to be on the public MySQL server by next Monday.

has now been updated and is available and is available on the public 
mysql server.  I get about 18,000 rows using the second method, as 
opposed to only about 7,500 rows with the first method.

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 03/08/11 16:12, Brooke Rhead wrote:
> Hi Anton,
> 
> The Table Browser alone is not able to do MySQL joins, so it is not 
> possible to do what you are asking within the Table Browser only. 
> Galaxy, which works in conjunction with the Table Browser, can do joins 
> (as Greg described).
> 
> There is another solution, however:  we have a public MySQL server that 
> you can query directly.  Instructions for using it are here:
> 
> http://genome.ucsc.edu/FAQ/FAQdownloads.html#download29
> 
> Greg came up with a MySQL query equivalent to the Galaxy instructions:
> 
> SELECT mm9.refGene.name AS "mm9GeneID",
>         mm9.refGene.name2 AS "mm9GeneName",
>         rn4.kgXref.geneSymbol AS "rn4GeneName",
>         rn4.refGene.name AS "rn4GeneID"
>    FROM  mm9.refGene, rn4.refGene, rn4.kgXref
>    WHERE mm9.refGene.name2 = rn4.kgXref.geneSymbol
>       AND   rn4.kgXref.refSeq = rn4.refGene.name
>    ORDER BY mm9GeneName;
> 
> This should get you a table that looks like:
> 
> +--------------+---------------+---------------+--------------+
> | mm9GeneID    | mm9GeneName   | rn4GeneName   | rn4GeneID    |
> +--------------+---------------+---------------+--------------+
> | NM_020003    | 0610031J06Rik | 0610031j06rik | NM_001004226 |
> | NM_001081067 | A1bg          | A1bg          | NM_022258    |
> | NM_175628    | A2m           | A2m           | NM_012488    |
> | NM_030210    | Aacs          | Aacs          | NM_023104    |
> ...
> 
> It will have ~7500 rows in it.
> 
> I want to point out that you may want a slightly different query, though:
> 
> SELECT mm9.refGene.name AS "mm9GeneID",
>         mm9.refGene.name2 AS "mm9GeneName",
>         rn4.refGene.name2 AS "rn4GeneName",
>         rn4.refGene.name AS "rn4GeneID"
>    FROM  mm9.refGene, rn4.refGene
>    WHERE mm9.refGene.name2 = rn4.refGene.name2
>    ORDER BY mm9GeneName;
> 
> This one will use the gene symbols from the refGene tables directly, 
> which might be preferable, as this data is updated nightly and is not 
> tied to our UCSC Genes track, which may not contain everything in the 
> RefSeq Genes track.
> 
> However, there is currently a problem with this approach: the rn4 
> refGene table is in a different format that does not include the 'name2' 
> field.  (The rn4 table was initially made before we switched to a more 
> inclusive format.)  We are updating the rn4 table now.  I expect the new 
> version to be on the public MySQL server by next Monday.
> 
> I hope this is helpful.  If you have further questions, please feel free 
> to contact us again at [email protected].
> 
> --
> Brooke Rhead
> UCSC Genome Bioinformatics Group
> 
> 
> On 03/07/11 20:07, Anton Kratz wrote:
>> Hi Greg,
>>
>> thanks; I do not wish to use Galaxy though, as I need such a table of mouse
>> RefSeq <-> rat RefSeq as part of an analysis pipeline for comparisons
>> between many such lists, which I have written in Perl and which in turn call
>> other scripts, R, etc... so including a manual step involving Galaxy is not
>> feasible.
>>
>> What I need is a plain, tab-separated list of mouse RefSeqs and their
>> corresping rat RefSeqs, or vice versa.
>>
>> Could you please explain a little more detailed how to get such a list with
>> the table browser?
>>
>> What I tried, in the table browser, I select:
>> clade: Mammal, genome: Mouse, assembly: July 2007 (NCBI37/mm9)
>> group: Genes and Gene Prediction Tracks, track: RefSeq Genes
>> table: kgXref
>> output format: selected fields from primary and related tables
>>
>> Then click on Get Output.
>>
>> On the next page (with areas "Select Fields from mm9.kgXref" and "Linked
>> Tables"), is where I am stuck.
>>
>> Anton
>>
>> On Tue, Mar 8, 2011 at 8:27 AM, Greg Roe <[email protected]> wrote:
>>
>>>  Hi Anton,
>>>
>>> Yes, you can use the table browser (with the refSeq > kgXref table) to get
>>> the gene symbols for both lists and compare.  Though, Galaxy makes this a
>>> bit more automated (http://main.g2.bx.psu.edu/).
>>>
>>> Go to Galaxy and click Get Data > UCSC 
>>> Main<http://genome.ucsc.edu/cgi-bin/hgTables?GALAXY_URL=http%3A//main.g2.bx.psu.edu/tool_runner&tool_id=ucsc_table_direct1&hgta_compressType=none&sendToGalaxy=1&hgta_outputType=bed>table
>>>  browser (upper left).  Select the genome, assembly, etc, for rat and
>>> use refSeq with the kgXref Table.  It will show you UCSC's table browser in
>>> the process, and paste your list of gene identifiers in. The "send output to
>>> Galaxy" box should  be checked by default. When you click output it will
>>> allow you to send the output to Galaxy. Then do the same for mouse.  Once
>>> both data sets are there, you can use Galaxy's "Join, Subtract and Group"
>>> tool to compare the data sets ("Compare Two Data sets" tool). Just join them
>>> on the geneSymbol column. Your output should then be a list of genes they
>>> have in common.
>>>
>>> Let me know if you have any additional questions.
>>>
>>> -
>>> Greg Roe
>>> UCSC Genome Bioinformatics Group
>>>
>>>
>>>
>>> On 3/7/11 2:25 AM, Anton Kratz wrote:
>>>
>>> Dear UCSC team,
>>>
>>> I have a list of some RefSeq-genes in mouse, and a list of some RefSeq-genes
>>> in rat.
>>>
>>> I want to find out which genes are present in both lists. But I can not just
>>> compare the identifiers, as the genes have different IDs in the two species.
>>>
>>> For example, Grid2 (glutamate receptor, ionotropic, delta 2) in rat would
>>> have the identifier NM_024379 and be located at chr4:92642427-94054757,
>>> while in mouse it would be NM_008167 at chr6:63206851-64616273. Still
>>> NM_024379 in rat is the "same" as NM_008167 in mouse.
>>>
>>> What is the best approach to find the same genes in two different lists of
>>> mouse and rat as described? Is it possible to get a list of equivalent
>>> RefSeq identifiers in mouse and rat throught he table browser?
>>>
>>> Thanks,
>>> Anton
>>> _______________________________________________
>>> Genome maillist  -  [email protected]
>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>
>>>
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to