Hi Anton, The rn4 refGene table, referenced here:
> inclusive format.) We are updating the rn4 table now. I expect the > new version to be on the public MySQL server by next Monday. has now been updated and is available and is available on the public mysql server. I get about 18,000 rows using the second method, as opposed to only about 7,500 rows with the first method. -- Brooke Rhead UCSC Genome Bioinformatics Group On 03/08/11 16:12, Brooke Rhead wrote: > Hi Anton, > > The Table Browser alone is not able to do MySQL joins, so it is not > possible to do what you are asking within the Table Browser only. > Galaxy, which works in conjunction with the Table Browser, can do joins > (as Greg described). > > There is another solution, however: we have a public MySQL server that > you can query directly. Instructions for using it are here: > > http://genome.ucsc.edu/FAQ/FAQdownloads.html#download29 > > Greg came up with a MySQL query equivalent to the Galaxy instructions: > > SELECT mm9.refGene.name AS "mm9GeneID", > mm9.refGene.name2 AS "mm9GeneName", > rn4.kgXref.geneSymbol AS "rn4GeneName", > rn4.refGene.name AS "rn4GeneID" > FROM mm9.refGene, rn4.refGene, rn4.kgXref > WHERE mm9.refGene.name2 = rn4.kgXref.geneSymbol > AND rn4.kgXref.refSeq = rn4.refGene.name > ORDER BY mm9GeneName; > > This should get you a table that looks like: > > +--------------+---------------+---------------+--------------+ > | mm9GeneID | mm9GeneName | rn4GeneName | rn4GeneID | > +--------------+---------------+---------------+--------------+ > | NM_020003 | 0610031J06Rik | 0610031j06rik | NM_001004226 | > | NM_001081067 | A1bg | A1bg | NM_022258 | > | NM_175628 | A2m | A2m | NM_012488 | > | NM_030210 | Aacs | Aacs | NM_023104 | > ... > > It will have ~7500 rows in it. > > I want to point out that you may want a slightly different query, though: > > SELECT mm9.refGene.name AS "mm9GeneID", > mm9.refGene.name2 AS "mm9GeneName", > rn4.refGene.name2 AS "rn4GeneName", > rn4.refGene.name AS "rn4GeneID" > FROM mm9.refGene, rn4.refGene > WHERE mm9.refGene.name2 = rn4.refGene.name2 > ORDER BY mm9GeneName; > > This one will use the gene symbols from the refGene tables directly, > which might be preferable, as this data is updated nightly and is not > tied to our UCSC Genes track, which may not contain everything in the > RefSeq Genes track. > > However, there is currently a problem with this approach: the rn4 > refGene table is in a different format that does not include the 'name2' > field. (The rn4 table was initially made before we switched to a more > inclusive format.) We are updating the rn4 table now. I expect the new > version to be on the public MySQL server by next Monday. > > I hope this is helpful. If you have further questions, please feel free > to contact us again at [email protected]. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > On 03/07/11 20:07, Anton Kratz wrote: >> Hi Greg, >> >> thanks; I do not wish to use Galaxy though, as I need such a table of mouse >> RefSeq <-> rat RefSeq as part of an analysis pipeline for comparisons >> between many such lists, which I have written in Perl and which in turn call >> other scripts, R, etc... so including a manual step involving Galaxy is not >> feasible. >> >> What I need is a plain, tab-separated list of mouse RefSeqs and their >> corresping rat RefSeqs, or vice versa. >> >> Could you please explain a little more detailed how to get such a list with >> the table browser? >> >> What I tried, in the table browser, I select: >> clade: Mammal, genome: Mouse, assembly: July 2007 (NCBI37/mm9) >> group: Genes and Gene Prediction Tracks, track: RefSeq Genes >> table: kgXref >> output format: selected fields from primary and related tables >> >> Then click on Get Output. >> >> On the next page (with areas "Select Fields from mm9.kgXref" and "Linked >> Tables"), is where I am stuck. >> >> Anton >> >> On Tue, Mar 8, 2011 at 8:27 AM, Greg Roe <[email protected]> wrote: >> >>> Hi Anton, >>> >>> Yes, you can use the table browser (with the refSeq > kgXref table) to get >>> the gene symbols for both lists and compare. Though, Galaxy makes this a >>> bit more automated (http://main.g2.bx.psu.edu/). >>> >>> Go to Galaxy and click Get Data > UCSC >>> Main<http://genome.ucsc.edu/cgi-bin/hgTables?GALAXY_URL=http%3A//main.g2.bx.psu.edu/tool_runner&tool_id=ucsc_table_direct1&hgta_compressType=none&sendToGalaxy=1&hgta_outputType=bed>table >>> browser (upper left). Select the genome, assembly, etc, for rat and >>> use refSeq with the kgXref Table. It will show you UCSC's table browser in >>> the process, and paste your list of gene identifiers in. The "send output to >>> Galaxy" box should be checked by default. When you click output it will >>> allow you to send the output to Galaxy. Then do the same for mouse. Once >>> both data sets are there, you can use Galaxy's "Join, Subtract and Group" >>> tool to compare the data sets ("Compare Two Data sets" tool). Just join them >>> on the geneSymbol column. Your output should then be a list of genes they >>> have in common. >>> >>> Let me know if you have any additional questions. >>> >>> - >>> Greg Roe >>> UCSC Genome Bioinformatics Group >>> >>> >>> >>> On 3/7/11 2:25 AM, Anton Kratz wrote: >>> >>> Dear UCSC team, >>> >>> I have a list of some RefSeq-genes in mouse, and a list of some RefSeq-genes >>> in rat. >>> >>> I want to find out which genes are present in both lists. But I can not just >>> compare the identifiers, as the genes have different IDs in the two species. >>> >>> For example, Grid2 (glutamate receptor, ionotropic, delta 2) in rat would >>> have the identifier NM_024379 and be located at chr4:92642427-94054757, >>> while in mouse it would be NM_008167 at chr6:63206851-64616273. Still >>> NM_024379 in rat is the "same" as NM_008167 in mouse. >>> >>> What is the best approach to find the same genes in two different lists of >>> mouse and rat as described? Is it possible to get a list of equivalent >>> RefSeq identifiers in mouse and rat throught he table browser? >>> >>> Thanks, >>> Anton >>> _______________________________________________ >>> Genome maillist - [email protected] >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >>> >>> >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
