Hi Stefanie, If you are using the Gene Sorter for human, mouse, or rat, the gene set that is used is "UCSC Genes" (or "Known Genes" if you are using an older assembly).
The UCSC Genes set is now created using data from RefSeq, Genbank, CCDS and UniProt, and it is based on more than a simple merging of databases. You can read about the methods used to create the set on the UCSC Genes description page. To get to it from the Gene Sorter, click on a link in the "Description" column, scroll to the bottom, and click the link that says "Click here for details on how this gene model was made and data restrictions if any." It should take you to this page: http://genome.ucsc.edu/cgi-bin/hgGene?hgg_do_kgMethod=1 An easy way to get a protein fasta file for all of the UCSC Genes is to use the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables). Select your assembly, then: group: Genes and Gene Prediction Tracks track: UCSC Genes table: knownGene region: genome output format: sequence output file: (enter a name for the file you will download) Hit "get output" and choose "protein" on the next page. The output should be a protein fasta file of all of the UCSC Genes for your assembly. Note that this will include ALL splice variants. To get only the splice variants that appear by default in the Gene Sorter, you would need to first get a list of gene names from the 'knownCanonical' table and upload them via the "identifiers (names/accessions)" button in the Table Browser. I hope this helps. If you need any further assistance, please feel free to write back to the mailing list. -- Brooke Rhead UCSC Genome Bioinformatics Group On 09/19/10 09:56, Stefanie Gerstberger wrote: > Hi, > I am trying to find the current reference list of genes ( and their protein > fasta formats) used by the gene sorter, the original publication says it used > a > synthesis of refseq, genebank and swissprot. Is this still correct that all > genes of these databases were simply merged into one file or where can I find > the genes (and protein fasta files) for the genes currently displayed on the > gene sorter? Is there a link on UCSC genome browser to access the currently > updated gene sorter file? > Thanks a lot, > Stefanie > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
