On Fri, 25 Aug 2006, Alexandre Gattiker wrote: > Hello, > > In the document > ftp://ftp.ebi.ac.uk/pub/software/biomart/user-docs.pdf > > Figure 5.4.1 shows the reversed star schema of the hsapiens_gene_ensembl > gene and transcript tables. > > I'm wondering why the gene__main table is at all necessary. It appears > that all its information is present, an can be fetched from, the > transcript__main table.
The way the mart query system works only one main table is used at a time and SELECT DISTINCT is not used as it tends to reduce performance. If the query just involves things at the gene level then just the gene_main is used (eg) if you are just exporting gene_stable_ids for chromosome 1 the query will resolve to something like: SELECT gene_stable_id FROM hsapiens_gene_ensembl__gene__main WHERE chr_name = 1; Doing the query on the transcript main table instead would obviously give duplicated rows where genes have > 1 transcript. Obviously of you then change the query to include some transcript-level data then the software knows to switch the main table to the transcript_main (eg) SELECT gene_stable_id,transcript_stable_id FROM hsapiens_gene_ensembl__gene__main WHERE chr_name = 1; Best wishes Damian > > The figure also appears to have the following errors: > > gene__main.chrom_start -> gene__main.gene_chrom_start > gene__main.chrom_end -> gene__main.gene_chrom_end > transcript__main.gene_stable_id (missing) > > Also it reads "gene_id_key (PK)". This is erroneous for the transcript > table. For the gene table, it make sense, but I noticed that in the > actual tables on the ensembldb server, the primary key is not defined. > > > Cheers > Alexandre > > -- > Alexandre Gattiker > Swiss Institute of Bioinformatics, Genome Bioinformatics Group > Biozentrum Tel. +41 61 267 1579 > Klingelbergstrasse 50 Fax +41 61 267 1585 > 4056 Basel [EMAIL PROTECTED] > Switzerland http://www.biozentrum.unibas.ch/primig >
