[mart-dev] scerevisiae_gene_ensembl__exon_transcript__dm table population issue

Arnaud Kerhornou Thu, 21 Aug 2008 06:48:11 -0700

Hi,

We're running into a problem in populating our biomart databases for our Ensembl datasets. Theproblem lies in the population of the *_gene_ensembl__exon_transcript__dm mart tables.


I take scerevisiae as an example because it's being done fine by the Ensembl 
team, so I don't understand what's wrong with our procedure.

We're using a saccharomyces_cerevisiae_core_50_1i database installed onour local mysql server. The data were dumped from ensembl ftp site, andwe're using ensembl_50.xml that Ensembl gave us. We just adapted it to ourdatabase connectivity.

The problem is that this table has only data for cerevisiae genesattached to the mitochondrial chromosome.


The problem lies with this mart building query:

create table fungal_mart_redo_50.TEMP28 as select a.*,b.value asvalue_1060,b.attrib_type_id as attrib_type_id_1060 fromfungal_mart_redo_50.TEMP26 as a inner joinsaccharomyces_cerevisiae_core_50_1i.seq_region_attrib as b ona.seq_region_id_1015=b.seq_region_id and (b.attrib_type_id=11 orb.attrib_type_id is null);

TEMP26 is fine and has data for every chromosomes, TEMP28 has only datafor the mitochondrial chromosome.

What happens is that seq_region_attr table has only attrib_type_id=11for the mitochondrial seq_region. I haven't seen any rows whereseq_region_attr.attrib_type_id is null


attrib_type_id=11 corresponds to 'Codon Table' attribute.

So I guess it tries to add codon table information into*_gene_ensembl__exon_transcript__dm. It seems that ensembl by defaultdoesn't attach a codon table to a seq_region, I guess because they areall 1, except for mitochndrial or chloroplast genes.

So, we must be missing something in our mart building procedure ? Doesit ring a bell to you ?

I can fix this issue by replacing the inner join by a left join. Should this be 
enough ?

We're facing another problem. attrib_type_id=11 might not be true for all datasets. We also have Ensembl core databases that we built ourselves, and it turns out thatthe attrib_type corresponding to 'Codon Table' has attrib_type_id=3.I mean, we want to set up a fungal mart for two datasets, scerevisiae and spombe. So we have a build xml file with attrib_type_id=11 so we get scerevisiae ok (assuming we replace the above inner join by a left join),but we don't get spombe right as it requires attrib_type_id=3 !

So, is the only way to fix it, to have consistently the same attrib_type_id value for all datasets or could you suggest another solution that does not rely on a predefinedattrib_type_id value ?


Thanks
Arnaud

[mart-dev] scerevisiae_gene_ensembl__exon_transcript__dm table population issue

Reply via email to