[mart-dev] problem with dataset partitioning

Dan Staines Thu, 21 Aug 2008 03:38:27 -0700

Hi chaps,

I'm having some problems with partitioning my ensembl multi-speciesdatabase into datasets on the basis of a column value indicating speciesin one of the tables in the dataset, and wondered if anyone could offerany advice as discussions with Syed and Andreas here at the EBI haven'tcome up with a solution.

Following the advice of Richard after an earlier mail to mart-dev, I'veused MartBuilder to use coord_system as a partition table forgene_ensembl based on the value of species_id. This all looks fine inmartbuilder, and the saved XML is:

<datasetPartitionTable name="coord_system" selectedColumns="species_id_107">

<partitionApplication name="gene_ensembl"pCols="species_id_107" dsCols="1.7.0.specie\

s_id" relationIds="1955" nameCols="species_id_107" compounds="0"/>
        </datasetPartitionTable>

However, when I try and build the mart (either directly with thegenerated SQL or using martrunner), the first dataset is builtcorrectly, but all subsequence datasets fail with this kind of errorwhen building the transcript_main table:


Unknown column 'a.ox_GO_bool' in 'field list'

which comes from the following bit of SQL (in this case from the secondspecies):create table multi_bacterial_mart_51.TEMP400 as selecta.ox_RefSeqDNA_bool,a.ox_GO_bool,a.ox_integr8_gene_type_1_bool,a.db_display_name_1018,a.display_label_1074,a.analysis_id_1020,a.description_1020,a.status_1020,a.seq_region_strand_1020,a.seq_region_id_1020,a.length_1059,a.coord_system_id_1059,a.ox_EMBLDNA_bool,a.ox_integr8_name_1_bool,a.description_106,a.biotype_1020,a.gene_id_1020_key,a.ox_BioCyc_bool,a.ox_BioCyc_1_bool,a.name_106,a.ox_Rfamncrna_bool,a.name_1059,a.species_id_107,a.code_106,a.seq_region_end_1020,a.band_1027,a.attrib_type_id_1060,a.value_1060,a.seq_region_start_1020,a.transcript_count,a.is_current_1020,a.display_xref_id_1020,a.value_1022,a.ox_integr8_gene_type_bool,a.ox_ENTREZGENE_bool,a.ox_GeneID_bool,a.stable_id_1023,a.ox_GeneID_1_bool,a.ox_integr8_name_bool,a.ox_GO_1_bool,a.source_1020from multi_bacterial_mart_51.pyrococcus_gene_ensembl_37__gene__main as a;

Digging a bit further, it appears that the optimiser column ox_GO_1_boolis added to pyrococcus_gene_ensembl_37__gene__main, instead ofox_GO_bool (which is the name used for this column in the firstdataset). This SQL tries to read both ox_GO_bool and ox_GO_1_bool (andthis is true for all the object_xref optimisers). Looking at the thirdand fourth species datasets, this statement also fails, but is trying tolook at ox_GO_bool, ox_GO_1_bool and ox_GO_2_bool (for the third) andox_GO_bool, ox_GO_1_bool, ox_GO_2_bool and ox_GO_3_bool (for thefourth), so it seems that for some reason the SQL generated is referringthe old column names from the previous partitions.

Does anyone have any suggestions about why this is happening? Any advicegratefully received...


Cheers,

Dan.

--
Dan Staines, PhD
EMBL-European Bioinformatics Institute    Tel: +44-(0)1223-492507
Wellcome Trust Genome Campus, Hinxton     Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK                    email: [EMAIL PROTECTED]

[mart-dev] problem with dataset partitioning

Reply via email to