On Tue, 27 Feb 2007, Arek Kasprzyk wrote:
On 27 Feb 2007, at 14:25, Will Spooner wrote:
<snip>
WormMart has the same gene/transcript (CDS in WormBase's case) issues. I
solve this by 'merging' the multiple transcript values into a single
attribute in the gene_main table. See this example;
http://tinyurl.com/2x3bj5
The 'merged' attributes are pretty vital for wormbase where the 'cross
dimension multiplicity' (unconstrained dimension joins) are much more of an
issue than for Ensembl. I would like to see this approach supported
natively by BioMart/MartView, and hopefuly MartBuilder as well.
this is what I was referring to as on of the ways of 'unifying' the data at
the higher level.
This can be done in may ways eg de-normalization on the main table level
which would
solve main->main multiplicity problem as it is in your case,
or perhaps having the dimensions with unified annotation on the higher level
This is certainly one of the options which we are now looking for MBuilder to
support
I'm not sure about the distinction between 'de-normalization on the main
table' and 'dimensions with unified annotation'. Here is the WormMart
approach;
For the dimension named 'foo', the main table has the following
attributes;
foo_count - the number of corresponding records in foo__dm
foo_dmlist - the list of names of the corresponding foos, e.g.
'foo1 | foo2 | foo3
foo_dminfo - the list of summaries of the corresponding foos, e.g.
'[foo1] first foo | [foo2] another foo | [foo3] more foo'
In addition, the dimension table has the the following attributes;
foo - The name of the foo record, e.g. 'foo1'
info - The summary of the foo record, e.g. '[foo1] first foo'
or even screening the output dynamically as it comes out. The first two seem
to be error-free
and would perform correctly. I have some doubts about the last one but I
suppose we should
investigate all the options
I am also worried about this one. A possible formatter-specific approach
would be e.g. for the HTML formatter to screen for duplicates on a single
results page, with an indication that duplicates have been screened.
Will
a.
Will
-------------------------------------------------------------------------------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
-------------------------------------------------------------------------------
--
---
William Spooner
WormMart Developer