Re: [mart-dev] MartBuilder question

2007-05-24 Thread Guðmundur Árni Þórisson
Richard, if I may pitch in a question: can one use MartBuilder to  
pull data out of say PostgreSQL and create + load into a mart in  
MySQL? The answer might be in the manual somewhere but I'm sure you  
have the quick answer!
   It's possible I might for my project end up having the master  
database in PostgreSQL (for various reasons), but then have the  
corresponding Mart in MySQL + read-only MyISAM tables for speed.



 Mummi

---
Gudmundur A. Thorisson, PhD student, Brookes lab
 Department of Genetics
 University of Leicester
 University Road
 Leicester, LE1 7RH, UK
 E-mail: [EMAIL PROTECTED]
 Tel: +44 (0)116 252-3055

On 24 May 2007, at 16:34, Richard Holland wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

The public thing is interesting.

When you originally connected MartBuilder to your source schema, I'm
assuming you used database=postgres and schema=bioseqdb?

What target schema did you give it when you asked it to generate the
SQL? Was is bioseqdbmart?

Are both the bioseqdb and bioseqdbmart schemas in the same  
database, ie.

postgres?

cheers,
Richard

Brian Osborne wrote:

Richard,

So the SQL file's name is bioseqdbmart.sql, the source is bioseqdb  
and the

mart or target is bioseqdbmart. OK, so:

psql bioseqdbmart postgres < bioseqdbmart.sql

might not work, I should be logged in. Let's log in...

Hmm. I see:

bioseqdbmart=# \i bioseqdbmart.sql
psql:bioseqdbmart.sql:1: ERROR:  schema "bioseqdbmart" already exists
SET
psql:bioseqdbmart.sql:6: ERROR:  relation "public.bioentry" does  
not exist

SET
psql:bioseqdbmart.sql:9: ERROR:  relation "bioseqdbmart.temp__1"  
does not

exist
SET

...and more errors.

"public.bioentry" is troubling, it should be  
"public.bioseqdb.bioentry" or

"bioseqdb.bioentry", I think. That error comes from this:

bioseqdbmart=# create table bioseqdbmart.TEMP__1 as select  
a.accession as

accession,a.biodatabase_id as biodatabase_id,
bioseqdbmart-# a.bioentry_id as bioentry_id_key,a.description as
description,a.division as division,a.identifier as
bioseqdbmart-# identifier,a.name as name,a.taxon_id as  
taxon_id,a.version as

version,b.authority as authority,
bioseqdbmart-# b.description as description_biodatabase,b.name as
name_biodatabase from public.bioentry as a left join
bioseqdbmart-# public.biodatabase as b on  
a.biodatabase_id=b.biodatabase_id;

ERROR:  relation "public.bioentry" does not exist
bioseqdbmart=#


Brian O.


On 5/24/07 10:36 AM, "Richard Holland" <[EMAIL PROTECTED]> wrote:


run whilst logged in as the target schema owner.




-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGVbCJ4C5LeMEKA/QRAr1lAKCfLBi6HpDwCo1liCikscGmVxxdjQCeKljT
zHNwNkyL8Y8DDVlSjGaKJeY=
=6VHt
-END PGP SIGNATURE-




Re: [mart-dev] MartBuilder question

2007-05-25 Thread Guðmundur Árni Þórisson
Of course, of course (silly me!). However, it may prove useful to  
have MBuilder optionally produce the table-definition SQL +  SQL- 
script to dump out the data to textfiles, to support heterogenous db- 
engine setups. The tables could then be created in the target  
database and textfiles subsequently loaded.



   Mummi

On 24 May 2007, at 16:59, Richard Holland wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello.

The quick answer is 'no'.

The slightly longer answer is because it builds marts by executing
'create X as select Y from Z' statements against a single database  
- and
so both source and target schemas need to be living in the database  
for

this to work.

cheers,
Richard

Guðmundur Árni Þórisson wrote:
Richard, if I may pitch in a question: can one use MartBuilder to  
pull
data out of say PostgreSQL and create + load into a mart in MySQL?  
The
answer might be in the manual somewhere but I'm sure you have the  
quick

answer!
   It's possible I might for my project end up having the master
database in PostgreSQL (for various reasons), but then have the
corresponding Mart in MySQL + read-only MyISAM tables for speed.


 Mummi

---
Gudmundur A. Thorisson, PhD student, Brookes lab
 Department of Genetics
 University of Leicester
 University Road
 Leicester, LE1 7RH, UK
 E-mail: [EMAIL PROTECTED]
 Tel: +44 (0)116 252-3055

On 24 May 2007, at 16:34, Richard Holland wrote:

The public thing is interesting.

When you originally connected MartBuilder to your source schema, I'm
assuming you used database=postgres and schema=bioseqdb?

What target schema did you give it when you asked it to generate the
SQL? Was is bioseqdbmart?

Are both the bioseqdb and bioseqdbmart schemas in the same  
database, ie.

postgres?

cheers,
Richard

Brian Osborne wrote:

Richard,

So the SQL file's name is bioseqdbmart.sql, the source is bioseqdb
and the
mart or target is bioseqdbmart. OK, so:

psql bioseqdbmart postgres < bioseqdbmart.sql

might not work, I should be logged in. Let's log in...

Hmm. I see:

bioseqdbmart=# \i bioseqdbmart.sql
psql:bioseqdbmart.sql:1: ERROR:  schema "bioseqdbmart" already  
exists

SET
psql:bioseqdbmart.sql:6: ERROR:  relation "public.bioentry"  
does not

exist
SET
psql:bioseqdbmart.sql:9: ERROR:  relation  
"bioseqdbmart.temp__1" does

not
exist
SET

...and more errors.

"public.bioentry" is troubling, it should be
"public.bioseqdb.bioentry" or
"bioseqdb.bioentry", I think. That error comes from this:

bioseqdbmart=# create table bioseqdbmart.TEMP__1 as select
a.accession as
accession,a.biodatabase_id as biodatabase_id,
bioseqdbmart-# a.bioentry_id as bioentry_id_key,a.description as
description,a.division as division,a.identifier as
bioseqdbmart-# identifier,a.name as name,a.taxon_id as
taxon_id,a.version as
version,b.authority as authority,
bioseqdbmart-# b.description as description_biodatabase,b.name as
name_biodatabase from public.bioentry as a left join
bioseqdbmart-# public.biodatabase as b on
a.biodatabase_id=b.biodatabase_id;
ERROR:  relation "public.bioentry" does not exist
bioseqdbmart=#


Brian O.


On 5/24/07 10:36 AM, "Richard Holland" <[EMAIL PROTECTED]> wrote:


run whilst logged in as the target schema owner.




-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGVbZe4C5LeMEKA/QRAioZAJ93gc2kLvyRHoj16iVIe+gE/bTszQCbBOYe
OYGNosThAaDoIBJ7b7pd2yo=
=ryuw
-END PGP SIGNATURE-




[mart-dev] MartBuilder for multi-table joins, how deep?

2007-06-01 Thread Guðmundur Árni Þórisson
I'm starting to use MartBuilder for real now and slowly getting the  
hang of things. I've managed to create some simple datasets without  
problems (just works!), but when going into more complex things I am  
stumbling. Specifically, I do not see how to tell MBuilder to do  
certain complex multi-table joins to pull various pieces of data  
together.
   Here's an example of this sort of thing is genotype frequency  
data I want to pull out of my database. Some tables hold data for  
filters and/or attributes, others are merely linking tables:


Study->Experiment->Usedmarkerset->GenotypeFrequencyCluster- 
>GenotypeFrequency->[couple more tables]


  I see some indication of how MartBuilder is doing things when it  
has created a dimension table out of two tables, and click 'Explain  
table'. It's also doing something clever when I select multiple  
tables in the 'Create datasets' dialog. But is this procedure  
entirely automatic, offering no control beyond masking dimension  
tables *after* MBuilder has created the dataset? How deep does it go,  
in terms of following the PK-FK trail?


 I guess I'm hunting for a clue as to whether I can add tables to a  
dataset semi-manually, one by one? Or is it entirely a matter of  
telling MBuilder to do X number of tables, and then sorting out the  
mess afterwards via mask/hide and so on?




Mummi

PS I tried the 'Add restriction to table' and it's just wicked,  
extremely useful! My particular use case is that I have an Experiment  
table with different types of experiment-thingies: frequency  
determination, association analysis etc. This function let's me  
easily create seperate datasets for each!




---
Gudmundur A. Thorisson, PhD student, Brookes lab
 Department of Genetics
 University of Leicester
 University Road
 Leicester, LE1 7RH, UK
 E-mail: [EMAIL PROTECTED]
 Tel: +44 (0)116 252-3055






Re: [mart-dev] MartBuilder for multi-table joins, how deep?

2007-06-03 Thread Guðmundur Árni Þórisson
Very nice. Thanks for the info Richard. I'll give 'force relation' a  
try and get back to you if I run into any trouble. sounds like it  
would do the trick for me.


 The 'restriction to relation' you explain in your other mail is  
something I'd seen but didn't see obviously how it was different from  
the other kind (which is no longer in 0.6 as you say). But it seems  
very useful and I'll probably have use for the 'hard restriction'  
feature as well. And/or the compound relation feature.  The  The  
'Explain ...' is extremely powerful from what I've seen so far. Lots  
of goodies in MBuilder to play with!!



Mummi

On 3 Jun 2007, at 19:46, Richard Holland wrote:

MBuilder follows all 1:1 and M:1 relations it can find, until there  
are no
more. It will never follow 1:M or N:M relations except in the case  
where a
1:M (never N:M) relation leads off the initial table or a table  
that is

reached purely by a chain of 1:1 relations, in which case they become
dimensions, which have new relation chains of their own starting at  
the

point they branch off the main table.

In MBuilder 0.5 you basically could like it or lump it! But in  
MBuilder

0.6 there is a 'force relation' feature which you can use to make it
follow a relation beyond the point where it would normally have  
stopped. I
won't go as far as Arek and call the lack of this option in 0.5 a  
'bug in

the algorithm'... I'd prefer to call the ability to include additional
relations beyond those covered by the default algorithm a 'new  
feature'
instead. :) After all, the algorithm must have a defined and  
predictable

stopping point else it would end up merging the entire schema into a
single table!

By forcing a relation you make MBuilder treat it as though it were  
another

1:1 relation in the chain (even if logically it is not), and it will
continue merging relations past that point until it reaches the next
logical stopping point. So you will usually end up with more relations
than the one you just clicked on included in the final table. You  
can then
use masking to remove the ones you don't want. Because the forced  
relation
is treated as a 1:1 relation this also means that you can  
potentially end

up with new dimensions popping up as well from 1:M relations it finds
beyond the forced relation, if the forced relation comes after an  
unbroken
chain of 1:1 relations from the starting point. There is no limit  
to the

number of relations you can force.

Also new in MBuilder 0.6 is the ability to alter the transformation
behaviour on a per-table or per-dataset basis, allowing you to make  
it use
different choices for each separate main or dimension table should  
you so
desire. Combined with the new compound relation feature which  
allows you
to follow one relation multiple times and apply different  
restrictions to
each iteration, this can allow MBuilder to gather much more of the  
schema
into a single dataset table, or even generate multiple dimensions  
from the

same 1:M  relation, with each one including different information.

cheers,
Richard

On Fri, June 1, 2007 10:13 pm, Arek Kasprzyk wrote:


On 1 Jun 2007, at 16:06, Guðmundur Árni Þórisson wrote:


I'm starting to use MartBuilder for real now and slowly getting the
hang of things. I've managed to create some simple datasets without
problems (just works!), but when going into more complex things I am
stumbling. Specifically, I do not see how to tell MBuilder to do
certain complex multi-table joins to pull various pieces of data
together.
   Here's an example of this sort of thing is genotype frequency  
data

I want to pull out of my database. Some tables hold data for filters
and/or attributes, others are merely linking tables:

Study->Experiment->Usedmarkerset->GenotypeFrequencyCluster-

GenotypeFrequency->[couple more tables]


  I see some indication of how MartBuilder is doing things when  
it has

created a dimension table out of two tables, and click 'Explain
table'. It's also doing something clever when I select multiple  
tables

in the 'Create datasets' dialog. But is this procedure entirely
automatic, offering no control beyond masking dimension tables  
*after*

MBuilder has created the dataset? How deep does it go, in terms of
following the PK-FK trail?



the algorithm follows the PK/FK and derived cardinalities.  
whenever it

hits 1:n or n:m it stops and creates a dm table
or in case of a dm table just stops.



 I guess I'm hunting for a clue as to whether I can add tables to a
dataset semi-manually, one by one?


you can override the algorithm and make MBuilder to merge table
according to you wishes.
Not sure if available in 0.5 but definitely possible in 0.6 Richard
will tell you more about it



 Or is it entirely a matter of telling MBuilder to do X number of
tables, and 

Re: [mart-dev] MartBuilder for multi-table joins, how deep?

2007-06-04 Thread Guðmundur Árni Þórisson

Understood. Thanks.


  Mummi

On 4 Jun 2007, at 09:25, Arek Kasprzyk wrote:



I agree with you entirely :-) This was never a bug in the  
algorithm, because the algorithm
has been around for 5 years now, years before even the idea of  
MBuilder was conceived and it worked correctly so far.
This was  a bug in the implementation of the algorithm. The 0.5  
version simply cut it's path too early (The focus point for the dm
transformation was  the first 1:m table rather than the first link  
table as it was customary for our earlier implementations - the  
starting point was implemented incorrectly. The stopping point was  
correct and it always was there anyway). The good news is that this  
is now fixed for 0.6 so people should not be 'short changed' on  
their tables anymore :)


Mummi - as a general rule, you should very rarely need to add  
tables to the path manually. This is a sign
of an unusual situation and breaking 1:n/n:m cutoff.  Most of the  
times you should be simply removing tables from the path as the  
algorithm
is 'greedy' in this respect and tend to include more tables that  
you normally require.


anyway, happy building
a.




[mart-dev] RE: MartEditor reditting problems

2007-06-05 Thread Guðmundur Árni Þórisson
Damian, I'm running into the same problems as Trevor with MartEditor:  
after several iterations of successful editing & exporting to db of a  
given dataset, MEditor starts importing configuration for the  
template ($datasetname_template). It subsequently insists on saving   
this back to the database, with no effect on the actual dataset I was  
trying to edit.


  I'll try to use the workaround Trevor mentioned to salvage my  
config, but annoying nonetheless. I am using martj from CVS, is this  
a lesser problem in the official 0.5 release perhaps? I am reluctant  
to invest much work in configuring datasets if this keeps happening.




BTW   In case it gives a clue in resolving this bug: when I do 'Save  
all' to save config XML to disk, I get one file named $dataset.xml  
and another named $dataset_template.template.xml for each dataset in  
my mart. Here's a sample diff of the first few elements in one of  
those pairs. Besides the extra filter in the second XML (the one I'm  
trying to add but can't!!), there's differences in the DatasetConfig  
element,I reckon that's where Trevor did his editing before loading  
back to the db. Dunno if that's normal for template vs non-template XML?




[EMAIL PROTECTED] /tmp/allelefrequency.xml  /tmp/ 
allelefrequency_template.template.xml

3c3,4
< displayName="AlleleFrequency" hideDisplay="false"  
interfaces="default" internalName="default" martUsers="default"  
modified="2007-06-05 14:52:41" softwareVersion="0.6"  
template="AlleleFrequency" type="TableSet" visible="1">

---
> displayName="AlleleFrequency" hideDisplay="false"  
interfaces="default" internalName="template" martUsers="default"  
modified="2007-06-05 11:44:23.589" softwareVersion="0.6"  
template="AlleleFrequency" type="TableSet" visible="1">

> 
10a12,16
> displayType="text" field="frequencyasproportion" hidden="false"  
hideDisplay="false" internalName="frequencyasproportion"  
key="AlleleFrequencyID_key" legal_qualifiers="=" qualifier="="  
tableConstraint="main" type="text">
> internalName="0.1" isSelectable="1" value="0.1" />
> internalName="0.3" isSelectable="1" value="0.3" />
> internalName="0.3" isSelectable="1" value="0.3" />

> 






 Mummi

Damian Smedley
Tue, 13 Mar 2007 08:55:48 -0800

sounds like the template file which is what you are really editing  
when

using MartEditor has lost its mapping to the actual datasetConfig XML.
This template code in MartEditor is not particularly robust yet so  
the XML
has probably got into a mess - do you want to send it to us and we  
can try

and spot what the problem is

cheers
Damian


On Tue, 13 Mar 2007, trevor paterson (RI) wrote:

> I am getting into difficulties trying to redit dataset configuration
> files using MartEditor...
>
> changes that i make don't seem to get 'Export'-ed back to the  
database -

> (though they do hang around in the Marteditor memory)
>
>
> eg
>
> if i have a dataset called maps...
>
> if i connect to the dataset witwith MartEditor and edit the  
config of

> filters on maps..
>
> 'Export' does not write changes back to the database config tables
>
> if I 'SaveAll' the edits do get saved to a file e.g  
maps.template.xml...

>
> I can then copy and rename this to maps.xml  (changing a couple of
> attributes in the file)
>
> and then overwrite the changes to the database using 'Upload All'  
- this

> then works
>
>
> however this is very laborious and error prone - are their known  
issues

> with this process, or any work arounds ? -
>
> the logic of the two files maps.xml and maps.template.xml is  
opaque to

> me
>
> cheers
>
>
>
>
> Trevor Paterson


Re: [mart-dev] RE: MartEditor reditting problems

2007-06-05 Thread Guðmundur Árni Þórisson
Thanks for the heads-up, Richard. I knew from before that there were  
plans to revamp the config system completely, but thought those plans  
had been postponed or whatever when MEditor stayed the largely the  
same (!).


  I'll try the MEditor 0.5 release for stability. I don't  think I'm  
doing anything fancy that requires 0.6. Unless MartBuilder 0.6  (from  
CVS) is incompatible with 0.5 MEditor which would cause me trouble  
elsewhere.



  Mummi

On 5 Jun 2007, at 16:33, Richard Holland wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I appreciate the problems you're having, but we are no longer  
developing
MartEditor after the 0.6 release (except for fixing show-stopping  
bugs).


Instead we are concentrating on writing an entirely new config tool  
from
scratch for the 0.7 release of BioMart. The 0.6 release is imminent  
and

the current intention is to release 0.7 in around 6 months time.

This new config tool will accompany a rethink of our entire
configuration system which is beginning to have scaling problems
associated with the ever-increasing range of species available in
Ensembl marts. It will also address the problems associated with the
parallel maintenance of the template- and dataset-config XMLs, one of
which you appear to have encountered already if I read your email  
correctly.


So all I can really say is hold in there... I promise it will get a  
lot

better after 0.7!

cheers,
Richard



Guðmundur Árni Þórisson wrote:

Damian, I'm running into the same problems as Trevor with MartEditor:
after several iterations of successful editing & exporting to db of a
given dataset, MEditor starts importing configuration for the  
template
($datasetname_template). It subsequently insists on saving  this  
back to
the database, with no effect on the actual dataset I was trying to  
edit.


  I'll try to use the workaround Trevor mentioned to salvage my  
config,

but annoying nonetheless. I am using martj from CVS, is this a lesser
problem in the official 0.5 release perhaps? I am reluctant to invest
much work in configuring datasets if this keeps happening.



BTW   In case it gives a clue in resolving this bug: when I do 'Save
all' to save config XML to disk, I get one file named $dataset.xml  
and

another named $dataset_template.template.xml for each dataset in my
mart. Here's a sample diff of the first few elements in one of those
pairs. Besides the extra filter in the second XML (the one I'm  
trying to

add but can't!!), there's differences in the DatasetConfig element,I
reckon that's where Trevor did his editing before loading back to the
db. Dunno if that's normal for template vs non-template XML?



[EMAIL PROTECTED] /tmp/allelefrequency.xml
/tmp/allelefrequency_template.template.xml
3c3,4
< displayName="AlleleFrequency" hideDisplay="false"  
interfaces="default"

internalName="default" martUsers="default" modified="2007-06-05
14:52:41" softwareVersion="0.6" template="AlleleFrequency"
type="TableSet" visible="1">
---

displayName="AlleleFrequency" hideDisplay="false"  
interfaces="default"

internalName="template" martUsers="default" modified="2007-06-05
11:44:23.589" softwareVersion="0.6" template="AlleleFrequency"
type="TableSet" visible="1">
internalName="AlleleFrequency" />

10a12,16


displayType="text" field="frequencyasproportion" hidden="false"
hideDisplay="false" internalName="frequencyasproportion"
key="AlleleFrequencyID_key" legal_qualifiers="=" qualifier="="
tableConstraint="main" type="text">


internalName="0.1" isSelectable="1" value="0.1" />


internalName="0.3" isSelectable="1" value="0.3" />


internalName="0.3" isSelectable="1" value="0.3" />









 Mummi

Damian Smedley
Tue, 13 Mar 2007 08:55:48 -0800

sounds like the template file which is what you are really  
editing when
using MartEditor has lost its mapping to the actual datasetConfig  
XML.
This template code in MartEditor is not particularly robust yet  
so the

XML
has probably got into a mess - do you want to send it to us and  
we can

try
and spot what the problem is

cheers
Damian


On Tue, 13 Mar 2007, trevor paterson (RI) wrote:

I am getting into difficulties trying to redit dataset  
configuration

files using MartEditor...

changes that i make don't seem to get 'Export'-ed back to the

database -

(though they do hang around in the Marteditor memory)


eg

if i have a dataset called maps...

if i connect to the dataset witwith MartEditor and edit the