Hi
Can anyone help out with our problem with getting federated queries working? (see emails below) We would like the ensembl database to appear in the datasets section (with our own mart) not in the database section of the mart.
Junjun has not been available to continue replying to our problem. Hope you
can help...
Thanks
Jenny + Luca
-----Original Message-----
From: Mead, Jennifer
Sent: 16 June 2009 16:50
To: Junjun Zhang
Subject: RE: [mart-dev] Federating queries
Hi Junjun,
When we try to restart apache, we get this error message, (but we did not edit
the conf file it is referring to ourselves), and the mart fails to display at
all.
Syntax error on line 71 of /home/jenny/biomart/biomart-perl/conf/httpd.conf:
Invalid command 'PerlOptions', perhaps misspelled or defined by a module not
included in the server configuration
However, using an old version of the httpd conf file, before we made any
changes to the mart or registry file, we get the 2 databases, where you can
either have Ensembl or 'trans' datasets, so you cannot query all datasets
together. This is the problem we had before.
We have made the changes you described to the mart editor, but we still do not
get the desired result. Any ideas?
Thanks, J + L
________________________________________
From: Junjun Zhang [[email protected]]
Sent: 16 June 2009 15:03
To: Mead, Jennifer
Cc: [email protected]
Subject: RE: [mart-dev] Federating queries
Hi Jenny and Luca,
You may want to use the current version (ie, 54) of the Ensembl mart.
In order to federate your own mart with a fully functional Ensembl mart, you need to
have the following four entries in your registry file together with your own one
between the <virtualSchema> tags:
<MartDBLocation databaseType="mysql" host="martdb.ensembl.org" database="ensembl_mart_54" name="ensembl" displayName="ENSEMBL 54 GENES (SANGER UK)"
port="5316" schema="ensembl_mart_54" user="anonymous" password="" visible="1" default="1" martUser="" includeDatasets="" />
<MartDBLocation databaseType="mysql" host="martdb.ensembl.org" database="genomic_features_mart_54" name="genomic_features" displayName="ENSEMBL 54 GENOMIC FEATURES
(SANGER UK)" port="5316" schema="genomic_features_mart_54" user="anonymous" password="" visible="0" default="0" martUser=""
includeDatasets="" />
<MartDBLocation databaseType="mysql" host="martdb.ensembl.org" database="ontology_mart_54" name="ontology" displayName="ENSEMBL 54 ONTOLOGY (SANGER UK)"
port="5316" schema="ontology_mart_54" user="anonymous" password="" visible="0" default="0" martUser="" includeDatasets="" />
<MartDBLocation databaseType="mysql" host="martdb.ensembl.org" database="sequence_mart_54" name="sequence" displayName="ENSEMBL 54 SEQUENCE (SANGER UK)"
port="5316" schema="sequence_mart_54" user="anonymous" password="" visible="0" default="0" martUser="" includeDatasets="" />
As for defining Importable/Exportable pair, since Ensembl has already an
Exportable defined (see the attached screenshot), you will just need to define
an Importable in your dataset that matches Ensembl's Exportable. The other
attached screenshot shows how to define the Importable in your dataset.
Let us know how it goes, please feel free to get back to us should you have any
questions.
Best regards,
Junjun
________________________________
From: Mead, Jennifer [mailto:[email protected]]
Sent: Tuesday, June 16, 2009 9:02 AM
To: Junjun Zhang
Cc: [email protected]
Subject: RE: [mart-dev] Federating queries
Hi Junjun,
We are struggling to get federated queries working. We want to have our mart
db called 'trans' as the database, and two federated datasets: 1) ensemble gene
and 2) our own 'trans-A' dataset. Both ensembl and trans-A have
ensembl_gene_id, so we want to link on this. In trans this field is called
'ensg_id_1017' in ensembl this is 'ensembl_gene_id', but they correspond to the
same values.
Where we are so far:
1. We have added the URL location for the ensembl biomart to the registry
file as (as Syed suggested)
<MartURLLocation database="genomic_features_mart_53" default="0"
displayName="ENSEMBL 53 GENOMIC FEATURES (SANGER UK)"
host="www.biomart.org" includeDatasets="" martUser=""
name="genomic_features" path="/biomart/martservice" port="80"
serverVirtualSchema="default" visible="0" />
We added this between the <virtualSchema> tags, so there are 2 blocks of
text, one for trans (real) one for ensemble (URL virtual) within the virtualschema
tags. We're not sure we put this in the right position - should both trans and
ensemble marts be inside these tags?
2. We added importable/exportable specifiers in mart editor. But we don't know
if we did it correctly.
The specific values are probably wrong. Should internalname be the ensembl
field name or the trans field name in our case? And should it be the trans
field name for the exportable one, and the ensemble name for the importable
one? Also, what is the linkname in this case? The ensembl field name or the
trans one?
When we refresh the config file, restart apache etc. in our martview we see 2
databases, trans and ensembl. There is no way to query both at the same time,
only separately as discrete DBs. The link is not working between the two.
Can you help?
Thanks
Jenny and Luca
From: Junjun Zhang [mailto:[email protected]]
Sent: 29 April 2009 15:30
To: Mead, Jennifer
Subject: Re: [mart-dev] Federating queries
Hi Jennifer,
Just thought the following email I sent to other users may be hopeful to you as
well.
Please feel free to let us know if you have any further questions.
Regards,
Junjun
______________________________________________
Let me try to give you a brief explanation on how join query is done in BioMart.
Join of different datasets is done through Importable/Exportable pair
predefined using MartEditor. Importable acts as filter, it points to a filter
(which you have defined earlier under one of the FilterPages). Similarly,
Exportable acts as attribute, it points to an attribute. See the screenshot
below for an example of how Importable/Exportable pair is defined.
Once the pair is defined in MartEditor (Importable in one dataset, Exportable
in the other. For more detailed instruction, please see the document Christina
sent to you, page 10?), we then prepare a registry file, which includes both
datasets. finally we run bin/configure.pl to configure martview with the
registry settings. It is at this step, all the links (Importable/Exportable
pairs) are determined and stored.
Use the datasets in the screenshot as the example, now when you select both
datasets from MartView GUI, choose attributes from both dataset and set filters
on one or both dataset, then fire the query. Let's say dataset MSD would return
100 rows and dataset gene_ensembl would return 3000 rows if the query were done
independently on each dataset. But since there is a link between these two
datasets, dataset MSD will export all 100 'pdb_id' to dataset gene_ensembl as a
ID-list to filter (searching for matching IDs) the 3000 rows returned by
gene_ensembl, after filtering, only rows with matching pdb_ids (intersection
set) will be joined and reported, hence, join query.
As in your example with multiple datasets to be joined, we can do it by
chaining the process described above, ie, exporting IDs from dataset 1 to
dataset 2, exporting the intersection IDs to dataset 3, and on and on till the
last one. For non-standard IDs, as Christina pointed out you can keep standard
ID and all of it's synonyms in a dimension table and define a filter on the ID
synonym column. While filtering on this filter, matching any of the synonyms
will lead to retrieving the desired row.
As you may know, the current BioMart 0.7 release supports join of two datasets.
Multiple datasets join will be supported in the next release.
Hope this is useful. Please feel free to write to us should you have any
questions.
Junjun
[cid:519133713@16062009-1528]