On 16 Feb 2007, at 21:11, Steffen Durinck wrote:
Hi Arek,
Today I noticed that Gramene is present in the central registry and I
tried to get access to it via biomaRt.
Getting the datasets and xml configuration file works fine but an
actual query fails on the biomaRt side.
The reason is a slight difference in output of the webservices:
For example the following query:
"<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
virtualSchemaName = 'default' count = '0' softwareVersion = '0.5'
requestId= 'biomaRt'> <Dataset name =
'athaliana_gene_ensembl'><Attribute name =
'ensembl_gene_id'/><ValueFilter name = 'chromosome_name' value = '1'
/></Dataset></Query>"
to http://www.gramene.org/Multi/martservice returns:
"AT1G78350-TAIR-G\t\nAT1G31960-TAIR-G\t\nAT1G30610-TAIR-
G\t\nAT1G13760-TAIR-G\t\nAT1G11060-TAIR-G\t\n"
While the similar query:
"<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
virtualSchemaName = 'default' count = '0' softwareVersion = '0.5'
requestId= 'biomaRt'> <Dataset name =
'hsapiens_gene_ensembl'><Attribute name =
'ensembl_gene_id'/><ValueFilter name = 'chromosome_name' value = '22'
/></Dataset></Query>"
to the Ensembl webservice returns:
"ENSG00000100151\nENSG00000100395\nENSG00000100399\n"
So there is an extra \t in the output of Gramene which makes my
results parser fail as it expects an extra data field after a \t. Are
there any rules on how the output of the martservices is formated
(especially the use of \n and \t) so I can make my results parser work
for all BioMart databases?
Hi Steffen,
hmm ... yes, there is a slight inconsistency there. The
new 0.5 prints results correctly through a formatter as:
return join($FIELD_DELIMITER, @{$row}) . $RECORD_DELIMITER;
while the old one does not quite correctly directly through martservice:
foreach my $result (@{$row}){
print "$result\t";
}
print "\n";
so you get an additional '\t'.
Let us fix strip this off this trailing ''\t' so it can get Gramene in
sync with 0.5
we'll get back to you when it is fixed
a.
Cheers,
Steffen
--
Steffen Durinck, Ph.D.
Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL: http://home.ccr.cancer.gov/oncology/oncogenomics/
Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877
------------------------------------------------------------------------
-------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
------------------------------------------------------------------------
-------