Hi Steffen,
Can you try tyhe query again. The trailing '\t' should now have been
removed.
Thanks,
Will
On Fri, 16 Feb 2007, Arek Kasprzyk wrote:
On 16 Feb 2007, at 21:11, Steffen Durinck wrote:
Hi Arek,
Today I noticed that Gramene is present in the central registry and I
tried to get access to it via biomaRt.
Getting the datasets and xml configuration file works fine but an actual
query fails on the biomaRt side.
The reason is a slight difference in output of the webservices:
For example the following query:
"<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
virtualSchemaName = 'default' count = '0' softwareVersion = '0.5'
requestId= 'biomaRt'> <Dataset name = 'athaliana_gene_ensembl'><Attribute
name = 'ensembl_gene_id'/><ValueFilter name = 'chromosome_name' value = '1'
/></Dataset></Query>"
to http://www.gramene.org/Multi/martservice returns:
"AT1G78350-TAIR-G\t\nAT1G31960-TAIR-G\t\nAT1G30610-TAIR-G\t\nAT1G13760-TAIR-G\t\nAT1G11060-TAIR-G\t\n"
While the similar query:
"<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
virtualSchemaName = 'default' count = '0' softwareVersion = '0.5'
requestId= 'biomaRt'> <Dataset name = 'hsapiens_gene_ensembl'><Attribute
name = 'ensembl_gene_id'/><ValueFilter name = 'chromosome_name' value =
'22' /></Dataset></Query>"
to the Ensembl webservice returns:
"ENSG00000100151\nENSG00000100395\nENSG00000100399\n"
So there is an extra \t in the output of Gramene which makes my results
parser fail as it expects an extra data field after a \t. Are there any
rules on how the output of the martservices is formated (especially the use
of \n and \t) so I can make my results parser work for all BioMart
databases?
Hi Steffen,
hmm ... yes, there is a slight inconsistency there. The
new 0.5 prints results correctly through a formatter as:
return join($FIELD_DELIMITER, @{$row}) . $RECORD_DELIMITER;
while the old one does not quite correctly directly through martservice:
foreach my $result (@{$row}){
print "$result\t";
}
print "\n";
so you get an additional '\t'.
Let us fix strip this off this trailing ''\t' so it can get Gramene in sync
with 0.5
we'll get back to you when it is fixed
a.
Cheers,
Steffen
--
Steffen Durinck, Ph.D.
Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL: http://home.ccr.cancer.gov/oncology/oncogenomics/
Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877
-------------------------------------------------------------------------------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
-------------------------------------------------------------------------------