On 12 Mar 2007, at 18:53, Julian Catchen wrote:

That worked like a charm -- I am now downloading my sequence data. The web services interface is slick and a whole lot cleaner than the previous URL GET method.


:-)



One detail with regard to the XML query builder: it doesn't add formatting options to the output, even if you selected them in the web interface, which might be useful to document a little better.

This is probably even better fixed rather than documented ;) I noticed the same problem. Formatting options should be automatically added.

That leads me to one last question: how do I specify that I want the downloaded data to be gzipped?

not sure if this is currently supported but let us look into this. We'll see what we can do about it

 Last last question: is there a published DTD for the query format?


I am afraid not, we tend to rely on people compiling xml through martview and the format is really trivial but certainly we could publish that as you suggest

a.


Thanks very much,

julian

Here is an example query that will produce a FASTA file of cDNA sequences, in case anyone else was interested:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" Header = "1" formatter = "FASTA" count = "" softwareVersion = "0.5" >
<Dataset name = "hsapiens_gene_ensembl" interface = "default" >
    <Attribute name = "cdna" />
    <Attribute name = "str_chrom_name" />
    <Attribute name = "gene_stable_id_v" />
    <Attribute name = "transcript_stable_id_v" />
    <Attribute name = "translation_stable_id_v" />
    <Attribute name = "transcript_chrom_strand" />
</Dataset>
</Query>

Arek Kasprzyk wrote:
On 12 Mar 2007, at 18:00, Julian Catchen wrote:
Hi Arek,

Thanks very much for the reply. When I press the XML button from within martview (looking for cDNA sequences for human) it only gives me the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" Header = "1" count = "" softwareVersion = "0.5" >
</Query>

The web interface still delivers the correct data, so I am faily sure I am asking for the right things, however, I am still unable to construct an XML query that will give me FASTA-formatted sequence data.

When I only ask for sequence IDs, such as in the example you posted in your message below, the XML output works, and I get a proper copy of the XML query.

Any additional help or documentation would be greatly appreciated--

julian

Hi Julian,
there seems to be a small but annoying bug in the XML dumping. If you remove a 'biotype' attribute the silly thing resets itself giving you an 'empty' XML as in your example above. We'll be dealing with this problem shortly. Meanwhile if you want to remove biotype you need re-check the attributes again to get a correct XML. Annoyingly this seems to only affect this particular header attribute in sequences, the rest seems to be working fine
please give us a shout if spot anything else,
a.
Arek Kasprzyk wrote:
On 12 Mar 2007, at 03:48, Julian Catchen wrote:
Hello,

Does anyone have any example XML queries they are using to poll the Ensembl biomart interface? I have gotten some simple examples working that pull down lists of ensembl IDs by using examples from the documentation. However, I can't seem to find any examples of how to query for FASTA formatted cDNA sequences or translations. Also, I can't find any documentation of how to request gzipped data.

Hi Julian,
please go to www.biomart.org/biomart/martview, create your favourite query using MView and click XML button. This will give you the exact xml format required for your web service query. In principle anything that you can do with MView you should be also able to do with webservice XML. If not, that we need
to fix it.
In order to invoke a formatter you simply need to add 'formatter="FASTA' to your xml query. For example the below query will give you peptides from chromosome 22 in FASTA format:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" Header = "1" count = "" softwareVersion = "0.5" formatter="FASTA" > <Dataset name = "hsapiens_gene_ensembl" interface = "default" >
                <Attribute name = "peptide" />
                <Attribute name = "str_chrom_name" />
                <Attribute name = "gene_stable_id" />
                <Attribute name = "biotype" />
                <Filter name = "chromosome_name" value = "22"/>
        </Dataset>
</Query>
you can run this query using the webExample.pl script:
http://cvs.sanger.ac.uk/cgi-bin/viewcvs.cgi/biomart-perl/scripts/ webExample.pl?view=markup
I used to have all of these automated through simple URL GET queries that no longer seem to work with Ensembl post version 41.

URL GET query still work but the format has changed. We have not yet documented it properly. I am cc-ing your email to mart-dev so someone from there will send you a few examples
hope that helps,
a.
Any pointers to examples or documentation for XML queries would be appreciated.

Thanks,

julian

-------------------------------------------------------------------- ----------- Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
-------------------------------------------------------------------- -----------

--Julian M Catchen
Computer and Information Science |
Institute of Neuroscience        | [EMAIL PROTECTED]
University of Oregon | http://www.cs.uoregon.edu/~catchen/

---------------------------------------------------------------------- --------- Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
---------------------------------------------------------------------- ---------

--
Julian M Catchen
Computer and Information Science |
Institute of Neuroscience        | [EMAIL PROTECTED]
University of Oregon             | http://www.cs.uoregon.edu/~catchen/



------------------------------------------------------------------------ -------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
------------------------------------------------------------------------ -------



Reply via email to