On 11 May 2006, at 16:29, Amir Karger wrote:
From: Arek Kasprzyk [mailto:[EMAIL PROTECTED]
On 9 May 2006, at 21:11, Amir Karger wrote:
you need to watch for attributes which are named as follows:
"dataset.attribute". These are so called pointer (placeholder)
attributes
and they refer to a different dataset from the one they
presented in so
in your example you cannot use them in hsapiens_gene_ensembl
but only
in
hsapiens_gene_ensembl_structure.
This rather annoying arrangement for pointer attributes is
going to be
removed
in 0.5.
However for now
you need to either find the appropriate dataset for them yourself or
just
skip any with "dataset.attribute" format
Perhaps I'm missing something. I want to input HGNC gene names (like
BRCA1) and get out FASTA sequences for all exons in those genes. I
would
like the header to have chromosome & gene ID along with exon start,
end,
strand. I'm able to do that in martView without selecting a second
dataset. Does it magically work there, but not in biomart-plib (0.4)?
Hi Amir,
In order to use sequence dataset you also need (invisible) structure
dataset which
contains placeholder attributes which get selected in MView under FASTA
header. Behind the scenes MVIew finds out that these are the
placeholders
and attaches structure dataset to which they belong. For an example
on how to do it explicitly please look at
http://cvs.sanger.ac.uk/cgi-bin/viewcvs.cgi/biomart-plib/scripts/
sequenceExample.pl?rev=1.1.2.2&view=markup
MartView only presents visible datasets to the user as 'second dataset'
choice ei
bigmac: ~[arek] mysql -uanonymous -hensembldb.ensembl.org -e'select
dataset from meta_conf__dataset__main where visible="1"'
ensembl_mart_38
+----------------------------+
| dataset |
+----------------------------+
| agambiae_gene_ensembl |
| amellifera_gene_ensembl |
| btaurus_gene_ensembl |
| hsapiens_gene_ensembl |
| cfamiliaris_gene_ensembl |
| cintestinalis_gene_ensembl |
| dmelanogaster_gene_ensembl |
| drerio_gene_ensembl |
| frubripes_gene_ensembl |
| ggallus_gene_ensembl |
| mdomestica_gene_ensembl |
| mmulatta_gene_ensembl |
| mmusculus_gene_ensembl |
| ptroglodytes_gene_ensembl |
| rnorvegicus_gene_ensembl |
| scerevisiae_gene_ensembl |
| tnigroviridis_gene_ensembl |
| xtropicalis_gene_ensembl |
| celegans_gene_ensembl |
+----------------------------+
bigmac: ~[arek]
I admit that the current (0.4) logic of driving both API and web
service queries is slightly complicated as it requires
for people to understand visible and invisible datasets, placeholders,
links and in more complex cases
pretty much the execution path of the query.
All of it will be removed in 0.5 which will not require for clients to
deal with any of it
explicitly and instead will be dealt with by the library
a.
a.
Thanks,
-Amir
a.
--------------------------------------------------------------
----------
-------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
--------------------------------------------------------------
----------
-------
------------------------------------------------------------------------
-------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
------------------------------------------------------------------------
-------