On Nov 14, 2007, at 7:28 AM, Arek Kasprzyk wrote:
On 14 Nov 2007, at 07:32, David M. Goodstein wrote:
I was wondering if there are any shortcuts that enable UCSC table
browser-style intersection queries in BioMart. The typical
application would be to grab all the genes that overlap a given set
of sequences (e.g., ESTs) aligned to the reference genome. Or does
one need to retrieve all the spans for the alignments in questions
and then directly query BioMart for overlap with each span?
Hi David,
we are currently testing an additional functionality in mart where you
could intersect two gene lists e.g.
gene list A (uniprot) and gene list B (entrez genes) and retrieve an
intersection between them based
on mapping to the same core ids ei ensembl. is t his what you are
after or do you mean something else?
That would work if there is an a priori mapping between the lists, or
to a third set. I think this capability exists already in some form in
terms of the "Second dataset" option in BioMart. The problem for my
application is I would need to precompute that mapping (essentially
calculating the overlap between features in list 1 and list 2), and
including the linkage between them. That's not prohibitive, but it
would be static (I wouldn't be able, for example, to adjust the %id
threshold on-the-fly for the EST track and get back a different set of
overlapping genes.
What I'm really hoping for is a combination of Gbrowse's ability to
handle the upload and display of a user track in real-time and
BioMart's ability to filter a gene set (in my app, the filter would
include the requirement that members of the gene set overlap features
in the user track). It sounds like my best bet is to create a Gbrowse
plugin to handle this (it was always going to be a bit iffy to have
BioMart handle temporay user-uploaded feature sets), but I wanted to
avoid reinventing the wheel (and get the bonus of the existing BioMart
filtering capabilities) if possible.
a.
regards,
-David
David M. Goodstein
Computational Genomics Group
Joint Genome Institute
Lawrence Berkeley National Lab
-----------------------------------------------------------------------
--------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
-----------------------------------------------------------------------
--------
David M. Goodstein
Computational Genomics Group
Joint Genome Institute
Lawrence Berkeley National Lab