On Nov 14, 2007, at 7:28 AM, Arek Kasprzyk wrote:


On 14 Nov 2007, at 07:32, David M. Goodstein wrote:

I was wondering if there are any shortcuts that enable UCSC table browser-style intersection queries in BioMart. The typical application would be to grab all the genes that overlap a given set of sequences (e.g., ESTs) aligned to the reference genome. Or does one need to retrieve all the spans for the alignments in questions and then directly query BioMart for overlap with each span?


Hi David,
we are currently testing an additional functionality in mart where you could intersect two gene lists e.g. gene list A (uniprot) and gene list B (entrez genes) and retrieve an intersection between them based on mapping to the same core ids ei ensembl. is t his what you are after or do you mean something else?

That would work if there is an a priori mapping between the lists, or to a third set. I think this capability exists already in some form in terms of the "Second dataset" option in BioMart. The problem for my application is I would need to precompute that mapping (essentially calculating the overlap between features in list 1 and list 2), and including the linkage between them. That's not prohibitive, but it would be static (I wouldn't be able, for example, to adjust the %id threshold on-the-fly for the EST track and get back a different set of overlapping genes.

What I'm really hoping for is a combination of Gbrowse's ability to handle the upload and display of a user track in real-time and BioMart's ability to filter a gene set (in my app, the filter would include the requirement that members of the gene set overlap features in the user track). It sounds like my best bet is to create a Gbrowse plugin to handle this (it was always going to be a bit iffy to have BioMart handle temporay user-uploaded feature sets), but I wanted to avoid reinventing the wheel (and get the bonus of the existing BioMart filtering capabilities) if possible.


a.


regards,
-David


David M. Goodstein
Computational Genomics Group
Joint Genome Institute
Lawrence Berkeley National Lab



----------------------------------------------------------------------- --------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
----------------------------------------------------------------------- --------




David M. Goodstein
Computational Genomics Group
Joint Genome Institute
Lawrence Berkeley National Lab

Reply via email to