David M. Goodstein
Joint Genome Institute / Lawrence Berkeley National Lab
Center for Integrative Genomics / UCBerkeley



On 18 Nov 2007, at 01:10, Arek Kasprzyk wrote:


On 18 Dec 2007, at 04:42, David M. Godostein wrote:

I think that would work quite well, as long as the regions could also specify strand. Is that possible?


yes, strand is certainly possible. Could you also give us an idea how many features (regions) does your typical 'intersection use case' consists of? (10s, 100s, 1000s, 10000s, 100000s etc?) I must warn you that this kind of intersection scales quite poorly for large number of regions so would be good if we could assess the practicality of such a solution beforehand. In the long run it will be possible to properly optimize it but in the short term you would have to cope with the performance as it stands at the moment


if this achieves acceptable performance (15 to 30 seconds) for 10,000 regions, that's great. If it needs to be 1,000 or less, that's still o.k, because I can programmatically break up the sets into smaller non-overlapping sets (e.g, quicksort and partitioning to drop the calc time by roughly a factor of the number of subsets) It would be nice if the algorithm took care of this automatically, but not essential.
--David


cheers,
a.


--David


Syed Haider wrote:
Hi David,
for both perl and webservice APIs, it will look like a normal filter
representing a genomic region (chr,start,end). If you see on Biomart
MartView, Ensembl Gene -> human, under filters you can see 'Encode
region' which has preset values. You should be able to assign your own set of values for the new genomic region filter just the same. You would
be able to upload as many segments as you want to.

for instance, in webservice call,

<Filter name = "genomic_region" value = "7:115597757:117475182"/>

or

<Filter name = "genomic_region" value =
"7:115597757:117475182,1:100:100000,12:1000000,4000000"/>

equivalent perl API call would be
$query->addFilter("genomic_region", ["7:115597757:117475182"]);
or $query->addFilter("genomic_region",
["7:115597757:117475182,1:100:100000,12:1000000,4000000"]);

Hope that will enable you to feed biomart system with multiple region
value.

regards
syed


On Sat, 2007-11-17 at 19:10 +0000, Arek Kasprzyk wrote:

On 14 Nov 2007, at 07:32, David M. Goodstein wrote:


I was wondering if there are any shortcuts that enable UCSC table browser-style intersection queries in BioMart. The typical application would be to grab all the genes that overlap a given set of sequences (e.g., ESTs) aligned to the reference genome. Or does one need to retrieve all the spans for the alignments in questions and then directly query BioMart for overlap with each span?


Hi David,
Syed (cc'ed on this email) looked into the fix in more detail and it appears that we would be able to implement it with not too much trouble. This means that we could add it to the existing ensembl mart config to be available for your even as early as ensembl 48 (scheduled for early december)
would it work for you?

Syed, could you give David a code snippet for the api and web service of how this would work in theory so we could make sure that this is what he wants before implementing anything? :)

cheers,
a.




regards,
-David


David M. Goodstein
Computational Genomics Group
Joint Genome Institute
Lawrence Berkeley National Lab


------------------------------------------------------------------- ----- -------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
------------------------------------------------------------------- ----- -------









---------------------------------------------------------------------- ---------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
---------------------------------------------------------------------- ---------




Reply via email to