David M. Goodstein
Joint Genome Institute / Lawrence Berkeley National Lab
Center for Integrative Genomics / UCBerkeley
On 18 Nov 2007, at 01:10, Arek Kasprzyk wrote:
On 18 Dec 2007, at 04:42, David M. Godostein wrote:
I think that would work quite well, as long as the regions could
also specify strand. Is that possible?
yes, strand is certainly possible. Could you also give us an idea
how many features (regions) does your typical
'intersection use case' consists of? (10s, 100s, 1000s, 10000s,
100000s etc?)
I must warn you that this kind of intersection scales quite poorly
for large number of regions
so would be good if we could assess the practicality of such a
solution beforehand. In the long run it will be possible to
properly optimize it
but in the short term you would have to cope with the performance
as it stands at the moment
if this achieves acceptable performance (15 to 30 seconds) for 10,000
regions, that's great. If it needs to be 1,000 or less, that's still
o.k, because I can programmatically break up the sets into smaller
non-overlapping sets (e.g, quicksort and partitioning to drop the
calc time by roughly a factor of the number of subsets) It would be
nice if the algorithm took care of this automatically, but not
essential.
--David
cheers,
a.
--David
Syed Haider wrote:
Hi David,
for both perl and webservice APIs, it will look like a normal filter
representing a genomic region (chr,start,end). If you see on Biomart
MartView, Ensembl Gene -> human, under filters you can see 'Encode
region' which has preset values. You should be able to assign
your own
set of values for the new genomic region filter just the same.
You would
be able to upload as many segments as you want to.
for instance, in webservice call,
<Filter name = "genomic_region" value = "7:115597757:117475182"/>
or
<Filter name = "genomic_region" value =
"7:115597757:117475182,1:100:100000,12:1000000,4000000"/>
equivalent perl API call would be
$query->addFilter("genomic_region", ["7:115597757:117475182"]);
or $query->addFilter("genomic_region",
["7:115597757:117475182,1:100:100000,12:1000000,4000000"]);
Hope that will enable you to feed biomart system with multiple
region
value.
regards
syed
On Sat, 2007-11-17 at 19:10 +0000, Arek Kasprzyk wrote:
On 14 Nov 2007, at 07:32, David M. Goodstein wrote:
I was wondering if there are any shortcuts that enable UCSC
table browser-style intersection queries in BioMart. The
typical application would be to grab all the genes that
overlap a given set of sequences (e.g., ESTs) aligned to the
reference genome. Or does one need to retrieve all the spans
for the alignments in questions and then directly query
BioMart for overlap with each span?
Hi David,
Syed (cc'ed on this email) looked into the fix in more detail
and it appears that we would be able to implement it with not
too much trouble.
This means that we could add it to the existing ensembl mart
config to be available for your even as early as ensembl 48
(scheduled for early december)
would it work for you?
Syed, could you give David a code snippet for the api and web
service of how this would work in theory so we could make sure
that this is what he wants before implementing anything? :)
cheers,
a.
regards,
-David
David M. Goodstein
Computational Genomics Group
Joint Genome Institute
Lawrence Berkeley National Lab
-------------------------------------------------------------------
----- -------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
-------------------------------------------------------------------
----- -------
----------------------------------------------------------------------
---------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
----------------------------------------------------------------------
---------