Hi David,

it isnt the processing of coordinates which takes major proportion of the time, usually its the time taken by database to return the results. Could you please check if indices are in place for the table that serves the sequence coordinates ?

thanks
Syed


David M. Goodstein wrote:




On 7 Jul 2009, at 12:23, Syed Haider wrote:

Hi Rochak,

Rochak Neupane wrote:
When querying for sequences on a dataset set, without any filters (so as to grab complete set of sequences), marts seem to be quite slow. Pulling peptides for human, for example, took an excess of 7 minutes from ensembl mart. Grabbing peptide sequences from Caenorhabditis elegans (wormbase db, gene dataset from biomart.org <http://biomart.org>) also took about 7 or so minutes for a file that turns out to be 9MB. Our own mart is quite slow when querying a complete set of sequences from an organism. Is it typical for biomart to take 7-8minutes + when querying for whole genome sequences?

a- are you using GenomicSequence to retrieve sequences ?

b- do you have ORDER BY property set on sequence exportables (if 'a' is true). Setting ORDER BY slows down the response considerably and its only required to cope with inconsistencies in the row order that should really be fixed on the mart (database) construction level.

Best,
Syed


Removing the ORDER BY does improve performance (from 15 minutes down to 5 minutes for an unfiltered FASTA grab of approx 30k peptides), but still not really something that's user-tolerable. Is that really the expected behavior?

--David

David M. Goodstein
Joint Genome Institute / Lawrence Berkeley National Lab
Center for Integrative Genomics / UCBerkeley
http://www.phytozome.net



Thanks,
rochak

Reply via email to