The various public BioMart servers are made available with average-case usage 
in mind. They're configured to give the best performance to a reasonable number 
of small queries. Large single queries or large numbers of repeated small 
queries can really cause them problems unless configured to cope, but this 
configuration would come at the expense of being able to adequately service the 
normal workload. Server admins therefore tend to configure for the average case 
as this keeps the largest number of people happy.

As with most public resources of this nature, many server admins also have 
policies about restricting access from IPs that over-use the service, in order 
to prevent the service from dying or otherwise degrading in quality for other 
users.

If you need to do large queries or lots of queries, there are three options 
usually:

  1. Reassess your query - do you really need to do it that way, are there 
alternative methods, can you filter your data more effectively (e.g. instead of 
a long list of genes, specify a range that includes them all and post-process 
the results to remove the ones that you're not interested in), can you reduce 
the number of attributes (e.g. do you need gene name returned if you've only 
filtered for one gene).

  2. Modify your query so that you can break it into lots of small ones, and 
space the small ones over time so that they don't overload the server or cause 
your IP to get noticed and become blocked by the admins.

  3. install a local copy of BioMart so that you can configure it to cope with 
the pattern of queries you need. Ideally you'd also want local copies of the 
databases involved if you believe that the database connections themselves may 
be getting blocked by admins.

cheers,
Richard

On 16 Nov 2009, at 12:44, Jennifer A. Drummond wrote:

> On Mon, Nov 16, 2009 at 09:41:47AM +0000, Syed Haider wrote:
>> Hi Ariel,
>> 
>> I am pretty confident that its because of size of SNP dataset that takes 
>> a long time to run large queries. The safest with SNP dataset is to use 
>> web interface and ask for results as Email option. that would work.
>> 
>> Best,
>> Syed
> 
> I wouldn't normally hijack a thread, but the original poster's question 
> here is so similar to mine from yesterday....
> 
> I was also having trouble pulling a SNP dataset, because I was filtering 
> on human-mouse homologs, which meant putting about 10K gene IDs in the 
> ensembl_gene filter box. This gave timeouts or URL-length errors even in 
> email mode.
> 
> Since I only wanted counts of SNPs per gene anyway, I wrote a script to 
> pull the counts for each individual gene, and that worked fine for a 
> while. But then I got a socket error at about the three thousandth query, 
> and kept getting it when I repeated the query -- even from other machines. 
> Moreover, the machine I'd been using originally stopped being able to pull 
> up Biomart at all, even in a browser; it got server-closed-connection 
> errors.
> 
> The next day, I tried again and everything was suddenly fine; the rest of 
> my queries ran without incident.
> 
> So my question is: did I run into some sort of number of queries limit or 
> bandwidth limit or something, or was it just some kind of odd session bug? 
> 
> It's not just an idle question, because I might need to do some more runs 
> of this type. Also, if there's an obviously better way of getting this 
> sort of data (counts of SNPs per gene), let me know...I may very well be 
> missing something. Thanks!
> 
> =-=-> Jennifer Drummond // [email protected]
> 

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: [email protected]
http://www.eaglegenomics.com/

Reply via email to