Re: [Taverna-hackers] [mart-dev] BioMart Central server registry errors - taverna advice needed

Tom Oinn Thu, 31 Aug 2006 14:50:52 -0700

Arek Kasprzyk wrote:

On 23 Aug 2006, at 15:48, Tom Oinn wrote:
Damian Smedley wrote:
How about a clear policy as to what forms of access are legal - asensible service interface suggests that bulk querying is legitimatesurely?
I don't want to ban people doing bulk querying though putting all theIDs into one query is obviously much more efficient.
I'm not sure it's always equivalent though? I agree it's a problem,obviously if the server's going down you need to do something toresolve that but hopefully we can work out some kind of best practice/ code change in Taverna to help out as well.
David Withers is our developer for the biomart side of things, I nowknow relatively little of how it works internally but I believe he'son the list as well :)
Cheers,

Tom
ok, some more details about this problem. I hope we can work this outtogether as we do not wantto ban anybody from doing anything but simply to optimize the access soit is works in
an optimal way for taverna as well for us.
(apologies for the massive cross-posting but not sure what list all therelevant people are subscribed to :))please feel free to redirect, narrow down this discussion or evenreject if do not recognize taverna
request pattern :)

This should be going to taverna-hackers, everyone appropriate is onthere I think, add mart-dev if there are people your end who need to see it.

ok, here it goes:
BioMart central server went down twice after a series of over 100 000requests coming from a singlesource over a relatively short period of time. After analyzing theaccess logs and contacting the guys whowere firing those requests it seems that they have originated fromtaverna workflows.
the requests came in the following pattern:


<snip>

after further analyzing the logs it seems like those users wantedsequences for a ~300 ensembl transcripts. This in itself is a perfectlyvalid and sensible use case.However, what is unclear to me is why it is necessary to request eachsequence individually and more importantly why for each query thesoftware(taverna?) needs to undergo a full configuration (as above). surelythis could be done once and then be followed either by individualqueries if necessary orbetter still by less queries doing requests in batches. This isnormally is a light weight and sensible request when done properly. Fora comparisonI enclose below an example of exactly the same usage but sent as asingle query and a small perl script which quickly and harmlesslyretrieves it from our web-service so you can run and compare.

In this case that sounds plausible, the time when you want to run onequery per identifier is where you're getting more than one resultreturned and want to maintain the mapping from input to output. Thiscould be done by altering the workflow as well but the most obvious wayto use the biomart process within taverna will tend to make lots ofdistinct queries.

The issue with retrieval of dataset configs has I think been fixed inCVS but David can confirm or deny that. That should massively reduce thenumber of queries once we deploy the new code.


Cheers,

Tom

Re: [Taverna-hackers] [mart-dev] BioMart Central server registry errors - taverna advice needed

Reply via email to