Arek Kasprzyk wrote: > On 23 Aug 2006, at 15:48, Tom Oinn wrote: > >> Damian Smedley wrote: >> >>>> How about a clear policy as to what forms of access are legal - a >>>> sensible service interface suggests that bulk querying is legitimate >>>> surely? >>> I don't want to ban people doing bulk querying though putting all the >>> IDs into one query is obviously much more efficient. >> I'm not sure it's always equivalent though? I agree it's a problem, >> obviously if the server's going down you need to do something to >> resolve that but hopefully we can work out some kind of best practice >> / code change in Taverna to help out as well. >> >> David Withers is our developer for the biomart side of things, I now >> know relatively little of how it works internally but I believe he's >> on the list as well :) >> >> Cheers, >> >> Tom >> > > ok, some more details about this problem. I hope we can work this out > together as we do not want > to ban anybody from doing anything but simply to optimize the access so > it is works in > an optimal way for taverna as well for us. > (apologies for the massive cross-posting but not sure what list all the > relevant people are subscribed to :)) > please feel free to redirect, narrow down this discussion or even > reject if do not recognize taverna > request pattern :) > > ok, here it goes: > BioMart central server went down twice after a series of over 100 000 > requests coming from a single > source over a relatively short period of time. After analyzing the > access logs and contacting the guys who > were firing those requests it seems that they have originated from > taverna workflows. > > the requests came in the following pattern: > > > - - [18/Aug/2006:22:12:33 +0100] "GET > /biomart/martservice?type=datasets&mart=sequence HTTP/1.1" 200 1503 > - - [18/Aug/2006:22:12:33 +0100] "GET > /biomart/martservice?type=datasets&mart=sequence HTTP/1.1" 200 1503 > - - [18/Aug/2006:22:12:34 +0100] "GET > /biomart/martservice?type=datasets&mart=snp HTTP/1.1" 200 640 > - - [18/Aug/2006:22:12:34 +0100] "GET > /biomart/martservice?type=datasets&mart=snp HTTP/1.1" 200 640 > - - [18/Aug/2006:22:12:34 +0100] "GET > /biomart/martservice?type=datasets&mart=vega HTTP/1.1" 200 343 > - - [18/Aug/2006:22:12:34 +0100] "GET > /biomart/martservice?type=datasets&mart=vega HTTP/1.1" 200 343 > - - [18/Aug/2006:22:12:34 +0100] "GET > /biomart/martservice?type=datasets&mart=uniprot HTTP/1.1" 200 490 > - - [18/Aug/2006:22:12:34 +0100] "GET > /biomart/martservice?type=datasets&mart=uniprot HTTP/1.1" 200 490 > - - [18/Aug/2006:22:12:35 +0100] "GET > /biomart/martservice?type=datasets&mart=msd HTTP/1.1" 200 74 > - - [18/Aug/2006:22:12:35 +0100] "GET > /biomart/martservice?type=datasets&mart=msd HTTP/1.1" 200 74 > - - [18/Aug/2006:22:12:35 +0100] "GET > /biomart/martservice?type=datasets&mart=wormbase HTTP/1.1" 200 336 > - - [18/Aug/2006:22:12:35 +0100] "GET > /biomart/martservice?type=datasets&mart=wormbase HTTP/1.1" 200 336 > - - [18/Aug/2006:22:12:35 +0100] "GET > /biomart/martservice? > type=configuration&dataset=hsapiens_genomic_sequence&virtualschema=defau > lt HTTP/1.1" 200 9161 > - - [18/Aug/2006:22:12:35 +0100] "GET > /biomart/martservice? > type=configuration&dataset=hsapiens_genomic_sequence&virtualschema=defau > lt HTTP/1.1" 200 9161 > - - [18/Aug/2006:22:12:35 +0100] "GET > /biomart/martservice? > query=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22UTF > -8%22%3F%3E%0D%0A%3C%21DOCTYPE+Query%3E%0D%0A%3CQuery+virtualSchemaName% > 3D%22default%22+count%3D%220%22%3E%3CDataset+name%3D%22hsapiens_gene_ens > embl_structure%22%3E%3CAttribute+name%3D%223utr_start%22+%2F%3E%3CAttrib > ute+name%3D%223utr_end%22+%2F%3E%3CAttribute+name%3D%22gene_stable_id_v% > 22+%2F%3E%3CAttribute+name%3D%22transcript_stable_id%22+%2F%3E%3CAttribu > te+name%3D%22str_chrom_name%22+%2F%3E%3C%2FDataset%3E%3CDataset+name%3D% > 22hsapiens_genomic_sequence%22%3E%3CAttribute+name%3D%223utr%22+%2F%3E%3 > C%2FDataset%3E%3CDataset+name%3D%22hsapiens_gene_ensembl%22%3E%3CFilter+ > name%3D%22ensembl_transcript_id%22+value%3D%22ENST00000358646%22+%2F%3E% > 3C%2FDataset%3E%3CLinks+source%3D%22hsapiens_gene_ensembl%22+target%3D%2 > 2hsapiens_gene_ensembl_structure%22+defaultLink%3D%22hsapiens_internal_t > ranscript_id%22+%2F%3E%3CLinks+source%3D%22hsapiens_gene_ensembl_structu > re%22+target%3D%22hsapiens_genomic_sequence%22+defaultLink%3D%223utr%22+ > %2F%3E%3C%2FQuery%3E%0D%0A HTTP/1.1" 200 5 > - - [18/Aug/2006:22:12:35 +0100] "GET > /biomart/martservice? > query=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22UTF > -8%22%3F%3E%0D%0A%3C%21DOCTYPE+Query%3E%0D%0A%3CQuery+virtualSchemaName% > 3D%22default%22+count%3D%220%22%3E%3CDataset+name%3D%22hsapiens_gene_ens > embl_structure%22%3E%3CAttribute+name%3D%223utr_start%22+%2F%3E%3CAttrib > ute+name%3D%223utr_end%22+%2F%3E%3CAttribute+name%3D%22gene_stable_id_v% > 22+%2F%3E%3CAttribute+name%3D%22transcript_stable_id%22+%2F%3E%3CAttribu > te+name%3D%22str_chrom_name%22+%2F%3E%3C%2FDataset%3E%3CDataset+name%3D% > 22hsapiens_genomic_sequence%22%3E%3CAttribute+name%3D%223utr%22+%2F%3E%3 > C%2FDataset%3E%3CDataset+name%3D%22hsapiens_gene_ensembl%22%3E%3CFilter+ > name%3D%22ensembl_transcript_id%22+value%3D%22ENST00000358646%22+%2F%3E% > 3C%2FDataset%3E%3CLinks+source%3D%22hsapiens_gene_ensembl%22+target%3D%2 > 2hsapiens_gene_ensembl_structure%22+defaultLink%3D%22hsapiens_internal_t > ranscript_id%22+%2F%3E%3CLinks+source%3D%22hsapiens_gene_ensembl_structu > re%22+target%3D%22hsapiens_genomic_sequence%22+defaultLink%3D%223utr%22+ > %2F%3E%3C%2FQuery%3E%0D%0A HTTP/1.1" 200 5 > > > > after further analyzing the logs it seems like those users wanted > sequences for a ~300 ensembl transcripts. This in itself is a perfectly > valid and sensible use case. > However, what is unclear to me is why it is necessary to request each > sequence individually and more importantly why for each query the > software > (taverna?) needs to undergo a full configuration (as above). surely > this could be done once and then be followed either by individual > queries if necessary or > better still by less queries doing requests in batches.
Hi Arek, I just want to check a couple of things as I investigate this. 1. Are you seeing the above sequence of requests happening 100,000 times? 2. Does each request happen twice or is the logger reporting it twice? Thanks, David. -- David Withers School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK. Tel: +44(0)161 275 0145
