Arek Kasprzyk wrote:
> On 23 Aug 2006, at 15:48, Tom Oinn wrote:
> 
>> Damian Smedley wrote:
>>
>>>> How about a clear policy as to what forms of access are legal - a  
>>>> sensible service interface suggests that bulk querying is legitimate  
>>>> surely?
>>> I don't want to ban people doing bulk querying though putting all the  
>>> IDs into one query is obviously much more efficient.
>> I'm not sure it's always equivalent though? I agree it's a problem,  
>> obviously if the server's going down you need to do something to  
>> resolve that but hopefully we can work out some kind of best practice  
>> / code change in Taverna to help out as well.
>>
>> David Withers is our developer for the biomart side of things, I now  
>> know relatively little of how it works internally but I believe he's  
>> on the list as well :)
>>
>> Cheers,
>>
>> Tom
>>
> 
> ok, some more details about this problem. I hope we can work this out  
> together as we do not want
> to ban anybody from doing anything but simply to optimize the access so  
> it is works in
> an optimal way for taverna as well for us.
> (apologies for the massive cross-posting but not sure what list all the  
> relevant people are subscribed to :))
> please feel free to redirect, narrow down this discussion or even  
> reject if do not recognize taverna
> request pattern :)
> 
> ok, here it goes:
> BioMart central server went down twice after a series of over 100 000  
> requests coming from a single
> source over a relatively short period of time. After analyzing the  
> access logs and contacting the guys who
> were firing those requests it seems that they have originated from  
> taverna workflows.
> 
> the requests came in the following pattern:
> 
> 
>   - - [18/Aug/2006:22:12:33 +0100] "GET  
> /biomart/martservice?type=datasets&mart=sequence HTTP/1.1" 200 1503
>   - - [18/Aug/2006:22:12:33 +0100] "GET  
> /biomart/martservice?type=datasets&mart=sequence HTTP/1.1" 200 1503
>   - - [18/Aug/2006:22:12:34 +0100] "GET  
> /biomart/martservice?type=datasets&mart=snp HTTP/1.1" 200 640
>   - - [18/Aug/2006:22:12:34 +0100] "GET  
> /biomart/martservice?type=datasets&mart=snp HTTP/1.1" 200 640
>   - - [18/Aug/2006:22:12:34 +0100] "GET  
> /biomart/martservice?type=datasets&mart=vega HTTP/1.1" 200 343
>   - - [18/Aug/2006:22:12:34 +0100] "GET  
> /biomart/martservice?type=datasets&mart=vega HTTP/1.1" 200 343
>   - - [18/Aug/2006:22:12:34 +0100] "GET  
> /biomart/martservice?type=datasets&mart=uniprot HTTP/1.1" 200 490
>   - - [18/Aug/2006:22:12:34 +0100] "GET  
> /biomart/martservice?type=datasets&mart=uniprot HTTP/1.1" 200 490
>   - - [18/Aug/2006:22:12:35 +0100] "GET  
> /biomart/martservice?type=datasets&mart=msd HTTP/1.1" 200 74
>   - - [18/Aug/2006:22:12:35 +0100] "GET  
> /biomart/martservice?type=datasets&mart=msd HTTP/1.1" 200 74
>   - - [18/Aug/2006:22:12:35 +0100] "GET  
> /biomart/martservice?type=datasets&mart=wormbase HTTP/1.1" 200 336
>   - - [18/Aug/2006:22:12:35 +0100] "GET  
> /biomart/martservice?type=datasets&mart=wormbase HTTP/1.1" 200 336
>   - - [18/Aug/2006:22:12:35 +0100] "GET  
> /biomart/martservice? 
> type=configuration&dataset=hsapiens_genomic_sequence&virtualschema=defau 
> lt HTTP/1.1" 200 9161
>   - - [18/Aug/2006:22:12:35 +0100] "GET  
> /biomart/martservice? 
> type=configuration&dataset=hsapiens_genomic_sequence&virtualschema=defau 
> lt HTTP/1.1" 200 9161
>   - - [18/Aug/2006:22:12:35 +0100] "GET  
> /biomart/martservice? 
> query=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22UTF 
> -8%22%3F%3E%0D%0A%3C%21DOCTYPE+Query%3E%0D%0A%3CQuery+virtualSchemaName% 
> 3D%22default%22+count%3D%220%22%3E%3CDataset+name%3D%22hsapiens_gene_ens 
> embl_structure%22%3E%3CAttribute+name%3D%223utr_start%22+%2F%3E%3CAttrib 
> ute+name%3D%223utr_end%22+%2F%3E%3CAttribute+name%3D%22gene_stable_id_v% 
> 22+%2F%3E%3CAttribute+name%3D%22transcript_stable_id%22+%2F%3E%3CAttribu 
> te+name%3D%22str_chrom_name%22+%2F%3E%3C%2FDataset%3E%3CDataset+name%3D% 
> 22hsapiens_genomic_sequence%22%3E%3CAttribute+name%3D%223utr%22+%2F%3E%3 
> C%2FDataset%3E%3CDataset+name%3D%22hsapiens_gene_ensembl%22%3E%3CFilter+ 
> name%3D%22ensembl_transcript_id%22+value%3D%22ENST00000358646%22+%2F%3E% 
> 3C%2FDataset%3E%3CLinks+source%3D%22hsapiens_gene_ensembl%22+target%3D%2 
> 2hsapiens_gene_ensembl_structure%22+defaultLink%3D%22hsapiens_internal_t 
> ranscript_id%22+%2F%3E%3CLinks+source%3D%22hsapiens_gene_ensembl_structu 
> re%22+target%3D%22hsapiens_genomic_sequence%22+defaultLink%3D%223utr%22+ 
> %2F%3E%3C%2FQuery%3E%0D%0A HTTP/1.1" 200 5
>   - - [18/Aug/2006:22:12:35 +0100] "GET  
> /biomart/martservice? 
> query=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22UTF 
> -8%22%3F%3E%0D%0A%3C%21DOCTYPE+Query%3E%0D%0A%3CQuery+virtualSchemaName% 
> 3D%22default%22+count%3D%220%22%3E%3CDataset+name%3D%22hsapiens_gene_ens 
> embl_structure%22%3E%3CAttribute+name%3D%223utr_start%22+%2F%3E%3CAttrib 
> ute+name%3D%223utr_end%22+%2F%3E%3CAttribute+name%3D%22gene_stable_id_v% 
> 22+%2F%3E%3CAttribute+name%3D%22transcript_stable_id%22+%2F%3E%3CAttribu 
> te+name%3D%22str_chrom_name%22+%2F%3E%3C%2FDataset%3E%3CDataset+name%3D% 
> 22hsapiens_genomic_sequence%22%3E%3CAttribute+name%3D%223utr%22+%2F%3E%3 
> C%2FDataset%3E%3CDataset+name%3D%22hsapiens_gene_ensembl%22%3E%3CFilter+ 
> name%3D%22ensembl_transcript_id%22+value%3D%22ENST00000358646%22+%2F%3E% 
> 3C%2FDataset%3E%3CLinks+source%3D%22hsapiens_gene_ensembl%22+target%3D%2 
> 2hsapiens_gene_ensembl_structure%22+defaultLink%3D%22hsapiens_internal_t 
> ranscript_id%22+%2F%3E%3CLinks+source%3D%22hsapiens_gene_ensembl_structu 
> re%22+target%3D%22hsapiens_genomic_sequence%22+defaultLink%3D%223utr%22+ 
> %2F%3E%3C%2FQuery%3E%0D%0A HTTP/1.1" 200 5
> 
> 
> 
> after further analyzing the logs it seems like those users wanted  
> sequences for a ~300 ensembl transcripts. This in itself is a perfectly  
> valid and sensible use case.
> However, what is unclear to me is why it is necessary to request each  
> sequence individually and more importantly why for each query  the  
> software
> (taverna?) needs to undergo a full configuration (as above). surely  
> this could be done once and then be followed either by individual  
> queries if  necessary or
> better still by less queries doing requests in batches. 

Hi Arek,

I just want to check a couple of things as I investigate this.

1. Are you seeing the above sequence of requests happening 100,000 times?
2. Does each request happen twice or is the logger reporting it twice?

Thanks,

David.

-- 
David Withers
School of Computer Science, University of Manchester,
Oxford Road, Manchester, M13 9PL, UK.
Tel: +44(0)161 275 0145

Reply via email to