Let me try to answer this question...

On 03/09/10 12:47, Vasileios Kontorinis wrote:
Akara and Shanti hi,
    Now that Olio works I have a ton more questions... :-)
1) Are there any plans in the future to support dynamic addition/removal of servers (web,db,filestore) during steady state. e.g. I realize that the webserver is the performance bottleneck, because it cannot keep up with the requests. One way to fix this is to add more servers.\

Olio is an open source project and I'm really glad to hear feedback. This is certainly a desired feature but not yet a concrete plan to deliver such functionality. I really encourage you to file a Jira on this. If you want this very soon, I'd really encourage contributing this to Olio.

2) Are there any ways for figuring out where that actual bottleneck is (besides the http://blogs.sun.com/shanti/entry/olio_on_nehalem) other than high level tools sush as vmstat,mpstat, sar? In my setup I believe the IO to be the limiting factor.

iostat certainly provides you plenty of info on I/O bottlenecks. If on Solaris, there are plenty of things you can do based on dtrace. Unfortunately, the same does not apply to Linux.

As I scale the number of users I see the percentage of cpu for the webservers spent waiting for IO to increase tll is saturates around 40-60% on average.(sys and user , increase as well but at a much slower pace) for 1000 concurrent users. However, I monitor the network traffic and it does not seem as the Gb ethernet is saturated. (I am reading/writing ~ 10-15MB/s while it should support up to 125MB/s , one ethernet interface that all the virtual machines are using)

Small request/response sizes hamper throughput.

In terms of the harddrive now, the filestores have write throughput of around 2-4MB/s and read around 0.1-0.5MB/s , while the webservers and db much less (webservers write ~0.5-1MB/s, read too small, db write 0.1-0.3 MB/s, read too small -- with memcache) and it feels like the there should be more bandwidth there.
Still I get failing results for EventDetail.
Is there any way to monitor the internals of olio, something like a breakdown of the end-to-end response time ( I know this is tough but even approximate would be a nice feature). Any knowledge beyond just the average response time / throughput would be beneficial.

Surely not as part of Olio. Dtrace can help a lot.

3) If someone wanted to create a model for Olio performance (I know this defeats the purpose of creating a benchmark to measure things :-) how would he model the interaction between the different tiers? There have been suggestions in literature to apply queueing theory for such purposes (especially for Rubis like benchmarks). So each Olio operation requires a number of web server accesses (6 on average) to generate the html content and db queries (10 on average) for the data [/deploying web 2.0 applications on sun servers and the opensolaris™ operating system/]. If we had a way to monitor average delay for each webserver/db request as well as the geocoder requests then a model is possible, right?
Have you ever done something similar for debugging purposes?

We have looked into this a bit more. Since you mentioned OpenSolaris, you can certainly use dtrace to find out such latencies.


It would also be nice if we can have this info per server, e.g. the specific web-server is over-committed, or this db is under-committed let's put some more requests there.

Yes, this is not automated. You need to know the probes for each layer. There are certainly MySQL probes you can use (google for it).

Thanks,
-Akara

Reply via email to