Let me try to answer this question...
On 03/09/10 12:47, Vasileios Kontorinis wrote:
Akara and Shanti hi,
Now that Olio works I have a ton more questions... :-)
1) Are there any plans in the future to support dynamic addition/removal
of servers (web,db,filestore) during steady state. e.g. I realize that
the webserver is the performance bottleneck, because it cannot keep up
with the requests. One way to fix this is to add more servers.\
Olio is an open source project and I'm really glad to hear feedback.
This is certainly a desired feature but not yet a concrete plan to
deliver such functionality. I really encourage you to file a Jira on
this. If you want this very soon, I'd really encourage contributing this
to Olio.
2) Are there any ways for figuring out where that actual bottleneck is
(besides the http://blogs.sun.com/shanti/entry/olio_on_nehalem) other
than high level tools sush as vmstat,mpstat, sar? In my setup I believe
the IO to be the limiting factor.
iostat certainly provides you plenty of info on I/O bottlenecks. If on
Solaris, there are plenty of things you can do based on dtrace.
Unfortunately, the same does not apply to Linux.
As I scale the number of users I see
the percentage of cpu for the webservers spent waiting for IO to
increase tll is saturates around 40-60% on average.(sys and user ,
increase as well but at a much slower pace) for 1000 concurrent users.
However, I monitor the network traffic and it does not seem as the Gb
ethernet is saturated. (I am reading/writing ~ 10-15MB/s while it should
support up to 125MB/s , one ethernet interface that all the virtual
machines are using)
Small request/response sizes hamper throughput.
In terms of the harddrive now, the filestores have write throughput of
around 2-4MB/s and read around 0.1-0.5MB/s , while the webservers and db
much less (webservers write ~0.5-1MB/s, read too small, db write 0.1-0.3
MB/s, read too small -- with memcache) and it feels like the there
should be more bandwidth there.
Still I get failing results for EventDetail.
Is there any way to monitor the internals of olio, something like a
breakdown of the end-to-end response time ( I know this is tough but
even approximate would be a nice feature). Any knowledge beyond just the
average response time / throughput would be beneficial.
Surely not as part of Olio. Dtrace can help a lot.
3) If someone wanted to create a model for Olio performance (I know this
defeats the purpose of creating a benchmark to measure things :-) how
would he model the interaction between the different tiers?
There have been suggestions in literature to apply queueing theory for
such purposes (especially for Rubis like benchmarks). So each Olio
operation requires a number of web server accesses (6 on average) to
generate the html content and db queries (10 on average) for the data
[/deploying web 2.0 applications on sun servers and the opensolaris™
operating system/]. If we had a way to monitor average delay for each
webserver/db request as well as the geocoder requests then a model is
possible, right?
Have you ever done something similar for debugging purposes?
We have looked into this a bit more. Since you mentioned OpenSolaris,
you can certainly use dtrace to find out such latencies.
It would also be nice if we can have this info per server, e.g. the
specific web-server is over-committed, or this db is under-committed
let's put some more requests there.
Yes, this is not automated. You need to know the probes for each layer.
There are certainly MySQL probes you can use (google for it).
Thanks,
-Akara