The default pvfs2 configuration files (and matching #define's in
pvfs2-server.h) list the following job timeout values:
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300
ClientJobFlowTimeoutSecs 300
The flow timeouts trigger when X seconds have passed without moving any
data, while the BMI timeouts trigger if the entire operation hasn't
completed in X seconds.
This means that on the server side, each individual Trove write
operation (broken into 256K chunks) has to complete within 30 seconds,
or else it will cause a write flow timeout to trigger and the flow will
be cancelled (to be restarted by the client).
We have recently found some test scenarios where 30 seconds isn't really
long enough. In particular, if you have the following combination:
- fast server with a lot of RAM
- relatively high latency storage (old SAN hardware)
- very heavy write workload
If the system isn't tuned any (more on that in a later email), then what
happens is the server cooks along accepting writes into its buffer cache
until the buffer cache is practically exhausted. At that point it then
tries to flush an enormous amount of data to the storage device, which
has to hop over hba, switch, controller, raid etc. to get that data out.
During this time newly posted writes will take a long time to complete.
The end result is that we can see even with standalone benchmarks that
occasionally writes will take as long as 50 seconds to finish, despite
the fact that they are only 256K in size.
Most likely all writes during this buffer flush time will take a while,
regardless of the API used. It is worth noting though, that the glibc
AOI implementation queues all I/O to a given file descriptor to be
serviced sequentially in a single thread dedicated to that fd. If you
have many clients writing to the same file, then you will probably end
up with N delayed writes rather than just one, and timeout/cancellation
scenarios with several clients.
I think we are going to run with the two ServerJob timeouts set to 300
seconds (as is already done for the client), but I just wanted to pass
along the information in case there is interest in changing the stock
default values.
-Phil
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers