osaf/services/saf/immsv/immnd/ImmModel.cc |  7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)


The IMMND server maintains request continuations containing the destination
for replying on non CCB requests such as to update PRT data or class-create,
schema-change and class-delete.

Some of these requests can take more time to process than is normal for an
IMM request. A class-change can result in instance migration and there may
be many instances. While there should only be one or two sqlite transactions
for the entire procedure, the number of sqlite commands can be large.
This may take time to process (up to a minute or more has been observed) and
the result is then that the user gets an ERR_TIMEOUT on the request.
If the "user" is a shell script or program, then the ERR_TIMEOUT is typically
treated as a failure to perform the operation, resulting in a failure of
an upgrade campaign. ERR_TIMEOUT can actually not be interpreted as failure
to execute a request. The true meaning is that it is unknown if the request
has failed or succeeded. Typically it is still being processed when the client
gets timeout.

The client can increase the API level syncronous timeout by setting the
environment variable IMMA_SYNCR_TIMEOUT. But the problem here is that the
IMMND server has an internal timeout of when to garbage collect the
continuation records it keeps for pending replies. THe IMMA_SYNCR_TIMEOUT
does not control the server internal timeout.

This patch increases the hardwired server side timeout for cleanup of
PRT equests, from 6 seconds to 120 seconds.

diff --git a/osaf/services/saf/immsv/immnd/ImmModel.cc 
b/osaf/services/saf/immsv/immnd/ImmModel.cc
--- a/osaf/services/saf/immsv/immnd/ImmModel.cc
+++ b/osaf/services/saf/immsv/immnd/ImmModel.cc
@@ -11196,8 +11196,11 @@ ImmModel::cleanTheBasement(InvocVector& 
 
     ci2=sPbeRtReqContinuationMap.begin(); 
     while(ci2!=sPbeRtReqContinuationMap.end()) {
-        //TODO the timeout should not be hardwired, but for now it is.
-        if(now - ci2->second.mCreateTime >= DEFAULT_TIMEOUT_SEC) {
+        //Timeout on PRT request continuation is hardwired but long.
+        //It needs to be long to allow reply on larger batch jobs such as a
+        //schema/class change with instance migration and slow file system.
+        //It can not be infinite as that could cause a memory leak.
+           if(now - ci2->second.mCreateTime >= (DEFAULT_TIMEOUT_SEC * 20)) {
             TRACE_5("Timeout on PbeRtReqContinuation %llu", ci2->first);
             pbePrtoReqs.push_back(ci2->second.mConn);
             sPbeRtReqContinuationMap.erase(ci2);

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to