osaf/services/saf/immsv/immnd/ImmModel.cc | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-)
The IMMND server maintains request continuations containing the destination for replying on non CCB requests such as to update PRT data or class-create, schema-change and class-delete. Some of these requests can take more time to process than is normal for an IMM request. A class-change can result in instance migration and there may be many instances. While there should only be one or two sqlite transactions for the entire procedure, the number of sqlite commands can be large. This may take time to process (up to a minute or more has been observed) and the result is then that the user gets an ERR_TIMEOUT on the request. If the "user" is a shell script or program, then the ERR_TIMEOUT is typically treated as a failure to perform the operation, resulting in a failure of an upgrade campaign. ERR_TIMEOUT can actually not be interpreted as failure to execute a request. The true meaning is that it is unknown if the request has failed or succeeded. Typically it is still being processed when the client gets timeout. The client can increase the API level syncronous timeout by setting the environment variable IMMA_SYNCR_TIMEOUT. But the problem here is that the IMMND server has an internal timeout of when to garbage collect the continuation records it keeps for pending replies. THe IMMA_SYNCR_TIMEOUT does not control the server internal timeout. This patch increases the hardwired server side timeout for cleanup of PRT equests, from 6 seconds to 120 seconds. diff --git a/osaf/services/saf/immsv/immnd/ImmModel.cc b/osaf/services/saf/immsv/immnd/ImmModel.cc --- a/osaf/services/saf/immsv/immnd/ImmModel.cc +++ b/osaf/services/saf/immsv/immnd/ImmModel.cc @@ -11196,8 +11196,11 @@ ImmModel::cleanTheBasement(InvocVector& ci2=sPbeRtReqContinuationMap.begin(); while(ci2!=sPbeRtReqContinuationMap.end()) { - //TODO the timeout should not be hardwired, but for now it is. - if(now - ci2->second.mCreateTime >= DEFAULT_TIMEOUT_SEC) { + //Timeout on PRT request continuation is hardwired but long. + //It needs to be long to allow reply on larger batch jobs such as a + //schema/class change with instance migration and slow file system. + //It can not be infinite as that could cause a memory leak. + if(now - ci2->second.mCreateTime >= (DEFAULT_TIMEOUT_SEC * 20)) { TRACE_5("Timeout on PbeRtReqContinuation %llu", ci2->first); pbePrtoReqs.push_back(ci2->second.mConn); sPbeRtReqContinuationMap.erase(ci2); ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel