- **status**: unassigned --> assigned
- **assigned_to**: Nagendra Kumar
- **Milestone**: future --> 4.4.FC



---

** [tickets:#426] IMM returns ERR_TRY_AGAIN for saImmOiRtObjectUpdate() in an 
IMM initiated callback**

**Status:** assigned
**Created:** Fri May 31, 2013 06:31 AM UTC by Nagendra Kumar
**Last Updated:** Fri May 31, 2013 06:31 AM UTC
**Owner:** Nagendra Kumar

Migrated from http://devel.opensaf.org/ticket/2821

ChangeSet?: 3730
Redundancy model: NWAY


Through SMF campaign performed lock, lock-in of SU and deleted its components 
and SupportedCSTypes of Class CompCSType.


Observed saImmOiRtObjectUpdate of below attributes is being performed by AMF 
which should not happen as these are not run-time modifiable
1. saAmfCompNumCurrActiveCSIs
2. saAmfSURestartCount 
3. saAmfSUNumCurrStandbySIs
4. saAmfSUNumCurrActiveSIs


/var/log/messages show :
==============================
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: WA saImmOiRtObjectUpdate of 
'safSupportedCsType=safVersion=4.0.0\,safCSType=NWAYCSBASETYPE_PI,safComp=COMP1SU5NWAYAPP,safSu=SU5,safSg=SGONE,safApp=NWAYAPP'
 saAmfCompNumCurrActiveCSIs failed with 6
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: WA saImmOiRtObjectUpdate of 
'safSupportedCsType=safVersion=4.0.0\,safCSType=NWAYCSBASETYPE_PI,safComp=COMP2SU5NWAYAPP,safSu=SU5,safSg=SGONE,safApp=NWAYAPP'
 saAmfCompNumCurrActiveCSIs failed with 6
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: WA saImmOiRtObjectUpdate of 
'safSupportedCsType=safVersion=4.0.0\,safCSType=NWAYCSBASETYPE_PI,safComp=COMP3SU5NWAYAPP,safSu=SU5,safSg=SGONE,safApp=NWAYAPP'
 saAmfCompNumCurrActiveCSIs failed with 6
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: WA saImmOiRtObjectUpdate of 
'safComp=COMP1SU5NWAYAPP,safSu=SU5,safSg=SGONE,safApp=NWAYAPP' 
saAmfCompRestartCount failed with 6
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: WA saImmOiRtObjectUpdate of 
'safComp=COMP1SU5NWAYAPP,safSu=SU5,safSg=SGONE,safApp=NWAYAPP' 
saAmfCompCurrProxyName failed with 6
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: WA saImmOiRtObjectUpdate of 
'safComp=COMP2SU5NWAYAPP,safSu=SU5,safSg=SGONE,safApp=NWAYAPP' 
saAmfCompRestartCount failed with 6
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: WA saImmOiRtObjectUpdate of 
'safComp=COMP2SU5NWAYAPP,safSu=SU5,safSg=SGONE,safApp=NWAYAPP' 
saAmfCompCurrProxyName failed with 6
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: WA saImmOiRtObjectUpdate of 
'safComp=COMP3SU5NWAYAPP,safSu=SU5,safSg=SGONE,safApp=NWAYAPP' 
saAmfCompRestartCount failed with 6
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: WA saImmOiRtObjectUpdate of 
'safComp=COMP3SU5NWAYAPP,safSu=SU5,safSg=SGONE,safApp=NWAYAPP' 
saAmfCompCurrProxyName failed with 6
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: WA saImmOiRtObjectUpdate of 
'safSu=SU5,safSg=SGONE,safApp=NWAYAPP' saAmfSURestartCount failed with 6
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: WA saImmOiRtObjectUpdate of 
'safSu=SU5,safSg=SGONE,safApp=NWAYAPP' saAmfSUNumCurrStandbySIs failed with 6
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: WA saImmOiRtObjectUpdate of 
'safSu=SU5,safSg=SGONE,safApp=NWAYAPP' saAmfSUNumCurrActiveSIs failed with 6
Sep 24 17:59:30 SLES11-SLOT-1 osafimmnd[22297]: NO Ccb 140 COMMITTED 
(SMFSERVICE)
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: ER job_exec_imm_objupdate: 
update FAILED 12
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: ER job_exec_imm_objupdate: 
update FAILED 12
Sep 24 17:59:30 SLES11-SLOT-1 osafimmnd[22297]: NO Create of PERSISTENT runtime 
object 
'smfRollbackData=00000001,smfRollbackElement=ccb_00000009,smfRollbackElement=ProcWrapup?,safSmfProc=amfClusterProc-1,safSmfCampaign=Campaign,safApp=safSmfService'
 by Impl safSmfCampaign=Campaign,safApp=safSmfService
Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: ER job_exec_imm_objupdate: 
update FAILED 12



Changed 8 months ago by hafe ¶
  ■version changed from 4.2.2 to 4.2.1 
Kind of a duplicate of http://devel.opensaf.org/ticket/2227


  Changed 8 months ago by hafe ¶
  Could you please rerun the test case with amfd trace on?


Changed 8 months ago by hrishikesh.chenna 
■attachment osafamfd.tgz  added 
Attached amfd trace file for the same time stamp given in ticket log snippet.


  Changed 8 months ago by hafe ¶
  ■summary changed from Non modifiable runtime attributes are being modified by 
AMF and returns ERR_TRY_AGAIN. to IMM returns ERR_TRY_AGAIN for 
saImmOiRtObjectUpdate() in an IMM initiated callback 
The AMF model contains classes with pure runtime objects. What is shown in the 
trace is just a normal read of such an object which results in a callback to 
avd to return values to IMM. Thus the ticket summary is wrong and now changed.


The problem here is that IMM responds with TRYAGAIN in context of an IMM 
callback! The complete syslog and immnd trace is not there so there is not much 
more to be done. If they were provided it would be nice.


However this log:


Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: ER job_exec_imm_objupdate: 
update FAILED 12


Could be avoided since it means that a deferred object update (due to an 
earlier TRYAGAIN) fails since the object does not exist in IMM. At object 
deletion avd could walk the deferred IMM job list and prune those jobs that are 
affected.


  Changed 8 months ago by anders ¶
  Severity of:


Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: ER job_exec_imm_objupdate: 
update FAILED 12


should not be ER unless AMFD is terminating due to this, which would seem 
excessive.


Severity of this: 


Sep 24 17:59:30 SLES11-SLOT-1 osafamfd[22351]: WA saImmOiRtObjectUpdate of


'safComp=COMP3SU5NWAYAPP,safSu=SU5,safSg=SGONE,safApp=NWAYAPP' 
saAmfCompCurrProxyName


failed with 6


should not be WA and not even be logged since the error is simply TRY_AGAIN.


When/if the service/OI has to give up its retry attempts due to its
realtime behavior requirements and duties towards other tasks, then
it should simply geive up and return ERR_NO_RESOURCES on the callback.
(Ideal would be to return ERR_TRY_AGAIN but that is not allowed according to 
hte spec).



  Changed 8 months ago by anders ¶
  ■milestone changed from future_releases to 4.2.3 
follow-up: ↓ 7   Changed 8 months ago by anders ¶
  The description for this ticket suggests that the local IMMND has crashed and 
restarted.
If that is the case, then that should of course be investigated.
If it is not a known and fixed problem, then a ticket should be written.


It could I suppose be an effect of a node being shut down, as part of the SMF 
campaign.


That would be covered by http://devel.opensaf.org/ticket/2099


in reply to: ↑ 6   Changed 8 months ago by hafe ¶
  Replying to anders:


The description for this ticket suggests that the local IMMND has crashed and 
restarted.
If that is the case, then that should of course be investigated.
If it is not a known and fixed problem, then a ticket should be written.

It could I suppose be an effect of a node being shut down, as part of the SMF 
campaign.

That would be covered by http://devel.opensaf.org/ticket/2099



From the traces I can see that IMMND has not crashed. It is alive and executing 
a CCB where objects are deleted. What is strange is that an object delete is 
followed by read of that same object. The CCB is still not applied. The object 
contains pure runtime objects so AMFDs SaImmOiRtAttrUpdateCallbackT is invoked. 
The callback tries to return attribute values to IMMND with 
saImmOiRtObjectUpdate() but receives TRYAGAIN. This is logged at Warning level 
since it is perceived as strange! TRYAGAIN causes the object to be put into 
AMFD's job queue. Later when processed, the object no longer exist (for real) 
but a logging bug in job_exec_imm_objupdate() causes this to be logged at Error 
level.


So my conclusion:
1) AMFD's logging could be changed and corrected (but that would only hide this)


2) IMMND is returning TRYAGAIN at the same time calling an implementers 
SaImmOiRtAttrUpdateCallbackT, weird... Why bother calling the implementer if it 
is anyway not allowed to reply?


3) Most likely SMF is constantly reading deleted (CCB not applied) objects! Why 
is that? If really necessary it is probably only interested in config 
attributes anyway?


Would it be possible to get SMFD and IMMND traces also?


  Changed 5 months ago by anders ¶
  Went back to look at this unsolved ticket because it has similarities with 
#2922.
It could be (its likely) that the TRY_AGAIN from IMMND on the RtUpdate? is due 
to IMMD (active)
being down.


Unfortunately, the only info attached to this ticket is the osafamfd trace.
But it appears that the AMFD gets stuck in its progress of performing the
switchover, The key question is if the AMFD can get *blocked* on imm requests 
during
the switchover? My understanding was that the AMFD when getting try-again on 
such
a job would park the imm-rt-update in the AMFDs job queue. So it should not get 
stuck
and the switchover should get completed.
Indeed perhaps that is the case.
It is not totally clear what problem this ticket is reporting. 


The slogan complains that an imm downcall inside an imm callback gets TRY_AGAIN
from the imm. But that is the way it is. The cluster state may change in the 
time 
beween the immnd sending the callback request to the imma client and the imma 
client 
acting on that callback. This is a posibillity and not an error.
The implementer of the callback could return an error on the callback as a way
to get rid of that job.


  Changed 4 months ago by hafe ¶
  ■milestone changed from 4.2.3 to future_releases 
removing milestone, ticket is not accepted thus not scheduled for 4.2.3





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to