Creation of a handler is ongoing that will contain all IMM handling needed to 
make a midification of the IMM model. This includes:
* Create, Modify and Delete of objects
* An easy to use generic C++ API where no IMM APIs has to be handled
* Handling all needed IMM (C) APIs
* Handling all rules associated with usage of the IMM APIs
* Handling all possible recovery when IMM APIs returns something else than OK
* Etc...

Attached is a .h file with a proposed API for this handling


Attachments:

- 
[immccb.h](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/2179c610/dd3c/attachment/immccb.h)
 (14.1 kB; application/octet-stream)


---

** [tickets:#1398] smf: Add capability to redo CCBs that fail **

**Status:** accepted
**Milestone:** 5.18.01
**Created:** Wed Jul 01, 2015 02:07 PM UTC by Rafael Odzakow
**Last Updated:** Mon Nov 20, 2017 03:46 PM UTC
**Owner:** elunlen


CCBs may fail for a variety of resource related reasons. SMF campaigns can
be made more robust if they are capable of redoing/replaying a CCB that has 
been aborted. A CCB that is aborted due to validation error will not succeed
when replayed, but no damage will be done either. A CCB that is aborted due to
resource reasons may succeed when replayed, avoiding the abandonement of the
whole campaign.


During the final stages of an upgrade campaign PBE is enabled. PBE is not ready 
until it attaches, so CCB operations will get TRY_AGAIN in that window. Once the
PBE has attached the IMM is persistent-write-available and CCB operations are
allowed again.

Any CCB started and adding operations *before* the PBE was enabled by a CCB,
will be a doomed CCB. This since the CCBs generated operations before the PBE
was enabled and thus before the PBE was even starting and thus the PBE will be
unaware of these pre-PBE-enable operations. Such a CCB would fail on an op-count
check in the CCB commit processing of that CCB in the PBE. 

In 4.7-tentative an enhancement #1261 was implemented in the IMM service
to make this abort cleaner, i.e. to avoid the ugly op-count error in the PBE.
The PBE generates an admin-operation to abort *all* open CCBs (all CCBs that
are active but not critical), just before attaching. The problem was that the
first implementation of #1261 resulted in the PBE often attaching as OI *before*
the abort of non-critical CCBs had been processed. When the abort requested by 
the PBE was finally processed it aborted also "innocent" CCBs that had actually
started *after* the PBE was attached as PBE-OI.

The syndrome as such, i.e. attach of PBE causing the abort of a valid CCB,
could still happen on earlier releases but was quite rare. The syslog
would then show the op-count error reported by the PBE. 

A possible improvement in SMF is to read the runtime-attribute:

   opensafImmNostdFlags

in the OpenSAF IMM object opensafImm=opensafImm,safApp=safImmService

and check that it is not <Empty> which would mean that PBE is attached.
But it is not really clear why this is needed in 4.7-tentative when it was
not needed earlier. 

CCBs may actually get aborted due to resource error at any time and not only in
conjunction with PBE enable. A general increase of the robustness of SMF 
campaigns
could be achieved by adding logic for redoing CCBs that fail unexpectedly.
If such a CCB was valid, i.e. it was aborted due to resource error and not
validation error, then it has a high probability of succeeding when retried.


IMM ticked related to this: #1261


Jun 29 10:36:35 SC-2-2 osafimmpbed: IN Admop for aborting CCBs result: 1, immsv 
returned 1
Jun 29 10:36:35 SC-2-2 osafimmpbed: NO Update epoch 63 committing with 
ccbId:100000185/4294967685
Jun 29 10:36:36 SC-2-2 osafsmfd[4726]: NO CAMP: Start campaign complete actions 
(95)
Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Create of PERSISTENT runtime object 
'smfRollbackElement=CampComplete,safSmfCampaign=ERIC-CMWUpgrade,safApp=safSmfService'
 (safSmfCampaign).
Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Ccb 305 COMMITTED 
(immcfg_SC-2-1_14718)
Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Ccb 306 COMMITTED 
(immcfg_SC-2-1_14741)
Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Ccb 307 COMMITTED 
(immcfg_SC-2-1_14764)
Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Ccb 308 COMMITTED 
(immcfg_SC-2-1_14787)
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 309 COMMITTED 
(immcfg_SC-2-1_14810)
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 310 COMMITTED 
(immcfg_SC-2-1_14833)
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 311 COMMITTED 
(immcfg_SC-2-1_14856)
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 312 COMMITTED 
(immcfg_SC-2-1_14879)
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Create of PERSISTENT runtime object 
'smfRollbackElement=ccb_00000002,smfRollbackElement=CampComplete,safSmfCampaign=ERIC-CMWUpgrade,safApp=safSmfService'
 (safSmfCampaign).
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO CCB 313 aborted by: immadm -o 202 
safRdn=immManagement,safApp=safImmService
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: WA Timeout while waiting for 
implementer, aborting ccb:313
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 313 ABORTED (SMFSERVICE)
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: WA >>s_info->to_svc == 0<< reply 
context destroyed before this reply could be made
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: WA Failed to send response to 
agent/client over MDS
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb <313> not in correct state (12) 
for Apply ignoring request
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: WA Spurious and redundant ccb-apply 
request ignored ccbId:313
Jun 29 10:37:37 SC-2-2 osafsmfd[4726]: NO saImmOmCcbApply failed 
rc=SA_AIS_ERR_FAILED_OPERATION (21)
Jun 29 10:37:37 SC-2-2 osafimmnd[4476]: WA ERR_BAD_HANDLE: Handle use is 
blocked by pending reply on syncronous call
Jun 29 10:37:37 SC-2-2 osafimmnd[4476]: WA IMMND - Client Node Get Failed for 
cli_hdl 158634617209359
Jun 29 10:37:37 SC-2-2 osafsmfd[4726]: ER SmfCampaignWrapup campCompleteAction 
2 failed, rc=SA_AIS_ERR_FAILED_OPERATION (21)
Jun 29 10:37:37 SC-2-2 osafsmfd[4726]: ER Campaign wrapup executing campaign 
complete actions failed





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to