[devel] [PATCH 3/3] ntftest: Add new test cases of suite 41 for cold sync and checkpoint of reader APIs [#2757]
--- src/ntf/apitest/tet_coldsync.c | 690 - 1 file changed, 688 insertions(+), 2 deletions(-) diff --git a/src/ntf/apitest/tet_coldsync.c b/src/ntf/apitest/tet_coldsync.c index b2d4cab..949c4f6 100644 --- a/src/ntf/apitest/tet_coldsync.c +++ b/src/ntf/apitest/tet_coldsync.c @@ -204,12 +204,698 @@ void test_coldsync_saNtfNotificationReadNext_01(void) safassert(saNtfFinalize(ntfHandle), SA_AIS_OK); test_validate(errorCode, SA_AIS_ERR_NOT_EXIST); /* read all notifications!! */ } +/** + * This test function is to verify the reader Id to be + * cold sync to standby. + * Steps: + * - Send alarms + * - Call read initialize + * - Reboot the standby, the alarms should be cold sync + * - switch over to make the standby become active + * - Read alarms, reader is successful to read alarms from the new active + */ +void test_coldsync_saNtfNotificationReadInitialize_01(void) +{ + saNotificationAllocationParamsT myNotificationAllocationParams; + saNotificationFilterAllocationParamsT + myNotificationFilterAllocationParams; + saNotificationParamsT myNotificationParams; + + SaNtfSearchCriteriaT searchCriteria; + SaNtfAlarmNotificationFilterT myAlarmFilter; + SaNtfNotificationTypeFilterHandlesT myNotificationFilterHandles = { + 0, 0, 0, 0, 0}; + SaNtfReadHandleT readHandle; + SaNtfHandleT ntfHandle; + SaNtfNotificationsT returnedNotification; + SaNtfAlarmNotificationT myNotification; + searchCriteria.searchMode = SA_NTF_SEARCH_ONLY_FILTER; + SaAisErrorT errorCode; + SaUint32T readCounter = 0; + + fillInDefaultValues(&myNotificationAllocationParams, + &myNotificationFilterAllocationParams, + &myNotificationParams); + + safassert(ntftest_saNtfInitialize(&ntfHandle, &ntfCallbacks, &ntfVersion), + SA_AIS_OK); + + safassert(ntftest_saNtfAlarmNotificationFilterAllocate( + ntfHandle, /* handle to Notification Service instance */ + &myAlarmFilter, /* put filter here */ + /* number of event types */ + myNotificationFilterAllocationParams.numEventTypes, + /* number of notification objects */ + myNotificationFilterAllocationParams.numNotificationObjects, + /* number of notifying objects */ + myNotificationFilterAllocationParams.numNotifyingObjects, + /* number of notification class ids */ + myNotificationFilterAllocationParams.numNotificationClassIds, + /* number of probable causes */ + myNotificationFilterAllocationParams.numProbableCauses, + /* number of perceived severities */ + myNotificationFilterAllocationParams.numPerceivedSeverities, + /* number of trend indications */ + myNotificationFilterAllocationParams.numTrends), + SA_AIS_OK); + + myNotificationFilterHandles.alarmFilterHandle = + myAlarmFilter.notificationFilterHandle; + myAlarmFilter.perceivedSeverities[0] = SA_NTF_SEVERITY_WARNING; + myAlarmFilter.perceivedSeverities[1] = SA_NTF_SEVERITY_CLEARED; + + /* Send one alarm notification */ + safassert(ntftest_saNtfAlarmNotificationAllocate( + ntfHandle, /* handle to Notification Service instance */ + &myNotification, + /* number of correlated notifications */ + myNotificationAllocationParams.numCorrelatedNotifications, + /* length of additional text */ + myNotificationAllocationParams.lengthAdditionalText, + /* number of additional info items*/ + myNotificationAllocationParams.numAdditionalInfo, + /* number of specific problems */ + myNotificationAllocationParams.numSpecificProblems, + /* number of monitored attributes */ + myNotificationAllocationParams.numMonitoredAttributes, + /* number of proposed repair actions */ + myNotificationAllocationParams.numProposedRepairActions, + /* use default allocation size */ + myNotificationAllocationParams.variableDataSize), + SA_AIS_OK); + + myNotificationParams.eventType = myNotificationParams.alarmEventType; + + fill_header_part(&myNotification.notificationHeader, +(saNotificationParamsT *)&myNotificationParams, +myNotificationAllocationParams.lengthAdditionalText); + + /* determine perceived severity */ + *(myNotification.perceivedSeverity) = + myNotificationParams.perceivedSeverity; + + /* set probable cause*/ + *(myNotification.probableCause) = myNotificationParams.probableC
[devel] [PATCH 2/3] ntfd: Cold sync reader to the standby ntfd after rebooting the standby controller [#2757]
Assumpt that the reader information is updated to the standby ntfd via checkpoint upon reception of reader APIs requests. However, if the standby controller reboots and comes up, the standby ntfd still has none of readers information which is available at the active ntfd. Now if a switchover happens, the new active will not be able to process the reader APIs requests with existing reader handles. This patch adds reader information as part of cold sync --- src/ntf/common/ntfsv_enc_dec.c | 1 - src/ntf/ntfd/NtfAdmin.cc | 63 +- src/ntf/ntfd/NtfAdmin.h| 5 ++ src/ntf/ntfd/NtfClient.cc | 7 ++ src/ntf/ntfd/NtfReader.cc | 18 - src/ntf/ntfd/NtfReader.h | 1 + src/ntf/ntfd/ntfs_com.c| 42 +--- src/ntf/ntfd/ntfs_com.h| 8 +++ src/ntf/ntfd/ntfs_mbcsv.c | 147 - 9 files changed, 235 insertions(+), 57 deletions(-) diff --git a/src/ntf/common/ntfsv_enc_dec.c b/src/ntf/common/ntfsv_enc_dec.c index d38c401..5cecabf 100644 --- a/src/ntf/common/ntfsv_enc_dec.c +++ b/src/ntf/common/ntfsv_enc_dec.c @@ -2275,7 +2275,6 @@ uint32_t ntfsv_enc_read_finalize_msg(NCS_UBAID *uba, ntfsv_reader_finalize_req_t ncs_encode_32bit(&p8, param->client_id); ncs_encode_32bit(&p8, param->readerId); ncs_enc_claim_space(uba, 8); - TRACE_LEAVE(); return NCSCC_RC_SUCCESS; } diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc index 1242129..2cb9945 100644 --- a/src/ntf/ntfd/NtfAdmin.cc +++ b/src/ntf/ntfd/NtfAdmin.cc @@ -627,6 +627,12 @@ void NtfAdmin::syncRequest(NCS_UBAID *uba) { LOG_ER("sendNewClient: %u failed", client->getClientId()); } } + // Need to sync cached notification before syncing readers. + // In decoding side, all notifications will be restored before + // decoding readers, so that the internal cache list of every + // readers will be up-to-dated + logger.syncRequest(uba); + for (pos = clientMap.begin(); pos != clientMap.end(); pos++) { pos->second->syncRequest(uba); /* subscriptions are synched here */ } @@ -659,7 +665,6 @@ void NtfAdmin::syncRequest(NCS_UBAID *uba) { NtfSmartPtr notification = posNot->second; notification->syncRequest(uba); } - logger.syncRequest(uba); TRACE_LEAVE(); } @@ -770,6 +775,50 @@ NtfReader* NtfAdmin::createReaderWithoutFilter(ntfsv_reader_init_req_t rp, return newReader; } /** + * The method is called in cold sync to restore reader instance + * with filter at standby NTFD + * + * @param rp: the original reader initialize request version 2 + * @param readerId: the current reader Id of reader instance exists + * at active side + * @param fIter: current iteration to read the notification + * @param firstRead: flag is used in NtfReader::next + * @return none + */ +void NtfAdmin::restoreReaderWithFilter(ntfsv_reader_init_req_2_t rp, +uint32_t readerId, uint32_t fIter, bool firstRead) { + TRACE_ENTER(); + NtfReader *reader = createReaderWithFilter(rp, NULL); + if (reader != nullptr) { +reader->setReaderId(readerId); +reader->setReaderIteration(fIter); +reader->setFirstRead(firstRead); + } + TRACE_LEAVE(); +} +/** + * The method is called in cold sync to restore reader instance + * without at standby NTFD + * + * @param rp: the original reader initialize request version 2 + * @param readerId: the current reader Id of reader instance exists + * at active side + * @param fIter: current iteration to read the notification + * @param firstRead: flag is used in NtfReader::next + * @return none + */ +void NtfAdmin::restoreReaderWithoutFilter(ntfsv_reader_init_req_t rp, +uint32_t readerId, uint32_t fIter, bool firstRead) { + TRACE_ENTER(); + NtfReader *reader = createReaderWithoutFilter(rp, NULL); + if (reader != nullptr) { +reader->setReaderId(readerId); +reader->setReaderIteration(fIter); +reader->setFirstRead(firstRead); + } + TRACE_LEAVE(); +} +/** * The method create a new instance of NtfReader that * has filter * @@ -1144,6 +1193,18 @@ void createReaderWithFilter(ntfsv_reader_init_req_2_t rp, MDS_SYNC_SND_CTXT *mds NtfAdmin::theNtfAdmin->createReaderWithFilter(rp, mdsCtxt); } +void restoreReaderWithFilter(ntfsv_reader_init_req_2_t rp, uint32_t readerId, +uint32_t fIter, bool firstRead) { + osafassert(NtfAdmin::theNtfAdmin != NULL); + NtfAdmin::theNtfAdmin->restoreReaderWithFilter(rp, readerId, fIter, firstRead); +} + +void restoreReaderWithoutFilter(ntfsv_reader_init_req_t rp, uint32_t readerId, +uint32_t fIter, bool firstRead) { + osafassert(NtfAdmin::theNtfAdmin != NULL); + NtfAdmin::theNtfAdmin->restoreReaderWithoutFilter(rp, readerId, fIter, firstRead); +} + void readNext(ntfsv_read_next_req_t reqNextReq, MDS_SYNC_SND_CTXT *mdsCtxt) { osafassert(NtfAdmin::theNtfAdmin != NULL); diff --git a/src/ntf/ntfd/NtfAdmin.h b/src/ntf/ntfd/NtfAdmin
[devel] [PATCH 0/3] Review Request for ntf: Checkpoint and cold sync reader information [#2757]
Summary: ntfd: Checkpoint reader to the standby when processes reader API requests [#2757] Review request for Ticket(s): 2757 Peer Reviewer(s): Lennart, Srinivas, Canh Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2757 Base revision: ee105cb3bf44eee4e8785e3de7d24f907641e2ab Personal repository: git://git.code.sf.net/u/minh-chau/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision 74da3370accfa44a34a7abf9830ceaeae3ab5d4f Author: Minh Chau Date: Mon, 22 Jan 2018 15:08:59 +1100 ntftest: Add new test cases of suite 41 for cold sync and checkpoint of reader APIs [#2757] revision ad38745b1c411bc52905725281c84c69e4147fef Author: Minh Chau Date: Mon, 22 Jan 2018 15:03:42 +1100 ntfd: Cold sync reader to the standby ntfd after rebooting the standby controller [#2757] Assumpt that the reader information is updated to the standby ntfd via checkpoint upon reception of reader APIs requests. However, if the standby controller reboots and comes up, the standby ntfd still has none of readers information which is available at the active ntfd. Now if a switchover happens, the new active will not be able to process the reader APIs requests with existing reader handles. This patch adds reader information as part of cold sync revision 47cf18850e6819c2db4642eb1e639aff5f0d8282 Author: Minh Chau Date: Mon, 22 Jan 2018 14:12:00 +1100 ntfd: Checkpoint reader to the standby when processes reader API requests [#2757] When active ntfd receives reader API requests: ReaderIntialize, ReadNext, ReadFinalize, active ntfd does not update the readers information to the standby. Thus, either switchover or failover happens, the client can not continue to use the reader APIs, because there is no such reader information still available in the new active after switchover. The patch does checkpoint reader information to the standby when completes processing reader APIs request. Complete diffstat: -- src/ntf/agent/ntfa_mds.c | 51 +-- src/ntf/apitest/tet_coldsync.c | 690 - src/ntf/common/ntfsv_enc_dec.c | 88 +- src/ntf/common/ntfsv_enc_dec.h | 12 +- src/ntf/ntfd/NtfAdmin.cc | 145 +++-- src/ntf/ntfd/NtfAdmin.h| 17 +- src/ntf/ntfd/NtfClient.cc | 68 +++- src/ntf/ntfd/NtfClient.h | 11 +- src/ntf/ntfd/NtfLogger.cc | 2 +- src/ntf/ntfd/NtfReader.cc | 84 +++-- src/ntf/ntfd/NtfReader.h | 13 +- src/ntf/ntfd/ntfs_com.c| 105 +++ src/ntf/ntfd/ntfs_com.h| 25 +- src/ntf/ntfd/ntfs_evt.c| 14 +- src/ntf/ntfd/ntfs_mbcsv.c | 287 ++--- src/ntf/ntfd/ntfs_mbcsv.h | 16 + src/ntf/ntfd/ntfs_mds.c| 42 +-- 17 files changed, 1430 insertions(+), 240 deletions(-) Testing Commands: - Run all test cases of suite 41, and legacy suites Testing, Expected Results: -- All pass Conditions of Submission: - ack from reviewers Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technic
[devel] [PATCH 1/3] ntfd: Checkpoint reader to the standby when processes reader API requests [#2757]
When active ntfd receives reader API requests: ReaderIntialize, ReadNext, ReadFinalize, active ntfd does not update the readers information to the standby. Thus, either switchover or failover happens, the client can not continue to use the reader APIs, because there is no such reader information still available in the new active after switchover. The patch does checkpoint reader information to the standby when completes processing reader APIs request. --- src/ntf/agent/ntfa_mds.c | 51 +++ src/ntf/common/ntfsv_enc_dec.c | 89 ++ src/ntf/common/ntfsv_enc_dec.h | 12 ++-- src/ntf/ntfd/NtfAdmin.cc | 82 src/ntf/ntfd/NtfAdmin.h| 12 ++-- src/ntf/ntfd/NtfClient.cc | 61 +- src/ntf/ntfd/NtfClient.h | 11 ++-- src/ntf/ntfd/NtfLogger.cc | 2 +- src/ntf/ntfd/NtfReader.cc | 66 ++- src/ntf/ntfd/NtfReader.h | 12 +++- src/ntf/ntfd/ntfs_com.c| 79 +++ src/ntf/ntfd/ntfs_com.h| 17 +++-- src/ntf/ntfd/ntfs_evt.c| 14 ++--- src/ntf/ntfd/ntfs_mbcsv.c | 140 - src/ntf/ntfd/ntfs_mbcsv.h | 16 + src/ntf/ntfd/ntfs_mds.c| 42 +++-- 16 files changed, 516 insertions(+), 190 deletions(-) diff --git a/src/ntf/agent/ntfa_mds.c b/src/ntf/agent/ntfa_mds.c index 41d58e8..0b088ea 100644 --- a/src/ntf/agent/ntfa_mds.c +++ b/src/ntf/agent/ntfa_mds.c @@ -162,7 +162,8 @@ static uint32_t ntfa_enc_send_not_msg(NCS_UBAID *uba, ntfsv_msg_t *msg) **/ static uint32_t ntfa_enc_reader_initialize_msg(NCS_UBAID *uba, ntfsv_msg_t *msg) { - return ntfsv_enc_reader_initialize_msg(uba, msg); + return ntfsv_enc_reader_initialize_msg(uba, + &msg->info.api_info.param.reader_init); } / @@ -178,10 +179,11 @@ static uint32_t ntfa_enc_reader_initialize_msg(NCS_UBAID *uba, ntfsv_msg_t *msg) Notes : None. **/ -static uint32_t ntfa_enc_reader_initialize_msg_2(NCS_UBAID *uba, +static uint32_t ntfa_enc_reader_initialize_2_msg(NCS_UBAID *uba, ntfsv_msg_t *msg) { - return ntfsv_enc_reader_initialize_msg_2(uba, msg); + return ntfsv_enc_reader_initialize_2_msg(uba, + &msg->info.api_info.param.reader_init_2); } / @@ -198,25 +200,8 @@ static uint32_t ntfa_enc_reader_initialize_msg_2(NCS_UBAID *uba, **/ static uint32_t ntfa_enc_reader_finalize_msg(NCS_UBAID *uba, ntfsv_msg_t *msg) { - uint8_t *p8; - ntfsv_reader_finalize_req_t *param = - &msg->info.api_info.param.reader_finalize; - - TRACE_ENTER(); - osafassert(uba != NULL); - - /** encode the contents **/ - p8 = ncs_enc_reserve_space(uba, 8); - if (!p8) { - TRACE("NULL pointer"); - return NCSCC_RC_OUT_OF_MEM; - } - ncs_encode_32bit(&p8, param->client_id); - ncs_encode_32bit(&p8, param->readerId); - ncs_enc_claim_space(uba, 8); - - TRACE_LEAVE(); - return NCSCC_RC_SUCCESS; + return ntfsv_enc_read_finalize_msg(uba, + &msg->info.api_info.param.reader_finalize); } / @@ -233,25 +218,7 @@ static uint32_t ntfa_enc_reader_finalize_msg(NCS_UBAID *uba, ntfsv_msg_t *msg) **/ static uint32_t ntfa_enc_read_next_msg(NCS_UBAID *uba, ntfsv_msg_t *msg) { - uint8_t *p8; - ntfsv_read_next_req_t *param = &msg->info.api_info.param.read_next; - - TRACE_ENTER(); - osafassert(uba != NULL); - - /** encode the contents **/ - p8 = ncs_enc_reserve_space(uba, 10); - if (!p8) { - TRACE("NULL pointer"); - return NCSCC_RC_OUT_OF_MEM; - } - ncs_encode_32bit(&p8, param->client_id); - ncs_encode_8bit(&p8, param->searchDirection); - ncs_encode_32bit(&p8, param->readerId); - ncs_enc_claim_space(uba, 10); - - TRACE_LEAVE(); - return NCSCC_RC_SUCCESS; + return ntfsv_enc_read_next_msg(uba, &msg->info.api_info.param.read_next); } / @@ -576,7 +543,7 @@ static uint32_t ntfa_mds_enc(struct ncsmds_callback_info *info) break; case NTFSV_READER_INITIALIZE_REQ_2: - rc = ntfa_enc_reader_initialize_msg_2(uba, msg); + rc = ntfa_enc_reade
Re: [devel] [PATCH 0/5] Review Request for Add support for split brain prevention V2 [#64]
Hi Hans I will respond to your questions in another reply. On 19/01/18 23:47, Hans Nordebäck wrote: - In Consensus::FenceNode, UINT_MAX is used as argument to opensaf_reboot, do this work? [Gary] Yes, I called opensaf_reboot with UINT_MAX and it worked with stonith. Thanks Gary -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 0/5] Review Request for Add support for split brain prevention V2 [#64]
HI Anders/Hans On 20/01/18 00:56, Anders Widell wrote: Ack from me also, with comments: * I think my major comment is that I had originally envisioned that you would use the "etcdctl lock" command (in the V3 API) and that the active SC would hold the lock for as long as it is active. The lock would not be needed for reading. Your approach of only creating the lock when you wish to change active controller could be fine though. However, you shouldn't need the lock for reading - only when you wish to update the active controller. Regarding the Watch: I think you should have the watch on the lock instead of (or in addition to) the data you are protecting. At a fail-over, the old standby would acquire the lock and wait a while to give the old active enough time to detect that a fail-over is pending (it notices that the lock has been created). The old active would then be able to remove the lock and prevent the fail-over from happening. We can look into this in the next iteration (next release) and keep it as it is for now. [Gary] I will remove the opensaf_active_controller key and just have a key for the lock. The node is stored in the corresponding value. It's a lot simpler that way so I will do it for this release. The lock will not have a TTL and be persistent (until removed by another controller). * You ought to utilise the test-and-set functionality in the etcd v2 protocol, in the cases where you are changing the value of a key and know (think you know) the previous value. unlock is an example of this, fail-over probably also. We could add this later but I think you should at least extend the plugin API already now, so that it takes a "previous value" parameter where applicable. * You have a try-again loop when you acquire the lock, but if the maximum number of retries have been done then you continue as if the lock was acquired successfully. It doesn't seem to be correct? [Gary] Yes, will fix. * It is not obvious (to me) that no more Watch thread can be created simultaneously. Could you add a flag that keeps track of if there is an existing thread, and add assert statements checking that there is no existing thread when you call MonitorActive() to create a new one? [Gary] OK, will add a conditional statement. * As Hans points out below, it seems that it is also possible that the watch thread could disappear silently in some error case. [Gary] Will make it assert in that case. * As already pointed out by Hans, we should store our keys in some directory in the etcd database, so that the same database can be used for other purposes as well. I think the plugin (shell script) should add a directory prefix to the key. [Gary] Yes, good idea. The directory prefix will be handled by the plugin, in case the underlying key-value store doesn't handle directories etc. AndersW> Split-brain should not be possible, however the current algorithm will not guarantee that the active SC will be in the largest partition in case TIPC connectivity is broken (partitioned). So it could happen that a single isolated node (from TIPC point of view) is the active SC, even though a larger TIPC partition exists. I think this could be solved by writing the size of the cluster into the lock. An existing active SC shall reject a fail-over if it is being initiated from a node in a smaller partition. [Gary] Can we postpone this for the next release? Thanks Gary -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 0/5] Review Request for Add support for split brain prevention V2 [#64]
Hi Gary, yes you are correct, I missed that, ee_name is second argument. /Thanks Hans On 01/22/2018 07:48 AM, Gary Lee wrote: Hi Hans I will respond to your questions in another reply. On 19/01/18 23:47, Hans Nordebäck wrote: - In Consensus::FenceNode, UINT_MAX is used as argument to opensaf_reboot, do this work? [Gary] Yes, I called opensaf_reboot with UINT_MAX and it worked with stonith. Thanks Gary -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel