[tickets] [opensaf:tickets] #2415 CKPT node director failed to execute ckpt create request
- **Milestone**: 5.2.0 --> next --- ** [tickets:#2415] CKPT node director failed to execute ckpt create request** **Status:** assigned **Milestone:** next **Created:** Fri Apr 07, 2017 01:30 AM UTC by David Byrne **Last Updated:** Fri Apr 07, 2017 03:54 AM UTC **Owner:** A V Mahesh (AVM) After the following two patches were removed, based on OpenSAF CS8701, CKPT node director failed to execute ckpt create request(Collocated Checkpoints, Asynchronous Update). -ph4_01_headless_escalation_for_osaftest.diff -mds_log_level.diff CPND_MAX_REPLICAS =1000 retention_time is set to 30s Test procedure 1. Send 34 ckpt request per second 34*30 = 1020 which is > CPND_MAX_REPLICAS Failed which is expected 2. Send 32 ckpt request per second 32*30 = 960 which is < CPND_MAX_REPLICAS It used to pass, but now failed since removing the above two patches. syslog: Apr 5 01:42:46 SC-2-1 osafckptnd[4958]: ncs_sel_obj_create: socketpair failed - Too many open files Apr 5 01:42:46 SC-2-1 osafckptnd[4958]: ER cpnd has exceeded the maximum number of allowed replicas (CPND_MAX_REPLICAS) Test debug info: Apr 5, 2017 1:46:08 AM INFO ANSWER type: report start-time: 1491349366.360 stop-time: 1491349567.269 total: send=6428 recv=6407 fail=6407 Change test procedure for investigation purpose 1. Start test from 32 ckpt/s 32*30 = 960 which is < CPND_MAX_REPLICAS Passed Apr 6, 2017 2:56:27 AM INFO ANSWER type: report start-time: 1491439975.068 stop-time: 1491440187.347 total: send=6792 send-failed=0 recv=6780 2. then test 34 ckpt/s Failed 3. Then test 33 ckpt/s Failed 4. Then back to 32 ckpt/s again Failed From this experiment, we can see that once exceed the CPND_MAX_REPLICAS, ckpt service can’t be recovered. Note: the problem only occurs for Collocated Checkpoints, Asynchronous Update. Run the same test for Non-Collocated Checkpoints, Synchronous Update, it is OK. Test Contact: Li Suo --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2415 CKPT node director failed to execute ckpt create request
- **status**: unassigned --> assigned - **assigned_to**: A V Mahesh (AVM) - **Comment**: Please provide following : - The bug is, for Collocated Checkpoints once exceed the CPND_MAX_REPLICAS limit , ckpt service can’t be recovered , and for Non-Collocated Checkpoints, Synchronous working fine , is that right ? - I hope you are using 5.2.RC1 code ? changeset: 8700:654ad1d8c491 tag: 5.2.RC1 summary: release: Update configure.ac for version 5.2.RC1 - Which two #ticket patches were removed and why ? --- ** [tickets:#2415] CKPT node director failed to execute ckpt create request** **Status:** assigned **Milestone:** 5.2.0 **Created:** Fri Apr 07, 2017 01:30 AM UTC by David Byrne **Last Updated:** Fri Apr 07, 2017 01:30 AM UTC **Owner:** A V Mahesh (AVM) After the following two patches were removed, based on OpenSAF CS8701, CKPT node director failed to execute ckpt create request(Collocated Checkpoints, Asynchronous Update). -ph4_01_headless_escalation_for_osaftest.diff -mds_log_level.diff CPND_MAX_REPLICAS =1000 retention_time is set to 30s Test procedure 1. Send 34 ckpt request per second 34*30 = 1020 which is > CPND_MAX_REPLICAS Failed which is expected 2. Send 32 ckpt request per second 32*30 = 960 which is < CPND_MAX_REPLICAS It used to pass, but now failed since removing the above two patches. syslog: Apr 5 01:42:46 SC-2-1 osafckptnd[4958]: ncs_sel_obj_create: socketpair failed - Too many open files Apr 5 01:42:46 SC-2-1 osafckptnd[4958]: ER cpnd has exceeded the maximum number of allowed replicas (CPND_MAX_REPLICAS) Test debug info: Apr 5, 2017 1:46:08 AM INFO ANSWER type: report start-time: 1491349366.360 stop-time: 1491349567.269 total: send=6428 recv=6407 fail=6407 Change test procedure for investigation purpose 1. Start test from 32 ckpt/s 32*30 = 960 which is < CPND_MAX_REPLICAS Passed Apr 6, 2017 2:56:27 AM INFO ANSWER type: report start-time: 1491439975.068 stop-time: 1491440187.347 total: send=6792 send-failed=0 recv=6780 2. then test 34 ckpt/s Failed 3. Then test 33 ckpt/s Failed 4. Then back to 32 ckpt/s again Failed From this experiment, we can see that once exceed the CPND_MAX_REPLICAS, ckpt service can’t be recovered. Note: the problem only occurs for Collocated Checkpoints, Asynchronous Update. Run the same test for Non-Collocated Checkpoints, Synchronous Update, it is OK. Test Contact: Li Suo --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2395 CKPT: Performance degradation ~100% (Time taken is almost double than previous)
On 4/6/2017 5:29 PM, Chani Srivastava wrote: > With this patch the performance figures shows great improvement then before > and the results are > >comparable to 5.1 results Thanks for the testing. This patch provides the option of rollback way to configure CKPT to get the old behavior as 5.1 so statistics will match 5.1 --- ** [tickets:#2395] CKPT: Performance degradation ~100% (Time taken is almost double than previous)** **Status:** review **Milestone:** 5.2.0 **Created:** Thu Mar 23, 2017 10:26 AM UTC by Chani Srivastava **Last Updated:** Thu Apr 06, 2017 11:26 AM UTC **Owner:** A V Mahesh (AVM) Environment details OS : Suse 11, 64bit Physical machine Changeset : 8634 ( 5.2.FC) Setup : 4 nodes There is considerable degradation in CKPT performance in 5.2 when compared to 5.1. The times are calculated just before api and after api for which time difference is calculated. -> For write operations, checkpoint write api is taking 2x the time taken in earlier release 5.1. Issue is observed in both synchronous and asynchronous mode. ( synchronous -- Checkpoint create flags used : SA_CKPT_WR_ALL_REPLICAS asynchronous -- Checkpoint create flag used : SA_CKPT_WR_ACTIVE_REPLICA | SA_CKPT_CHECKPOINT_COLLOCATED ) Both local and remote replica -> For section create operations in asynchronous mode for local replica, checkpoint section create api is taking more than 70% the earlier value in 5.1 -> For read operations in asynchronous mode for local replica, checkpoint read api is taking twice the time than in earlier value in 5.1 Please check the tickets pushed as part of 4.7 to 5.0, for which API performance got affected. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2415 CKPT node director failed to execute ckpt create request
--- ** [tickets:#2415] CKPT node director failed to execute ckpt create request** **Status:** unassigned **Milestone:** 5.2.0 **Created:** Fri Apr 07, 2017 01:30 AM UTC by David Byrne **Last Updated:** Fri Apr 07, 2017 01:30 AM UTC **Owner:** nobody After the following two patches were removed, based on OpenSAF CS8701, CKPT node director failed to execute ckpt create request(Collocated Checkpoints, Asynchronous Update). -ph4_01_headless_escalation_for_osaftest.diff -mds_log_level.diff CPND_MAX_REPLICAS =1000 retention_time is set to 30s Test procedure 1. Send 34 ckpt request per second 34*30 = 1020 which is > CPND_MAX_REPLICAS Failed which is expected 2. Send 32 ckpt request per second 32*30 = 960 which is < CPND_MAX_REPLICAS It used to pass, but now failed since removing the above two patches. syslog: Apr 5 01:42:46 SC-2-1 osafckptnd[4958]: ncs_sel_obj_create: socketpair failed - Too many open files Apr 5 01:42:46 SC-2-1 osafckptnd[4958]: ER cpnd has exceeded the maximum number of allowed replicas (CPND_MAX_REPLICAS) Test debug info: Apr 5, 2017 1:46:08 AM INFO ANSWER type: report start-time: 1491349366.360 stop-time: 1491349567.269 total: send=6428 recv=6407 fail=6407 Change test procedure for investigation purpose 1. Start test from 32 ckpt/s 32*30 = 960 which is < CPND_MAX_REPLICAS Passed Apr 6, 2017 2:56:27 AM INFO ANSWER type: report start-time: 1491439975.068 stop-time: 1491440187.347 total: send=6792 send-failed=0 recv=6780 2. then test 34 ckpt/s Failed 3. Then test 33 ckpt/s Failed 4. Then back to 32 ckpt/s again Failed From this experiment, we can see that once exceed the CPND_MAX_REPLICAS, ckpt service can’t be recovered. Note: the problem only occurs for Collocated Checkpoints, Asynchronous Update. Run the same test for Non-Collocated Checkpoints, Synchronous Update, it is OK. Test Contact: Li Suo --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2413 smf: coredump, suspend is issued at completed state
- **status**: accepted --> review - **Comment**: A suspend is actually not being issued here. The state machine code is implemented such that the suspend is only done in states Executing, Suspending, or RollingBack. After getting some more logs from Rafael, it is clear this is a race condition between an async failure in AMF and the campaign commit being executed. Here is what is happening: Campaign commit is performed. Before smfd clears the suMaintenanceCampaign attribute for the SU, a component in that SU fails. This sends an NTF event with the maintenance name. At the same time the poll routine in smfd processes the TERMINATE upgrade thread event. When it returns, the upgrade campaign thread has been deleted and m_running has been set to false. But, the NTF file descriptor has not been processed yet. Now, the poll routine processes the NTF event which tries to use the upgrade thread to deliver the asyncFailure event, which is gone. Hence the crash. The solution should be to always have "processEvt" last in the poll routine, so that if m_running is set to false, no other processing will be done, and the poll loop will finish. --- ** [tickets:#2413] smf: coredump, suspend is issued at completed state** **Status:** review **Milestone:** 5.2.0 **Created:** Wed Apr 05, 2017 12:39 PM UTC by Rafael **Last Updated:** Thu Apr 06, 2017 03:33 PM UTC **Owner:** Alex Jones **Attachments:** - [osafsmfd.9276.SC-2.core.txt](https://sourceforge.net/p/opensaf/tickets/2413/attachment/osafsmfd.9276.SC-2.core.txt) (15.4 kB; text/plain) ticket #2145 looks to be causing this issue. coredump printout is attached. Steps to reproduce: run a campaign and have AMF compenent fail at the campaign completed state. This triggers a event in SMF which tries to suspend a completed campaign. Function handleAmfObjectStateChangeNotification will try to call asyncFailure() which is the same as suspend() because the campaign is completed and commited this is not a valid transition. The campaign state instance is most likely deleted therefore we get a coredump. For reference refer to figures 5, 6, 7 in SMF AIS. Starting from section 5.1.3 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2413 smf: coredump, suspend is issued at completed state
- **status**: unassigned --> accepted - **assigned_to**: Alex Jones - **Comment**: I was able to reproduce the problem. It is a race condition between async failure in AMF on the upgraded SU and the commit being processed in smfd. --- ** [tickets:#2413] smf: coredump, suspend is issued at completed state** **Status:** accepted **Milestone:** 5.2.0 **Created:** Wed Apr 05, 2017 12:39 PM UTC by Rafael **Last Updated:** Thu Apr 06, 2017 11:58 AM UTC **Owner:** Alex Jones **Attachments:** - [osafsmfd.9276.SC-2.core.txt](https://sourceforge.net/p/opensaf/tickets/2413/attachment/osafsmfd.9276.SC-2.core.txt) (15.4 kB; text/plain) ticket #2145 looks to be causing this issue. coredump printout is attached. Steps to reproduce: run a campaign and have AMF compenent fail at the campaign completed state. This triggers a event in SMF which tries to suspend a completed campaign. Function handleAmfObjectStateChangeNotification will try to call asyncFailure() which is the same as suspend() because the campaign is completed and commited this is not a valid transition. The campaign state instance is most likely deleted therefore we get a coredump. For reference refer to figures 5, 6, 7 in SMF AIS. Starting from section 5.1.3 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2414 amf: Support NoRed model for OpenSAF directors
--- ** [tickets:#2414] amf: Support NoRed model for OpenSAF directors** **Status:** assigned **Milestone:** next **Created:** Thu Apr 06, 2017 01:32 PM UTC by Anders Widell **Last Updated:** Thu Apr 06, 2017 01:32 PM UTC **Owner:** Anders Widell **Attachments:** - [nored.diff.gz](https://sourceforge.net/p/opensaf/tickets/2414/attachment/nored.diff.gz) (2.0 kB; application/gzip) Currently, the OpenSAF directors can only be configured with the 2N redundancy models. The proposal is to also make it possible to configured them with the No-Redundancy model. The attached patch is a simple prototype that makes it possible to use the No-Redundancy model. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] Re: #2395 CKPT: Performance degradation ~100% (Time taken is almost double than previous)
With this patch the performance figures shows great improvement then before and the results are comparable to 5.1 results --- ** [tickets:#2395] CKPT: Performance degradation ~100% (Time taken is almost double than previous)** **Status:** review **Milestone:** 5.2.0 **Created:** Thu Mar 23, 2017 10:26 AM UTC by Chani Srivastava **Last Updated:** Thu Apr 06, 2017 11:26 AM UTC **Owner:** A V Mahesh (AVM) Environment details OS : Suse 11, 64bit Physical machine Changeset : 8634 ( 5.2.FC) Setup : 4 nodes There is considerable degradation in CKPT performance in 5.2 when compared to 5.1. The times are calculated just before api and after api for which time difference is calculated. -> For write operations, checkpoint write api is taking 2x the time taken in earlier release 5.1. Issue is observed in both synchronous and asynchronous mode. ( synchronous -- Checkpoint create flags used : SA_CKPT_WR_ALL_REPLICAS asynchronous -- Checkpoint create flag used : SA_CKPT_WR_ACTIVE_REPLICA | SA_CKPT_CHECKPOINT_COLLOCATED ) Both local and remote replica -> For section create operations in asynchronous mode for local replica, checkpoint section create api is taking more than 70% the earlier value in 5.1 -> For read operations in asynchronous mode for local replica, checkpoint read api is taking twice the time than in earlier value in 5.1 Please check the tickets pushed as part of 4.7 to 5.0, for which API performance got affected. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2413 smf: coredump, suspend is issued at completed state
- Description has changed: Diff: --- old +++ new @@ -1,4 +1,4 @@ -ticket [#2145] looks to be causing this issue. +ticket #2145 looks to be causing this issue. coredump printout is attached. - **Comment**: The following is the analysis: 1. From the FIGURE 7 from SMF AIS spec, async-failure is supported in the following campaign state: SA_SMF_CMPG_EXECUTING SA_SMF_CMPG_SUSPENDING_EXECUTION SA_SMF_CMPG_ROLLING_BACK >From the campaign perspective mark the campaign as >SA_SMF_CMPG_SUSPENDED_BY_ERROR_DETECTED only when the present campaign state is one of the above. This will avoid smfd segmentation fault. 2. But, the saAmfSUMaintenanceCampaign will be reset(cleared) at the time of committing the campaign, the same has been said in section 4.2.1.3 of SMF AIS. "When an upgrade campaign is committed, the Software Management Framework must reset all the maintenance status attributes that refer to the campaign being committed. Beyond this point, it cannot determine whether a failed entity was upgraded by the campaign or not." when the component is failed, in the states other than above states(like SA_SMF_CMPG_EXECUTION_COMPLETED) the amfnd will not restart, since saAmfSUMaintenanceCampaign is not yet reset. Ideally the failed component has to be reset because the campaign will not be moved to error state. --- ** [tickets:#2413] smf: coredump, suspend is issued at completed state** **Status:** unassigned **Milestone:** 5.2.0 **Created:** Wed Apr 05, 2017 12:39 PM UTC by Rafael **Last Updated:** Thu Apr 06, 2017 10:35 AM UTC **Owner:** nobody **Attachments:** - [osafsmfd.9276.SC-2.core.txt](https://sourceforge.net/p/opensaf/tickets/2413/attachment/osafsmfd.9276.SC-2.core.txt) (15.4 kB; text/plain) ticket #2145 looks to be causing this issue. coredump printout is attached. Steps to reproduce: run a campaign and have AMF compenent fail at the campaign completed state. This triggers a event in SMF which tries to suspend a completed campaign. Function handleAmfObjectStateChangeNotification will try to call asyncFailure() which is the same as suspend() because the campaign is completed and commited this is not a valid transition. The campaign state instance is most likely deleted therefore we get a coredump. For reference refer to figures 5, 6, 7 in SMF AIS. Starting from section 5.1.3 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2395 CKPT: Performance degradation ~100% (Time taken is almost double than previous)
- **status**: assigned --> review --- ** [tickets:#2395] CKPT: Performance degradation ~100% (Time taken is almost double than previous)** **Status:** review **Milestone:** 5.2.0 **Created:** Thu Mar 23, 2017 10:26 AM UTC by Chani Srivastava **Last Updated:** Thu Apr 06, 2017 10:33 AM UTC **Owner:** A V Mahesh (AVM) Environment details OS : Suse 11, 64bit Physical machine Changeset : 8634 ( 5.2.FC) Setup : 4 nodes There is considerable degradation in CKPT performance in 5.2 when compared to 5.1. The times are calculated just before api and after api for which time difference is calculated. -> For write operations, checkpoint write api is taking 2x the time taken in earlier release 5.1. Issue is observed in both synchronous and asynchronous mode. ( synchronous -- Checkpoint create flags used : SA_CKPT_WR_ALL_REPLICAS asynchronous -- Checkpoint create flag used : SA_CKPT_WR_ACTIVE_REPLICA | SA_CKPT_CHECKPOINT_COLLOCATED ) Both local and remote replica -> For section create operations in asynchronous mode for local replica, checkpoint section create api is taking more than 70% the earlier value in 5.1 -> For read operations in asynchronous mode for local replica, checkpoint read api is taking twice the time than in earlier value in 5.1 Please check the tickets pushed as part of 4.7 to 5.0, for which API performance got affected. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2413 smf: coredump, suspend is issued at completed state
- Description has changed: Diff: --- old +++ new @@ -1,4 +1,4 @@ -ticket #2145 looks to be causing this issue. +ticket [#2145] looks to be causing this issue. coredump printout is attached. --- ** [tickets:#2413] smf: coredump, suspend is issued at completed state** **Status:** unassigned **Milestone:** 5.2.0 **Created:** Wed Apr 05, 2017 12:39 PM UTC by Rafael **Last Updated:** Wed Apr 05, 2017 12:39 PM UTC **Owner:** nobody **Attachments:** - [osafsmfd.9276.SC-2.core.txt](https://sourceforge.net/p/opensaf/tickets/2413/attachment/osafsmfd.9276.SC-2.core.txt) (15.4 kB; text/plain) ticket [#2145] looks to be causing this issue. coredump printout is attached. Steps to reproduce: run a campaign and have AMF compenent fail at the campaign completed state. This triggers a event in SMF which tries to suspend a completed campaign. Function handleAmfObjectStateChangeNotification will try to call asyncFailure() which is the same as suspend() because the campaign is completed and commited this is not a valid transition. The campaign state instance is most likely deleted therefore we get a coredump. For reference refer to figures 5, 6, 7 in SMF AIS. Starting from section 5.1.3 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2395 CKPT: Performance degradation ~100% (Time taken is almost double than previous)
- **Part**: - --> doc --- ** [tickets:#2395] CKPT: Performance degradation ~100% (Time taken is almost double than previous)** **Status:** assigned **Milestone:** 5.2.0 **Created:** Thu Mar 23, 2017 10:26 AM UTC by Chani Srivastava **Last Updated:** Thu Apr 06, 2017 10:33 AM UTC **Owner:** A V Mahesh (AVM) Environment details OS : Suse 11, 64bit Physical machine Changeset : 8634 ( 5.2.FC) Setup : 4 nodes There is considerable degradation in CKPT performance in 5.2 when compared to 5.1. The times are calculated just before api and after api for which time difference is calculated. -> For write operations, checkpoint write api is taking 2x the time taken in earlier release 5.1. Issue is observed in both synchronous and asynchronous mode. ( synchronous -- Checkpoint create flags used : SA_CKPT_WR_ALL_REPLICAS asynchronous -- Checkpoint create flag used : SA_CKPT_WR_ACTIVE_REPLICA | SA_CKPT_CHECKPOINT_COLLOCATED ) Both local and remote replica -> For section create operations in asynchronous mode for local replica, checkpoint section create api is taking more than 70% the earlier value in 5.1 -> For read operations in asynchronous mode for local replica, checkpoint read api is taking twice the time than in earlier value in 5.1 Please check the tickets pushed as part of 4.7 to 5.0, for which API performance got affected. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2395 CKPT: Performance degradation ~100% (Time taken is almost double than previous)
The reported statistics was not appropriate for OSAF_CKPT_SHM_ALLOC_GUARANTEE=1, statistics was taken accidentally with enabled TRACES for IMM , we are seeing NORMAL performance with out any change to 5.2.RC2 code. If OSAF_CKPT_SHM_ALLOC_GUARANTEE is set to true (export OSAF_CKPT_SHM_ALLOC_GUARANTEE=1 & Pr-allocated ) we are seeing *NORMAL *performance with out any change to 5.2.RC2 code. If OSAF_CKPT_SHM_ALLOC_GUARANTEE is set to false (export OSAF_CKPT_SHM_ALLOC_GUARANTEE=0 & default ) we are seeing seeing ~70% performance degrade as expected with out any change to 5.2.RC2 code. Do we still need OSAF_CKPT_SHM_ALLOC_GUARANTEE=2 (Neither per-allocated nor check if memory is available) option as default , but I not for it whybecause osafckptnd core dump in high memory load reported in [#2202] . Any how creating/pushing README.SHM explaining the above configuration options for This #ticket --- ** [tickets:#2395] CKPT: Performance degradation ~100% (Time taken is almost double than previous)** **Status:** assigned **Milestone:** 5.2.0 **Created:** Thu Mar 23, 2017 10:26 AM UTC by Chani Srivastava **Last Updated:** Thu Apr 06, 2017 03:42 AM UTC **Owner:** A V Mahesh (AVM) Environment details OS : Suse 11, 64bit Physical machine Changeset : 8634 ( 5.2.FC) Setup : 4 nodes There is considerable degradation in CKPT performance in 5.2 when compared to 5.1. The times are calculated just before api and after api for which time difference is calculated. -> For write operations, checkpoint write api is taking 2x the time taken in earlier release 5.1. Issue is observed in both synchronous and asynchronous mode. ( synchronous -- Checkpoint create flags used : SA_CKPT_WR_ALL_REPLICAS asynchronous -- Checkpoint create flag used : SA_CKPT_WR_ACTIVE_REPLICA | SA_CKPT_CHECKPOINT_COLLOCATED ) Both local and remote replica -> For section create operations in asynchronous mode for local replica, checkpoint section create api is taking more than 70% the earlier value in 5.1 -> For read operations in asynchronous mode for local replica, checkpoint read api is taking twice the time than in earlier value in 5.1 Please check the tickets pushed as part of 4.7 to 5.0, for which API performance got affected. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets