HI Anders Widell, On 11/28/2016 6:28 PM, Anders Widell wrote: > One possible solution that I can think of for this problem is to set > up a signal handler for SIGBUS
It`s boiling down to prevention of the SIGBUS versus handling after the SIGBUS occurring , I think it would be good if we can have a solution that prevents the problem . Let us explore other options , if we don't find any, we can go with Ramesh proposed solution in LEAP even though it reduces performance of CPSV. Any how we have `OSAF_CKPT_SHM_ALLOC_GUARANTEE = 1` which give natural performance. -AVM On 11/28/2016 11:27 PM, hoang.m...@dektech.com.au wrote: > Hi, > > I also have very surprise this trial patch fix (or maybe hide) the > issue when fstat() is a unreliable function for checking shared memory. > > Here is my understanding, very basic, when > OSAF_CKPT_SHM_ALLOC_GUARANTEE = 0, shared memory is set size by > ftruncate() function. This function do not acquire all required size > but place a black plot on holding space, it will be changed runtime. > So in high memory load, other process occupied black plot space then > the expected address is not mapped to any real address area. Accessing > to that cause seg fault occur. That's the reason the ideal come out. > > I hope to receive more systematic solution than this implementation. > > Thank you and best regards, > Hoang > > > Quoting Anders Widell <anders.wid...@ericsson.com>: > >> I also have a hard time understanding how this code could work. One >> possible solution that I can think of for this problem is to set up a >> signal handler for SIGBUS, as I wrote in a review comment for ticket >> [#1712]. >> >> regards, >> >> Anders Widell >> >> >> On 11/28/2016 12:59 PM, ramesh betham wrote: >>> Hi, >>> >>> I am clueless how this patch is resolving this issue i.e., by checking >>> through fstat(_fd) and comparing the sizes etc. >>> >>> It suppose to check the available space of /dev/shm (tmpfs) just before >>> a write() operation like.., >>> >>> // /* Check File System available space */// >>> /char shm_dir[]="/dev/shm";// >>> struct statvfs info; >>> ////i//f (statvfs(shm_dir, &info) == 0)/ >>> >>> and get available space by calculating >>> "/info.f_bfree*info.f_bsize/" etc. >>> >>> Also make sure to do this check before every write in cpnd. >>> >>> If my interpretation is not correct, would like to understand more on >>> the context of the issue. >>> >>> Thanks and Regards, >>> Ramesh. >>> >>> >>> On 11/28/2016 11:17 AM, A V Mahesh wrote: >>>> Hi Hoang, >>>> >>>> Ok let us only focus on core dump , so solution should be, adding >>>> section to checkpoint will still fail and let us change LOG to >>>> possible error case of SHM deficiency and return error to CPA . >>>> >>>> Cpnd will NOT check the true size of shared memory for each write call >>>> before writing, this will degrade the performance of CPSV. >>>> >>>> Can you please re-send the patch V2 with fulfilling above , and change >>>> the fix description as well. >>>> >>>> -AVM >>>> >>>> On 11/25/2016 4:21 PM, Anders Widell wrote: >>>>> Yes, the problem with setting OSAF_CKPT_SHM_ALLOC_GUARANTEE is that >>>>> the memory consumption will increase. Therefore it is not backwards >>>>> compatible and thus not possible to do as a bug-fix. >>>>> >>>>> regards, >>>>> >>>>> Anders Widell >>>>> >>>>> >>>>> On 11/25/2016 10:46 AM, Vo Minh Hoang wrote: >>>>>> Dear Mahesh, >>>>>> >>>>>> I think this problem is #1712, problem occur when >>>>>> OSAF_CKPT_SHM_ALLOC_GUARANTEE is not set. >>>>>> >>>>>> My thinking is that because we provide 2 mode (guarantee or not) >>>>>> so we >>>>>> should making sure no coredump happened. >>>>>> Btw, because this is out of my scope of decide, I would like to ask >>>>>> Anders >>>>>> Widell about it. >>>>>> >>>>>> -----Original Message----- >>>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>>>>> Sent: Friday, November 25, 2016 4:06 PM >>>>>> To: Hoang Vo <hoang.m...@dektech.com.au>; anders.wid...@ericsson.com >>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>> Subject: Re: [PATCH 1 of 1] cpnd: ensure shared memory size before >>>>>> writing >>>>>> [#2202] >>>>>> >>>>>> Hi Hoang, >>>>>> >>>>>> Is this issue coming with enabling #1712 feature ? >>>>>> >>>>>> Exporting `OSAF_CKPT_SHM_ALLOC_GUARANTEE=1` will provide >>>>>> guaranteed CPSV >>>>>> service SHM issue and this is addressed broadly all SHM memory >>>>>> issue. >>>>>> >>>>>> So I don't think we required API level changes. ( may we need to >>>>>> document >>>>>> the usage of Exporting `OSAF_CKPT_SHM_ALLOC_GUARANTEE=1` ) >>>>>> >>>>>> #1712 description >>>>>> ============================================================================ >>>>>> >>>>>> >>>>>> >>>>>> == >>>>>> summary: leap: provide ensured disk space option for shm_open >>>>>> request [1712] >>>>>> >>>>>> description: >>>>>> leap: provide ensured disk space option for shm_open request [1712] >>>>>> Provided >>>>>> ensured disk space is allocated for NCS_OS_POSIX_SHM_REQ_OPEN >>>>>> request >>>>>> using >>>>>> posix_fallocate() so that application such as CPSV subsequent >>>>>> writes to >>>>>> bytes in the specified range are guaranteed not to fail because of >>>>>> lack of >>>>>> disk space. >>>>>> >>>>>> Updated the Opensaf services according to new options based on >>>>>> requirements. >>>>>> >>>>>> Cpsv service uses the ensured disk space option based on >>>>>> OSAF_CKPT_SHM_ALLOC_GUARANTEE environment variable ,so if user >>>>>> exports as >>>>>> OSAF_CKPT_SHM_ALLOC_GUARANTEE=1 (true) cpsv provided ensured disk >>>>>> space. >>>>>> ============================================================================ >>>>>> >>>>>> >>>>>> >>>>>> == >>>>>> >>>>>> -AVM >>>>>> >>>>>> >>>>>> On 11/23/2016 3:28 PM, Hoang Vo wrote: >>>>>>> osaf/libs/core/include/ncs_osprm.h | 9 +++++++++ >>>>>>> osaf/libs/core/leap/os_defs.c | 19 >>>>>>> +++++++++++++++++-- >>>>>>> osaf/services/saf/cpsv/cpnd/cpnd_proc.c | 16 ++++++++++++++++ >>>>>>> 3 files changed, 42 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> >>>>>>> problem: when checkpoint service init without shared memory size >>>>>>> guaranteed works in high memory load, core dump occur while adding >>>>>>> section >>>>>> to checkpoint. >>>>>>> solution: check the true size of shared memory before writing >>>>>>> to it. >>>>>>> >>>>>>> diff --git a/osaf/libs/core/include/ncs_osprm.h >>>>>>> b/osaf/libs/core/include/ncs_osprm.h >>>>>>> --- a/osaf/libs/core/include/ncs_osprm.h >>>>>>> +++ b/osaf/libs/core/include/ncs_osprm.h >>>>>>> @@ -557,6 +557,7 @@ typedef enum { >>>>>>> NCS_OS_POSIX_SHM_REQ_UNLINK, /* unlink is shm_unlink */ >>>>>>> NCS_OS_POSIX_SHM_REQ_READ, >>>>>>> NCS_OS_POSIX_SHM_REQ_WRITE, >>>>>>> + NCS_OS_POSIX_SHM_REQ_STATS, >>>>>>> NCS_OS_POSIX_SHM_REQ_MAX >>>>>>> } NCS_OS_POSIX_SHM_REQ_TYPE; >>>>>>> typedef struct ncs_os_posix_shm_req_open_info_tag { @@ -598,6 >>>>>>> +599,13 @@ typedef struct ncs_os_posix_shm_req_writ >>>>>>> uint64_t i_offset; >>>>>>> } NCS_OS_POSIX_SHM_REQ_WRITE_INFO; >>>>>>> +typedef struct ncs_os_posix_shm_req_stats_info_tag { >>>>>>> + uint32_t i_hdl; >>>>>>> + int32_t i_fd; >>>>>>> + bool ensures_space; >>>>>>> + void *o_addr; >>>>>>> +} NCS_OS_POSIX_SHM_REQ_STATS_INFO; >>>>>>> + >>>>>>> typedef struct ncs_shm_req_info { >>>>>>> NCS_OS_POSIX_SHM_REQ_TYPE type; >>>>>>> @@ -607,6 +615,7 @@ typedef struct ncs_shm_req_info { >>>>>>> NCS_OS_POSIX_SHM_REQ_UNLINK_INFO unlink; >>>>>>> NCS_OS_POSIX_SHM_REQ_READ_INFO read; >>>>>>> NCS_OS_POSIX_SHM_REQ_WRITE_INFO write; >>>>>>> + NCS_OS_POSIX_SHM_REQ_STATS_INFO stats; >>>>>>> } info; >>>>>>> } NCS_OS_POSIX_SHM_REQ_INFO; >>>>>>> diff --git a/osaf/libs/core/leap/os_defs.c >>>>>>> b/osaf/libs/core/leap/os_defs.c >>>>>>> --- a/osaf/libs/core/leap/os_defs.c >>>>>>> +++ b/osaf/libs/core/leap/os_defs.c >>>>>>> @@ -799,9 +799,9 @@ uint32_t ncs_os_posix_shm(NCS_OS_POSIX_S >>>>>>> } >>>>>>> } else { >>>>>>> if (ftruncate(req->info.open.o_fd, (off_t) >>>>>> shm_size /* off_t == long */ ) < 0) { >>>>>>> - printf("ftruncate failed with errno >>>>>> value %d \n", errno); >>>>>>> + LOG_WA("ftruncate failed with errno >>>>>> value %d \n", errno); >>>>>>> return NCSCC_RC_FAILURE; >>>>>>> - } >>>>>>> + } >>>>>>> } >>>>>>> uint32_t prot_flag = >>>>>> ncs_shm_prot_flags(req->info.open.i_flags); >>>>>>> @@ -859,6 +859,21 @@ uint32_t ncs_os_posix_shm(NCS_OS_POSIX_S >>>>>>> req->info.write.i_write_size); >>>>>>> break; >>>>>>> + case NCS_OS_POSIX_SHM_REQ_STATS: >>>>>>> + if (!req->info.stats.o_addr) { >>>>>>> + printf("Output space is not defined\n"); >>>>>>> + return NCSCC_RC_FAILURE; >>>>>>> + } >>>>>>> + >>>>>>> + if (req->info.stats.ensures_space) { >>>>>>> + return NCSCC_RC_SUCCESS; >>>>>>> + } else { >>>>>>> + if(fstat(req->info.stats.i_fd, >>>>>> req->info.stats.o_addr)) { >>>>>>> + return NCSCC_RC_FAILURE; >>>>>>> + } >>>>>>> + } >>>>>>> + break; >>>>>>> + >>>>>>> default: >>>>>>> printf("Option Not supported %d \n", req->type); >>>>>>> return NCSCC_RC_FAILURE; >>>>>>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c >>>>>>> b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c >>>>>>> --- a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c >>>>>>> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c >>>>>>> @@ -1851,6 +1851,22 @@ uint32_t cpnd_sec_hdr_update(CPND_CKPT_S >>>>>>> CPSV_SECT_HDR sec_hdr; >>>>>>> uint32_t rc = NCSCC_RC_SUCCESS; >>>>>>> NCS_OS_POSIX_SHM_REQ_INFO write_req; >>>>>>> + struct stat shm_stat; >>>>>>> + memset(&write_req, '\0', sizeof(write_req)); >>>>>>> + memset(&shm_stat, '\0', sizeof(shm_stat)); >>>>>>> + write_req.type = NCS_OS_POSIX_SHM_REQ_STATS; >>>>>>> + write_req.info.stats.i_fd = >>>>>> cp_node->replica_info.open.info.open.o_fd; >>>>>>> + write_req.info.stats.ensures_space = false; >>>>>>> + write_req.info.stats.o_addr = &shm_stat; >>>>>>> + shm_stat.st_size = sizeof(CPSV_CKPT_HDR) + >>>>>>> (sec_info->lcl_sec_id + >>>>>> 1) * (sizeof(CPSV_SECT_HDR) + >>>>>> cp_node->create_attrib.maxSectionSize); >>>>>>> + rc = ncs_os_posix_shm(&write_req); >>>>>>> + if (rc != NCSCC_RC_SUCCESS) { >>>>>>> + return rc; >>>>>>> + } >>>>>>> + >>>>>>> + if (shm_stat.st_size < sizeof(CPSV_CKPT_HDR) + >>>>>>> (sec_info->lcl_sec_id >>>>>> + 1) * (sizeof(CPSV_SECT_HDR) + >>>>>> cp_node->create_attrib.maxSectionSize)) { >>>>>>> + return NCSCC_RC_OUT_OF_MEM; >>>>>>> + } >>>>>>> memset(&write_req, '\0', sizeof(write_req)); >>>>>>> memset(&sec_hdr, '\0', sizeof(CPSV_SECT_HDR)); >>>>>>> sec_hdr.lcl_sec_id = sec_info->lcl_sec_id; >>>> ------------------------------------------------------------------------------ >>>> >>>> >>>> _______________________________________________ >>>> Opensaf-devel mailing list >>>> Opensaf-devel@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>> ------------------------------------------------------------------------------ >>> >>> >>> _______________________________________________ >>> Opensaf-devel mailing list >>> Opensaf-devel@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>> > > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel