I also have a hard time understanding how this code could work. One 
possible solution that I can think of for this problem is to set up a 
signal handler for SIGBUS, as I wrote in a review comment for ticket 
[#1712].

regards,

Anders Widell


On 11/28/2016 12:59 PM, ramesh betham wrote:
> Hi,
>
> I am clueless how this patch is resolving this issue i.e., by checking
> through fstat(_fd) and comparing the sizes etc.
>
> It suppose to check the available space of /dev/shm (tmpfs) just before
> a write() operation like..,
>
>      // /* Check File System available space *///
>      /char shm_dir[]="/dev/shm";//
>      struct statvfs info;
>      ////i//f (statvfs(shm_dir, &info) == 0)/
>
>            and get available space by calculating
> "/info.f_bfree*info.f_bsize/" etc.
>
> Also make sure to do this check before every write in cpnd.
>
> If my interpretation is not correct, would like to understand more on
> the context of the issue.
>
> Thanks and Regards,
> Ramesh.
>
>
> On 11/28/2016 11:17 AM, A V Mahesh wrote:
>> Hi Hoang,
>>
>> Ok let us only focus on core dump ,  so solution should be,  adding
>> section to checkpoint will still fail  and let us change LOG to
>> possible error case of SHM deficiency and return error to CPA .
>>
>> Cpnd will NOT check the true size of shared memory for each write call
>> before writing, this will degrade the performance of CPSV.
>>
>> Can you please re-send the patch V2 with fulfilling above , and change
>> the  fix description as well.
>>
>> -AVM
>>
>> On 11/25/2016 4:21 PM, Anders Widell wrote:
>>> Yes, the problem with setting OSAF_CKPT_SHM_ALLOC_GUARANTEE is that
>>> the memory consumption will increase. Therefore it is not backwards
>>> compatible and thus not possible to do as a bug-fix.
>>>
>>> regards,
>>>
>>> Anders Widell
>>>
>>>
>>> On 11/25/2016 10:46 AM, Vo Minh Hoang wrote:
>>>> Dear Mahesh,
>>>>
>>>> I think this problem is #1712, problem occur when
>>>> OSAF_CKPT_SHM_ALLOC_GUARANTEE is not set.
>>>>
>>>> My thinking is that because we provide 2 mode (guarantee or not) so we
>>>> should making sure no coredump happened.
>>>> Btw, because this is out of my scope of decide, I would like to ask
>>>> Anders
>>>> Widell about it.
>>>>
>>>> -----Original Message-----
>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>>>> Sent: Friday, November 25, 2016 4:06 PM
>>>> To: Hoang Vo <hoang.m...@dektech.com.au>; anders.wid...@ericsson.com
>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>> Subject: Re: [PATCH 1 of 1] cpnd: ensure shared memory size before
>>>> writing
>>>> [#2202]
>>>>
>>>> Hi Hoang,
>>>>
>>>> Is this issue coming with  enabling #1712  feature  ?
>>>>
>>>> Exporting  `OSAF_CKPT_SHM_ALLOC_GUARANTEE=1`  will  provide
>>>> guaranteed CPSV
>>>> service SHM issue and this is addressed broadly all SHM memory issue.
>>>>
>>>> So I don't think we required  API level  changes. ( may we need to
>>>> document
>>>> the usage of  Exporting  `OSAF_CKPT_SHM_ALLOC_GUARANTEE=1` )
>>>>
>>>> #1712  description
>>>> ============================================================================
>>>>
>>>> ==
>>>> summary:     leap: provide ensured disk space option for shm_open
>>>> request [1712]
>>>>
>>>> description:
>>>> leap: provide ensured disk space option for shm_open request [1712]
>>>> Provided
>>>> ensured disk space is allocated for NCS_OS_POSIX_SHM_REQ_OPEN request
>>>> using
>>>> posix_fallocate() so that application such as CPSV subsequent writes to
>>>> bytes in the specified range are guaranteed not to fail because of
>>>> lack of
>>>> disk space.
>>>>
>>>> Updated the Opensaf services according to new options based on
>>>> requirements.
>>>>
>>>> Cpsv service uses the ensured disk space option based on
>>>> OSAF_CKPT_SHM_ALLOC_GUARANTEE environment variable ,so if user
>>>> exports as
>>>> OSAF_CKPT_SHM_ALLOC_GUARANTEE=1 (true) cpsv provided ensured disk space.
>>>> ============================================================================
>>>>
>>>> ==
>>>>
>>>> -AVM
>>>>
>>>>
>>>> On 11/23/2016 3:28 PM, Hoang Vo wrote:
>>>>> osaf/libs/core/include/ncs_osprm.h      |   9 +++++++++
>>>>>      osaf/libs/core/leap/os_defs.c           |  19 +++++++++++++++++--
>>>>>      osaf/services/saf/cpsv/cpnd/cpnd_proc.c |  16 ++++++++++++++++
>>>>>      3 files changed, 42 insertions(+), 2 deletions(-)
>>>>>
>>>>>
>>>>> problem: when checkpoint service init without shared memory size
>>>>> guaranteed works in high memory load, core dump occur while adding
>>>>> section
>>>> to checkpoint.
>>>>> solution:  check the true size of shared memory before writing to it.
>>>>>
>>>>> diff --git a/osaf/libs/core/include/ncs_osprm.h
>>>>> b/osaf/libs/core/include/ncs_osprm.h
>>>>> --- a/osaf/libs/core/include/ncs_osprm.h
>>>>> +++ b/osaf/libs/core/include/ncs_osprm.h
>>>>> @@ -557,6 +557,7 @@ typedef enum {
>>>>>        NCS_OS_POSIX_SHM_REQ_UNLINK,    /* unlink is shm_unlink */
>>>>>        NCS_OS_POSIX_SHM_REQ_READ,
>>>>>        NCS_OS_POSIX_SHM_REQ_WRITE,
>>>>> +  NCS_OS_POSIX_SHM_REQ_STATS,
>>>>>        NCS_OS_POSIX_SHM_REQ_MAX
>>>>>      } NCS_OS_POSIX_SHM_REQ_TYPE;
>>>>>      typedef struct ncs_os_posix_shm_req_open_info_tag { @@ -598,6
>>>>> +599,13 @@ typedef struct ncs_os_posix_shm_req_writ
>>>>>        uint64_t i_offset;
>>>>>      } NCS_OS_POSIX_SHM_REQ_WRITE_INFO;
>>>>>      +typedef struct ncs_os_posix_shm_req_stats_info_tag {
>>>>> +  uint32_t i_hdl;
>>>>> +  int32_t i_fd;
>>>>> +  bool ensures_space;
>>>>> +  void *o_addr;
>>>>> +} NCS_OS_POSIX_SHM_REQ_STATS_INFO;
>>>>> +
>>>>>      typedef struct ncs_shm_req_info {
>>>>>        NCS_OS_POSIX_SHM_REQ_TYPE type;
>>>>>      @@ -607,6 +615,7 @@ typedef struct ncs_shm_req_info {
>>>>>          NCS_OS_POSIX_SHM_REQ_UNLINK_INFO unlink;
>>>>>          NCS_OS_POSIX_SHM_REQ_READ_INFO read;
>>>>>          NCS_OS_POSIX_SHM_REQ_WRITE_INFO write;
>>>>> +    NCS_OS_POSIX_SHM_REQ_STATS_INFO stats;
>>>>>        } info;
>>>>>         } NCS_OS_POSIX_SHM_REQ_INFO;
>>>>> diff --git a/osaf/libs/core/leap/os_defs.c
>>>>> b/osaf/libs/core/leap/os_defs.c
>>>>> --- a/osaf/libs/core/leap/os_defs.c
>>>>> +++ b/osaf/libs/core/leap/os_defs.c
>>>>> @@ -799,9 +799,9 @@ uint32_t ncs_os_posix_shm(NCS_OS_POSIX_S
>>>>>                      }
>>>>>                  } else {
>>>>>                      if (ftruncate(req->info.open.o_fd, (off_t)
>>>> shm_size /* off_t == long */ ) < 0) {
>>>>> -                    printf("ftruncate failed with errno
>>>> value %d \n", errno);
>>>>> +                    LOG_WA("ftruncate failed with errno
>>>> value %d \n", errno);
>>>>>                          return NCSCC_RC_FAILURE;
>>>>> -                                }
>>>>> +                }
>>>>>                  }
>>>>>                     uint32_t prot_flag =
>>>> ncs_shm_prot_flags(req->info.open.i_flags);
>>>>> @@ -859,6 +859,21 @@ uint32_t ncs_os_posix_shm(NCS_OS_POSIX_S
>>>>>                     req->info.write.i_write_size);
>>>>>              break;
>>>>>      +    case NCS_OS_POSIX_SHM_REQ_STATS:
>>>>> +        if (!req->info.stats.o_addr) {
>>>>> +            printf("Output space is not defined\n");
>>>>> +            return NCSCC_RC_FAILURE;
>>>>> +        }
>>>>> +
>>>>> +        if (req->info.stats.ensures_space) {
>>>>> +            return NCSCC_RC_SUCCESS;
>>>>> +        } else {
>>>>> +            if(fstat(req->info.stats.i_fd,
>>>> req->info.stats.o_addr)) {
>>>>> +                return NCSCC_RC_FAILURE;
>>>>> +            }
>>>>> +        }
>>>>> +        break;
>>>>> +
>>>>>          default:
>>>>>              printf("Option Not supported %d \n", req->type);
>>>>>              return NCSCC_RC_FAILURE;
>>>>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>>>>> b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>>>>> --- a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>>>>> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>>>>> @@ -1851,6 +1851,22 @@ uint32_t cpnd_sec_hdr_update(CPND_CKPT_S
>>>>>          CPSV_SECT_HDR sec_hdr;
>>>>>          uint32_t rc = NCSCC_RC_SUCCESS;
>>>>>          NCS_OS_POSIX_SHM_REQ_INFO write_req;
>>>>> +    struct stat shm_stat;
>>>>> +    memset(&write_req, '\0', sizeof(write_req));
>>>>> +    memset(&shm_stat, '\0', sizeof(shm_stat));
>>>>> +    write_req.type = NCS_OS_POSIX_SHM_REQ_STATS;
>>>>> +    write_req.info.stats.i_fd =
>>>> cp_node->replica_info.open.info.open.o_fd;
>>>>> +    write_req.info.stats.ensures_space = false;
>>>>> +    write_req.info.stats.o_addr = &shm_stat;
>>>>> +    shm_stat.st_size = sizeof(CPSV_CKPT_HDR) + (sec_info->lcl_sec_id +
>>>> 1) * (sizeof(CPSV_SECT_HDR) + cp_node->create_attrib.maxSectionSize);
>>>>> +    rc = ncs_os_posix_shm(&write_req);
>>>>> +    if (rc != NCSCC_RC_SUCCESS) {
>>>>> +        return rc;
>>>>> +    }
>>>>> +
>>>>> +    if (shm_stat.st_size < sizeof(CPSV_CKPT_HDR) +
>>>>> (sec_info->lcl_sec_id
>>>> + 1) * (sizeof(CPSV_SECT_HDR) +
>>>> cp_node->create_attrib.maxSectionSize)) {
>>>>> +        return NCSCC_RC_OUT_OF_MEM;
>>>>> +    }
>>>>>          memset(&write_req, '\0', sizeof(write_req));
>>>>>          memset(&sec_hdr, '\0', sizeof(CPSV_SECT_HDR));
>>>>>          sec_hdr.lcl_sec_id = sec_info->lcl_sec_id;
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Opensaf-devel mailing list
>> Opensaf-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> ------------------------------------------------------------------------------
> _______________________________________________
> Opensaf-devel mailing list
> Opensaf-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>


------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to