HI Anders Widell,

On 11/28/2016 6:28 PM, Anders Widell wrote:
> One possible solution that I can think of for this problem is to set 
> up a signal handler for SIGBUS

It`s boiling down to prevention of the SIGBUS versus handling after the  
SIGBUS occurring ,
I think it would be good if we can have a solution that prevents the 
problem .

Let us explore other options , if we don't find any, we can go with 
Ramesh proposed solution in LEAP
even though it reduces performance of CPSV.

Any how we have `OSAF_CKPT_SHM_ALLOC_GUARANTEE = 1`  which give natural 
performance.

-AVM

On 11/28/2016 11:27 PM, hoang.m...@dektech.com.au wrote:
> Hi,
>
> I also have very surprise this trial patch fix (or maybe hide) the 
> issue when fstat() is a unreliable function for checking shared memory.
>
> Here is my understanding, very basic, when 
> OSAF_CKPT_SHM_ALLOC_GUARANTEE = 0, shared memory is set size by 
> ftruncate() function. This function do not acquire all required size 
> but place a black plot on holding space, it will be changed runtime. 
> So in high memory load, other process occupied black plot space then 
> the expected address is not mapped to any real address area. Accessing 
> to that cause seg fault occur. That's the reason the ideal come out.
>
> I hope to receive more systematic solution than this implementation.
>
> Thank you and best regards,
> Hoang
>
>
> Quoting Anders Widell <anders.wid...@ericsson.com>:
>
>> I also have a hard time understanding how this code could work. One 
>> possible solution that I can think of for this problem is to set up a 
>> signal handler for SIGBUS, as I wrote in a review comment for ticket 
>> [#1712].
>>
>> regards,
>>
>> Anders Widell
>>
>>
>> On 11/28/2016 12:59 PM, ramesh betham wrote:
>>> Hi,
>>>
>>> I am clueless how this patch is resolving this issue i.e., by checking
>>> through fstat(_fd) and comparing the sizes etc.
>>>
>>> It suppose to check the available space of /dev/shm (tmpfs) just before
>>> a write() operation like..,
>>>
>>>     // /* Check File System available space *///
>>>     /char shm_dir[]="/dev/shm";//
>>>     struct statvfs info;
>>>     ////i//f (statvfs(shm_dir, &info) == 0)/
>>>
>>>           and get available space by calculating
>>> "/info.f_bfree*info.f_bsize/" etc.
>>>
>>> Also make sure to do this check before every write in cpnd.
>>>
>>> If my interpretation is not correct, would like to understand more on
>>> the context of the issue.
>>>
>>> Thanks and Regards,
>>> Ramesh.
>>>
>>>
>>> On 11/28/2016 11:17 AM, A V Mahesh wrote:
>>>> Hi Hoang,
>>>>
>>>> Ok let us only focus on core dump ,  so solution should be, adding
>>>> section to checkpoint will still fail  and let us change LOG to
>>>> possible error case of SHM deficiency and return error to CPA .
>>>>
>>>> Cpnd will NOT check the true size of shared memory for each write call
>>>> before writing, this will degrade the performance of CPSV.
>>>>
>>>> Can you please re-send the patch V2 with fulfilling above , and change
>>>> the  fix description as well.
>>>>
>>>> -AVM
>>>>
>>>> On 11/25/2016 4:21 PM, Anders Widell wrote:
>>>>> Yes, the problem with setting OSAF_CKPT_SHM_ALLOC_GUARANTEE is that
>>>>> the memory consumption will increase. Therefore it is not backwards
>>>>> compatible and thus not possible to do as a bug-fix.
>>>>>
>>>>> regards,
>>>>>
>>>>> Anders Widell
>>>>>
>>>>>
>>>>> On 11/25/2016 10:46 AM, Vo Minh Hoang wrote:
>>>>>> Dear Mahesh,
>>>>>>
>>>>>> I think this problem is #1712, problem occur when
>>>>>> OSAF_CKPT_SHM_ALLOC_GUARANTEE is not set.
>>>>>>
>>>>>> My thinking is that because we provide 2 mode (guarantee or not) 
>>>>>> so we
>>>>>> should making sure no coredump happened.
>>>>>> Btw, because this is out of my scope of decide, I would like to ask
>>>>>> Anders
>>>>>> Widell about it.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>>>>>> Sent: Friday, November 25, 2016 4:06 PM
>>>>>> To: Hoang Vo <hoang.m...@dektech.com.au>; anders.wid...@ericsson.com
>>>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>>>> Subject: Re: [PATCH 1 of 1] cpnd: ensure shared memory size before
>>>>>> writing
>>>>>> [#2202]
>>>>>>
>>>>>> Hi Hoang,
>>>>>>
>>>>>> Is this issue coming with  enabling #1712  feature  ?
>>>>>>
>>>>>> Exporting  `OSAF_CKPT_SHM_ALLOC_GUARANTEE=1`  will provide
>>>>>> guaranteed CPSV
>>>>>> service SHM issue and this is addressed broadly all SHM memory 
>>>>>> issue.
>>>>>>
>>>>>> So I don't think we required  API level  changes. ( may we need to
>>>>>> document
>>>>>> the usage of  Exporting `OSAF_CKPT_SHM_ALLOC_GUARANTEE=1` )
>>>>>>
>>>>>> #1712  description
>>>>>> ============================================================================
>>>>>>  
>>>>>>
>>>>>>
>>>>>> ==
>>>>>> summary:     leap: provide ensured disk space option for shm_open
>>>>>> request [1712]
>>>>>>
>>>>>> description:
>>>>>> leap: provide ensured disk space option for shm_open request [1712]
>>>>>> Provided
>>>>>> ensured disk space is allocated for NCS_OS_POSIX_SHM_REQ_OPEN 
>>>>>> request
>>>>>> using
>>>>>> posix_fallocate() so that application such as CPSV subsequent 
>>>>>> writes to
>>>>>> bytes in the specified range are guaranteed not to fail because of
>>>>>> lack of
>>>>>> disk space.
>>>>>>
>>>>>> Updated the Opensaf services according to new options based on
>>>>>> requirements.
>>>>>>
>>>>>> Cpsv service uses the ensured disk space option based on
>>>>>> OSAF_CKPT_SHM_ALLOC_GUARANTEE environment variable ,so if user
>>>>>> exports as
>>>>>> OSAF_CKPT_SHM_ALLOC_GUARANTEE=1 (true) cpsv provided ensured disk 
>>>>>> space.
>>>>>> ============================================================================
>>>>>>  
>>>>>>
>>>>>>
>>>>>> ==
>>>>>>
>>>>>> -AVM
>>>>>>
>>>>>>
>>>>>> On 11/23/2016 3:28 PM, Hoang Vo wrote:
>>>>>>> osaf/libs/core/include/ncs_osprm.h |   9 +++++++++
>>>>>>>     osaf/libs/core/leap/os_defs.c           |  19 
>>>>>>> +++++++++++++++++--
>>>>>>>     osaf/services/saf/cpsv/cpnd/cpnd_proc.c |  16 ++++++++++++++++
>>>>>>>     3 files changed, 42 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>>
>>>>>>> problem: when checkpoint service init without shared memory size
>>>>>>> guaranteed works in high memory load, core dump occur while adding
>>>>>>> section
>>>>>> to checkpoint.
>>>>>>> solution:  check the true size of shared memory before writing 
>>>>>>> to it.
>>>>>>>
>>>>>>> diff --git a/osaf/libs/core/include/ncs_osprm.h
>>>>>>> b/osaf/libs/core/include/ncs_osprm.h
>>>>>>> --- a/osaf/libs/core/include/ncs_osprm.h
>>>>>>> +++ b/osaf/libs/core/include/ncs_osprm.h
>>>>>>> @@ -557,6 +557,7 @@ typedef enum {
>>>>>>>       NCS_OS_POSIX_SHM_REQ_UNLINK,    /* unlink is shm_unlink */
>>>>>>>       NCS_OS_POSIX_SHM_REQ_READ,
>>>>>>>       NCS_OS_POSIX_SHM_REQ_WRITE,
>>>>>>> +  NCS_OS_POSIX_SHM_REQ_STATS,
>>>>>>>       NCS_OS_POSIX_SHM_REQ_MAX
>>>>>>>     } NCS_OS_POSIX_SHM_REQ_TYPE;
>>>>>>>     typedef struct ncs_os_posix_shm_req_open_info_tag { @@ -598,6
>>>>>>> +599,13 @@ typedef struct ncs_os_posix_shm_req_writ
>>>>>>>       uint64_t i_offset;
>>>>>>>     } NCS_OS_POSIX_SHM_REQ_WRITE_INFO;
>>>>>>>     +typedef struct ncs_os_posix_shm_req_stats_info_tag {
>>>>>>> +  uint32_t i_hdl;
>>>>>>> +  int32_t i_fd;
>>>>>>> +  bool ensures_space;
>>>>>>> +  void *o_addr;
>>>>>>> +} NCS_OS_POSIX_SHM_REQ_STATS_INFO;
>>>>>>> +
>>>>>>>     typedef struct ncs_shm_req_info {
>>>>>>>       NCS_OS_POSIX_SHM_REQ_TYPE type;
>>>>>>>     @@ -607,6 +615,7 @@ typedef struct ncs_shm_req_info {
>>>>>>>         NCS_OS_POSIX_SHM_REQ_UNLINK_INFO unlink;
>>>>>>>         NCS_OS_POSIX_SHM_REQ_READ_INFO read;
>>>>>>>         NCS_OS_POSIX_SHM_REQ_WRITE_INFO write;
>>>>>>> +    NCS_OS_POSIX_SHM_REQ_STATS_INFO stats;
>>>>>>>       } info;
>>>>>>>        } NCS_OS_POSIX_SHM_REQ_INFO;
>>>>>>> diff --git a/osaf/libs/core/leap/os_defs.c
>>>>>>> b/osaf/libs/core/leap/os_defs.c
>>>>>>> --- a/osaf/libs/core/leap/os_defs.c
>>>>>>> +++ b/osaf/libs/core/leap/os_defs.c
>>>>>>> @@ -799,9 +799,9 @@ uint32_t ncs_os_posix_shm(NCS_OS_POSIX_S
>>>>>>>                     }
>>>>>>>                 } else {
>>>>>>>                     if (ftruncate(req->info.open.o_fd, (off_t)
>>>>>> shm_size /* off_t == long */ ) < 0) {
>>>>>>> - printf("ftruncate failed with errno
>>>>>> value %d \n", errno);
>>>>>>> + LOG_WA("ftruncate failed with errno
>>>>>> value %d \n", errno);
>>>>>>>                         return NCSCC_RC_FAILURE;
>>>>>>> -                                }
>>>>>>> +                }
>>>>>>>                 }
>>>>>>>                    uint32_t prot_flag =
>>>>>> ncs_shm_prot_flags(req->info.open.i_flags);
>>>>>>> @@ -859,6 +859,21 @@ uint32_t ncs_os_posix_shm(NCS_OS_POSIX_S
>>>>>>>                    req->info.write.i_write_size);
>>>>>>>             break;
>>>>>>>     +    case NCS_OS_POSIX_SHM_REQ_STATS:
>>>>>>> +        if (!req->info.stats.o_addr) {
>>>>>>> +            printf("Output space is not defined\n");
>>>>>>> +            return NCSCC_RC_FAILURE;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        if (req->info.stats.ensures_space) {
>>>>>>> +            return NCSCC_RC_SUCCESS;
>>>>>>> +        } else {
>>>>>>> +            if(fstat(req->info.stats.i_fd,
>>>>>> req->info.stats.o_addr)) {
>>>>>>> +                return NCSCC_RC_FAILURE;
>>>>>>> +            }
>>>>>>> +        }
>>>>>>> +        break;
>>>>>>> +
>>>>>>>         default:
>>>>>>>             printf("Option Not supported %d \n", req->type);
>>>>>>>             return NCSCC_RC_FAILURE;
>>>>>>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>>>>>>> b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>>>>>>> --- a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>>>>>>> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>>>>>>> @@ -1851,6 +1851,22 @@ uint32_t cpnd_sec_hdr_update(CPND_CKPT_S
>>>>>>>         CPSV_SECT_HDR sec_hdr;
>>>>>>>         uint32_t rc = NCSCC_RC_SUCCESS;
>>>>>>>         NCS_OS_POSIX_SHM_REQ_INFO write_req;
>>>>>>> +    struct stat shm_stat;
>>>>>>> +    memset(&write_req, '\0', sizeof(write_req));
>>>>>>> +    memset(&shm_stat, '\0', sizeof(shm_stat));
>>>>>>> +    write_req.type = NCS_OS_POSIX_SHM_REQ_STATS;
>>>>>>> +    write_req.info.stats.i_fd =
>>>>>> cp_node->replica_info.open.info.open.o_fd;
>>>>>>> + write_req.info.stats.ensures_space = false;
>>>>>>> +    write_req.info.stats.o_addr = &shm_stat;
>>>>>>> +    shm_stat.st_size = sizeof(CPSV_CKPT_HDR) + 
>>>>>>> (sec_info->lcl_sec_id +
>>>>>> 1) * (sizeof(CPSV_SECT_HDR) + 
>>>>>> cp_node->create_attrib.maxSectionSize);
>>>>>>> +    rc = ncs_os_posix_shm(&write_req);
>>>>>>> +    if (rc != NCSCC_RC_SUCCESS) {
>>>>>>> +        return rc;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (shm_stat.st_size < sizeof(CPSV_CKPT_HDR) +
>>>>>>> (sec_info->lcl_sec_id
>>>>>> + 1) * (sizeof(CPSV_SECT_HDR) +
>>>>>> cp_node->create_attrib.maxSectionSize)) {
>>>>>>> +        return NCSCC_RC_OUT_OF_MEM;
>>>>>>> +    }
>>>>>>>         memset(&write_req, '\0', sizeof(write_req));
>>>>>>>         memset(&sec_hdr, '\0', sizeof(CPSV_SECT_HDR));
>>>>>>>         sec_hdr.lcl_sec_id = sec_info->lcl_sec_id;
>>>> ------------------------------------------------------------------------------
>>>>  
>>>>
>>>> _______________________________________________
>>>> Opensaf-devel mailing list
>>>> Opensaf-devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>>> ------------------------------------------------------------------------------
>>>  
>>>
>>> _______________________________________________
>>> Opensaf-devel mailing list
>>> Opensaf-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>>>
>
>


------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to