Hi,

I also have very surprise this trial patch fix (or maybe hide) the  
issue when fstat() is a unreliable function for checking shared memory.

Here is my understanding, very basic, when  
OSAF_CKPT_SHM_ALLOC_GUARANTEE = 0, shared memory is set size by  
ftruncate() function. This function do not acquire all required size  
but place a black plot on holding space, it will be changed runtime.  
So in high memory load, other process occupied black plot space then  
the expected address is not mapped to any real address area. Accessing  
to that cause seg fault occur. That's the reason the ideal come out.

I hope to receive more systematic solution than this implementation.

Thank you and best regards,
Hoang


Quoting Anders Widell <anders.wid...@ericsson.com>:

> I also have a hard time understanding how this code could work. One  
> possible solution that I can think of for this problem is to set up  
> a signal handler for SIGBUS, as I wrote in a review comment for  
> ticket [#1712].
>
> regards,
>
> Anders Widell
>
>
> On 11/28/2016 12:59 PM, ramesh betham wrote:
>> Hi,
>>
>> I am clueless how this patch is resolving this issue i.e., by checking
>> through fstat(_fd) and comparing the sizes etc.
>>
>> It suppose to check the available space of /dev/shm (tmpfs) just before
>> a write() operation like..,
>>
>>     // /* Check File System available space *///
>>     /char shm_dir[]="/dev/shm";//
>>     struct statvfs info;
>>     ////i//f (statvfs(shm_dir, &info) == 0)/
>>
>>           and get available space by calculating
>> "/info.f_bfree*info.f_bsize/" etc.
>>
>> Also make sure to do this check before every write in cpnd.
>>
>> If my interpretation is not correct, would like to understand more on
>> the context of the issue.
>>
>> Thanks and Regards,
>> Ramesh.
>>
>>
>> On 11/28/2016 11:17 AM, A V Mahesh wrote:
>>> Hi Hoang,
>>>
>>> Ok let us only focus on core dump ,  so solution should be,  adding
>>> section to checkpoint will still fail  and let us change LOG to
>>> possible error case of SHM deficiency and return error to CPA .
>>>
>>> Cpnd will NOT check the true size of shared memory for each write call
>>> before writing, this will degrade the performance of CPSV.
>>>
>>> Can you please re-send the patch V2 with fulfilling above , and change
>>> the  fix description as well.
>>>
>>> -AVM
>>>
>>> On 11/25/2016 4:21 PM, Anders Widell wrote:
>>>> Yes, the problem with setting OSAF_CKPT_SHM_ALLOC_GUARANTEE is that
>>>> the memory consumption will increase. Therefore it is not backwards
>>>> compatible and thus not possible to do as a bug-fix.
>>>>
>>>> regards,
>>>>
>>>> Anders Widell
>>>>
>>>>
>>>> On 11/25/2016 10:46 AM, Vo Minh Hoang wrote:
>>>>> Dear Mahesh,
>>>>>
>>>>> I think this problem is #1712, problem occur when
>>>>> OSAF_CKPT_SHM_ALLOC_GUARANTEE is not set.
>>>>>
>>>>> My thinking is that because we provide 2 mode (guarantee or not) so we
>>>>> should making sure no coredump happened.
>>>>> Btw, because this is out of my scope of decide, I would like to ask
>>>>> Anders
>>>>> Widell about it.
>>>>>
>>>>> -----Original Message-----
>>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>>>>> Sent: Friday, November 25, 2016 4:06 PM
>>>>> To: Hoang Vo <hoang.m...@dektech.com.au>; anders.wid...@ericsson.com
>>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>>> Subject: Re: [PATCH 1 of 1] cpnd: ensure shared memory size before
>>>>> writing
>>>>> [#2202]
>>>>>
>>>>> Hi Hoang,
>>>>>
>>>>> Is this issue coming with  enabling #1712  feature  ?
>>>>>
>>>>> Exporting  `OSAF_CKPT_SHM_ALLOC_GUARANTEE=1`  will  provide
>>>>> guaranteed CPSV
>>>>> service SHM issue and this is addressed broadly all SHM memory issue.
>>>>>
>>>>> So I don't think we required  API level  changes. ( may we need to
>>>>> document
>>>>> the usage of  Exporting  `OSAF_CKPT_SHM_ALLOC_GUARANTEE=1` )
>>>>>
>>>>> #1712  description
>>>>> ============================================================================
>>>>>
>>>>> ==
>>>>> summary:     leap: provide ensured disk space option for shm_open
>>>>> request [1712]
>>>>>
>>>>> description:
>>>>> leap: provide ensured disk space option for shm_open request [1712]
>>>>> Provided
>>>>> ensured disk space is allocated for NCS_OS_POSIX_SHM_REQ_OPEN request
>>>>> using
>>>>> posix_fallocate() so that application such as CPSV subsequent writes to
>>>>> bytes in the specified range are guaranteed not to fail because of
>>>>> lack of
>>>>> disk space.
>>>>>
>>>>> Updated the Opensaf services according to new options based on
>>>>> requirements.
>>>>>
>>>>> Cpsv service uses the ensured disk space option based on
>>>>> OSAF_CKPT_SHM_ALLOC_GUARANTEE environment variable ,so if user
>>>>> exports as
>>>>> OSAF_CKPT_SHM_ALLOC_GUARANTEE=1 (true) cpsv provided ensured disk space.
>>>>> ============================================================================
>>>>>
>>>>> ==
>>>>>
>>>>> -AVM
>>>>>
>>>>>
>>>>> On 11/23/2016 3:28 PM, Hoang Vo wrote:
>>>>>> osaf/libs/core/include/ncs_osprm.h      |   9 +++++++++
>>>>>>     osaf/libs/core/leap/os_defs.c           |  19 +++++++++++++++++--
>>>>>>     osaf/services/saf/cpsv/cpnd/cpnd_proc.c |  16 ++++++++++++++++
>>>>>>     3 files changed, 42 insertions(+), 2 deletions(-)
>>>>>>
>>>>>>
>>>>>> problem: when checkpoint service init without shared memory size
>>>>>> guaranteed works in high memory load, core dump occur while adding
>>>>>> section
>>>>> to checkpoint.
>>>>>> solution:  check the true size of shared memory before writing to it.
>>>>>>
>>>>>> diff --git a/osaf/libs/core/include/ncs_osprm.h
>>>>>> b/osaf/libs/core/include/ncs_osprm.h
>>>>>> --- a/osaf/libs/core/include/ncs_osprm.h
>>>>>> +++ b/osaf/libs/core/include/ncs_osprm.h
>>>>>> @@ -557,6 +557,7 @@ typedef enum {
>>>>>>       NCS_OS_POSIX_SHM_REQ_UNLINK,    /* unlink is shm_unlink */
>>>>>>       NCS_OS_POSIX_SHM_REQ_READ,
>>>>>>       NCS_OS_POSIX_SHM_REQ_WRITE,
>>>>>> +  NCS_OS_POSIX_SHM_REQ_STATS,
>>>>>>       NCS_OS_POSIX_SHM_REQ_MAX
>>>>>>     } NCS_OS_POSIX_SHM_REQ_TYPE;
>>>>>>     typedef struct ncs_os_posix_shm_req_open_info_tag { @@ -598,6
>>>>>> +599,13 @@ typedef struct ncs_os_posix_shm_req_writ
>>>>>>       uint64_t i_offset;
>>>>>>     } NCS_OS_POSIX_SHM_REQ_WRITE_INFO;
>>>>>>     +typedef struct ncs_os_posix_shm_req_stats_info_tag {
>>>>>> +  uint32_t i_hdl;
>>>>>> +  int32_t i_fd;
>>>>>> +  bool ensures_space;
>>>>>> +  void *o_addr;
>>>>>> +} NCS_OS_POSIX_SHM_REQ_STATS_INFO;
>>>>>> +
>>>>>>     typedef struct ncs_shm_req_info {
>>>>>>       NCS_OS_POSIX_SHM_REQ_TYPE type;
>>>>>>     @@ -607,6 +615,7 @@ typedef struct ncs_shm_req_info {
>>>>>>         NCS_OS_POSIX_SHM_REQ_UNLINK_INFO unlink;
>>>>>>         NCS_OS_POSIX_SHM_REQ_READ_INFO read;
>>>>>>         NCS_OS_POSIX_SHM_REQ_WRITE_INFO write;
>>>>>> +    NCS_OS_POSIX_SHM_REQ_STATS_INFO stats;
>>>>>>       } info;
>>>>>>        } NCS_OS_POSIX_SHM_REQ_INFO;
>>>>>> diff --git a/osaf/libs/core/leap/os_defs.c
>>>>>> b/osaf/libs/core/leap/os_defs.c
>>>>>> --- a/osaf/libs/core/leap/os_defs.c
>>>>>> +++ b/osaf/libs/core/leap/os_defs.c
>>>>>> @@ -799,9 +799,9 @@ uint32_t ncs_os_posix_shm(NCS_OS_POSIX_S
>>>>>>                     }
>>>>>>                 } else {
>>>>>>                     if (ftruncate(req->info.open.o_fd, (off_t)
>>>>> shm_size /* off_t == long */ ) < 0) {
>>>>>> -                    printf("ftruncate failed with errno
>>>>> value %d \n", errno);
>>>>>> +                    LOG_WA("ftruncate failed with errno
>>>>> value %d \n", errno);
>>>>>>                         return NCSCC_RC_FAILURE;
>>>>>> -                                }
>>>>>> +                }
>>>>>>                 }
>>>>>>                    uint32_t prot_flag =
>>>>> ncs_shm_prot_flags(req->info.open.i_flags);
>>>>>> @@ -859,6 +859,21 @@ uint32_t ncs_os_posix_shm(NCS_OS_POSIX_S
>>>>>>                    req->info.write.i_write_size);
>>>>>>             break;
>>>>>>     +    case NCS_OS_POSIX_SHM_REQ_STATS:
>>>>>> +        if (!req->info.stats.o_addr) {
>>>>>> +            printf("Output space is not defined\n");
>>>>>> +            return NCSCC_RC_FAILURE;
>>>>>> +        }
>>>>>> +
>>>>>> +        if (req->info.stats.ensures_space) {
>>>>>> +            return NCSCC_RC_SUCCESS;
>>>>>> +        } else {
>>>>>> +            if(fstat(req->info.stats.i_fd,
>>>>> req->info.stats.o_addr)) {
>>>>>> +                return NCSCC_RC_FAILURE;
>>>>>> +            }
>>>>>> +        }
>>>>>> +        break;
>>>>>> +
>>>>>>         default:
>>>>>>             printf("Option Not supported %d \n", req->type);
>>>>>>             return NCSCC_RC_FAILURE;
>>>>>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>>>>>> b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>>>>>> --- a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>>>>>> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>>>>>> @@ -1851,6 +1851,22 @@ uint32_t cpnd_sec_hdr_update(CPND_CKPT_S
>>>>>>         CPSV_SECT_HDR sec_hdr;
>>>>>>         uint32_t rc = NCSCC_RC_SUCCESS;
>>>>>>         NCS_OS_POSIX_SHM_REQ_INFO write_req;
>>>>>> +    struct stat shm_stat;
>>>>>> +    memset(&write_req, '\0', sizeof(write_req));
>>>>>> +    memset(&shm_stat, '\0', sizeof(shm_stat));
>>>>>> +    write_req.type = NCS_OS_POSIX_SHM_REQ_STATS;
>>>>>> +    write_req.info.stats.i_fd =
>>>>> cp_node->replica_info.open.info.open.o_fd;
>>>>>> +    write_req.info.stats.ensures_space = false;
>>>>>> +    write_req.info.stats.o_addr = &shm_stat;
>>>>>> +    shm_stat.st_size = sizeof(CPSV_CKPT_HDR) + (sec_info->lcl_sec_id +
>>>>> 1) * (sizeof(CPSV_SECT_HDR) + cp_node->create_attrib.maxSectionSize);
>>>>>> +    rc = ncs_os_posix_shm(&write_req);
>>>>>> +    if (rc != NCSCC_RC_SUCCESS) {
>>>>>> +        return rc;
>>>>>> +    }
>>>>>> +
>>>>>> +    if (shm_stat.st_size < sizeof(CPSV_CKPT_HDR) +
>>>>>> (sec_info->lcl_sec_id
>>>>> + 1) * (sizeof(CPSV_SECT_HDR) +
>>>>> cp_node->create_attrib.maxSectionSize)) {
>>>>>> +        return NCSCC_RC_OUT_OF_MEM;
>>>>>> +    }
>>>>>>         memset(&write_req, '\0', sizeof(write_req));
>>>>>>         memset(&sec_hdr, '\0', sizeof(CPSV_SECT_HDR));
>>>>>>         sec_hdr.lcl_sec_id = sec_info->lcl_sec_id;
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> Opensaf-devel mailing list
>>> Opensaf-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Opensaf-devel mailing list
>> Opensaf-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>>



------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to