Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Raghavendra Gowdappa
On Thu, Mar 21, 2019 at 4:16 PM Atin Mukherjee  wrote:

> All,
>
> In the last few releases of glusterfs, with stability as a primary theme
> of the releases, there has been lots of changes done on the code
> optimization with an expectation that such changes will have gluster to
> provide better performance. While many of these changes do help, but off
> late we have started seeing some diverse effects of them, one especially
> being the calloc to malloc conversions. While I do understand that malloc
> syscall will eliminate the extra memset bottleneck which calloc bears, but
> with recent kernels having in-built strong compiler optimizations I am not
> sure whether that makes any significant difference, but as I mentioned
> earlier certainly if this isn't done carefully it can potentially introduce
> lot of bugs and I'm writing this email to share one of such experiences.
>
> Sanju & I were having troubles for last two days to figure out why
> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
> Sanju's system but it had no problems running the same fix in my gluster
> containers. After spending a significant amount of time, what we now
> figured out is that a malloc call [1] (which was a calloc earlier) is the
> culprit here. As you all can see, in this function we allocate txn_id and
> copy the event->txn_id into it through gf_uuid_copy () . But when we were
> debugging this step wise through gdb, txn_id wasn't exactly copied with the
> exact event->txn_id and it had some junk values which made the
> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
> resulting the leaks to remain the same which was the original intention of
> the fix.
>
> This was quite painful to debug and we had to spend some time to figure
> this out. Considering we have converted many such calls in past, I'd urge
> that we review all such conversions and see if there're any side effects to
> it. Otherwise we might end up running into many potential memory related
> bugs later on. OTOH, going forward I'd request every patch
> owners/maintainers to pay some special attention to these conversions and
> see they are really beneficial and error free. IMO, general guideline
> should be - for bigger buffers, malloc would make better sense but has to
> be done carefully, for smaller size, we stick to calloc.
>
> What do others think about it?
>

I too am afraid of unknown effects of this change as much of the codebase
relies on the assumption of zero-initialized data structures. I vote for
reverting these patches unless it can be demonstrated that performance
benefits are indeed significant. Otherwise the trade off in stability is
not worth the cost.


>
> [1]
> https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-op-sm.c#L5681
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Nithya Balachandran
On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee  wrote:

> All,
>
> In the last few releases of glusterfs, with stability as a primary theme
> of the releases, there has been lots of changes done on the code
> optimization with an expectation that such changes will have gluster to
> provide better performance. While many of these changes do help, but off
> late we have started seeing some diverse effects of them, one especially
> being the calloc to malloc conversions. While I do understand that malloc
> syscall will eliminate the extra memset bottleneck which calloc bears, but
> with recent kernels having in-built strong compiler optimizations I am not
> sure whether that makes any significant difference, but as I mentioned
> earlier certainly if this isn't done carefully it can potentially introduce
> lot of bugs and I'm writing this email to share one of such experiences.
>
> Sanju & I were having troubles for last two days to figure out why
> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
> Sanju's system but it had no problems running the same fix in my gluster
> containers. After spending a significant amount of time, what we now
> figured out is that a malloc call [1] (which was a calloc earlier) is the
> culprit here. As you all can see, in this function we allocate txn_id and
> copy the event->txn_id into it through gf_uuid_copy () . But when we were
> debugging this step wise through gdb, txn_id wasn't exactly copied with the
> exact event->txn_id and it had some junk values which made the
> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
> resulting the leaks to remain the same which was the original intention of
> the fix.
>
> This was quite painful to debug and we had to spend some time to figure
> this out. Considering we have converted many such calls in past, I'd urge
> that we review all such conversions and see if there're any side effects to
> it. Otherwise we might end up running into many potential memory related
> bugs later on. OTOH, going forward I'd request every patch
> owners/maintainers to pay some special attention to these conversions and
> see they are really beneficial and error free. IMO, general guideline
> should be - for bigger buffers, malloc would make better sense but has to
> be done carefully, for smaller size, we stick to calloc.
>

> What do others think about it?
>

I believe that replacing calloc with malloc everywhere without adequate
testing and review is not safe and am against doing so for the following
reasons:

   1. Most of these patches have not been tested, especially the error
   paths.I have seen some that introduced issues in error scenarios with
   pointers being non-null.
   2. As Raghavendra said, the code assumes that certain elements will be
   initialized to null/zero and changing that can have consequences which are
   not immediately obvious. I think it might be worthwhile to go through the
   already merged calloc->malloc patches to check error paths and so on to see
   if they are safe.
   3. Making such changes to the libglusterfs code while we are currently
   working to stabilize the product is not a good idea. The patches take time
   to review and any errors introduced in the core pieces affect all processes
   and require significant effort to debug.

Yaniv, while the example you provided might make sense to change to malloc,
a lot of the other changes, in my opinion, do not for the effort required.
For performance testing, smallfile might be a useful tool to see if any of
the changes make a difference. That said, I am reluctant to take in patches
that change core code significantly without being tested or providing proof
of benefits.

We need better performance and scalability but that is going to need
changes in our algorithms and fop handling and that is what we need to be
working on. Such changes, when done right, will provide more benefits than
the micro optimizations. I think it unlikely the micro optimizations will
provide much benefit but am willing to be proven wrong if you have numbers
that show otherwise.

Regards,
Nithya


>
> [1]
> https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-op-sm.c#L5681
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Yaniv Kaul
On Thu, Mar 21, 2019 at 5:23 PM Nithya Balachandran 
wrote:

>
>
> On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee  wrote:
>
>> All,
>>
>> In the last few releases of glusterfs, with stability as a primary theme
>> of the releases, there has been lots of changes done on the code
>> optimization with an expectation that such changes will have gluster to
>> provide better performance. While many of these changes do help, but off
>> late we have started seeing some diverse effects of them, one especially
>> being the calloc to malloc conversions. While I do understand that malloc
>> syscall will eliminate the extra memset bottleneck which calloc bears, but
>> with recent kernels having in-built strong compiler optimizations I am not
>> sure whether that makes any significant difference, but as I mentioned
>> earlier certainly if this isn't done carefully it can potentially introduce
>> lot of bugs and I'm writing this email to share one of such experiences.
>>
>> Sanju & I were having troubles for last two days to figure out why
>> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
>> Sanju's system but it had no problems running the same fix in my gluster
>> containers. After spending a significant amount of time, what we now
>> figured out is that a malloc call [1] (which was a calloc earlier) is the
>> culprit here. As you all can see, in this function we allocate txn_id and
>> copy the event->txn_id into it through gf_uuid_copy () . But when we were
>> debugging this step wise through gdb, txn_id wasn't exactly copied with the
>> exact event->txn_id and it had some junk values which made the
>> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
>> resulting the leaks to remain the same which was the original intention of
>> the fix.
>>
>> This was quite painful to debug and we had to spend some time to figure
>> this out. Considering we have converted many such calls in past, I'd urge
>> that we review all such conversions and see if there're any side effects to
>> it. Otherwise we might end up running into many potential memory related
>> bugs later on. OTOH, going forward I'd request every patch
>> owners/maintainers to pay some special attention to these conversions and
>> see they are really beneficial and error free. IMO, general guideline
>> should be - for bigger buffers, malloc would make better sense but has to
>> be done carefully, for smaller size, we stick to calloc.
>>
>
>> What do others think about it?
>>
>
> I believe that replacing calloc with malloc everywhere without adequate
> testing and review is not safe and am against doing so for the following
> reasons:
>

No patch should get in without adequate testing and thorough review.

>
>1. Most of these patches have not been tested, especially the error
>paths.I have seen some that introduced issues in error scenarios with
>pointers being non-null.
>
>
You raise an interesting issue. Why are free'd memory pointers are not
NULL'ified? Why does FREE() set ptr = (void *)0x and not NULL?
This is a potential cause for failure. A re-occuring FREE(NULL) is
harmless. A FREE(0x) is a bit more problematic.


>1.
>2. As Raghavendra said, the code assumes that certain elements will be
>initialized to null/zero and changing that can have consequences which are
>not immediately obvious. I think it might be worthwhile to go through the
>already merged calloc->malloc patches to check error paths and so on to see
>if they are safe.
>
>
Agreed.

>
>1.
>2. Making such changes to the libglusterfs code while we are currently
>working to stabilize the product is not a good idea. The patches take time
>to review and any errors introduced in the core pieces affect all processes
>and require significant effort to debug.
>
>
Let me know when we consider the project stable. I'd argue the way to
stabilize it is not stop improving it, but improving its testing. From more
tests to cover more code via more tests to more static analysis coverage,
to ensuring CI is rock-solid (inc. random errors that pop up from time to
time). Not accepting patches to master is not the right approach, unless
it's time-boxed somehow. If it is, then it means we don't trust our CI
enough, btw.


>1.
>
> Yaniv, while the example you provided might make sense to change to
> malloc, a lot of the other changes, in my opinion, do not for the effort
> required. For performance testing, smallfile might be a useful tool to see
> if any of the changes make a difference. That said, I am reluctant to take
> in patches that change core code significantly without being tested or
> providing proof of benefits.
>

Smallfile is part of CI? I am happy to see it documented @
https://docs.gluster.org/en/latest/Administrator%20Guide/Performance%20Testing/#smallfile-distributed-io-benchmark
, so at least one can know how to execute it manually.

>
> We need better performance and scalability but that is g

Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Yaniv Kaul
On Thu, Mar 21, 2019 at 5:43 PM Yaniv Kaul  wrote:

>
>
> On Thu, Mar 21, 2019 at 5:23 PM Nithya Balachandran 
> wrote:
>
>>
>>
>> On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee  wrote:
>>
>>> All,
>>>
>>> In the last few releases of glusterfs, with stability as a primary theme
>>> of the releases, there has been lots of changes done on the code
>>> optimization with an expectation that such changes will have gluster to
>>> provide better performance. While many of these changes do help, but off
>>> late we have started seeing some diverse effects of them, one especially
>>> being the calloc to malloc conversions. While I do understand that malloc
>>> syscall will eliminate the extra memset bottleneck which calloc bears, but
>>> with recent kernels having in-built strong compiler optimizations I am not
>>> sure whether that makes any significant difference, but as I mentioned
>>> earlier certainly if this isn't done carefully it can potentially introduce
>>> lot of bugs and I'm writing this email to share one of such experiences.
>>>
>>> Sanju & I were having troubles for last two days to figure out why
>>> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
>>> Sanju's system but it had no problems running the same fix in my gluster
>>> containers. After spending a significant amount of time, what we now
>>> figured out is that a malloc call [1] (which was a calloc earlier) is the
>>> culprit here. As you all can see, in this function we allocate txn_id and
>>> copy the event->txn_id into it through gf_uuid_copy () . But when we were
>>> debugging this step wise through gdb, txn_id wasn't exactly copied with the
>>> exact event->txn_id and it had some junk values which made the
>>> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
>>> resulting the leaks to remain the same which was the original intention of
>>> the fix.
>>>
>>> This was quite painful to debug and we had to spend some time to figure
>>> this out. Considering we have converted many such calls in past, I'd urge
>>> that we review all such conversions and see if there're any side effects to
>>> it. Otherwise we might end up running into many potential memory related
>>> bugs later on. OTOH, going forward I'd request every patch
>>> owners/maintainers to pay some special attention to these conversions and
>>> see they are really beneficial and error free. IMO, general guideline
>>> should be - for bigger buffers, malloc would make better sense but has to
>>> be done carefully, for smaller size, we stick to calloc.
>>>
>>
>>> What do others think about it?
>>>
>>
>> I believe that replacing calloc with malloc everywhere without adequate
>> testing and review is not safe and am against doing so for the following
>> reasons:
>>
>
> No patch should get in without adequate testing and thorough review.
>
>>
>>1. Most of these patches have not been tested, especially the error
>>paths.I have seen some that introduced issues in error scenarios with
>>pointers being non-null.
>>
>>
> You raise an interesting issue. Why are free'd memory pointers are not
> NULL'ified? Why does FREE() set ptr = (void *)0x and not NULL?
> This is a potential cause for failure. A re-occuring FREE(NULL) is
> harmless. A FREE(0x) is a bit more problematic.
>
>
>>1.
>>2. As Raghavendra said, the code assumes that certain elements will
>>be initialized to null/zero and changing that can have consequences which
>>are not immediately obvious. I think it might be worthwhile to go through
>>the already merged calloc->malloc patches to check error paths and so on 
>> to
>>see if they are safe.
>>
>>
> Agreed.
>
>>
>>1.
>>2. Making such changes to the libglusterfs code while we are
>>currently working to stabilize the product is not a good idea. The patches
>>take time to review and any errors introduced in the core pieces affect 
>> all
>>processes and require significant effort to debug.
>>
>>
> Let me know when we consider the project stable. I'd argue the way to
> stabilize it is not stop improving it, but improving its testing. From more
> tests to cover more code via more tests to more static analysis coverage,
> to ensuring CI is rock-solid (inc. random errors that pop up from time to
> time). Not accepting patches to master is not the right approach, unless
> it's time-boxed somehow. If it is, then it means we don't trust our CI
> enough, btw.
>
>
>>1.
>>
>> Yaniv, while the example you provided might make sense to change to
>> malloc, a lot of the other changes, in my opinion, do not for the effort
>> required. For performance testing, smallfile might be a useful tool to see
>> if any of the changes make a difference. That said, I am reluctant to take
>> in patches that change core code significantly without being tested or
>> providing proof of benefits.
>>
>
> Smallfile is part of CI? I am happy to see it documented @
> https://docs.gluster.org/en/latest/Administr

Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Sankarshan Mukhopadhyay
On Thu, Mar 21, 2019 at 9:24 PM Yaniv Kaul  wrote:

>> Smallfile is part of CI? I am happy to see it documented @ 
>> https://docs.gluster.org/en/latest/Administrator%20Guide/Performance%20Testing/#smallfile-distributed-io-benchmark
>>  , so at least one can know how to execute it manually.
>
>
> Following up the above link to the smallfile repo leads to 404 (I'm assuming 
> we don't have a link checker running on our documentation, so it can break 
> from time to time?)

Hmm... that needs to be addressed.

> I assume it's https://github.com/distributed-system-analysis/smallfile ?

Yes.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Nithya Balachandran
On Thu, 21 Mar 2019 at 21:14, Yaniv Kaul  wrote:

>
>
> On Thu, Mar 21, 2019 at 5:23 PM Nithya Balachandran 
> wrote:
>
>>
>>
>> On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee  wrote:
>>
>>> All,
>>>
>>> In the last few releases of glusterfs, with stability as a primary theme
>>> of the releases, there has been lots of changes done on the code
>>> optimization with an expectation that such changes will have gluster to
>>> provide better performance. While many of these changes do help, but off
>>> late we have started seeing some diverse effects of them, one especially
>>> being the calloc to malloc conversions. While I do understand that malloc
>>> syscall will eliminate the extra memset bottleneck which calloc bears, but
>>> with recent kernels having in-built strong compiler optimizations I am not
>>> sure whether that makes any significant difference, but as I mentioned
>>> earlier certainly if this isn't done carefully it can potentially introduce
>>> lot of bugs and I'm writing this email to share one of such experiences.
>>>
>>> Sanju & I were having troubles for last two days to figure out why
>>> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
>>> Sanju's system but it had no problems running the same fix in my gluster
>>> containers. After spending a significant amount of time, what we now
>>> figured out is that a malloc call [1] (which was a calloc earlier) is the
>>> culprit here. As you all can see, in this function we allocate txn_id and
>>> copy the event->txn_id into it through gf_uuid_copy () . But when we were
>>> debugging this step wise through gdb, txn_id wasn't exactly copied with the
>>> exact event->txn_id and it had some junk values which made the
>>> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
>>> resulting the leaks to remain the same which was the original intention of
>>> the fix.
>>>
>>> This was quite painful to debug and we had to spend some time to figure
>>> this out. Considering we have converted many such calls in past, I'd urge
>>> that we review all such conversions and see if there're any side effects to
>>> it. Otherwise we might end up running into many potential memory related
>>> bugs later on. OTOH, going forward I'd request every patch
>>> owners/maintainers to pay some special attention to these conversions and
>>> see they are really beneficial and error free. IMO, general guideline
>>> should be - for bigger buffers, malloc would make better sense but has to
>>> be done carefully, for smaller size, we stick to calloc.
>>>
>>
>>> What do others think about it?
>>>
>>
>> I believe that replacing calloc with malloc everywhere without adequate
>> testing and review is not safe and am against doing so for the following
>> reasons:
>>
>
> No patch should get in without adequate testing and thorough review.
>

>>1. Most of these patches have not been tested, especially the error
>>paths.I have seen some that introduced issues in error scenarios with
>>pointers being non-null.
>>
>>
> You raise an interesting issue. Why are free'd memory pointers are not
> NULL'ified? Why does FREE() set ptr = (void *)0x and not NULL?
>
The problem I'm referring to here is in error paths  when incompletely
initialised structures are cleaned up. A non-null never allocated pointer
will be attempted to be freed.

This is a potential cause for failure. A re-occuring FREE(NULL) is
> harmless. A FREE(0x) is a bit more problematic.
>
>
>>1.
>>2. As Raghavendra said, the code assumes that certain elements will
>>be initialized to null/zero and changing that can have consequences which
>>are not immediately obvious. I think it might be worthwhile to go through
>>the already merged calloc->malloc patches to check error paths and so on 
>> to
>>see if they are safe.
>>
>>
> Agreed.
>
>>
>>1.
>>2. Making such changes to the libglusterfs code while we are
>>currently working to stabilize the product is not a good idea. The patches
>>take time to review and any errors introduced in the core pieces affect 
>> all
>>processes and require significant effort to debug.
>>
>>
> Let me know when we consider the project stable. I'd argue the way to
> stabilize it is not stop improving it, but improving its testing. From more
> tests to cover more code via more tests to more static analysis coverage,
> to ensuring CI is rock-solid (inc. random errors that pop up from time to
> time).
>

Agreed. We need better CI coverage. More patches there would be very
welcome.


> Not accepting patches to master is not the right approach, unless it's
> time-boxed somehow. If it is, then it means we don't trust our CI enough,
> btw.
>

We are not blocking patches to master. We are raising concerns about
patches which are likely to impact code stability (such as the malloc
patches which have introduced issues in some cases) but require
considerable effort to review or test. A patch that solves a known proble

Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-26 Thread Vijay Bellur
On Thu, Mar 21, 2019 at 8:44 AM Yaniv Kaul  wrote:

>
>
> On Thu, Mar 21, 2019 at 5:23 PM Nithya Balachandran 
> wrote:
>
>>
>>
>> On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee  wrote:
>>
>>> All,
>>>
>>> In the last few releases of glusterfs, with stability as a primary theme
>>> of the releases, there has been lots of changes done on the code
>>> optimization with an expectation that such changes will have gluster to
>>> provide better performance. While many of these changes do help, but off
>>> late we have started seeing some diverse effects of them, one especially
>>> being the calloc to malloc conversions. While I do understand that malloc
>>> syscall will eliminate the extra memset bottleneck which calloc bears, but
>>> with recent kernels having in-built strong compiler optimizations I am not
>>> sure whether that makes any significant difference, but as I mentioned
>>> earlier certainly if this isn't done carefully it can potentially introduce
>>> lot of bugs and I'm writing this email to share one of such experiences.
>>>
>>> Sanju & I were having troubles for last two days to figure out why
>>> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
>>> Sanju's system but it had no problems running the same fix in my gluster
>>> containers. After spending a significant amount of time, what we now
>>> figured out is that a malloc call [1] (which was a calloc earlier) is the
>>> culprit here. As you all can see, in this function we allocate txn_id and
>>> copy the event->txn_id into it through gf_uuid_copy () . But when we were
>>> debugging this step wise through gdb, txn_id wasn't exactly copied with the
>>> exact event->txn_id and it had some junk values which made the
>>> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
>>> resulting the leaks to remain the same which was the original intention of
>>> the fix.
>>>
>>> This was quite painful to debug and we had to spend some time to figure
>>> this out. Considering we have converted many such calls in past, I'd urge
>>> that we review all such conversions and see if there're any side effects to
>>> it. Otherwise we might end up running into many potential memory related
>>> bugs later on. OTOH, going forward I'd request every patch
>>> owners/maintainers to pay some special attention to these conversions and
>>> see they are really beneficial and error free. IMO, general guideline
>>> should be - for bigger buffers, malloc would make better sense but has to
>>> be done carefully, for smaller size, we stick to calloc.
>>>
>>
>>> What do others think about it?
>>>
>>
>> I believe that replacing calloc with malloc everywhere without adequate
>> testing and review is not safe and am against doing so for the following
>> reasons:
>>
>
> No patch should get in without adequate testing and thorough review.
>


There are lots of interesting points to glean in this thread. However, this
particular one caught my attention. How about we introduce a policy that no
patch gets merged unless it is thoroughly tested? The onus would be on the
developer to provide a .t test case to show completeness in the testing of
that patch. If the developer does not or cannot for any reason, we could
have the maintainer run tests and add a note in gerrit explaining the tests
run. This would provide more assurance about the patches being tested
before getting merged. Obviously, patches that fix typos or that cannot
affect any functionality need not be subject to this policy.

As far as review thoroughness is concerned, it might be better to mandate
acks from respective maintainers before merging a patch that affects
several components. More eyeballs that specialize in particular
component(s) will hopefully catch some of these issues during the review
phase.

Thanks,
Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-27 Thread Niels de Vos
On Tue, Mar 26, 2019 at 02:52:33PM -0700, Vijay Bellur wrote:
> On Thu, Mar 21, 2019 at 8:44 AM Yaniv Kaul  wrote:
> 
> >
> >
> > On Thu, Mar 21, 2019 at 5:23 PM Nithya Balachandran 
> > wrote:
> >
> >>
> >>
> >> On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee  wrote:
> >>
> >>> All,
> >>>
> >>> In the last few releases of glusterfs, with stability as a primary theme
> >>> of the releases, there has been lots of changes done on the code
> >>> optimization with an expectation that such changes will have gluster to
> >>> provide better performance. While many of these changes do help, but off
> >>> late we have started seeing some diverse effects of them, one especially
> >>> being the calloc to malloc conversions. While I do understand that malloc
> >>> syscall will eliminate the extra memset bottleneck which calloc bears, but
> >>> with recent kernels having in-built strong compiler optimizations I am not
> >>> sure whether that makes any significant difference, but as I mentioned
> >>> earlier certainly if this isn't done carefully it can potentially 
> >>> introduce
> >>> lot of bugs and I'm writing this email to share one of such experiences.
> >>>
> >>> Sanju & I were having troubles for last two days to figure out why
> >>> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
> >>> Sanju's system but it had no problems running the same fix in my gluster
> >>> containers. After spending a significant amount of time, what we now
> >>> figured out is that a malloc call [1] (which was a calloc earlier) is the
> >>> culprit here. As you all can see, in this function we allocate txn_id and
> >>> copy the event->txn_id into it through gf_uuid_copy () . But when we were
> >>> debugging this step wise through gdb, txn_id wasn't exactly copied with 
> >>> the
> >>> exact event->txn_id and it had some junk values which made the
> >>> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
> >>> resulting the leaks to remain the same which was the original intention of
> >>> the fix.
> >>>
> >>> This was quite painful to debug and we had to spend some time to figure
> >>> this out. Considering we have converted many such calls in past, I'd urge
> >>> that we review all such conversions and see if there're any side effects 
> >>> to
> >>> it. Otherwise we might end up running into many potential memory related
> >>> bugs later on. OTOH, going forward I'd request every patch
> >>> owners/maintainers to pay some special attention to these conversions and
> >>> see they are really beneficial and error free. IMO, general guideline
> >>> should be - for bigger buffers, malloc would make better sense but has to
> >>> be done carefully, for smaller size, we stick to calloc.
> >>>
> >>
> >>> What do others think about it?
> >>>
> >>
> >> I believe that replacing calloc with malloc everywhere without adequate
> >> testing and review is not safe and am against doing so for the following
> >> reasons:
> >>
> >
> > No patch should get in without adequate testing and thorough review.
> >
> 
> 
> There are lots of interesting points to glean in this thread. However, this
> particular one caught my attention. How about we introduce a policy that no
> patch gets merged unless it is thoroughly tested? The onus would be on the
> developer to provide a .t test case to show completeness in the testing of
> that patch. If the developer does not or cannot for any reason, we could
> have the maintainer run tests and add a note in gerrit explaining the tests
> run. This would provide more assurance about the patches being tested
> before getting merged. Obviously, patches that fix typos or that cannot
> affect any functionality need not be subject to this policy.
> 
> As far as review thoroughness is concerned, it might be better to mandate
> acks from respective maintainers before merging a patch that affects
> several components. More eyeballs that specialize in particular
> component(s) will hopefully catch some of these issues during the review
> phase.

Both of these points have always been strongly encouraged. They are also
documented in the
https://docs.gluster.org/en/latest/Contributors-Guide/Guidelines-For-Maintainers/
https://github.com/gluster/glusterdocs/blob/master/docs/Contributors-Guide/Guidelines-For-Maintainers.md
(formatting is broken in the 1st link, but I dont know how to fix it)

We probably need to apply our own guidelines a little better, and
remember developers that > 90% of the patch(series) should come with a
.t file or added test in an existing one.

And a big +1 for getting reviews or at least some involvement of the
component maintainers.

Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-27 Thread Poornima Gurusiddaiah
Regarding the use of malloc vs calloc, i had tried this a while back,
though theoretically it should give performance improvement, i was not able
to see this in practice for small memory allocations, may be due to the
optimizations in the glibc and kernel memory management itself. When i ran
a sample program for random size millions of malloc vs calloc and also use
it(so that it results in page fault) resulted in no conclusive results on
which is more performant, in some runs calloc was slower in some runs
malloc was slower. In glibc the calloc is not exactly malloc + memset,
after browsing and looking through __libc_calloc(), the kernel always gives
zeroed pages for security purpose(with COW), when calloc implementation in
libc reuses the memory allocated by its own process then there is a
difference in perf of malloc and calloc, but even these are optimised to
some extent in glibc. I would personally prefer keeping calloc and use
malloc only if its large buffers, and trade off this one nano perf
improvement, for the ease of coding and uniformity across the project. As a
thumb rule we have been using calloc for smaller memory, we should discuss
and conclude on this so it can go in
developer-guide(./doc/developer-guide/coding-standard.md)

Also, while at the memory alloc optimization discussions, would like to
mention about one another observation, long back had tried doing some
changes to gf_malloc and gf_calloc to use from the per thread mem pool if
available, rather than send it to libc. We saw that there were millions of
small mallocs/callocs happening so we tried to modify the code to lookup
the mempool for availability for pre allocated memory, with this change we
saw no improvement! But there were improvements wrt using iobuf_pool and
other pre allocated poo for dict, inode etcl. something we could explore.

Regards,
Poornima

On Wed, Mar 27, 2019 at 2:25 PM Niels de Vos  wrote:

> On Tue, Mar 26, 2019 at 02:52:33PM -0700, Vijay Bellur wrote:
> > On Thu, Mar 21, 2019 at 8:44 AM Yaniv Kaul  wrote:
> >
> > >
> > >
> > > On Thu, Mar 21, 2019 at 5:23 PM Nithya Balachandran <
> nbala...@redhat.com>
> > > wrote:
> > >
> > >>
> > >>
> > >> On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee 
> wrote:
> > >>
> > >>> All,
> > >>>
> > >>> In the last few releases of glusterfs, with stability as a primary
> theme
> > >>> of the releases, there has been lots of changes done on the code
> > >>> optimization with an expectation that such changes will have gluster
> to
> > >>> provide better performance. While many of these changes do help, but
> off
> > >>> late we have started seeing some diverse effects of them, one
> especially
> > >>> being the calloc to malloc conversions. While I do understand that
> malloc
> > >>> syscall will eliminate the extra memset bottleneck which calloc
> bears, but
> > >>> with recent kernels having in-built strong compiler optimizations I
> am not
> > >>> sure whether that makes any significant difference, but as I
> mentioned
> > >>> earlier certainly if this isn't done carefully it can potentially
> introduce
> > >>> lot of bugs and I'm writing this email to share one of such
> experiences.
> > >>>
> > >>> Sanju & I were having troubles for last two days to figure out why
> > >>> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
> > >>> Sanju's system but it had no problems running the same fix in my
> gluster
> > >>> containers. After spending a significant amount of time, what we now
> > >>> figured out is that a malloc call [1] (which was a calloc earlier)
> is the
> > >>> culprit here. As you all can see, in this function we allocate
> txn_id and
> > >>> copy the event->txn_id into it through gf_uuid_copy () . But when we
> were
> > >>> debugging this step wise through gdb, txn_id wasn't exactly copied
> with the
> > >>> exact event->txn_id and it had some junk values which made the
> > >>> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
> > >>> resulting the leaks to remain the same which was the original
> intention of
> > >>> the fix.
> > >>>
> > >>> This was quite painful to debug and we had to spend some time to
> figure
> > >>> this out. Considering we have converted many such calls in past, I'd
> urge
> > >>> that we review all such conversions and see if there're any side
> effects to
> > >>> it. Otherwise we might end up running into many potential memory
> related
> > >>> bugs later on. OTOH, going forward I'd request every patch
> > >>> owners/maintainers to pay some special attention to these
> conversions and
> > >>> see they are really beneficial and error free. IMO, general guideline
> > >>> should be - for bigger buffers, malloc would make better sense but
> has to
> > >>> be done carefully, for smaller size, we stick to calloc.
> > >>>
> > >>
> > >>> What do others think about it?
> > >>>
> > >>
> > >> I believe that replacing calloc with malloc everywhere without
> adequate
> > >> testing and review is not safe and am against