Re: [Gluster-devel] The state of lock heal and inodelk/entrylk heal ?

2019-03-21 Thread Pranith Kumar Karampuri
On Thu, Mar 21, 2019 at 9:15 AM Kinglong Mee  wrote:

> Hello folks,
>
> Lock self healing (recovery or replay) is added at
> https://review.gluster.org/#/c/glusterfs/+/2766/
>
> But it is removed at
> https://review.gluster.org/#/c/glusterfs/+/12363/
>
> I found some information about it at
> https://anoopcs.fedorapeople.org/Lock%20recovery%20in%20GlusterFS.txt
>
> I download the glusterfs source but cannot find any code about lock heal.
>
> I wanna know the state of lock heal, and inodelk/entrylk heal.
>

hi,
 At the moment lock heal doesn't happen. It is an open item that needs
to be fixed. It is a problem that is gaining interest recently and we are
thinking of solving this problem. Did you get a chance to think about this
problem and do you have any solutions?


>
> Can someone show me some information about it?
>
> thanks,
> Kinglong Mee
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] The state of lock heal and inodelk/entrylk heal ?

2019-03-21 Thread Pranith Kumar Karampuri
On Thu, Mar 21, 2019 at 11:50 AM Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On Thu, Mar 21, 2019 at 9:15 AM Kinglong Mee 
> wrote:
>
>> Hello folks,
>>
>> Lock self healing (recovery or replay) is added at
>> https://review.gluster.org/#/c/glusterfs/+/2766/
>>
>> But it is removed at
>> https://review.gluster.org/#/c/glusterfs/+/12363/
>>
>> I found some information about it at
>> https://anoopcs.fedorapeople.org/Lock%20recovery%20in%20GlusterFS.txt
>>
>> I download the glusterfs source but cannot find any code about lock heal.
>>
>> I wanna know the state of lock heal, and inodelk/entrylk heal.
>>
>
> hi,
>  At the moment lock heal doesn't happen. It is an open item that needs
> to be fixed. It is a problem that is gaining interest recently and we are
> thinking of solving this problem. Did you get a chance to think about this
> problem and do you have any solutions?
>

I saw your question first @
https://review.gluster.org/#/c/glusterfs/+/22377/, let us continue the
conversation here so that everyone can get involved :-).


>
>
>>
>> Can someone show me some information about it?
>>
>> thanks,
>> Kinglong Mee
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
> --
> Pranith
>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] The state of lock heal and inodelk/entrylk heal ?

2019-03-21 Thread Kinglong Mee
On 2019/3/21 14:59, Pranith Kumar Karampuri wrote:> On Thu, Mar 21, 2019 
at 11:50 AM Pranith Kumar Karampuri
mailto:pkara...@redhat.com>> wrote: 
On Thu, Mar 21, 2019 at 9:15 AM Kinglong Mee 
> wrote:

Hello folks,

Lock self healing (recovery or replay) is added at
https://review.gluster.org/#/c/glusterfs/+/2766/

But it is removed at
https://review.gluster.org/#/c/glusterfs/+/12363/

I found some information about it at
https://anoopcs.fedorapeople.org/Lock%20recovery%20in%20GlusterFS.txt

I download the glusterfs source but cannot find any code about
lock heal.

I wanna know the state of lock heal, and inodelk/entrylk heal.


hi,
  At the moment lock heal doesn't happen. It is an open item
that needs to be fixed. It is a problem that is gaining interest
recently and we are thinking of solving this problem. Did you get a
chance to think about this problem and do you have any solutions?


I'd like the mechanism NLM/NFS used lock recovery which is used years.

Thanks,
Kinglong Mee




I saw your question first @ 
https://review.gluster.org/#/c/glusterfs/+/22377/, let us continue the 
conversation here so that everyone can get involved :-).



Can someone show me some information about it?

thanks,
Kinglong Mee
___
Gluster-devel mailing list
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel



-- 
Pranith




--
Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Network Block device (NBD) on top of glusterfs

2019-03-21 Thread Xiubo Li

On 2019/3/21 18:09, Prasanna Kalever wrote:



On Thu, Mar 21, 2019 at 9:00 AM Xiubo Li > wrote:


All,

I am one of the contributor forgluster-block
[1] project, and also I
contribute to linux kernel andopen-iscsi
 project.[2]

NBD was around for some time, but in recent time, linux kernel’s
Network Block Device (NBD) is enhanced and made to work with more
devices and also the option to integrate with netlink is added.
So, I tried to provide a glusterfs client based NBD driver
recently. Please refergithub issue #633
[3], and good
news is I have a working code, with most basic things @nbd-runner
project [4].

While this email is about announcing the project, and asking for
more collaboration, I would also like to discuss more about the
placement of the project itself. Currently nbd-runner project is
expected to be shared by our friends at Ceph project too, to
provide NBD driver for Ceph. I have personally worked with some of
them closely while contributing to open-iSCSI project, and we
would like to take this project to great success.

Now few questions:

 1. Can I continue to usehttp://github.com/gluster/nbd-runneras
home for this project, even if its shared by other filesystem
projects?

  * I personally am fine with this.

 2. Should there be a separate organization for this repo?

  * While it may make sense in future, for now, I am not planning
to start any new thing?

It would be great if we have some consensus on this soon as
nbd-runner is a new repository. If there are no concerns, I will
continue to contribute to the existing repository.


Thanks Xiubo Li, for finally sending this email out. Since this email 
is out on gluster mailing list, I would like to take a stand from 
gluster community point of view *only* and share my views.


My honest answer is "If we want to maintain this within gluster org, 
then 80% of the effort is common/duplicate of what we did all these 
days with gluster-block",


The great idea came from Mike Christie days ago and the nbd-runner 
project's framework is initially emulated from tcmu-runner. This is why 
I name this project as nbd-runner, which will work for all the other 
Distributed Storages, such as Gluster/Ceph/Azure, as discussed with Mike 
before.


nbd-runner(NBD proto) and tcmu-runner(iSCSI proto) are almost the same 
and both are working as lower IO(READ/WRITE/...) stuff, not the 
management layer like ceph-iscsi-gateway and gluster-block currently do.


Currently since I only implemented the Gluster handler and also using 
the RPC like glusterfs and gluster-block, most of the other code (about 
70%) in nbd-runner are for the NBD proto and these are very different 
from tcmu-runner/glusterfs/gluster-block projects, and there are many 
new features in NBD module that not yet supported and then there will be 
more different in future.


The framework coding has been done and the nbd-runner project is already 
stable and could already work well for me now.




like:
* rpc/socket code
* cli/daemon parser/helper logics
* gfapi util functions
* logger framework
* inotify & dyn-config threads


Yeah, these features were initially from tcmu-runner project, Mike and I 
coded two years ago. Currently nbd-runner also has copied them from 
tcmu-runner.


Very appreciated for you great ideas here Prasanna and hope nbd-runner 
could be more generically and successfully used in future.


BRs

Xiubo Li



* configure/Makefile/specfiles
* docsAboutGluster and etc ..

The repository gluster-block is actually a home for all the block 
related stuff within gluster and its designed to accommodate alike 
functionalities, if I was you I would have simply copied nbd-runner.c 
into https://github.com/gluster/gluster-block/tree/master/daemon/ just 
like ceph plays it here 
https://github.com/ceph/ceph/blob/master/src/tools/rbd_nbd/rbd-nbd.cc 
and be done.


Advantages of keeping nbd client within gluster-block:
-> No worry about maintenance code burdon
-> No worry about monitoring a new component
-> shipping packages to fedora/centos/rhel is handled
-> This helps improve and stabilize the current gluster-block framework
-> We can build a common CI
-> We can use reuse common test framework and etc ..

If you have an impression that gluster-block is for management, then I 
would really want to correct you at this point.


Some of my near future plans for gluster-block:
* Allow exporting blocks with FUSE access via fileIO backstore to 
improve large-file workloads, draft: 
https://github.com/gluster/gluster-block/pull/58

* Accommodate kernel loopback handling for local only applications
* The same way we can accommodate nbd app/client, and IMHO this effort 

Re: [Gluster-devel] Network Block device (NBD) on top of glusterfs

2019-03-21 Thread Prasanna Kalever
On Thu, Mar 21, 2019 at 9:00 AM Xiubo Li  wrote:

> All,
>
> I am one of the contributor for gluster-block
> [1] project, and also I
> contribute to linux kernel and open-iscsi 
> project.[2]
>
> NBD was around for some time, but in recent time, linux kernel’s Network
> Block Device (NBD) is enhanced and made to work with more devices and also
> the option to integrate with netlink is added. So, I tried to provide a
> glusterfs client based NBD driver recently. Please refer github issue #633
> [3], and good news is I
> have a working code, with most basic things @ nbd-runner project
> [4].
>
> While this email is about announcing the project, and asking for more
> collaboration, I would also like to discuss more about the placement of the
> project itself. Currently nbd-runner project is expected to be shared by
> our friends at Ceph project too, to provide NBD driver for Ceph. I have
> personally worked with some of them closely while contributing to
> open-iSCSI project, and we would like to take this project to great success.
>
> Now few questions:
>
>1. Can I continue to use http://github.com/gluster/nbd-runner as home
>for this project, even if its shared by other filesystem projects?
>
>
>- I personally am fine with this.
>
>
>1. Should there be a separate organization for this repo?
>
>
>- While it may make sense in future, for now, I am not planning to
>start any new thing?
>
> It would be great if we have some consensus on this soon as nbd-runner is
> a new repository. If there are no concerns, I will continue to contribute
> to the existing repository.
>

Thanks Xiubo Li, for finally sending this email out. Since this email is
out on gluster mailing list, I would like to take a stand from gluster
community point of view *only* and share my views.

My honest answer is "If we want to maintain this within gluster org, then
80% of the effort is common/duplicate of what we did all these days with
gluster-block",

like:
* rpc/socket code
* cli/daemon parser/helper logics
* gfapi util functions
* logger framework
* inotify & dyn-config threads
* configure/Makefile/specfiles
* docsAboutGluster and etc ..

The repository gluster-block is actually a home for all the block related
stuff within gluster and its designed to accommodate alike functionalities,
if I was you I would have simply copied nbd-runner.c into
https://github.com/gluster/gluster-block/tree/master/daemon/ just like ceph
plays it here
https://github.com/ceph/ceph/blob/master/src/tools/rbd_nbd/rbd-nbd.cc and
be done.

Advantages of keeping nbd client within gluster-block:
-> No worry about maintenance code burdon
-> No worry about monitoring a new component
-> shipping packages to fedora/centos/rhel is handled
-> This helps improve and stabilize the current gluster-block framework
-> We can build a common CI
-> We can use reuse common test framework and etc ..

If you have an impression that gluster-block is for management, then I
would really want to correct you at this point.

Some of my near future plans for gluster-block:
* Allow exporting blocks with FUSE access via fileIO backstore to improve
large-file workloads, draft:
https://github.com/gluster/gluster-block/pull/58
* Accommodate kernel loopback handling for local only applications
* The same way we can accommodate nbd app/client, and IMHO this effort
shouldn't take 1 or 2 days to get it merged with in gluster-block and ready
for a go release.


Hope that clarifies it.


Best Regards,
--
Prasanna


> Regards,
> Xiubo Li (@lxbsz)
>
> [1] - https://github.com/gluster/gluster-block
> [2] - https://github.com/open-iscsi
> [3] - https://github.com/gluster/glusterfs/issues/633
> [4] - https://github.com/gluster/nbd-runner
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] "rpc_clnt_ping_timer_expired" errors

2019-03-21 Thread Raghavendra Gowdappa
On Thu, Mar 21, 2019 at 4:10 PM Mauro Tridici  wrote:

> Hi Raghavendra,
>
> the number of errors reduced, but during last days I received some error
> notifications from Nagios server similar to the following one:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ** Nagios *Notification Type: PROBLEMService: Brick -
> /gluster/mnt5/brickHost: s04Address: s04-stgState: CRITICALDate/Time: Mon
> Mar 18 19:56:36 CET 2019Additional Info:CHECK_NRPE STATE CRITICAL: Socket
> timeout after 10 seconds.*
>
> The error was related only to s04 gluster server.
>
> So, following your suggestions,  I executed, on s04 node, the top command.
> In attachment, you can find the related output.
>

top output doesn't contain cmd/thread names. Was there anything wrong.


> Thank you very much for your help.
> Regards,
> Mauro
>
>
>
> On 14 Mar 2019, at 13:31, Raghavendra Gowdappa 
> wrote:
>
> Thanks Mauro.
>
> On Thu, Mar 14, 2019 at 3:38 PM Mauro Tridici 
> wrote:
>
>> Hi Raghavendra,
>>
>> I just changed the client option value to 8.
>> I will check the volume behaviour during the next hours.
>>
>> The GlusterFS version is 3.12.14.
>>
>> I will provide you the logs as soon as the activity load will be high.
>> Thank you,
>> Mauro
>>
>> On 14 Mar 2019, at 04:57, Raghavendra Gowdappa 
>> wrote:
>>
>>
>>
>> On Wed, Mar 13, 2019 at 3:55 PM Mauro Tridici 
>> wrote:
>>
>>> Hi Raghavendra,
>>>
>>> Yes, server.event-thread has been changed from 4 to 8.
>>>
>>
>> Was client.event-thread value too changed to 8? If not, I would like to
>> know the results of including this tuning too. Also, if possible, can you
>> get the output of following command from problematic clients and bricks
>> (during the duration when load tends to be high and ping-timer-expiry is
>> seen)?
>>
>> # top -bHd 3
>>
>> This will help us to know  CPU utilization of event-threads.
>>
>> And I forgot to ask, what version of Glusterfs are you using?
>>
>> During last days, I noticed that the error events are still here although
>>> they have been considerably reduced.
>>>
>>> So, I used grep command against the log files in order to provide you a
>>> global vision about the warning, error and critical events appeared today
>>> at 06:xx (may be useful I hope).
>>> I collected the info from s06 gluster server, but the behaviour is the
>>> the almost the same on the other gluster servers.
>>>
>>> *ERRORS:  *
>>> *CWD: /var/log/glusterfs *
>>> *COMMAND: grep " E " *.log |grep "2019-03-13 06:"*
>>>
>>> (I can see a lot of this kind of message in the same period but I'm
>>> notifying you only one record for each type of error)
>>>
>>> glusterd.log:[2019-03-13 06:12:35.982863] E [MSGID: 101042]
>>> [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of
>>> /var/run/gluster/tier2_quota_list/
>>>
>>> glustershd.log:[2019-03-13 06:14:28.666562] E
>>> [rpc-clnt.c:350:saved_frames_unwind] (-->
>>> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a71ddcebb] (-->
>>> /lib64/libgfr
>>> pc.so.0(saved_frames_unwind+0x1de)[0x7f4a71ba1d9e] (-->
>>> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4a71ba1ebe] (-->
>>> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup
>>> +0x90)[0x7f4a71ba3640] (-->
>>> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f4a71ba4130] )
>>> 0-tier2-client-55: forced unwinding frame type(GlusterFS 3.3)
>>> op(INODELK(29))
>>> called at 2019-03-13 06:14:14.858441 (xid=0x17fddb50)
>>>
>>> glustershd.log:[2019-03-13 06:17:48.883825] E
>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to
>>> 192.168.0.55:49158 failed (Connection timed out); disco
>>> nnecting socket
>>> glustershd.log:[2019-03-13 06:19:58.931798] E
>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to
>>> 192.168.0.55:49158 failed (Connection timed out); disco
>>> nnecting socket
>>> glustershd.log:[2019-03-13 06:22:08.979829] E
>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to
>>> 192.168.0.55:49158 failed (Connection timed out); disco
>>> nnecting socket
>>> glustershd.log:[2019-03-13 06:22:36.226847] E [MSGID: 114031]
>>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote
>>> operation failed [Transport endpoint
>>> is not connected]
>>> glustershd.log:[2019-03-13 06:22:36.306669] E [MSGID: 114031]
>>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote
>>> operation failed [Transport endpoint
>>> is not connected]
>>> glustershd.log:[2019-03-13 06:22:36.385257] E [MSGID: 114031]
>>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote
>>> operation failed [Transport endpoint
>>> is not connected]
>>>
>>> *WARNINGS:*
>>> *CWD: /var/log/glusterfs *
>>> *COMMAND: grep " W " *.log |grep "2019-03-13 06:"*
>>>
>>> (I can see a lot of this kind of message in the same period but I'm
>>> notifying you only one record for each type of warnings)
>>>
>>> glustershd.log:[2019-03-13 06:14:28.666772] W [MSGID: 114031]
>>> [client-rpc-fops.c:1080:client3_3_getxattr_cbk] 

[Gluster-devel] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Atin Mukherjee
All,

In the last few releases of glusterfs, with stability as a primary theme of
the releases, there has been lots of changes done on the code optimization
with an expectation that such changes will have gluster to provide better
performance. While many of these changes do help, but off late we have
started seeing some diverse effects of them, one especially being the
calloc to malloc conversions. While I do understand that malloc syscall
will eliminate the extra memset bottleneck which calloc bears, but with
recent kernels having in-built strong compiler optimizations I am not sure
whether that makes any significant difference, but as I mentioned earlier
certainly if this isn't done carefully it can potentially introduce lot of
bugs and I'm writing this email to share one of such experiences.

Sanju & I were having troubles for last two days to figure out why
https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in Sanju's
system but it had no problems running the same fix in my gluster
containers. After spending a significant amount of time, what we now
figured out is that a malloc call [1] (which was a calloc earlier) is the
culprit here. As you all can see, in this function we allocate txn_id and
copy the event->txn_id into it through gf_uuid_copy () . But when we were
debugging this step wise through gdb, txn_id wasn't exactly copied with the
exact event->txn_id and it had some junk values which made the
glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
resulting the leaks to remain the same which was the original intention of
the fix.

This was quite painful to debug and we had to spend some time to figure
this out. Considering we have converted many such calls in past, I'd urge
that we review all such conversions and see if there're any side effects to
it. Otherwise we might end up running into many potential memory related
bugs later on. OTOH, going forward I'd request every patch
owners/maintainers to pay some special attention to these conversions and
see they are really beneficial and error free. IMO, general guideline
should be - for bigger buffers, malloc would make better sense but has to
be done carefully, for smaller size, we stick to calloc.

What do others think about it?

[1]
https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-op-sm.c#L5681
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Raghavendra Gowdappa
On Thu, Mar 21, 2019 at 4:16 PM Atin Mukherjee  wrote:

> All,
>
> In the last few releases of glusterfs, with stability as a primary theme
> of the releases, there has been lots of changes done on the code
> optimization with an expectation that such changes will have gluster to
> provide better performance. While many of these changes do help, but off
> late we have started seeing some diverse effects of them, one especially
> being the calloc to malloc conversions. While I do understand that malloc
> syscall will eliminate the extra memset bottleneck which calloc bears, but
> with recent kernels having in-built strong compiler optimizations I am not
> sure whether that makes any significant difference, but as I mentioned
> earlier certainly if this isn't done carefully it can potentially introduce
> lot of bugs and I'm writing this email to share one of such experiences.
>
> Sanju & I were having troubles for last two days to figure out why
> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
> Sanju's system but it had no problems running the same fix in my gluster
> containers. After spending a significant amount of time, what we now
> figured out is that a malloc call [1] (which was a calloc earlier) is the
> culprit here. As you all can see, in this function we allocate txn_id and
> copy the event->txn_id into it through gf_uuid_copy () . But when we were
> debugging this step wise through gdb, txn_id wasn't exactly copied with the
> exact event->txn_id and it had some junk values which made the
> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
> resulting the leaks to remain the same which was the original intention of
> the fix.
>
> This was quite painful to debug and we had to spend some time to figure
> this out. Considering we have converted many such calls in past, I'd urge
> that we review all such conversions and see if there're any side effects to
> it. Otherwise we might end up running into many potential memory related
> bugs later on. OTOH, going forward I'd request every patch
> owners/maintainers to pay some special attention to these conversions and
> see they are really beneficial and error free. IMO, general guideline
> should be - for bigger buffers, malloc would make better sense but has to
> be done carefully, for smaller size, we stick to calloc.
>
> What do others think about it?
>

I too am afraid of unknown effects of this change as much of the codebase
relies on the assumption of zero-initialized data structures. I vote for
reverting these patches unless it can be demonstrated that performance
benefits are indeed significant. Otherwise the trade off in stability is
not worth the cost.


>
> [1]
> https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-op-sm.c#L5681
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Yaniv Kaul
On Thu, Mar 21, 2019 at 5:23 PM Nithya Balachandran 
wrote:

>
>
> On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee  wrote:
>
>> All,
>>
>> In the last few releases of glusterfs, with stability as a primary theme
>> of the releases, there has been lots of changes done on the code
>> optimization with an expectation that such changes will have gluster to
>> provide better performance. While many of these changes do help, but off
>> late we have started seeing some diverse effects of them, one especially
>> being the calloc to malloc conversions. While I do understand that malloc
>> syscall will eliminate the extra memset bottleneck which calloc bears, but
>> with recent kernels having in-built strong compiler optimizations I am not
>> sure whether that makes any significant difference, but as I mentioned
>> earlier certainly if this isn't done carefully it can potentially introduce
>> lot of bugs and I'm writing this email to share one of such experiences.
>>
>> Sanju & I were having troubles for last two days to figure out why
>> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
>> Sanju's system but it had no problems running the same fix in my gluster
>> containers. After spending a significant amount of time, what we now
>> figured out is that a malloc call [1] (which was a calloc earlier) is the
>> culprit here. As you all can see, in this function we allocate txn_id and
>> copy the event->txn_id into it through gf_uuid_copy () . But when we were
>> debugging this step wise through gdb, txn_id wasn't exactly copied with the
>> exact event->txn_id and it had some junk values which made the
>> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
>> resulting the leaks to remain the same which was the original intention of
>> the fix.
>>
>> This was quite painful to debug and we had to spend some time to figure
>> this out. Considering we have converted many such calls in past, I'd urge
>> that we review all such conversions and see if there're any side effects to
>> it. Otherwise we might end up running into many potential memory related
>> bugs later on. OTOH, going forward I'd request every patch
>> owners/maintainers to pay some special attention to these conversions and
>> see they are really beneficial and error free. IMO, general guideline
>> should be - for bigger buffers, malloc would make better sense but has to
>> be done carefully, for smaller size, we stick to calloc.
>>
>
>> What do others think about it?
>>
>
> I believe that replacing calloc with malloc everywhere without adequate
> testing and review is not safe and am against doing so for the following
> reasons:
>

No patch should get in without adequate testing and thorough review.

>
>1. Most of these patches have not been tested, especially the error
>paths.I have seen some that introduced issues in error scenarios with
>pointers being non-null.
>
>
You raise an interesting issue. Why are free'd memory pointers are not
NULL'ified? Why does FREE() set ptr = (void *)0x and not NULL?
This is a potential cause for failure. A re-occuring FREE(NULL) is
harmless. A FREE(0x) is a bit more problematic.


>1.
>2. As Raghavendra said, the code assumes that certain elements will be
>initialized to null/zero and changing that can have consequences which are
>not immediately obvious. I think it might be worthwhile to go through the
>already merged calloc->malloc patches to check error paths and so on to see
>if they are safe.
>
>
Agreed.

>
>1.
>2. Making such changes to the libglusterfs code while we are currently
>working to stabilize the product is not a good idea. The patches take time
>to review and any errors introduced in the core pieces affect all processes
>and require significant effort to debug.
>
>
Let me know when we consider the project stable. I'd argue the way to
stabilize it is not stop improving it, but improving its testing. From more
tests to cover more code via more tests to more static analysis coverage,
to ensuring CI is rock-solid (inc. random errors that pop up from time to
time). Not accepting patches to master is not the right approach, unless
it's time-boxed somehow. If it is, then it means we don't trust our CI
enough, btw.


>1.
>
> Yaniv, while the example you provided might make sense to change to
> malloc, a lot of the other changes, in my opinion, do not for the effort
> required. For performance testing, smallfile might be a useful tool to see
> if any of the changes make a difference. That said, I am reluctant to take
> in patches that change core code significantly without being tested or
> providing proof of benefits.
>

Smallfile is part of CI? I am happy to see it documented @
https://docs.gluster.org/en/latest/Administrator%20Guide/Performance%20Testing/#smallfile-distributed-io-benchmark
, so at least one can know how to execute it manually.

>
> We need better performance and scalability but that is 

Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Nithya Balachandran
On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee  wrote:

> All,
>
> In the last few releases of glusterfs, with stability as a primary theme
> of the releases, there has been lots of changes done on the code
> optimization with an expectation that such changes will have gluster to
> provide better performance. While many of these changes do help, but off
> late we have started seeing some diverse effects of them, one especially
> being the calloc to malloc conversions. While I do understand that malloc
> syscall will eliminate the extra memset bottleneck which calloc bears, but
> with recent kernels having in-built strong compiler optimizations I am not
> sure whether that makes any significant difference, but as I mentioned
> earlier certainly if this isn't done carefully it can potentially introduce
> lot of bugs and I'm writing this email to share one of such experiences.
>
> Sanju & I were having troubles for last two days to figure out why
> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
> Sanju's system but it had no problems running the same fix in my gluster
> containers. After spending a significant amount of time, what we now
> figured out is that a malloc call [1] (which was a calloc earlier) is the
> culprit here. As you all can see, in this function we allocate txn_id and
> copy the event->txn_id into it through gf_uuid_copy () . But when we were
> debugging this step wise through gdb, txn_id wasn't exactly copied with the
> exact event->txn_id and it had some junk values which made the
> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
> resulting the leaks to remain the same which was the original intention of
> the fix.
>
> This was quite painful to debug and we had to spend some time to figure
> this out. Considering we have converted many such calls in past, I'd urge
> that we review all such conversions and see if there're any side effects to
> it. Otherwise we might end up running into many potential memory related
> bugs later on. OTOH, going forward I'd request every patch
> owners/maintainers to pay some special attention to these conversions and
> see they are really beneficial and error free. IMO, general guideline
> should be - for bigger buffers, malloc would make better sense but has to
> be done carefully, for smaller size, we stick to calloc.
>

> What do others think about it?
>

I believe that replacing calloc with malloc everywhere without adequate
testing and review is not safe and am against doing so for the following
reasons:

   1. Most of these patches have not been tested, especially the error
   paths.I have seen some that introduced issues in error scenarios with
   pointers being non-null.
   2. As Raghavendra said, the code assumes that certain elements will be
   initialized to null/zero and changing that can have consequences which are
   not immediately obvious. I think it might be worthwhile to go through the
   already merged calloc->malloc patches to check error paths and so on to see
   if they are safe.
   3. Making such changes to the libglusterfs code while we are currently
   working to stabilize the product is not a good idea. The patches take time
   to review and any errors introduced in the core pieces affect all processes
   and require significant effort to debug.

Yaniv, while the example you provided might make sense to change to malloc,
a lot of the other changes, in my opinion, do not for the effort required.
For performance testing, smallfile might be a useful tool to see if any of
the changes make a difference. That said, I am reluctant to take in patches
that change core code significantly without being tested or providing proof
of benefits.

We need better performance and scalability but that is going to need
changes in our algorithms and fop handling and that is what we need to be
working on. Such changes, when done right, will provide more benefits than
the micro optimizations. I think it unlikely the micro optimizations will
provide much benefit but am willing to be proven wrong if you have numbers
that show otherwise.

Regards,
Nithya


>
> [1]
> https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-op-sm.c#L5681
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Yaniv Kaul
On Thu, Mar 21, 2019 at 5:43 PM Yaniv Kaul  wrote:

>
>
> On Thu, Mar 21, 2019 at 5:23 PM Nithya Balachandran 
> wrote:
>
>>
>>
>> On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee  wrote:
>>
>>> All,
>>>
>>> In the last few releases of glusterfs, with stability as a primary theme
>>> of the releases, there has been lots of changes done on the code
>>> optimization with an expectation that such changes will have gluster to
>>> provide better performance. While many of these changes do help, but off
>>> late we have started seeing some diverse effects of them, one especially
>>> being the calloc to malloc conversions. While I do understand that malloc
>>> syscall will eliminate the extra memset bottleneck which calloc bears, but
>>> with recent kernels having in-built strong compiler optimizations I am not
>>> sure whether that makes any significant difference, but as I mentioned
>>> earlier certainly if this isn't done carefully it can potentially introduce
>>> lot of bugs and I'm writing this email to share one of such experiences.
>>>
>>> Sanju & I were having troubles for last two days to figure out why
>>> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
>>> Sanju's system but it had no problems running the same fix in my gluster
>>> containers. After spending a significant amount of time, what we now
>>> figured out is that a malloc call [1] (which was a calloc earlier) is the
>>> culprit here. As you all can see, in this function we allocate txn_id and
>>> copy the event->txn_id into it through gf_uuid_copy () . But when we were
>>> debugging this step wise through gdb, txn_id wasn't exactly copied with the
>>> exact event->txn_id and it had some junk values which made the
>>> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
>>> resulting the leaks to remain the same which was the original intention of
>>> the fix.
>>>
>>> This was quite painful to debug and we had to spend some time to figure
>>> this out. Considering we have converted many such calls in past, I'd urge
>>> that we review all such conversions and see if there're any side effects to
>>> it. Otherwise we might end up running into many potential memory related
>>> bugs later on. OTOH, going forward I'd request every patch
>>> owners/maintainers to pay some special attention to these conversions and
>>> see they are really beneficial and error free. IMO, general guideline
>>> should be - for bigger buffers, malloc would make better sense but has to
>>> be done carefully, for smaller size, we stick to calloc.
>>>
>>
>>> What do others think about it?
>>>
>>
>> I believe that replacing calloc with malloc everywhere without adequate
>> testing and review is not safe and am against doing so for the following
>> reasons:
>>
>
> No patch should get in without adequate testing and thorough review.
>
>>
>>1. Most of these patches have not been tested, especially the error
>>paths.I have seen some that introduced issues in error scenarios with
>>pointers being non-null.
>>
>>
> You raise an interesting issue. Why are free'd memory pointers are not
> NULL'ified? Why does FREE() set ptr = (void *)0x and not NULL?
> This is a potential cause for failure. A re-occuring FREE(NULL) is
> harmless. A FREE(0x) is a bit more problematic.
>
>
>>1.
>>2. As Raghavendra said, the code assumes that certain elements will
>>be initialized to null/zero and changing that can have consequences which
>>are not immediately obvious. I think it might be worthwhile to go through
>>the already merged calloc->malloc patches to check error paths and so on 
>> to
>>see if they are safe.
>>
>>
> Agreed.
>
>>
>>1.
>>2. Making such changes to the libglusterfs code while we are
>>currently working to stabilize the product is not a good idea. The patches
>>take time to review and any errors introduced in the core pieces affect 
>> all
>>processes and require significant effort to debug.
>>
>>
> Let me know when we consider the project stable. I'd argue the way to
> stabilize it is not stop improving it, but improving its testing. From more
> tests to cover more code via more tests to more static analysis coverage,
> to ensuring CI is rock-solid (inc. random errors that pop up from time to
> time). Not accepting patches to master is not the right approach, unless
> it's time-boxed somehow. If it is, then it means we don't trust our CI
> enough, btw.
>
>
>>1.
>>
>> Yaniv, while the example you provided might make sense to change to
>> malloc, a lot of the other changes, in my opinion, do not for the effort
>> required. For performance testing, smallfile might be a useful tool to see
>> if any of the changes make a difference. That said, I am reluctant to take
>> in patches that change core code significantly without being tested or
>> providing proof of benefits.
>>
>
> Smallfile is part of CI? I am happy to see it documented @
> 

Re: [Gluster-devel] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Yaniv Kaul
On Thu, Mar 21, 2019 at 12:45 PM Atin Mukherjee  wrote:

> All,
>
> In the last few releases of glusterfs, with stability as a primary theme
> of the releases, there has been lots of changes done on the code
> optimization with an expectation that such changes will have gluster to
> provide better performance. While many of these changes do help, but off
> late we have started seeing some diverse effects of them, one especially
> being the calloc to malloc conversions. While I do understand that malloc
> syscall will eliminate the extra memset bottleneck which calloc bears, but
> with recent kernels having in-built strong compiler optimizations I am not
> sure whether that makes any significant difference, but as I mentioned
> earlier certainly if this isn't done carefully it can potentially introduce
> lot of bugs and I'm writing this email to share one of such experiences.
>
> Sanju & I were having troubles for last two days to figure out why
> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
> Sanju's system but it had no problems running the same fix in my gluster
> containers. After spending a significant amount of time, what we now
> figured out is that a malloc call [1] (which was a calloc earlier) is the
> culprit here. As you all can see, in this function we allocate txn_id and
> copy the event->txn_id into it through gf_uuid_copy () . But when we were
> debugging this step wise through gdb, txn_id wasn't exactly copied with the
> exact event->txn_id and it had some junk values which made the
> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
> resulting the leaks to remain the same which was the original intention of
> the fix.
>

- I'm not sure I understand what 'wasn't exactly copied' means? It either
copied or did not copy the event->txn_id ? Is event->txn_id not fully
populated somehow?
- This is a regression caused by 81cbbfd1d870bea49b8aafe7bebb9e8251190918
which I introduced in August 4th, and we are just now discovering it. This
is not good.
Without looking, I assume almost all CALLOC->MALLOC changes are done on
positive paths of the code, which means it's not tested well.
This file, while having a low code coverage, seems to be covered[1], so I'm
not sure how we are finding this now?

>
> This was quite painful to debug and we had to spend some time to figure
> this out. Considering we have converted many such calls in past, I'd urge
> that we review all such conversions and see if there're any side effects to
> it. Otherwise we might end up running into many potential memory related
> bugs later on. OTOH, going forward I'd request every patch
> owners/maintainers to pay some special attention to these conversions and
> see they are really beneficial and error free. IMO, general guideline
> should be - for bigger buffers, malloc would make better sense but has to
> be done carefully, for smaller size, we stick to calloc.
>
> What do others think about it?
>

I think I might have been aggressive with the changes, but I do feel they
are important in some areas where it makes sense. For example:
libglusterfs/src/inode.c :
 new->inode_hash = (void *)GF_CALLOC(*65536, sizeof(struct list_head)*,
gf_common_mt_list_head);
if (!new->inode_hash)
goto out;

new->name_hash = (void *)GF_CALLOC(*new->hashsize, sizeof(struct
list_head)*,
   gf_common_mt_list_head);
if (!new->name_hash)
goto out;


And just few lines later:
for (i = 0; i < *65536*; i++) {
INIT_LIST_HEAD(>inode_hash[i]);
}

for (i = 0; i < *new->hashsize*; i++) {
INIT_LIST_HEAD(>name_hash[i]);
}


So this is really a waste of cycles for no good reason. I agree not every
CALLOC is worth converting.

One more note, I'd love to be able to measure the effect. But there's no CI
job with benchmarks, inc. CPU and memory consumption, which we can evaluate
the changes.

And lastly, we need better performance. We need better scalability. We are
not keeping up with HW advancements (especially NVMe, pmem and such) and
(just like other storage stacks!) becoming somewhat of a performance
bottleneck.
Y.

>
> [1]
> https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-op-sm.c#L5681
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Network Block device (NBD) on top of glusterfs

2019-03-21 Thread Prasanna Kalever
On Thu, Mar 21, 2019 at 6:31 PM Xiubo Li  wrote:

> On 2019/3/21 18:09, Prasanna Kalever wrote:
>
>
>
> On Thu, Mar 21, 2019 at 9:00 AM Xiubo Li  wrote:
>
>> All,
>>
>> I am one of the contributor for gluster-block
>> [1] project, and also I
>> contribute to linux kernel and open-iscsi 
>> project.[2]
>>
>> NBD was around for some time, but in recent time, linux kernel’s Network
>> Block Device (NBD) is enhanced and made to work with more devices and also
>> the option to integrate with netlink is added. So, I tried to provide a
>> glusterfs client based NBD driver recently. Please refer github issue
>> #633 [3], and good news
>> is I have a working code, with most basic things @ nbd-runner project
>> [4].
>>
>> While this email is about announcing the project, and asking for more
>> collaboration, I would also like to discuss more about the placement of the
>> project itself. Currently nbd-runner project is expected to be shared by
>> our friends at Ceph project too, to provide NBD driver for Ceph. I have
>> personally worked with some of them closely while contributing to
>> open-iSCSI project, and we would like to take this project to great success.
>>
>> Now few questions:
>>
>>1. Can I continue to use http://github.com/gluster/nbd-runner as home
>>for this project, even if its shared by other filesystem projects?
>>
>>
>>- I personally am fine with this.
>>
>>
>>1. Should there be a separate organization for this repo?
>>
>>
>>- While it may make sense in future, for now, I am not planning to
>>start any new thing?
>>
>> It would be great if we have some consensus on this soon as nbd-runner is
>> a new repository. If there are no concerns, I will continue to contribute
>> to the existing repository.
>>
>
> Thanks Xiubo Li, for finally sending this email out. Since this email is
> out on gluster mailing list, I would like to take a stand from gluster
> community point of view *only* and share my views.
>
> My honest answer is "If we want to maintain this within gluster org, then
> 80% of the effort is common/duplicate of what we did all these days with
> gluster-block",
>
> The great idea came from Mike Christie days ago and the nbd-runner
> project's framework is initially emulated from tcmu-runner. This is why I
> name this project as nbd-runner, which will work for all the other
> Distributed Storages, such as Gluster/Ceph/Azure, as discussed with Mike
> before.
>
> nbd-runner(NBD proto) and tcmu-runner(iSCSI proto) are almost the same and
> both are working as lower IO(READ/WRITE/...) stuff, not the management
> layer like ceph-iscsi-gateway and gluster-block currently do.
>
> Currently since I only implemented the Gluster handler and also using the
> RPC like glusterfs and gluster-block, most of the other code (about 70%) in
> nbd-runner are for the NBD proto and these are very different from
> tcmu-runner/glusterfs/gluster-block projects, and there are many new
> features in NBD module that not yet supported and then there will be more
> different in future.
>
> The framework coding has been done and the nbd-runner project is already
> stable and could already work well for me now.
>
> like:
> * rpc/socket code
> * cli/daemon parser/helper logics
> * gfapi util functions
> * logger framework
> * inotify & dyn-config threads
>
> Yeah, these features were initially from tcmu-runner project, Mike and I
> coded two years ago. Currently nbd-runner also has copied them from
> tcmu-runner.
>

I don't think tcmu-runner has any of,

-> cli/daemon approach routines
-> rpc low-level clnt/svc routines
-> gfapi level file create/delete util functions
-> Json parser support
-> socket bound/listener related functionalities
-> autoMake build frame-work, and
-> many other maintenance files

I actually can go in detail and furnish a long list of reference made here
and you cannot deny the fact, but its **all okay** to take references from
other alike projects. But my intention was not to point about the copy made
here, but rather saying we are just wasting our efforts rewriting,
copy-pasting, maintaining and fixing the same functionality framework.

Again all I'm trying to make is, if at all you want to maintain nbd client
as part of gluster.org, why not use gluster-block itself ? which is well
tested and stable enough.

Apart from all the examples I have mentioned in my previous thread, there
are other great advantages from user perspective as-well, like:

* The top layers such as heketi consuming gluster's block storage really
don't have to care whether the backend provider is tcmu-runner or
nbd-runner or qemu-tcmu or kernel loopback or fileIO or something else ...
They simply call gluster-block and get a block device out there.

* We can reuse the existing gluster-block's rest api interface too.


** Believe me, over the years I 

Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Sankarshan Mukhopadhyay
On Thu, Mar 21, 2019 at 9:24 PM Yaniv Kaul  wrote:

>> Smallfile is part of CI? I am happy to see it documented @ 
>> https://docs.gluster.org/en/latest/Administrator%20Guide/Performance%20Testing/#smallfile-distributed-io-benchmark
>>  , so at least one can know how to execute it manually.
>
>
> Following up the above link to the smallfile repo leads to 404 (I'm assuming 
> we don't have a link checker running on our documentation, so it can break 
> from time to time?)

Hmm... that needs to be addressed.

> I assume it's https://github.com/distributed-system-analysis/smallfile ?

Yes.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Nithya Balachandran
On Thu, 21 Mar 2019 at 21:14, Yaniv Kaul  wrote:

>
>
> On Thu, Mar 21, 2019 at 5:23 PM Nithya Balachandran 
> wrote:
>
>>
>>
>> On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee  wrote:
>>
>>> All,
>>>
>>> In the last few releases of glusterfs, with stability as a primary theme
>>> of the releases, there has been lots of changes done on the code
>>> optimization with an expectation that such changes will have gluster to
>>> provide better performance. While many of these changes do help, but off
>>> late we have started seeing some diverse effects of them, one especially
>>> being the calloc to malloc conversions. While I do understand that malloc
>>> syscall will eliminate the extra memset bottleneck which calloc bears, but
>>> with recent kernels having in-built strong compiler optimizations I am not
>>> sure whether that makes any significant difference, but as I mentioned
>>> earlier certainly if this isn't done carefully it can potentially introduce
>>> lot of bugs and I'm writing this email to share one of such experiences.
>>>
>>> Sanju & I were having troubles for last two days to figure out why
>>> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
>>> Sanju's system but it had no problems running the same fix in my gluster
>>> containers. After spending a significant amount of time, what we now
>>> figured out is that a malloc call [1] (which was a calloc earlier) is the
>>> culprit here. As you all can see, in this function we allocate txn_id and
>>> copy the event->txn_id into it through gf_uuid_copy () . But when we were
>>> debugging this step wise through gdb, txn_id wasn't exactly copied with the
>>> exact event->txn_id and it had some junk values which made the
>>> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
>>> resulting the leaks to remain the same which was the original intention of
>>> the fix.
>>>
>>> This was quite painful to debug and we had to spend some time to figure
>>> this out. Considering we have converted many such calls in past, I'd urge
>>> that we review all such conversions and see if there're any side effects to
>>> it. Otherwise we might end up running into many potential memory related
>>> bugs later on. OTOH, going forward I'd request every patch
>>> owners/maintainers to pay some special attention to these conversions and
>>> see they are really beneficial and error free. IMO, general guideline
>>> should be - for bigger buffers, malloc would make better sense but has to
>>> be done carefully, for smaller size, we stick to calloc.
>>>
>>
>>> What do others think about it?
>>>
>>
>> I believe that replacing calloc with malloc everywhere without adequate
>> testing and review is not safe and am against doing so for the following
>> reasons:
>>
>
> No patch should get in without adequate testing and thorough review.
>

>>1. Most of these patches have not been tested, especially the error
>>paths.I have seen some that introduced issues in error scenarios with
>>pointers being non-null.
>>
>>
> You raise an interesting issue. Why are free'd memory pointers are not
> NULL'ified? Why does FREE() set ptr = (void *)0x and not NULL?
>
The problem I'm referring to here is in error paths  when incompletely
initialised structures are cleaned up. A non-null never allocated pointer
will be attempted to be freed.

This is a potential cause for failure. A re-occuring FREE(NULL) is
> harmless. A FREE(0x) is a bit more problematic.
>
>
>>1.
>>2. As Raghavendra said, the code assumes that certain elements will
>>be initialized to null/zero and changing that can have consequences which
>>are not immediately obvious. I think it might be worthwhile to go through
>>the already merged calloc->malloc patches to check error paths and so on 
>> to
>>see if they are safe.
>>
>>
> Agreed.
>
>>
>>1.
>>2. Making such changes to the libglusterfs code while we are
>>currently working to stabilize the product is not a good idea. The patches
>>take time to review and any errors introduced in the core pieces affect 
>> all
>>processes and require significant effort to debug.
>>
>>
> Let me know when we consider the project stable. I'd argue the way to
> stabilize it is not stop improving it, but improving its testing. From more
> tests to cover more code via more tests to more static analysis coverage,
> to ensuring CI is rock-solid (inc. random errors that pop up from time to
> time).
>

Agreed. We need better CI coverage. More patches there would be very
welcome.


> Not accepting patches to master is not the right approach, unless it's
> time-boxed somehow. If it is, then it means we don't trust our CI enough,
> btw.
>

We are not blocking patches to master. We are raising concerns about
patches which are likely to impact code stability (such as the malloc
patches which have introduced issues in some cases) but require
considerable effort to review or test. A patch that solves a known