Considering, we have the effort to reduce the threads in progress, should we mark it as known issue till we get the other reduced threads patch merged?
-Amar On Thu, Dec 20, 2018 at 2:38 PM Poornima Gurusiddaiah <pguru...@redhat.com> wrote: > So, this failure is related to patch [1] iobuf. Thanks to Pranith for > identifying this. This patch increases the memory consumption in the brick > mux use case(**) and causes oom kill. But it is not the problem with the > patch itself. The only way to rightly fix it is to fix the issue [2]. That > said we cannot wait until this issue is fixed, the possible work arounds > are: > - Reduce the volume creation count in test case mpx-restart-crash.t > (temporarily until [2] is fixed) > - Increase the resources(RAM to 4G?) on the regression system > - Revert the patch until [2] is completely fixed > > Root Cause: > Without the iobuf patch [1], we had a pre allocated pool of min size > 12.5MB(which can grow), in many cases this entire size may not be > completely used. Hence we moved to per thread mem pool for iobuf as well. > With this we expect the memory consumption of the processes to go down, and > it did go down.After creating 20 volumes on the system, the free -m output: > With this patch: > total used free shared buff/cache > available > Mem: 3789 2198 290 249 1300 > 968 > Swap: 3071 0 3071 > > Without this patch: > total used free shared buff/cache > available > Mem: 3789 2280 115 488 > 1393 647 > Swap: 3071 0 3071 > This output can vary based on system state, workload etc. This is not > indicative of the exact amount of memory reduction, but of the fact that > the memory usage is reduced. > > But, with brick mux the scenario is different. Since we use per thread mem > pool for iobuf in patch [1], the memory consumption due to iobuf increases > if the threads increases. In the current brick mux implementation, for 20 > volumes(in the mpx-restart-crash test), the number of threads is 1439. And > the allocated iobufs(or any other per thread mem pool memory) doesn't get > freed until 30s(garbage collection time) of issuing free(eg: iobuf_put). As > a result of this the memory consumption of the process appears to increase > for brick mux. Reducing the number of threads to <100 [2] will solve this > issue. To prove this theory, if we add 30sec delay between each volume > create in mpx-restart-crash, the mem consumption is: > > With this patch after adding 30s delay between create volume: > total used free shared buff/cache > available > Mem: 3789 1344 947 488 1497 > 1606 > Swap: 3071 0 3071 > > With this patch: > total used free shared > buff/cache available > Mem: 3789 1710 840 235 1238 > 1494 > Swap: 3071 0 3071 > > Without this patch: > total used free shared buff/cache > available > Mem: 3789 1413 969 355 1406 > 1668 > Swap: 3071 0 3071 > > Regards, > Poornima > > [1] https://review.gluster.org/#/c/glusterfs/+/20362/ > [2] https://github.com/gluster/glusterfs/issues/475 > > On Thu, Dec 20, 2018 at 10:28 AM Amar Tumballi <atumb...@redhat.com> > wrote: > >> Since yesterday at least 10+ patches have failed regression on >> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t >> >> >> Help to debug them soon would be appreciated. >> >> >> Regards, >> >> Amar >> >> >> -- >> Amar Tumballi (amarts) >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-devel > > -- Amar Tumballi (amarts)
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel