Re: [Gluster-devel] too many failures on mpx-restart-crash.t on master branch

2018-12-20 Thread Amar Tumballi
Merged the patch from Mohit
https://review.gluster.org/#/c/glusterfs/+/21898/ is now merged. The issue
is not completely *fixed*, but is RCA'd for the memory consumption. We are
working on fixing the issues, meantime, to unblock the currently pending
patches, the above merge helps.

Please rebase your patches to latest master to pass through the regression.


Regards,
Amar

On Thu 20 Dec, 2018, 2:47 PM Poornima Gurusiddaiah  Yeah, but Pranith mentioned that the issue is seen even without the iobuf
> patch, so the test may fail even after fixing the thread count? Hence
> reducing the volume count as suggested may be a better option.
>
> Regards,
> Poornima
>
> On Thu, Dec 20, 2018 at 2:41 PM Amar Tumballi  wrote:
>
>> Considering, we have the effort to reduce the threads in progress, should
>> we mark it as known issue till we get the other reduced threads patch
>> merged?
>>
>> -Amar
>>
>> On Thu, Dec 20, 2018 at 2:38 PM Poornima Gurusiddaiah <
>> pguru...@redhat.com> wrote:
>>
>>> So, this failure is related to patch [1] iobuf. Thanks to Pranith for
>>> identifying this. This patch increases the memory consumption in the brick
>>> mux use case(**) and causes oom kill. But it is not the problem with the
>>> patch itself. The only way to rightly fix it is to fix the issue [2]. That
>>> said we cannot wait until this issue is fixed, the possible work arounds
>>> are:
>>> - Reduce the volume creation count in test case mpx-restart-crash.t
>>> (temporarily until [2] is fixed)
>>> - Increase the resources(RAM to 4G?) on the regression system
>>> - Revert the patch until [2] is completely fixed
>>>
>>> Root Cause:
>>> Without the iobuf patch [1], we had a pre allocated pool of min size
>>> 12.5MB(which can grow), in many cases this entire size may not be
>>> completely used. Hence we moved to per thread mem pool for iobuf as well.
>>> With this we expect the memory consumption of the processes to go down, and
>>> it did go down.After creating 20 volumes on the system, the free -m output:
>>> With this patch:
>>>totalusedfree  shared
>>> buff/cache   available
>>> Mem:   37892198   290 249
>>> 1300 968
>>> Swap:  3071   0 3071
>>>
>>> Without this patch:
>>>totalusedfree  shared
>>> buff/cache   available
>>> Mem:   37892280 115 488
>>> 1393 647
>>> Swap:  3071   0   3071
>>> This output can vary based on system state, workload etc. This is not
>>> indicative of the exact amount of memory reduction, but of the fact that
>>> the memory usage is reduced.
>>>
>>> But, with brick mux the scenario is different. Since we use per thread
>>> mem pool for iobuf in patch [1], the memory consumption due to iobuf
>>> increases if the threads increases. In the current brick mux
>>> implementation, for 20 volumes(in the mpx-restart-crash test), the number
>>> of threads is 1439. And the allocated iobufs(or any other per thread mem
>>> pool memory) doesn't get freed until 30s(garbage collection time) of
>>> issuing free(eg: iobuf_put). As a result of this the memory consumption of
>>> the process appears to increase for brick mux. Reducing the number of
>>> threads to <100 [2] will solve this issue. To prove this theory, if we add
>>> 30sec delay between each volume create in mpx-restart-crash, the mem
>>> consumption is:
>>>
>>> With this patch after adding 30s delay between create volume:
>>>totalused   free  shared
>>> buff/cache   available
>>> Mem:   37891344  947 4881497
>>> 1606
>>> Swap:  3071   03071
>>>
>>> With this patch:
>>> totalusedfree  shared
>>> buff/cache   available
>>> Mem:   37891710 840 235
>>> 12381494
>>> Swap:  3071   0   3071
>>>
>>> Without this patch:
>>>totalusedfree  shared
>>> buff/cache   available
>>> Mem:   37891413  969 3551406
>>> 1668
>>> Swap:  3071   03071
>>>
>>> Regards,
>>> Poornima
>>>
>>> [1] https://review.gluster.org/#/c/glusterfs/+/20362/
>>> [2] https://github.com/gluster/glusterfs/issues/475
>>>
>>> On Thu, Dec 20, 2018 at 10:28 AM Amar Tumballi 
>>> wrote:
>>>
 Since yesterday at least 10+ patches have failed regression on 
 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t


 Help to debug them soon would be appreciated.


 Regards,

 Amar


 --
 Amar Tumballi (amarts)
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>>
>> --
>> Amar Tumballi (amarts)
>>
>
_

Re: [Gluster-devel] too many failures on mpx-restart-crash.t on master branch

2018-12-20 Thread Poornima Gurusiddaiah
Yeah, but Pranith mentioned that the issue is seen even without the iobuf
patch, so the test may fail even after fixing the thread count? Hence
reducing the volume count as suggested may be a better option.

Regards,
Poornima

On Thu, Dec 20, 2018 at 2:41 PM Amar Tumballi  wrote:

> Considering, we have the effort to reduce the threads in progress, should
> we mark it as known issue till we get the other reduced threads patch
> merged?
>
> -Amar
>
> On Thu, Dec 20, 2018 at 2:38 PM Poornima Gurusiddaiah 
> wrote:
>
>> So, this failure is related to patch [1] iobuf. Thanks to Pranith for
>> identifying this. This patch increases the memory consumption in the brick
>> mux use case(**) and causes oom kill. But it is not the problem with the
>> patch itself. The only way to rightly fix it is to fix the issue [2]. That
>> said we cannot wait until this issue is fixed, the possible work arounds
>> are:
>> - Reduce the volume creation count in test case mpx-restart-crash.t
>> (temporarily until [2] is fixed)
>> - Increase the resources(RAM to 4G?) on the regression system
>> - Revert the patch until [2] is completely fixed
>>
>> Root Cause:
>> Without the iobuf patch [1], we had a pre allocated pool of min size
>> 12.5MB(which can grow), in many cases this entire size may not be
>> completely used. Hence we moved to per thread mem pool for iobuf as well.
>> With this we expect the memory consumption of the processes to go down, and
>> it did go down.After creating 20 volumes on the system, the free -m output:
>> With this patch:
>>totalusedfree  shared
>> buff/cache   available
>> Mem:   37892198   290 2491300
>> 968
>> Swap:  3071   0 3071
>>
>> Without this patch:
>>totalusedfree  shared
>> buff/cache   available
>> Mem:   37892280 115 488
>> 1393 647
>> Swap:  3071   0   3071
>> This output can vary based on system state, workload etc. This is not
>> indicative of the exact amount of memory reduction, but of the fact that
>> the memory usage is reduced.
>>
>> But, with brick mux the scenario is different. Since we use per thread
>> mem pool for iobuf in patch [1], the memory consumption due to iobuf
>> increases if the threads increases. In the current brick mux
>> implementation, for 20 volumes(in the mpx-restart-crash test), the number
>> of threads is 1439. And the allocated iobufs(or any other per thread mem
>> pool memory) doesn't get freed until 30s(garbage collection time) of
>> issuing free(eg: iobuf_put). As a result of this the memory consumption of
>> the process appears to increase for brick mux. Reducing the number of
>> threads to <100 [2] will solve this issue. To prove this theory, if we add
>> 30sec delay between each volume create in mpx-restart-crash, the mem
>> consumption is:
>>
>> With this patch after adding 30s delay between create volume:
>>totalused   free  shared  buff/cache
>> available
>> Mem:   37891344  947 4881497
>> 1606
>> Swap:  3071   03071
>>
>> With this patch:
>> totalusedfree  shared
>> buff/cache   available
>> Mem:   37891710 840 235
>> 12381494
>> Swap:  3071   0   3071
>>
>> Without this patch:
>>totalusedfree  shared
>> buff/cache   available
>> Mem:   37891413  969 3551406
>> 1668
>> Swap:  3071   03071
>>
>> Regards,
>> Poornima
>>
>> [1] https://review.gluster.org/#/c/glusterfs/+/20362/
>> [2] https://github.com/gluster/glusterfs/issues/475
>>
>> On Thu, Dec 20, 2018 at 10:28 AM Amar Tumballi 
>> wrote:
>>
>>> Since yesterday at least 10+ patches have failed regression on 
>>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>>
>>>
>>> Help to debug them soon would be appreciated.
>>>
>>>
>>> Regards,
>>>
>>> Amar
>>>
>>>
>>> --
>>> Amar Tumballi (amarts)
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>
> --
> Amar Tumballi (amarts)
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] too many failures on mpx-restart-crash.t on master branch

2018-12-20 Thread Amar Tumballi
Considering, we have the effort to reduce the threads in progress, should
we mark it as known issue till we get the other reduced threads patch
merged?

-Amar

On Thu, Dec 20, 2018 at 2:38 PM Poornima Gurusiddaiah 
wrote:

> So, this failure is related to patch [1] iobuf. Thanks to Pranith for
> identifying this. This patch increases the memory consumption in the brick
> mux use case(**) and causes oom kill. But it is not the problem with the
> patch itself. The only way to rightly fix it is to fix the issue [2]. That
> said we cannot wait until this issue is fixed, the possible work arounds
> are:
> - Reduce the volume creation count in test case mpx-restart-crash.t
> (temporarily until [2] is fixed)
> - Increase the resources(RAM to 4G?) on the regression system
> - Revert the patch until [2] is completely fixed
>
> Root Cause:
> Without the iobuf patch [1], we had a pre allocated pool of min size
> 12.5MB(which can grow), in many cases this entire size may not be
> completely used. Hence we moved to per thread mem pool for iobuf as well.
> With this we expect the memory consumption of the processes to go down, and
> it did go down.After creating 20 volumes on the system, the free -m output:
> With this patch:
>totalusedfree  shared  buff/cache
> available
> Mem:   37892198   290 2491300
> 968
> Swap:  3071   0 3071
>
> Without this patch:
>totalusedfree  shared  buff/cache
> available
> Mem:   37892280 115 488
> 1393 647
> Swap:  3071   0   3071
> This output can vary based on system state, workload etc. This is not
> indicative of the exact amount of memory reduction, but of the fact that
> the memory usage is reduced.
>
> But, with brick mux the scenario is different. Since we use per thread mem
> pool for iobuf in patch [1], the memory consumption due to iobuf increases
> if the threads increases. In the current brick mux implementation, for 20
> volumes(in the mpx-restart-crash test), the number of threads is 1439. And
> the allocated iobufs(or any other per thread mem pool memory) doesn't get
> freed until 30s(garbage collection time) of issuing free(eg: iobuf_put). As
> a result of this the memory consumption of the process appears to increase
> for brick mux. Reducing the number of threads to <100 [2] will solve this
> issue. To prove this theory, if we add 30sec delay between each volume
> create in mpx-restart-crash, the mem consumption is:
>
> With this patch after adding 30s delay between create volume:
>totalused   free  shared  buff/cache
> available
> Mem:   37891344  947 4881497
> 1606
> Swap:  3071   03071
>
> With this patch:
> totalusedfree  shared
> buff/cache   available
> Mem:   37891710 840 2351238
> 1494
> Swap:  3071   0   3071
>
> Without this patch:
>totalusedfree  shared  buff/cache
> available
> Mem:   37891413  969 3551406
> 1668
> Swap:  3071   03071
>
> Regards,
> Poornima
>
> [1] https://review.gluster.org/#/c/glusterfs/+/20362/
> [2] https://github.com/gluster/glusterfs/issues/475
>
> On Thu, Dec 20, 2018 at 10:28 AM Amar Tumballi 
> wrote:
>
>> Since yesterday at least 10+ patches have failed regression on 
>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>
>>
>> Help to debug them soon would be appreciated.
>>
>>
>> Regards,
>>
>> Amar
>>
>>
>> --
>> Amar Tumballi (amarts)
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>

-- 
Amar Tumballi (amarts)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] too many failures on mpx-restart-crash.t on master branch

2018-12-20 Thread Poornima Gurusiddaiah
So, this failure is related to patch [1] iobuf. Thanks to Pranith for
identifying this. This patch increases the memory consumption in the brick
mux use case(**) and causes oom kill. But it is not the problem with the
patch itself. The only way to rightly fix it is to fix the issue [2]. That
said we cannot wait until this issue is fixed, the possible work arounds
are:
- Reduce the volume creation count in test case mpx-restart-crash.t
(temporarily until [2] is fixed)
- Increase the resources(RAM to 4G?) on the regression system
- Revert the patch until [2] is completely fixed

Root Cause:
Without the iobuf patch [1], we had a pre allocated pool of min size
12.5MB(which can grow), in many cases this entire size may not be
completely used. Hence we moved to per thread mem pool for iobuf as well.
With this we expect the memory consumption of the processes to go down, and
it did go down.After creating 20 volumes on the system, the free -m output:
With this patch:
   totalusedfree  shared  buff/cache
available
Mem:   37892198   290 2491300
968
Swap:  3071   0 3071

Without this patch:
   totalusedfree  shared  buff/cache
available
Mem:   37892280 115 4881393
647
Swap:  3071   0   3071
This output can vary based on system state, workload etc. This is not
indicative of the exact amount of memory reduction, but of the fact that
the memory usage is reduced.

But, with brick mux the scenario is different. Since we use per thread mem
pool for iobuf in patch [1], the memory consumption due to iobuf increases
if the threads increases. In the current brick mux implementation, for 20
volumes(in the mpx-restart-crash test), the number of threads is 1439. And
the allocated iobufs(or any other per thread mem pool memory) doesn't get
freed until 30s(garbage collection time) of issuing free(eg: iobuf_put). As
a result of this the memory consumption of the process appears to increase
for brick mux. Reducing the number of threads to <100 [2] will solve this
issue. To prove this theory, if we add 30sec delay between each volume
create in mpx-restart-crash, the mem consumption is:

With this patch after adding 30s delay between create volume:
   totalused   free  shared  buff/cache
available
Mem:   37891344  947 48814971606
Swap:  3071   03071

With this patch:
totalusedfree  shared  buff/cache
available
Mem:   37891710 840 2351238
1494
Swap:  3071   0   3071

Without this patch:
   totalusedfree  shared  buff/cache
available
Mem:   37891413  969 35514061668
Swap:  3071   03071

Regards,
Poornima

[1] https://review.gluster.org/#/c/glusterfs/+/20362/
[2] https://github.com/gluster/glusterfs/issues/475

On Thu, Dec 20, 2018 at 10:28 AM Amar Tumballi  wrote:

> Since yesterday at least 10+ patches have failed regression on 
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>
>
> Help to debug them soon would be appreciated.
>
>
> Regards,
>
> Amar
>
>
> --
> Amar Tumballi (amarts)
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] too many failures on mpx-restart-crash.t on master branch

2018-12-19 Thread Amar Tumballi
Since yesterday at least 10+ patches have failed regression on
./tests/bugs/core/bug-1432542-mpx-restart-crash.t


Help to debug them soon would be appreciated.


Regards,

Amar


-- 
Amar Tumballi (amarts)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel