Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread huting3






I have tried this, but the size of the file is huge and the mail was sent back. I uploaded the file to google drive and the link is:https://drive.google.com/file/d/1ZlttTzt4E56Qtk9j7b4I9GkZC2W3mJgp/view?usp=sharing



   










huting3







huti...@corp.netease.com








签名由
网易邮箱大师
定制

  



On 08/9/2018 13:48,Nithya Balachandran wrote: 


Is it possible for you to send us the statedump file? It will be easier than going back and forth over emails.Thanks,NithyaOn 9 August 2018 at 09:25, huting3  wrote:







Yes, I got the dump file and found there are many huge num_allocs just like following:I found memusage of 4 variable types are extreamly huge. [protocol/client.gv0-client-0 - usage-type gf_common_mt_char memusage]size=47202352num_allocs=2030212max_size=47203074max_num_allocs=2030235total_allocs=26892201[protocol/client.gv0-client-0 - usage-type gf_common_mt_memdup memusage]size=24362448num_allocs=2030204max_size=24367560max_num_allocs=2030226total_allocs=17830860[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]size=2497947552num_allocs=4578229max_size=2459135680max_num_allocs=7123206total_allocs=41635232[mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]size=4038730976num_allocs=1max_size=4294962264max_num_allocs=37total_allocs=150049981​
huting3
huti...@corp.netease.com


签名由
网易邮箱大师
定制

  



On 08/9/2018 11:36,Raghavendra Gowdappa wrote: 


On Thu, Aug 9, 2018 at 8:55 AM, huting3  wrote:







Hi expert:I meet a problem when I use glusterfs. The problem is that the fuse client consumes huge memory when write a   lot of files(>million) to the gluster, at last leading to killed by OS oom. The memory the fuse process consumes can up to 100G! I wonder if there are memory leaks in the gluster fuse process, or some other causes.Can you get statedump of fuse process consuming huge memory? My gluster version is 3.13.2, the gluster volume info is listed as following:Volume Name: gv0Type: Distributed-ReplicateVolume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0Status: StartedSnapshot Count: 0Number of Bricks: 19 x 3 = 57Transport-type: tcpBricks:Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0Brick3: dl22.dg.163.org:/glusterfs_brick/brick1/gv0Brick4: dl20.dg.163.org:/glusterfs_brick/brick2/gv0Brick5: dl21.dg.163.org:/glusterfs_brick/brick2/gv0Brick6: dl22.dg.163.org:/glusterfs_brick/brick2/gv0Brick7: dl20.dg.163.org:/glusterfs_brick/brick3/gv0Brick8: dl21.dg.163.org:/glusterfs_brick/brick3/gv0Brick9: dl22.dg.163.org:/glusterfs_brick/brick3/gv0Brick10: dl23.dg.163.org:/glusterfs_brick/brick1/gv0Brick11: dl24.dg.163.org:/glusterfs_brick/brick1/gv0Brick12: dl25.dg.163.org:/glusterfs_brick/brick1/gv0Brick13: dl26.dg.163.org:/glusterfs_brick/brick1/gv0Brick14: dl27.dg.163.org:/glusterfs_brick/brick1/gv0Brick15: dl28.dg.163.org:/glusterfs_brick/brick1/gv0Brick16: dl29.dg.163.org:/glusterfs_brick/brick1/gv0Brick17: dl30.dg.163.org:/glusterfs_brick/brick1/gv0Brick18: dl31.dg.163.org:/glusterfs_brick/brick1/gv0Brick19: dl32.dg.163.org:/glusterfs_brick/brick1/gv0Brick20: dl33.dg.163.org:/glusterfs_brick/brick1/gv0Brick21: dl34.dg.163.org:/glusterfs_brick/brick1/gv0Brick22: dl23.dg.163.org:/glusterfs_brick/brick2/gv0Brick23: dl24.dg.163.org:/glusterfs_brick/brick2/gv0Brick24: dl25.dg.163.org:/glusterfs_brick/brick2/gv0Brick25: dl26.dg.163.org:/glusterfs_brick/brick2/gv0Brick26: dl27.dg.163.org:/glusterfs_brick/brick2/gv0Brick27: dl28.dg.163.org:/glusterfs_brick/brick2/gv0Brick28: dl29.dg.163.org:/glusterfs_brick/brick2/gv0Brick29: dl30.dg.163.org:/glusterfs_brick/brick2/gv0Brick30: dl31.dg.163.org:/glusterfs_brick/brick2/gv0Brick31: dl32.dg.163.org:/glusterfs_brick/brick2/gv0Brick32: dl33.dg.163.org:/glusterfs_brick/brick2/gv0Brick33: dl34.dg.163.org:/glusterfs_brick/brick2/gv0Brick34: dl23.dg.163.org:/glusterfs_brick/brick3/gv0Brick35: dl24.dg.163.org:/glusterfs_brick/brick3/gv0Brick36: dl25.dg.163.org:/glusterfs_brick/brick3/gv0Brick37: dl26.dg.163.org:/glusterfs_brick/brick3/gv0Brick38: dl27.dg.163.org:/glusterfs_brick/brick3/gv0Brick39: dl28.dg.163.org:/glusterfs_brick/brick3/gv0Brick40: dl29.dg.163.org:/glusterfs_brick/brick3/gv0Brick41: 

Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread Nithya Balachandran
Is it possible for you to send us the statedump file? It will be easier
than going back and forth over emails.

Thanks,
Nithya

On 9 August 2018 at 09:25, huting3  wrote:

> Yes, I got the dump file and found there are many huge num_allocs just
> like following:
>
> I found memusage of 4 variable types are extreamly huge.
>
>  [protocol/client.gv0-client-0 - usage-type gf_common_mt_char memusage]
> size=47202352
> num_allocs=2030212
> max_size=47203074
> max_num_allocs=2030235
> total_allocs=26892201
>
> [protocol/client.gv0-client-0 - usage-type gf_common_mt_memdup memusage]
> size=24362448
> num_allocs=2030204
> max_size=24367560
> max_num_allocs=2030226
> total_allocs=17830860
>
> [mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
> size=2497947552
> num_allocs=4578229
> max_size=2459135680
> max_num_allocs=7123206
> total_allocs=41635232
>
> [mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
> size=4038730976
> num_allocs=1
> max_size=4294962264
> max_num_allocs=37
> total_allocs=150049981​
> 
>
>
>
> huting3
> huti...@corp.netease.com
>
> 
> 签名由 网易邮箱大师  定制
>
> On 08/9/2018 11:36,Raghavendra Gowdappa
>  wrote:
>
>
>
> On Thu, Aug 9, 2018 at 8:55 AM, huting3  wrote:
>
>> Hi expert:
>>
>> I meet a problem when I use glusterfs. The problem is that the fuse
>> client consumes huge memory when write a   lot of files(>million) to the
>> gluster, at last leading to killed by OS oom. The memory the fuse process
>> consumes can up to 100G! I wonder if there are memory leaks in the gluster
>> fuse process, or some other causes.
>>
>
> Can you get statedump of fuse process consuming huge memory?
>
>
>> My gluster version is 3.13.2, the gluster volume info is listed as
>> following:
>>
>> Volume Name: gv0
>> Type: Distributed-Replicate
>> Volume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 19 x 3 = 57
>> Transport-type: tcp
>> Bricks:
>> Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick3: dl22.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick4: dl20.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick5: dl21.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick6: dl22.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick7: dl20.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick8: dl21.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick9: dl22.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick10: dl23.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick11: dl24.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick12: dl25.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick13: dl26.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick14: dl27.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick15: dl28.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick16: dl29.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick17: dl30.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick18: dl31.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick19: dl32.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick20: dl33.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick21: dl34.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick22: dl23.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick23: dl24.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick24: dl25.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick25: dl26.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick26: dl27.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick27: dl28.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick28: dl29.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick29: dl30.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick30: dl31.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick31: dl32.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick32: dl33.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick33: dl34.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick34: dl23.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick35: dl24.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick36: dl25.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick37: dl26.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick38: dl27.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick39: dl28.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick40: dl29.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick41: dl30.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick42: dl31.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick43: dl32.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick44: 

Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread huting3






em, the data set is complicated. There are many big files as well as small files. There is about 50T data in gluster server, so I do not know how many files in the dataset exactly. Can the inode cache consume so huge memory? How can I limit the inode cache?ps:$ grep itable glusterdump.109182.dump.1533730324 | grep lru | wc -l191728When I dump the process info, the fuse process consumed about 30G memory.




   










huting3







huti...@corp.netease.com








签名由
网易邮箱大师
定制

  



On 08/9/2018 13:13,Raghavendra Gowdappa wrote: 


On Thu, Aug 9, 2018 at 10:36 AM, huting3  wrote:







grep count will ouput nothing, so I grep size, the results are:$ grep itable glusterdump.109182.dump.1533730324 | grep lru | grep sizexlator.mount.fuse.itable.lru_size=191726Kernel is holding too many inodes in its cache. What's the data set like? Do you've too many directories? How many files do you have? $ grep itable glusterdump.109182.dump.1533730324 | grep active | grep sizexlator.mount.fuse.itable.active_size=17
huting3
huti...@corp.netease.com


签名由
网易邮箱大师
定制

  



On 08/9/2018 12:36,Raghavendra Gowdappa wrote: 


Can you get the output of following cmds?# grep itable  | grep lru | grep count# grep itable  | grep active | grep countOn Thu, Aug 9, 2018 at 9:25 AM, huting3  wrote:







Yes, I got the dump file and found there are many huge num_allocs just like following:I found memusage of 4 variable types are extreamly huge. [protocol/client.gv0-client-0 - usage-type gf_common_mt_char memusage]size=47202352num_allocs=2030212max_size=47203074max_num_allocs=2030235total_allocs=26892201[protocol/client.gv0-client-0 - usage-type gf_common_mt_memdup memusage]size=24362448num_allocs=2030204max_size=24367560max_num_allocs=2030226total_allocs=17830860[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]size=2497947552num_allocs=4578229max_size=2459135680max_num_allocs=7123206total_allocs=41635232[mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]size=4038730976num_allocs=1max_size=4294962264max_num_allocs=37total_allocs=150049981
huting3
huti...@corp.netease.com


签名由
网易邮箱大师
定制

  



On 08/9/2018 11:36,Raghavendra Gowdappa wrote: 


On Thu, Aug 9, 2018 at 8:55 AM, huting3  wrote:







Hi expert:I meet a problem when I use glusterfs. The problem is that the fuse client consumes huge memory when write a   lot of files(>million) to the gluster, at last leading to killed by OS oom. The memory the fuse process consumes can up to 100G! I wonder if there are memory leaks in the gluster fuse process, or some other causes.Can you get statedump of fuse process consuming huge memory? My gluster version is 3.13.2, the gluster volume info is listed as following:Volume Name: gv0Type: Distributed-ReplicateVolume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0Status: StartedSnapshot Count: 0Number of Bricks: 19 x 3 = 57Transport-type: tcpBricks:Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0Brick3: dl22.dg.163.org:/glusterfs_brick/brick1/gv0Brick4: dl20.dg.163.org:/glusterfs_brick/brick2/gv0Brick5: dl21.dg.163.org:/glusterfs_brick/brick2/gv0Brick6: dl22.dg.163.org:/glusterfs_brick/brick2/gv0Brick7: dl20.dg.163.org:/glusterfs_brick/brick3/gv0Brick8: dl21.dg.163.org:/glusterfs_brick/brick3/gv0Brick9: dl22.dg.163.org:/glusterfs_brick/brick3/gv0Brick10: dl23.dg.163.org:/glusterfs_brick/brick1/gv0Brick11: dl24.dg.163.org:/glusterfs_brick/brick1/gv0Brick12: dl25.dg.163.org:/glusterfs_brick/brick1/gv0Brick13: dl26.dg.163.org:/glusterfs_brick/brick1/gv0Brick14: dl27.dg.163.org:/glusterfs_brick/brick1/gv0Brick15: dl28.dg.163.org:/glusterfs_brick/brick1/gv0Brick16: dl29.dg.163.org:/glusterfs_brick/brick1/gv0Brick17: dl30.dg.163.org:/glusterfs_brick/brick1/gv0Brick18: dl31.dg.163.org:/glusterfs_brick/brick1/gv0Brick19: dl32.dg.163.org:/glusterfs_brick/brick1/gv0Brick20: dl33.dg.163.org:/glusterfs_brick/brick1/gv0Brick21: dl34.dg.163.org:/glusterfs_brick/brick1/gv0Brick22: dl23.dg.163.org:/glusterfs_brick/brick2/gv0Brick23: dl24.dg.163.org:/glusterfs_brick/brick2/gv0Brick24: 

Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread Raghavendra Gowdappa
On Thu, Aug 9, 2018 at 10:43 AM, Raghavendra Gowdappa 
wrote:

>
>
> On Thu, Aug 9, 2018 at 10:36 AM, huting3  wrote:
>
>> grep count will ouput nothing, so I grep size, the results are:
>>
>> $ grep itable glusterdump.109182.dump.1533730324 | grep lru | grep size
>> xlator.mount.fuse.itable.lru_size=191726
>>
>
> Kernel is holding too many inodes in its cache. What's the data set like?
> Do you've too many directories? How many files do you have?
>

Just to be sure, can you give the output of following cmd too:

# grep itable  | grep lru | wc -l


>
>> $ grep itable glusterdump.109182.dump.1533730324 | grep active | grep
>> size
>> xlator.mount.fuse.itable.active_size=17
>>
>>
>> huting3
>> huti...@corp.netease.com
>>
>> 
>> 签名由 网易邮箱大师  定制
>>
>> On 08/9/2018 12:36,Raghavendra Gowdappa
>>  wrote:
>>
>> Can you get the output of following cmds?
>>
>> # grep itable  | grep lru | grep count
>>
>> # grep itable  | grep active | grep count
>>
>> On Thu, Aug 9, 2018 at 9:25 AM, huting3  wrote:
>>
>>> Yes, I got the dump file and found there are many huge num_allocs just
>>> like following:
>>>
>>> I found memusage of 4 variable types are extreamly huge.
>>>
>>>  [protocol/client.gv0-client-0 - usage-type gf_common_mt_char memusage]
>>> size=47202352
>>> num_allocs=2030212
>>> max_size=47203074
>>> max_num_allocs=2030235
>>> total_allocs=26892201
>>>
>>> [protocol/client.gv0-client-0 - usage-type gf_common_mt_memdup memusage]
>>> size=24362448
>>> num_allocs=2030204
>>> max_size=24367560
>>> max_num_allocs=2030226
>>> total_allocs=17830860
>>>
>>> [mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
>>> size=2497947552
>>> num_allocs=4578229
>>> max_size=2459135680
>>> max_num_allocs=7123206
>>> total_allocs=41635232
>>>
>>> [mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
>>> size=4038730976
>>> num_allocs=1
>>> max_size=4294962264
>>> max_num_allocs=37
>>> total_allocs=150049981
>>> 
>>>
>>>
>>>
>>> huting3
>>> huti...@corp.netease.com
>>>
>>> 
>>> 签名由 网易邮箱大师  定制
>>>
>>> On 08/9/2018 11:36,Raghavendra Gowdappa
>>>  wrote:
>>>
>>>
>>>
>>> On Thu, Aug 9, 2018 at 8:55 AM, huting3 
>>> wrote:
>>>
 Hi expert:

 I meet a problem when I use glusterfs. The problem is that the fuse
 client consumes huge memory when write a   lot of files(>million) to the
 gluster, at last leading to killed by OS oom. The memory the fuse process
 consumes can up to 100G! I wonder if there are memory leaks in the gluster
 fuse process, or some other causes.

>>>
>>> Can you get statedump of fuse process consuming huge memory?
>>>
>>>
 My gluster version is 3.13.2, the gluster volume info is listed as
 following:

 Volume Name: gv0
 Type: Distributed-Replicate
 Volume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0
 Status: Started
 Snapshot Count: 0
 Number of Bricks: 19 x 3 = 57
 Transport-type: tcp
 Bricks:
 Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0
 Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0
 Brick3: dl22.dg.163.org:/glusterfs_brick/brick1/gv0
 Brick4: dl20.dg.163.org:/glusterfs_brick/brick2/gv0
 Brick5: dl21.dg.163.org:/glusterfs_brick/brick2/gv0
 Brick6: dl22.dg.163.org:/glusterfs_brick/brick2/gv0
 Brick7: dl20.dg.163.org:/glusterfs_brick/brick3/gv0
 Brick8: dl21.dg.163.org:/glusterfs_brick/brick3/gv0
 Brick9: dl22.dg.163.org:/glusterfs_brick/brick3/gv0
 Brick10: dl23.dg.163.org:/glusterfs_brick/brick1/gv0
 Brick11: dl24.dg.163.org:/glusterfs_brick/brick1/gv0
 Brick12: dl25.dg.163.org:/glusterfs_brick/brick1/gv0
 Brick13: dl26.dg.163.org:/glusterfs_brick/brick1/gv0
 Brick14: dl27.dg.163.org:/glusterfs_brick/brick1/gv0
 Brick15: dl28.dg.163.org:/glusterfs_brick/brick1/gv0
 Brick16: dl29.dg.163.org:/glusterfs_brick/brick1/gv0
 Brick17: dl30.dg.163.org:/glusterfs_brick/brick1/gv0
 Brick18: dl31.dg.163.org:/glusterfs_brick/brick1/gv0
 Brick19: dl32.dg.163.org:/glusterfs_brick/brick1/gv0
 Brick20: 

Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread Raghavendra Gowdappa
On Thu, Aug 9, 2018 at 10:36 AM, huting3  wrote:

> grep count will ouput nothing, so I grep size, the results are:
>
> $ grep itable glusterdump.109182.dump.1533730324 | grep lru | grep size
> xlator.mount.fuse.itable.lru_size=191726
>

Kernel is holding too many inodes in its cache. What's the data set like?
Do you've too many directories? How many files do you have?


> $ grep itable glusterdump.109182.dump.1533730324 | grep active | grep size
> xlator.mount.fuse.itable.active_size=17
>
>
> huting3
> huti...@corp.netease.com
>
> 
> 签名由 网易邮箱大师  定制
>
> On 08/9/2018 12:36,Raghavendra Gowdappa
>  wrote:
>
> Can you get the output of following cmds?
>
> # grep itable  | grep lru | grep count
>
> # grep itable  | grep active | grep count
>
> On Thu, Aug 9, 2018 at 9:25 AM, huting3  wrote:
>
>> Yes, I got the dump file and found there are many huge num_allocs just
>> like following:
>>
>> I found memusage of 4 variable types are extreamly huge.
>>
>>  [protocol/client.gv0-client-0 - usage-type gf_common_mt_char memusage]
>> size=47202352
>> num_allocs=2030212
>> max_size=47203074
>> max_num_allocs=2030235
>> total_allocs=26892201
>>
>> [protocol/client.gv0-client-0 - usage-type gf_common_mt_memdup memusage]
>> size=24362448
>> num_allocs=2030204
>> max_size=24367560
>> max_num_allocs=2030226
>> total_allocs=17830860
>>
>> [mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
>> size=2497947552
>> num_allocs=4578229
>> max_size=2459135680
>> max_num_allocs=7123206
>> total_allocs=41635232
>>
>> [mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
>> size=4038730976
>> num_allocs=1
>> max_size=4294962264
>> max_num_allocs=37
>> total_allocs=150049981
>> 
>>
>>
>>
>> huting3
>> huti...@corp.netease.com
>>
>> 
>> 签名由 网易邮箱大师  定制
>>
>> On 08/9/2018 11:36,Raghavendra Gowdappa
>>  wrote:
>>
>>
>>
>> On Thu, Aug 9, 2018 at 8:55 AM, huting3  wrote:
>>
>>> Hi expert:
>>>
>>> I meet a problem when I use glusterfs. The problem is that the fuse
>>> client consumes huge memory when write a   lot of files(>million) to the
>>> gluster, at last leading to killed by OS oom. The memory the fuse process
>>> consumes can up to 100G! I wonder if there are memory leaks in the gluster
>>> fuse process, or some other causes.
>>>
>>
>> Can you get statedump of fuse process consuming huge memory?
>>
>>
>>> My gluster version is 3.13.2, the gluster volume info is listed as
>>> following:
>>>
>>> Volume Name: gv0
>>> Type: Distributed-Replicate
>>> Volume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 19 x 3 = 57
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick3: dl22.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick4: dl20.dg.163.org:/glusterfs_brick/brick2/gv0
>>> Brick5: dl21.dg.163.org:/glusterfs_brick/brick2/gv0
>>> Brick6: dl22.dg.163.org:/glusterfs_brick/brick2/gv0
>>> Brick7: dl20.dg.163.org:/glusterfs_brick/brick3/gv0
>>> Brick8: dl21.dg.163.org:/glusterfs_brick/brick3/gv0
>>> Brick9: dl22.dg.163.org:/glusterfs_brick/brick3/gv0
>>> Brick10: dl23.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick11: dl24.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick12: dl25.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick13: dl26.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick14: dl27.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick15: dl28.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick16: dl29.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick17: dl30.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick18: dl31.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick19: dl32.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick20: dl33.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick21: dl34.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick22: dl23.dg.163.org:/glusterfs_brick/brick2/gv0
>>> Brick23: dl24.dg.163.org:/glusterfs_brick/brick2/gv0
>>> Brick24: dl25.dg.163.org:/glusterfs_brick/brick2/gv0
>>> Brick25: 

Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread huting3






grep count will ouput nothing, so I grep size, the results are:$ grep itable glusterdump.109182.dump.1533730324 | grep lru | grep sizexlator.mount.fuse.itable.lru_size=191726$ grep itable glusterdump.109182.dump.1533730324 | grep active | grep sizexlator.mount.fuse.itable.active_size=17
huting3
huti...@corp.netease.com


签名由
网易邮箱大师
定制

  



On 08/9/2018 12:36,Raghavendra Gowdappa wrote: 


Can you get the output of following cmds?# grep itable  | grep lru | grep count# grep itable  | grep active | grep countOn Thu, Aug 9, 2018 at 9:25 AM, huting3  wrote:







Yes, I got the dump file and found there are many huge num_allocs just like following:I found memusage of 4 variable types are extreamly huge. [protocol/client.gv0-client-0 - usage-type gf_common_mt_char memusage]size=47202352num_allocs=2030212max_size=47203074max_num_allocs=2030235total_allocs=26892201[protocol/client.gv0-client-0 - usage-type gf_common_mt_memdup memusage]size=24362448num_allocs=2030204max_size=24367560max_num_allocs=2030226total_allocs=17830860[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]size=2497947552num_allocs=4578229max_size=2459135680max_num_allocs=7123206total_allocs=41635232[mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]size=4038730976num_allocs=1max_size=4294962264max_num_allocs=37total_allocs=150049981​
huting3
huti...@corp.netease.com


签名由
网易邮箱大师
定制

  



On 08/9/2018 11:36,Raghavendra Gowdappa wrote: 


On Thu, Aug 9, 2018 at 8:55 AM, huting3  wrote:







Hi expert:I meet a problem when I use glusterfs. The problem is that the fuse client consumes huge memory when write a   lot of files(>million) to the gluster, at last leading to killed by OS oom. The memory the fuse process consumes can up to 100G! I wonder if there are memory leaks in the gluster fuse process, or some other causes.Can you get statedump of fuse process consuming huge memory? My gluster version is 3.13.2, the gluster volume info is listed as following:Volume Name: gv0Type: Distributed-ReplicateVolume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0Status: StartedSnapshot Count: 0Number of Bricks: 19 x 3 = 57Transport-type: tcpBricks:Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0Brick3: dl22.dg.163.org:/glusterfs_brick/brick1/gv0Brick4: dl20.dg.163.org:/glusterfs_brick/brick2/gv0Brick5: dl21.dg.163.org:/glusterfs_brick/brick2/gv0Brick6: dl22.dg.163.org:/glusterfs_brick/brick2/gv0Brick7: dl20.dg.163.org:/glusterfs_brick/brick3/gv0Brick8: dl21.dg.163.org:/glusterfs_brick/brick3/gv0Brick9: dl22.dg.163.org:/glusterfs_brick/brick3/gv0Brick10: dl23.dg.163.org:/glusterfs_brick/brick1/gv0Brick11: dl24.dg.163.org:/glusterfs_brick/brick1/gv0Brick12: dl25.dg.163.org:/glusterfs_brick/brick1/gv0Brick13: dl26.dg.163.org:/glusterfs_brick/brick1/gv0Brick14: dl27.dg.163.org:/glusterfs_brick/brick1/gv0Brick15: dl28.dg.163.org:/glusterfs_brick/brick1/gv0Brick16: dl29.dg.163.org:/glusterfs_brick/brick1/gv0Brick17: dl30.dg.163.org:/glusterfs_brick/brick1/gv0Brick18: dl31.dg.163.org:/glusterfs_brick/brick1/gv0Brick19: dl32.dg.163.org:/glusterfs_brick/brick1/gv0Brick20: dl33.dg.163.org:/glusterfs_brick/brick1/gv0Brick21: dl34.dg.163.org:/glusterfs_brick/brick1/gv0Brick22: dl23.dg.163.org:/glusterfs_brick/brick2/gv0Brick23: dl24.dg.163.org:/glusterfs_brick/brick2/gv0Brick24: dl25.dg.163.org:/glusterfs_brick/brick2/gv0Brick25: dl26.dg.163.org:/glusterfs_brick/brick2/gv0Brick26: dl27.dg.163.org:/glusterfs_brick/brick2/gv0Brick27: dl28.dg.163.org:/glusterfs_brick/brick2/gv0Brick28: dl29.dg.163.org:/glusterfs_brick/brick2/gv0Brick29: dl30.dg.163.org:/glusterfs_brick/brick2/gv0Brick30: dl31.dg.163.org:/glusterfs_brick/brick2/gv0Brick31: dl32.dg.163.org:/glusterfs_brick/brick2/gv0Brick32: dl33.dg.163.org:/glusterfs_brick/brick2/gv0Brick33: dl34.dg.163.org:/glusterfs_brick/brick2/gv0Brick34: dl23.dg.163.org:/glusterfs_brick/brick3/gv0Brick35: dl24.dg.163.org:/glusterfs_brick/brick3/gv0Brick36: dl25.dg.163.org:/glusterfs_brick/brick3/gv0Brick37: dl26.dg.163.org:/glusterfs_brick/brick3/gv0Brick38: dl27.dg.163.org:/glusterfs_brick/brick3/gv0Brick39: dl28.dg.163.org:/glusterfs_brick/brick3/gv0Brick40: dl29.dg.163.org:/glusterfs_brick/brick3/gv0Brick41: dl30.dg.163.org:/glusterfs_brick/brick3/gv0Brick42: dl31.dg.163.org:/glusterfs_brick/brick3/gv0Brick43: dl32.dg.163.org:/glusterfs_brick/brick3/gv0Brick44: dl33.dg.163.org:/glusterfs_brick/brick3/gv0Brick45: dl34.dg.163.org:/glusterfs_brick/brick3/gv0Brick46: dl0.dg.163.org:/glusterfs_brick/brick1/gv0Brick47: dl1.dg.163.org:/glusterfs_brick/brick1/gv0Brick48: dl2.dg.163.org:/glusterfs_brick/brick1/gv0Brick49: dl3.dg.163.org:/glusterfs_brick/brick1/gv0Brick50: 

Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread Raghavendra Gowdappa
Can you get the output of following cmds?

# grep itable  | grep lru | grep count

# grep itable  | grep active | grep count

On Thu, Aug 9, 2018 at 9:25 AM, huting3  wrote:

> Yes, I got the dump file and found there are many huge num_allocs just
> like following:
>
> I found memusage of 4 variable types are extreamly huge.
>
>  [protocol/client.gv0-client-0 - usage-type gf_common_mt_char memusage]
> size=47202352
> num_allocs=2030212
> max_size=47203074
> max_num_allocs=2030235
> total_allocs=26892201
>
> [protocol/client.gv0-client-0 - usage-type gf_common_mt_memdup memusage]
> size=24362448
> num_allocs=2030204
> max_size=24367560
> max_num_allocs=2030226
> total_allocs=17830860
>
> [mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
> size=2497947552
> num_allocs=4578229
> max_size=2459135680
> max_num_allocs=7123206
> total_allocs=41635232
>
> [mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
> size=4038730976
> num_allocs=1
> max_size=4294962264
> max_num_allocs=37
> total_allocs=150049981​
> 
>
>
>
> huting3
> huti...@corp.netease.com
>
> 
> 签名由 网易邮箱大师  定制
>
> On 08/9/2018 11:36,Raghavendra Gowdappa
>  wrote:
>
>
>
> On Thu, Aug 9, 2018 at 8:55 AM, huting3  wrote:
>
>> Hi expert:
>>
>> I meet a problem when I use glusterfs. The problem is that the fuse
>> client consumes huge memory when write a   lot of files(>million) to the
>> gluster, at last leading to killed by OS oom. The memory the fuse process
>> consumes can up to 100G! I wonder if there are memory leaks in the gluster
>> fuse process, or some other causes.
>>
>
> Can you get statedump of fuse process consuming huge memory?
>
>
>> My gluster version is 3.13.2, the gluster volume info is listed as
>> following:
>>
>> Volume Name: gv0
>> Type: Distributed-Replicate
>> Volume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 19 x 3 = 57
>> Transport-type: tcp
>> Bricks:
>> Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick3: dl22.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick4: dl20.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick5: dl21.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick6: dl22.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick7: dl20.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick8: dl21.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick9: dl22.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick10: dl23.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick11: dl24.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick12: dl25.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick13: dl26.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick14: dl27.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick15: dl28.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick16: dl29.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick17: dl30.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick18: dl31.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick19: dl32.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick20: dl33.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick21: dl34.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick22: dl23.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick23: dl24.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick24: dl25.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick25: dl26.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick26: dl27.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick27: dl28.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick28: dl29.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick29: dl30.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick30: dl31.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick31: dl32.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick32: dl33.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick33: dl34.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick34: dl23.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick35: dl24.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick36: dl25.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick37: dl26.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick38: dl27.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick39: dl28.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick40: dl29.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick41: dl30.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick42: dl31.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick43: dl32.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick44: 

Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread huting3






Yes, I got the dump file and found there are many huge num_allocs just like following:I found memusage of 4 variable types are extreamly huge. [protocol/client.gv0-client-0 - usage-type gf_common_mt_char memusage]size=47202352num_allocs=2030212max_size=47203074max_num_allocs=2030235total_allocs=26892201[protocol/client.gv0-client-0 - usage-type gf_common_mt_memdup memusage]size=24362448num_allocs=2030204max_size=24367560max_num_allocs=2030226total_allocs=17830860[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]size=2497947552num_allocs=4578229max_size=2459135680max_num_allocs=7123206total_allocs=41635232[mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]size=4038730976num_allocs=1max_size=4294962264max_num_allocs=37total_allocs=150049981​
huting3
huti...@corp.netease.com


签名由
网易邮箱大师
定制

  



On 08/9/2018 11:36,Raghavendra Gowdappa wrote: 


On Thu, Aug 9, 2018 at 8:55 AM, huting3  wrote:







Hi expert:I meet a problem when I use glusterfs. The problem is that the fuse client consumes huge memory when write a   lot of files(>million) to the gluster, at last leading to killed by OS oom. The memory the fuse process consumes can up to 100G! I wonder if there are memory leaks in the gluster fuse process, or some other causes.Can you get statedump of fuse process consuming huge memory? My gluster version is 3.13.2, the gluster volume info is listed as following:Volume Name: gv0Type: Distributed-ReplicateVolume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0Status: StartedSnapshot Count: 0Number of Bricks: 19 x 3 = 57Transport-type: tcpBricks:Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0Brick3: dl22.dg.163.org:/glusterfs_brick/brick1/gv0Brick4: dl20.dg.163.org:/glusterfs_brick/brick2/gv0Brick5: dl21.dg.163.org:/glusterfs_brick/brick2/gv0Brick6: dl22.dg.163.org:/glusterfs_brick/brick2/gv0Brick7: dl20.dg.163.org:/glusterfs_brick/brick3/gv0Brick8: dl21.dg.163.org:/glusterfs_brick/brick3/gv0Brick9: dl22.dg.163.org:/glusterfs_brick/brick3/gv0Brick10: dl23.dg.163.org:/glusterfs_brick/brick1/gv0Brick11: dl24.dg.163.org:/glusterfs_brick/brick1/gv0Brick12: dl25.dg.163.org:/glusterfs_brick/brick1/gv0Brick13: dl26.dg.163.org:/glusterfs_brick/brick1/gv0Brick14: dl27.dg.163.org:/glusterfs_brick/brick1/gv0Brick15: dl28.dg.163.org:/glusterfs_brick/brick1/gv0Brick16: dl29.dg.163.org:/glusterfs_brick/brick1/gv0Brick17: dl30.dg.163.org:/glusterfs_brick/brick1/gv0Brick18: dl31.dg.163.org:/glusterfs_brick/brick1/gv0Brick19: dl32.dg.163.org:/glusterfs_brick/brick1/gv0Brick20: dl33.dg.163.org:/glusterfs_brick/brick1/gv0Brick21: dl34.dg.163.org:/glusterfs_brick/brick1/gv0Brick22: dl23.dg.163.org:/glusterfs_brick/brick2/gv0Brick23: dl24.dg.163.org:/glusterfs_brick/brick2/gv0Brick24: dl25.dg.163.org:/glusterfs_brick/brick2/gv0Brick25: dl26.dg.163.org:/glusterfs_brick/brick2/gv0Brick26: dl27.dg.163.org:/glusterfs_brick/brick2/gv0Brick27: dl28.dg.163.org:/glusterfs_brick/brick2/gv0Brick28: dl29.dg.163.org:/glusterfs_brick/brick2/gv0Brick29: dl30.dg.163.org:/glusterfs_brick/brick2/gv0Brick30: dl31.dg.163.org:/glusterfs_brick/brick2/gv0Brick31: dl32.dg.163.org:/glusterfs_brick/brick2/gv0Brick32: dl33.dg.163.org:/glusterfs_brick/brick2/gv0Brick33: dl34.dg.163.org:/glusterfs_brick/brick2/gv0Brick34: dl23.dg.163.org:/glusterfs_brick/brick3/gv0Brick35: dl24.dg.163.org:/glusterfs_brick/brick3/gv0Brick36: dl25.dg.163.org:/glusterfs_brick/brick3/gv0Brick37: dl26.dg.163.org:/glusterfs_brick/brick3/gv0Brick38: dl27.dg.163.org:/glusterfs_brick/brick3/gv0Brick39: dl28.dg.163.org:/glusterfs_brick/brick3/gv0Brick40: dl29.dg.163.org:/glusterfs_brick/brick3/gv0Brick41: dl30.dg.163.org:/glusterfs_brick/brick3/gv0Brick42: dl31.dg.163.org:/glusterfs_brick/brick3/gv0Brick43: dl32.dg.163.org:/glusterfs_brick/brick3/gv0Brick44: dl33.dg.163.org:/glusterfs_brick/brick3/gv0Brick45: dl34.dg.163.org:/glusterfs_brick/brick3/gv0Brick46: dl0.dg.163.org:/glusterfs_brick/brick1/gv0Brick47: dl1.dg.163.org:/glusterfs_brick/brick1/gv0Brick48: dl2.dg.163.org:/glusterfs_brick/brick1/gv0Brick49: dl3.dg.163.org:/glusterfs_brick/brick1/gv0Brick50: dl5.dg.163.org:/glusterfs_brick/brick1/gv0Brick51: dl6.dg.163.org:/glusterfs_brick/brick1/gv0Brick52: dl9.dg.163.org:/glusterfs_brick/brick1/gv0Brick53: dl10.dg.163.org:/glusterfs_brick/brick1/gv0Brick54: dl11.dg.163.org:/glusterfs_brick/brick1/gv0Brick55: dl12.dg.163.org:/glusterfs_brick/brick1/gv0Brick56: dl13.dg.163.org:/glusterfs_brick/brick1/gv0Brick57: dl14.dg.163.org:/glusterfs_brick/brick1/gv0Options Reconfigured:performance.cache-size: 10GBperformance.parallel-readdir: onperformance.readdir-ahead: onnetwork.inode-lru-limit: 20performance.md-cache-timeout: 600performance.cache-invalidation: onperformance.stat-prefetch: onfeatures.cache-invalidation-timeout: 600features.cache-invalidation: 

Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread Raghavendra Gowdappa
On Thu, Aug 9, 2018 at 8:55 AM, huting3  wrote:

> Hi expert:
>
> I meet a problem when I use glusterfs. The problem is that the fuse client
> consumes huge memory when write a   lot of files(>million) to the gluster,
> at last leading to killed by OS oom. The memory the fuse process consumes
> can up to 100G! I wonder if there are memory leaks in the gluster fuse
> process, or some other causes.
>

Can you get statedump of fuse process consuming huge memory?


> My gluster version is 3.13.2, the gluster volume info is listed as
> following:
>
> Volume Name: gv0
> Type: Distributed-Replicate
> Volume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 19 x 3 = 57
> Transport-type: tcp
> Bricks:
> Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick3: dl22.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick4: dl20.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick5: dl21.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick6: dl22.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick7: dl20.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick8: dl21.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick9: dl22.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick10: dl23.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick11: dl24.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick12: dl25.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick13: dl26.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick14: dl27.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick15: dl28.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick16: dl29.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick17: dl30.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick18: dl31.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick19: dl32.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick20: dl33.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick21: dl34.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick22: dl23.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick23: dl24.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick24: dl25.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick25: dl26.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick26: dl27.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick27: dl28.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick28: dl29.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick29: dl30.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick30: dl31.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick31: dl32.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick32: dl33.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick33: dl34.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick34: dl23.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick35: dl24.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick36: dl25.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick37: dl26.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick38: dl27.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick39: dl28.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick40: dl29.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick41: dl30.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick42: dl31.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick43: dl32.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick44: dl33.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick45: dl34.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick46: dl0.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick47: dl1.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick48: dl2.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick49: dl3.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick50: dl5.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick51: dl6.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick52: dl9.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick53: dl10.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick54: dl11.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick55: dl12.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick56: dl13.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick57: dl14.dg.163.org:/glusterfs_brick/brick1/gv0
> Options Reconfigured:
> performance.cache-size: 10GB
> performance.parallel-readdir: on
> performance.readdir-ahead: on
> network.inode-lru-limit: 20
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> features.inode-quota: off
> features.quota: off
> cluster.quorum-reads: on
> cluster.quorum-count: 2
> cluster.quorum-type: fixed
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
> cluster.server-quorum-ratio: 51%
>
>
> huting3
> huti...@corp.netease.com
>
> 
> 签名由 网易邮箱大师  定制
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> 

[Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread huting3






Hi expert:I meet a problem when I use glusterfs. The problem is that the fuse client consumes huge memory when write a   lot of files(>million) to the gluster, at last leading to killed by OS oom. The memory the fuse process consumes can up to 100G! I wonder if there are memory leaks in the gluster fuse process, or some other causes.My gluster version is 3.13.2, the gluster volume info is listed as following:Volume Name: gv0Type: Distributed-ReplicateVolume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0Status: StartedSnapshot Count: 0Number of Bricks: 19 x 3 = 57Transport-type: tcpBricks:Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0Brick3: dl22.dg.163.org:/glusterfs_brick/brick1/gv0Brick4: dl20.dg.163.org:/glusterfs_brick/brick2/gv0Brick5: dl21.dg.163.org:/glusterfs_brick/brick2/gv0Brick6: dl22.dg.163.org:/glusterfs_brick/brick2/gv0Brick7: dl20.dg.163.org:/glusterfs_brick/brick3/gv0Brick8: dl21.dg.163.org:/glusterfs_brick/brick3/gv0Brick9: dl22.dg.163.org:/glusterfs_brick/brick3/gv0Brick10: dl23.dg.163.org:/glusterfs_brick/brick1/gv0Brick11: dl24.dg.163.org:/glusterfs_brick/brick1/gv0Brick12: dl25.dg.163.org:/glusterfs_brick/brick1/gv0Brick13: dl26.dg.163.org:/glusterfs_brick/brick1/gv0Brick14: dl27.dg.163.org:/glusterfs_brick/brick1/gv0Brick15: dl28.dg.163.org:/glusterfs_brick/brick1/gv0Brick16: dl29.dg.163.org:/glusterfs_brick/brick1/gv0Brick17: dl30.dg.163.org:/glusterfs_brick/brick1/gv0Brick18: dl31.dg.163.org:/glusterfs_brick/brick1/gv0Brick19: dl32.dg.163.org:/glusterfs_brick/brick1/gv0Brick20: dl33.dg.163.org:/glusterfs_brick/brick1/gv0Brick21: dl34.dg.163.org:/glusterfs_brick/brick1/gv0Brick22: dl23.dg.163.org:/glusterfs_brick/brick2/gv0Brick23: dl24.dg.163.org:/glusterfs_brick/brick2/gv0Brick24: dl25.dg.163.org:/glusterfs_brick/brick2/gv0Brick25: dl26.dg.163.org:/glusterfs_brick/brick2/gv0Brick26: dl27.dg.163.org:/glusterfs_brick/brick2/gv0Brick27: dl28.dg.163.org:/glusterfs_brick/brick2/gv0Brick28: dl29.dg.163.org:/glusterfs_brick/brick2/gv0Brick29: dl30.dg.163.org:/glusterfs_brick/brick2/gv0Brick30: dl31.dg.163.org:/glusterfs_brick/brick2/gv0Brick31: dl32.dg.163.org:/glusterfs_brick/brick2/gv0Brick32: dl33.dg.163.org:/glusterfs_brick/brick2/gv0Brick33: dl34.dg.163.org:/glusterfs_brick/brick2/gv0Brick34: dl23.dg.163.org:/glusterfs_brick/brick3/gv0Brick35: dl24.dg.163.org:/glusterfs_brick/brick3/gv0Brick36: dl25.dg.163.org:/glusterfs_brick/brick3/gv0Brick37: dl26.dg.163.org:/glusterfs_brick/brick3/gv0Brick38: dl27.dg.163.org:/glusterfs_brick/brick3/gv0Brick39: dl28.dg.163.org:/glusterfs_brick/brick3/gv0Brick40: dl29.dg.163.org:/glusterfs_brick/brick3/gv0Brick41: dl30.dg.163.org:/glusterfs_brick/brick3/gv0Brick42: dl31.dg.163.org:/glusterfs_brick/brick3/gv0Brick43: dl32.dg.163.org:/glusterfs_brick/brick3/gv0Brick44: dl33.dg.163.org:/glusterfs_brick/brick3/gv0Brick45: dl34.dg.163.org:/glusterfs_brick/brick3/gv0Brick46: dl0.dg.163.org:/glusterfs_brick/brick1/gv0Brick47: dl1.dg.163.org:/glusterfs_brick/brick1/gv0Brick48: dl2.dg.163.org:/glusterfs_brick/brick1/gv0Brick49: dl3.dg.163.org:/glusterfs_brick/brick1/gv0Brick50: dl5.dg.163.org:/glusterfs_brick/brick1/gv0Brick51: dl6.dg.163.org:/glusterfs_brick/brick1/gv0Brick52: dl9.dg.163.org:/glusterfs_brick/brick1/gv0Brick53: dl10.dg.163.org:/glusterfs_brick/brick1/gv0Brick54: dl11.dg.163.org:/glusterfs_brick/brick1/gv0Brick55: dl12.dg.163.org:/glusterfs_brick/brick1/gv0Brick56: dl13.dg.163.org:/glusterfs_brick/brick1/gv0Brick57: dl14.dg.163.org:/glusterfs_brick/brick1/gv0Options Reconfigured:performance.cache-size: 10GBperformance.parallel-readdir: onperformance.readdir-ahead: onnetwork.inode-lru-limit: 20performance.md-cache-timeout: 600performance.cache-invalidation: onperformance.stat-prefetch: onfeatures.cache-invalidation-timeout: 600features.cache-invalidation: onfeatures.inode-quota: offfeatures.quota: offcluster.quorum-reads: oncluster.quorum-count: 2cluster.quorum-type: fixedtransport.address-family: inetnfs.disable: onperformance.client-io-threads: offcluster.server-quorum-ratio: 51%



   










huting3







huti...@corp.netease.com








签名由
网易邮箱大师
定制

  





___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] tests/bugs/glusterd/quorum-validation.t ==> glusterfsd core

2018-08-08 Thread Atin Mukherjee
See https://build.gluster.org/job/line-coverage/435/consoleFull . core file
can be extracted from [1]

The core] seems to be coming from changelog xlator. Please note line-cov
doesn't run with brick mux enabled.

[1]
http://builder100.cloud.gluster.org/archived_builds/build-install-line-coverage-435.tar.bz2
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Wed, August 08th)

2018-08-08 Thread Atin Mukherjee
On Thu, 9 Aug 2018 at 06:34, Shyam Ranganathan  wrote:

> Today's patch set 7 [1], included fixes provided till last evening IST,
> and its runs can be seen here [2] (yay! we can link to comments in
> gerrit now).
>
> New failures: (added to the spreadsheet)
> ./tests/bugs/protocol/bug-808400-repl.t (core dumped)
> ./tests/bugs/quick-read/bug-846240.t
>
> Older tests that had not recurred, but failed today: (moved up in the
> spreadsheet)
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
>
> Other issues;
> Test ./tests/basic/ec/ec-5-2.t core dumped again




> Few geo-rep failures, Kotresh should have more logs to look at with
> these runs
> Test ./tests/bugs/glusterd/quorum-validation.t dumped core again


>
> Atin/Amar, we may need to merge some of the patches that have proven to
> be holding up and fixing issues today, so that we do not leave
> everything to the last. Check and move them along or lmk.


Ack. I’ll be merging those patches.


>
> Shyam
>
> [1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7
> [2] Runs against patch set 7 and its status (incomplete as some runs
> have not completed):
>
> https://review.gluster.org/c/glusterfs/+/20637/7#message-37bc68ce6f2157f2947da6fd03b361ab1b0d1a77
> (also updated in the spreadsheet)
>
> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> > Deserves a new beginning, threads on the other mail have gone deep
> enough.
> >
> > NOTE: (5) below needs your attention, rest is just process and data on
> > how to find failures.
> >
> > 1) We are running the tests using the patch [2].
> >
> > 2) Run details are extracted into a separate sheet in [3] named "Run
> > Failures" use a search to find a failing test and the corresponding run
> > that it failed in.
> >
> > 3) Patches that are fixing issues can be found here [1], if you think
> > you have a patch out there, that is not in this list, shout out.
> >
> > 4) If you own up a test case failure, update the spreadsheet [3] with
> > your name against the test, and also update other details as needed (as
> > comments, as edit rights to the sheet are restricted).
> >
> > 5) Current test failures
> > We still have the following tests failing and some without any RCA or
> > attention, (If something is incorrect, write back).
> >
> > ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> > attention)
> > ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> > ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> > (Atin)
> > ./tests/bugs/ec/bug-1236065.t (Ashish)
> > ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> > ./tests/basic/ec/ec-1468261.t (needs attention)
> > ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> > ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> > ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> > ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> > ./tests/bugs/replicate/bug-1363721.t (Ravi)
> >
> > Here are some newer failures, but mostly one-off failures except cores
> > in ec-5-2.t. All of the following need attention as these are new.
> >
> > ./tests/00-geo-rep/00-georep-verify-setup.t
> > ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> > ./tests/basic/stats-dump.t
> > ./tests/bugs/bug-1110262.t
> >
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> > ./tests/basic/ec/ec-data-heal.t
> > ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> >
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> > ./tests/basic/ec/ec-5-2.t
> >
> > 6) Tests that are addressed or are not occurring anymore are,
> >
> > ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> > ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> > ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> > ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> > ./tests/bitrot/bug-1373520.t
> > ./tests/bugs/distribute/bug-1117851.t
> > ./tests/bugs/glusterd/quorum-validation.t
> > ./tests/bugs/distribute/bug-1042725.t
> >
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> > ./tests/bugs/quota/bug-1293601.t
> > ./tests/bugs/bug-1368312.t
> > ./tests/bugs/distribute/bug-1122443.t
> > ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> >
> > Shyam (and Atin)
> >
> > On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
> >> Health on master as of the last nightly run [4] is still the same.
> >>
> >> Potential patches that rectify the situation (as in [1]) are bunched in
> >> a patch [2] that Atin and myself have put through several regressions
> >> (mux, normal and line coverage) and these have also not passed.
> >>
> >> Till we rectify the situation we are locking down master branch commit
> >> rights to the following people, Amar, Atin, Shyam, Vijay.
> >>
> >> The intention is to stabilize master and not 

Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-08 Thread Shyam Ranganathan
Today's patch set 7 [1], included fixes provided till last evening IST,
and its runs can be seen here [2] (yay! we can link to comments in
gerrit now).

New failures: (added to the spreadsheet)
./tests/bugs/protocol/bug-808400-repl.t (core dumped)
./tests/bugs/quick-read/bug-846240.t

Older tests that had not recurred, but failed today: (moved up in the
spreadsheet)
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
./tests/bugs/index/bug-1559004-EMLINK-handling.t

Other issues;
Test ./tests/basic/ec/ec-5-2.t core dumped again
Few geo-rep failures, Kotresh should have more logs to look at with
these runs
Test ./tests/bugs/glusterd/quorum-validation.t dumped core again

Atin/Amar, we may need to merge some of the patches that have proven to
be holding up and fixing issues today, so that we do not leave
everything to the last. Check and move them along or lmk.

Shyam

[1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7
[2] Runs against patch set 7 and its status (incomplete as some runs
have not completed):
https://review.gluster.org/c/glusterfs/+/20637/7#message-37bc68ce6f2157f2947da6fd03b361ab1b0d1a77
(also updated in the spreadsheet)

On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you see this
>> mail and let the list know regarding the same as well.
>>
>> Thanks,
>> Shyam
>>
>> [1] Patches that 

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Shyam Ranganathan
On 08/08/2018 09:43 AM, Shyam Ranganathan wrote:
> On 08/08/2018 09:41 AM, Kotresh Hiremath Ravishankar wrote:
>> For geo-rep test retrials. Could you take this instrumentation patch [1]
>> and give a run?
>> I am have tried thrice on the patch with brick mux enabled and without
>> but couldn't hit
>> geo-rep failure. May be some race and it's not happening with
>> instrumentation patch.
>>
>> [1] https://review.gluster.org/20477
> 
> Will do in my refresh today, thanks.
> 

Kotresh, this run may have the additional logs that you are looking for.
As this is a failed run on one of the geo-rep test cases.

https://build.gluster.org/job/line-coverage/434/consoleFull
19:10:55, 1 test(s) failed
19:10:55, ./tests/00-geo-rep/georep-basic-dr-tarssh.t
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Shyam Ranganathan
On 08/08/2018 04:56 AM, Nigel Babu wrote:
> Also, Shyam was saying that in case of retries, the old (failure) logs
> get overwritten by the retries which are successful. Can we disable
> re-trying the .ts when they fail just for this lock down period
> alone so
> that we do have the logs?
> 
> 
> Please don't apply a band-aid. Please fix run-test.sh so that the second
> run has a -retry attached to the file name or some such, please.

Posted patch https://review.gluster.org/c/glusterfs/+/20682 that
achieves this.

I do not like the fact that I use the gluster CLI in run-scripts.sh,
alternatives welcome.

If it looks functionally fine, then I will merge it into the big patch
[1] that we are using to run multiple tests (so that at least we start
getting retry logs from there).

Prior to this I had done this within include.rc and in cleanup, but that
gets invoked twice (at least) per test, and so generated far too many
empty tarballs for no reason.

Also, the change above does not prevent half complete logs if any test
calls cleanup in between (as that would create a tarball in between that
would be overwritten by the last invocation of cleanup).

Shyam

[1] big patch: https://review.gluster.org/c/glusterfs/+/20637
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Shyam Ranganathan
On 08/08/2018 09:41 AM, Kotresh Hiremath Ravishankar wrote:
> For geo-rep test retrials. Could you take this instrumentation patch [1]
> and give a run?
> I am have tried thrice on the patch with brick mux enabled and without
> but couldn't hit
> geo-rep failure. May be some race and it's not happening with
> instrumentation patch.
> 
> [1] https://review.gluster.org/20477

Will do in my refresh today, thanks.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Kotresh Hiremath Ravishankar
Hi Atin/Shyam

For geo-rep test retrials. Could you take this instrumentation patch [1]
and give a run?
I am have tried thrice on the patch with brick mux enabled and without but
couldn't hit
geo-rep failure. May be some race and it's not happening with
instrumentation patch.

[1] https://review.gluster.org/20477

Thanks,
Kotresh HR


On Wed, Aug 8, 2018 at 4:00 PM, Pranith Kumar Karampuri  wrote:

>
>
> On Wed, Aug 8, 2018 at 5:08 AM Shyam Ranganathan 
> wrote:
>
>> Deserves a new beginning, threads on the other mail have gone deep enough.
>>
>> NOTE: (5) below needs your attention, rest is just process and data on
>> how to find failures.
>>
>> 1) We are running the tests using the patch [2].
>>
>> 2) Run details are extracted into a separate sheet in [3] named "Run
>> Failures" use a search to find a failing test and the corresponding run
>> that it failed in.
>>
>> 3) Patches that are fixing issues can be found here [1], if you think
>> you have a patch out there, that is not in this list, shout out.
>>
>> 4) If you own up a test case failure, update the spreadsheet [3] with
>> your name against the test, and also update other details as needed (as
>> comments, as edit rights to the sheet are restricted).
>>
>> 5) Current test failures
>> We still have the following tests failing and some without any RCA or
>> attention, (If something is incorrect, write back).
>>
>> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
>> attention)
>> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
>> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>> (Atin)
>> ./tests/bugs/ec/bug-1236065.t (Ashish)
>> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
>> ./tests/basic/ec/ec-1468261.t (needs attention)
>> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
>> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
>>
>
> Sent https://review.gluster.org/#/c/glusterfs/+/20681 for the failure
> above. Because it was retried there were no logs. Entry heal succeeded but
> data/metadata heal after that didn't succeed. Found only one case based on
> code reading and the point at which it failed in .t
>
>
>> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
>> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
>> ./tests/bugs/replicate/bug-1363721.t (Ravi)
>>
>> Here are some newer failures, but mostly one-off failures except cores
>> in ec-5-2.t. All of the following need attention as these are new.
>>
>> ./tests/00-geo-rep/00-georep-verify-setup.t
>> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
>> ./tests/basic/stats-dump.t
>> ./tests/bugs/bug-1110262.t
>> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-
>> post-glusterd-restart.t
>> ./tests/basic/ec/ec-data-heal.t
>> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
>> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-
>> other-processes-accessing-mounted-path.t
>> ./tests/basic/ec/ec-5-2.t
>>
>> 6) Tests that are addressed or are not occurring anymore are,
>>
>> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
>> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
>> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
>> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>> ./tests/bitrot/bug-1373520.t
>> ./tests/bugs/distribute/bug-1117851.t
>> ./tests/bugs/glusterd/quorum-validation.t
>> ./tests/bugs/distribute/bug-1042725.t
>> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-
>> txn-on-quorum-failure.t
>> ./tests/bugs/quota/bug-1293601.t
>> ./tests/bugs/bug-1368312.t
>> ./tests/bugs/distribute/bug-1122443.t
>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>
>> Shyam (and Atin)
>>
>> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> > Health on master as of the last nightly run [4] is still the same.
>> >
>> > Potential patches that rectify the situation (as in [1]) are bunched in
>> > a patch [2] that Atin and myself have put through several regressions
>> > (mux, normal and line coverage) and these have also not passed.
>> >
>> > Till we rectify the situation we are locking down master branch commit
>> > rights to the following people, Amar, Atin, Shyam, Vijay.
>> >
>> > The intention is to stabilize master and not add more patches that my
>> > destabilize it.
>> >
>> > Test cases that are tracked as failures and need action are present here
>> > [3].
>> >
>> > @Nigel, request you to apply the commit rights change as you see this
>> > mail and let the list know regarding the same as well.
>> >
>> > Thanks,
>> > Shyam
>> >
>> > [1] Patches that address regression failures:
>> > https://review.gluster.org/#/q/starredby:srangana%2540redhat.com
>> >
>> > [2] Bunched up patch against which regressions were run:
>> > https://review.gluster.org/#/c/20637
>> >
>> > [3] Failing tests list:
>> > https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_
>> -crKALHSaSjZMQ/edit?usp=sharing
>> >
>> > [4] 

[Gluster-devel] Post-upgrade issues

2018-08-08 Thread Nigel Babu
Hello folks,

We have two post-upgrade issues

1. Jenkins jobs are failing because git clones fail. This is now fixed.
2. git.gluster.org shows no repos at the moment. I'm currently debugging
this.

-- 
nigelb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-infra] Fwd: Gerrit downtime on Aug 8, 2016

2018-08-08 Thread Nigel Babu
On Wed, Aug 8, 2018 at 4:59 PM Yaniv Kaul  wrote:

>
> Nice, thanks!
> I'm trying out the new UI. Needs getting used to, I guess.
> Have we upgraded to NotesDB?
>

Yep! Account information is now completely in NoteDB and not in
ReviewDB(which is backed by postgresql for us) anymore.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-infra] Fwd: Gerrit downtime on Aug 8, 2016

2018-08-08 Thread Yaniv Kaul
On Wed, Aug 8, 2018 at 1:28 PM, Deepshikha Khandelwal 
wrote:

> Gerrit is now upgraded to the newer version and is back online.
>

Nice, thanks!
I'm trying out the new UI. Needs getting used to, I guess.
Have we upgraded to NotesDB?
Y.


>
> Please file a bug if you face any issue.
> On Tue, Aug 7, 2018 at 11:53 AM Nigel Babu  wrote:
> >
> > Reminder, this upgrade is tomorrow.
> >
> > -- Forwarded message -
> > From: Nigel Babu 
> > Date: Fri, Jul 27, 2018 at 5:28 PM
> > Subject: Gerrit downtime on Aug 8, 2016
> > To: gluster-devel 
> > Cc: gluster-infra , <
> automated-test...@gluster.org>
> >
> >
> > Hello,
> >
> > It's been a while since we upgraded Gerrit. We plan to do a full upgrade
> and move to 2.15.3. Among other changes, this brings in the new PolyGerrit
> interface which brings significant frontend changes. You can take a look at
> how this would look on the staging site[1].
> >
> > ## Outage Window
> > 0330 EDT to 0730 EDT
> > 0730 UTC to 1130 UTC
> > 1300 IST to 1700 IST
> >
> > The actual time needed for the upgrade is about than hour, but we want
> to keep a larger window open to rollback in the event of any problems
> during the upgrade.
> >
> > --
> > nigelb
> >
> >
> > --
> > nigelb
> > ___
> > Gluster-infra mailing list
> > gluster-in...@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-infra
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-infra] Fwd: Gerrit downtime on Aug 8, 2016

2018-08-08 Thread Deepshikha Khandelwal
Gerrit is now upgraded to the newer version and is back online.

Please file a bug if you face any issue.
On Tue, Aug 7, 2018 at 11:53 AM Nigel Babu  wrote:
>
> Reminder, this upgrade is tomorrow.
>
> -- Forwarded message -
> From: Nigel Babu 
> Date: Fri, Jul 27, 2018 at 5:28 PM
> Subject: Gerrit downtime on Aug 8, 2016
> To: gluster-devel 
> Cc: gluster-infra , 
>
>
> Hello,
>
> It's been a while since we upgraded Gerrit. We plan to do a full upgrade and 
> move to 2.15.3. Among other changes, this brings in the new PolyGerrit 
> interface which brings significant frontend changes. You can take a look at 
> how this would look on the staging site[1].
>
> ## Outage Window
> 0330 EDT to 0730 EDT
> 0730 UTC to 1130 UTC
> 1300 IST to 1700 IST
>
> The actual time needed for the upgrade is about than hour, but we want to 
> keep a larger window open to rollback in the event of any problems during the 
> upgrade.
>
> --
> nigelb
>
>
> --
> nigelb
> ___
> Gluster-infra mailing list
> gluster-in...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-infra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Karthik Subrahmanya
On Wed, Aug 8, 2018 at 2:28 PM Nigel Babu  wrote:

>
>
> On Wed, Aug 8, 2018 at 2:00 PM Ravishankar N 
> wrote:
>
>>
>> On 08/08/2018 05:07 AM, Shyam Ranganathan wrote:
>> > 5) Current test failures
>> > We still have the following tests failing and some without any RCA or
>> > attention, (If something is incorrect, write back).
>> >
>> > ./tests/basic/afr/add-brick-self-heal.t (needs attention)
>>  From the runs captured at https://review.gluster.org/#/c/20637/ , I saw
>> that the latest runs where this particular .t failed were at
>> https://build.gluster.org/job/line-coverage/415 and
>> https://build.gluster.org/job/line-coverage/421/.
>> In both of these runs, there are no gluster 'regression' logs available
>> at https://build.gluster.org/job/line-coverage//artifact.
>> I have raised BZ 1613721 for it.
>
>
> We've fixed this for newer runs, but we can do nothing for older runs,
> sadly.
>
Thanks Nigel! I'm also blocked on this. The failures are not reproducible
locally.
Without the logs we can not debug the issue. Will wait for the new runs to
complete.

>
>
>>
>> Also, Shyam was saying that in case of retries, the old (failure) logs
>> get overwritten by the retries which are successful. Can we disable
>> re-trying the .ts when they fail just for this lock down period alone so
>> that we do have the logs?
>
>
> Please don't apply a band-aid. Please fix run-test.sh so that the second
> run has a -retry attached to the file name or some such, please.
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Nigel Babu
On Wed, Aug 8, 2018 at 2:00 PM Ravishankar N  wrote:

>
> On 08/08/2018 05:07 AM, Shyam Ranganathan wrote:
> > 5) Current test failures
> > We still have the following tests failing and some without any RCA or
> > attention, (If something is incorrect, write back).
> >
> > ./tests/basic/afr/add-brick-self-heal.t (needs attention)
>  From the runs captured at https://review.gluster.org/#/c/20637/ , I saw
> that the latest runs where this particular .t failed were at
> https://build.gluster.org/job/line-coverage/415 and
> https://build.gluster.org/job/line-coverage/421/.
> In both of these runs, there are no gluster 'regression' logs available
> at https://build.gluster.org/job/line-coverage//artifact.
> I have raised BZ 1613721 for it.
>

We've fixed this for newer runs, but we can do nothing for older runs,
sadly.


>
> Also, Shyam was saying that in case of retries, the old (failure) logs
> get overwritten by the retries which are successful. Can we disable
> re-trying the .ts when they fail just for this lock down period alone so
> that we do have the logs?


Please don't apply a band-aid. Please fix run-test.sh so that the second
run has a -retry attached to the file name or some such, please.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status

2018-08-08 Thread Ravishankar N



On 08/08/2018 05:07 AM, Shyam Ranganathan wrote:

5) Current test failures
We still have the following tests failing and some without any RCA or
attention, (If something is incorrect, write back).

./tests/basic/afr/add-brick-self-heal.t (needs attention)
From the runs captured at https://review.gluster.org/#/c/20637/ , I saw 
that the latest runs where this particular .t failed were at 
https://build.gluster.org/job/line-coverage/415 and 
https://build.gluster.org/job/line-coverage/421/.
In both of these runs, there are no gluster 'regression' logs available 
at https://build.gluster.org/job/line-coverage//artifact. 
I have raised BZ 1613721 for it.


Also, Shyam was saying that in case of retries, the old (failure) logs 
get overwritten by the retries which are successful. Can we disable 
re-trying the .ts when they fail just for this lock down period alone so 
that we do have the logs?


Regards,
Ravi

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Master branch lock down status

2018-08-08 Thread Ashish Pandey
I think the problem with this failure is the same which Shyam suspected for 
other EC failure. 
Connection to bricks are not being setup after killing bricks and starting 
volume using force. 

./tests/basic/ec/ec-1468261.t 
- 
Failure reported  - 

23:03:05 ok 34, LINENUM:79 23:03:05 not ok 35 Got "5" instead of "6", 
LINENUM:80 23:03:05 FAILED COMMAND: 6 ec_child_up_count patchy 0 23:03:05 not 
ok 36 Got "1298" instead of "^0$", LINENUM:83 23:03:05 FAILED COMMAND: ^0$ 
get_pending_heal_count patchy 23:03:05 ok 37, LINENUM:86 23:03:05 ok 38, 
LINENUM:87 23:03:05 not ok 39 Got "3" instead of "4", LINENUM:88 
 
When I see the glustershd log, I can see that there is an issue while starting 
the volume by force to starte the killed bricks. 
The bricks are not getting connected. 
I am seeing following logs in glustershd 
== 
[2018-08-06 23:05:45.077699] I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 
0-dict: key 'trusted.ec.size' is would not be sent on wire in future [Invalid 
argument] 
[2018-08-06 23:05:45.077724] I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 
0-dict: key 'trusted.ec.dirty' is would not be sent on wire in future [Invalid 
argument] 
[2018-08-06 23:05:45.077744] I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 
0-dict: key 'trusted.ec.version' is would not be sent on wire in future 
[Invalid argument] 
[2018-08-06 23:05:46.695719] I [rpc-clnt.c:2087:rpc_clnt_reconfig] 
0-patchy-client-1: changing port to 49152 (from 0) 
[2018-08-06 23:05:46.699766] W [MSGID: 114043] 
[client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-1: failed to set 
the volume [Resource temporarily unavailable] 
[2018-08-06 23:05:46.699809] W [MSGID: 114007] 
[client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-1: failed to get 
'process-uuid' from reply dict [Invalid argument] 
[2018-08-06 23:05:46.699833] E [MSGID: 114044] 
[client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-1: SETVOLUME on 
remote-host failed: cleanup flag is set for xlator.  Try again later [Resource 
temporarily unavailable] 
[2018-08-06 23:05:46.699855] I [MSGID: 114051] 
[client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-1: sending 
CHILD_CONNECTING event 
[2018-08-06 23:05:46.699920] I [MSGID: 114018] 
[client.c:2255:client_rpc_notify] 0-patchy-client-1: disconnected from 
patchy-client-1. Client process will keep trying to connect to glusterd until 
brick's port is available 
[2018-08-06 23:05:50.702806] I [rpc-clnt.c:2087:rpc_clnt_reconfig] 
0-patchy-client-1: changing port to 49152 (from 0) 
[2018-08-06 23:05:50.706726] W [MSGID: 114043] 
[client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-1: failed to set 
the volume [Resource temporarily unavailable] 
[2018-08-06 23:05:50.706783] W [MSGID: 114007] 
[client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-1: failed to get 
'process-uuid' from reply dict [Invalid argument] 
[2018-08-06 23:05:50.706808] E [MSGID: 114044] 
[client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-1: SETVOLUME on 
remote-host failed: cleanup flag is set for xlator.  Try again later [Resource 
temporarily unavailable] 
[2018-08-06 23:05:50.706831] I [MSGID: 114051] 
[client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-1: sending 
CHILD_CONNECTING event 
[2018-08-06 23:05:50.706904] I [MSGID: 114018] 
[client.c:2255:client_rpc_notify] 0-patchy-client-1: disconnected from 
patchy-client-1. Client process will keep trying to connect to glusterd until 
brick's port is available 
[2018-08-06 23:05:54.713490] I [rpc-clnt.c:2087:rpc_clnt_reconfig] 
0-patchy-client-1: changing port to 49152 (from 0) 
[2018-08-06 23:05:54.717417] W [MSGID: 114043] 
[client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-1: failed to set 
the volume [Resource temporarily unavailable] 
[2018-08-06 23:05:54.717483] W [MSGID: 114007] 
[client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-1: failed to get 
'process-uuid' from reply dict [Invalid argument] 
[2018-08-06 23:05:54.717508] E [MSGID: 114044] 
[client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-1: SETVOLUME on 
remote-host failed: cleanup flag is set for xlator.  Try again later [Resource 
temporarily unavailable] 
[2018-08-06 23:05:54.717530] I [MSGID: 114051] 
[client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-1: sending 
CHILD_CONNECTING event 
[2018-08-06 23:05:54.717605] I [MSGID: 114018] 
[client.c:2255:client_rpc_notify] 0-patchy-client-1: disconnected from 
patchy-client-1. Client process will keep trying to connect to glusterd until 
brick's port is available 
[2018-08-06 23:05:58.204494]:++ G_LOG:./tests/basic/ec/ec-1468261.t: 
TEST: 83 ^0$ get_pending_heal_count patchy ++ 
There are many more such logs in this duration 
 
Time at which test at line no 80 started - 
[2018-08-06 23:05:38.652297]:++ 

Re: [Gluster-devel] [Gluster-Maintainers] Test: ./tests/bugs/distribute/bug-1042725.t

2018-08-08 Thread Nithya Balachandran
On 8 August 2018 at 06:11, Shyam Ranganathan  wrote:

> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> > 6) Tests that are addressed or are not occurring anymore are,
> >
> > ./tests/bugs/distribute/bug-1042725.t
>
> The above test fails, I think due to cleanup not completing in the
> previous test failure.
>
> The failed runs are:
> https://build.gluster.org/job/line-coverage/405/consoleFull
> https://build.gluster.org/job/line-coverage/415/consoleFull


These runs do not have any logs we can check. Has this failed in the recent
runs which now collect logs?


>
>
> The logs are similar, where test 1042725.t fails to start glusterd and
> the previous test ./tests/bugs/core/multiplex-limit-issue-151.t has
> timed out.
>
> I am thinking we need to increase the cleanup time as well on time out
> tests from 5 seconds to 10 seconds to prevent these, thoughts?
>
Worth a try.

>
> This timer:
> https://github.com/gluster/glusterfs/blob/master/run-tests.sh#L16
>
> Logs look as follows:
> 16:24:48
> 
> 
> 16:24:48 [16:24:51] Running tests in file
> ./tests/bugs/core/multiplex-limit-issue-151.t
> 16:28:08 ./tests/bugs/core/multiplex-limit-issue-151.t timed out after
> 200 seconds
> 16:28:08 ./tests/bugs/core/multiplex-limit-issue-151.t: bad status 124
> 16:28:08
> 16:28:08*
> 16:28:08*   REGRESSION FAILED   *
> 16:28:08* Retrying failed tests in case *
> 16:28:08* we got some spurious failures *
> 16:28:08*
> 16:28:08
> 16:31:28 ./tests/bugs/core/multiplex-limit-issue-151.t timed out after
> 200 seconds
> 16:31:28 End of test ./tests/bugs/core/multiplex-limit-issue-151.t
> 16:31:28
> 
> 
> 16:31:28
> 16:31:28
> 16:31:28
> 
> 
> 16:31:28 [16:31:31] Running tests in file
> ./tests/bugs/distribute/bug-1042725.t
> 16:32:35 ./tests/bugs/distribute/bug-1042725.t ..
> 16:32:35 1..16
> 16:32:35 Terminated
> 16:32:35 not ok 1 , LINENUM:9
> 16:32:35 FAILED COMMAND: glusterd
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Atin Mukherjee
On Wed, Aug 8, 2018 at 5:08 AM Shyam Ranganathan 
wrote:

> Deserves a new beginning, threads on the other mail have gone deep enough.
>
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
>
> 1) We are running the tests using the patch [2].
>
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
>
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
>
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
>
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
>
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
>

This one is fixed through https://review.gluster.org/20651  as I see no
failures from this patch in the latest report from patch set 6.

./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
>
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
>
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
>
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
>

This failed because of https://review.gluster.org/20584. I believe there's
some timing issue introduced from this patch. As I highlighted in
https://review.gluster.org/#/c/20637 as a comment I'd request you to revert
this change and include https://review.gluster.org/20658

./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
>
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
>
> 6) Tests that are addressed or are not occurring anymore are,
>
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
>
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>
> Shyam (and Atin)
>
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
> > Health on master as of the last nightly run [4] is still the same.
> >
> > Potential patches that rectify the situation (as in [1]) are bunched in
> > a patch [2] that Atin and myself have put through several regressions
> > (mux, normal and line coverage) and these have also not passed.
> >
> > Till we rectify the situation we are locking down master branch commit
> > rights to the following people, Amar, Atin, Shyam, Vijay.
> >
> > The intention is to stabilize master and not add more patches that my
> > destabilize it.
> >
> > Test cases that are tracked as failures and need action are present here
> > [3].
> >
> > @Nigel, request you to apply the commit rights change as you see this
> > mail and let the list know regarding the same as well.
> >
> > Thanks,
> > Shyam
> >
> > [1] Patches that address regression failures:
> > https://review.gluster.org/#/q/starredby:srangana%2540redhat.com
> >
> > [2] Bunched up patch against which regressions were run:
> > https://review.gluster.org/#/c/20637
> >
> > [3] Failing tests list:
> >
> https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit?usp=sharing
> >
> > [4] Nightly run dashboard: https://build.gluster.org/job/nightly-master/
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel