Sorry, what I meant was, if I start the transfer now and get glusterd into 
zombie status, it's unlikely that I can fully recover the server without a 
reboot.

> On Aug 5, 2018, at 02:55, Raghavendra Gowdappa <rgowd...@redhat.com> wrote:
> 
> 
> 
> On Sun, Aug 5, 2018 at 1:22 PM, Yuhao Zhang <zzy...@gmail.com 
> <mailto:zzy...@gmail.com>> wrote:
> This is a semi-production server and I can't bring it down right now. Will 
> try to get the monitoring output when I get a chance. 
> 
> Collecting top output doesn't require to bring down servers.
> 
> 
> As I recall, the high CPU processes are brick daemons (glusterfsd) and htop 
> showed they were in status D. However, I saw zero zpool IO as clients were 
> all hanging.
> 
> 
>> On Aug 5, 2018, at 02:38, Raghavendra Gowdappa <rgowd...@redhat.com 
>> <mailto:rgowd...@redhat.com>> wrote:
>> 
>> 
>> 
>> On Sun, Aug 5, 2018 at 12:44 PM, Yuhao Zhang <zzy...@gmail.com 
>> <mailto:zzy...@gmail.com>> wrote:
>> Hi,
>> 
>> I am running into a situation that heavy write causes Gluster server went 
>> into zombie with many high CPU processes and all clients hangs, it is almost 
>> 100% reproducible on my machine. Hope someone can help.
>> 
>> Can you give us the output of monitioring these processes with High cpu 
>> usage captured in the duration when your tests are running?
>> 
>> MON_INTERVAL=10 # can be increased for very long runs
>> top -bd $MON_INTERVAL > /tmp/top_proc.${HOSTNAME}.txt # CPU utilization by 
>> process
>> top -bHd $MON_INTERVAL > /tmp/top_thr.${HOSTNAME}.txt # CPU utilization by 
>> thread
>> 
>> 
>> I started to observe this issue when running rsync to copy files from 
>> another server and I thought it might be because Gluster doesn't like 
>> rsync's delta transfer with a lot of small writes. However, I was able to 
>> reproduce this with "rsync --whole-file --inplace", or even with cp or scp. 
>> It usually appears after starting the transfer for a few hours, but 
>> sometimes can happen within several minutes.
>> 
>> Since this is a single node Gluster distributed volume, I tried to transfer 
>> files directly onto the server bypassing Gluster clients, but it still 
>> caused the same issue.
>> 
>> It is running on top of a ZFS RAIDZ2 dataset. Options are attached. Also, I 
>> attached the statedump generated when my clients hung, and volume options.
>> 
>> - Ubuntu 16.04 x86_64 / 4.4.0-116-generic
>> - GlusterFS 3.12.8
>> 
>> Thank you,
>> Yuhao
>> 
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
>> https://lists.gluster.org/mailman/listinfo/gluster-users 
>> <https://lists.gluster.org/mailman/listinfo/gluster-users>
> 

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to