Re: [ovirt-users] Re: Any way to terminate stuck export task

Gianluca Cecchi Mon, 05 Jul 2021 05:36:31 -0700

On Mon, Jul 5, 2021 at 2:13 PM Nir Soffer <nsof...@redhat.com> wrote:


>
> >
> > vdsm     14342  3270  0 11:17 ?        00:00:03 /usr/bin/qemu-img
> convert -p -t none -T none -f raw
> /rhev/data-center/mnt/blockSD/679c0725-75fb-4af7-bff1-7c447c5d789c/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/d2a89b5e-7d62-4695-96d8-b762ce52b379
> -O raw -o preallocation=falloc /rhev/data-center/mnt/172.16.1.137:
> _nas_EXPORT-DOMAIN/20433d5d-9d82-4079-9252-0e746ce54106/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/d2a89b5e-7d62-4695-96d8-b762ce52b379
>
> -o preallocation + NFS 4.0 + very slow NFS is your problem.
>
> qemu-img is using posix-fallocate() to preallocate the entire image at
> the start of the copy. With NFS 4.2
> this uses fallocate() linux specific syscall that allocates the space
> very efficiently in no time. With older
> NFS versions, this becomes a very slow loop, writing one byte for
> every 4k block.
>
> If you see -o preallocation, it means you are using an old vdsm
> version, we stopped using -o preallocation
> in 4.4.2, see https://bugzilla.redhat.com/1850267.
>

OK. As I said at the beginning the environment is latest 4.3
We are going to upgrade to 4.4 and we are making some complimentary
backups, for safeness.


> > On the hypervisor the ls commands quite hang, so from another hypervisor
> I see that the disk size seems to remain at 4Gb even if timestamp updates...
> >
> > # ll /rhev/data-center/mnt/172.16.1.137
> \:_nas_EXPORT-DOMAIN/20433d5d-9d82-4079-9252-0e746ce54106/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/
> > total 4260941
> > -rw-rw----. 1 nobody nobody 4363202560 Jul  5 11:23
> d2a89b5e-7d62-4695-96d8-b762ce52b379
> > -rw-r--r--. 1 nobody nobody        261 Jul  5 11:17
> d2a89b5e-7d62-4695-96d8-b762ce52b379.meta
> >
> > On host console I see a throughput of 4mbit/s...
> >
> > # strace -p 14342
>
> This shows only the main thread use -f use -f to show all threads.
>

 # strace -f -p 14342
strace: Process 14342 attached with 2 threads
[pid 14342] ppoll([{fd=9, events=POLLIN|POLLERR|POLLHUP}], 1, NULL, NULL, 8
<unfinished ...>
[pid 14343] pwrite64(12, "\0", 1, 16474968063) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474972159) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474976255) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474980351) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474984447) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474988543) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474992639) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474996735) = 1
[pid 14343] pwrite64(12, "\0", 1, 16475000831) = 1
[pid 14343] pwrite64(12, "\0", 1, 16475004927) = 1
. . . and so on . . .


>
> > This is a test oVirt env so I can wait and eventually test something...
> > Let me know your suggestions
>
> I would start by changing the NFS storage domain to version 4.2.
>

I'm going to try. RIght now I have set it to the default of
autonegotiated...


> 1. kill the hang qemu-img (it will probably cannot be killed, but worth
> trying)
> 2. deactivate the storage domain
> 3. fix the ownership on the storage domain (should be vdsm:kvm, not
> nobody:nobody)3.
>

Unfortunately it is an appliance. I have asked the guys that have it in
charge if we can set them.
Thanks for the other concepts explained.

Gianluca

Re: [ovirt-users] Re: Any way to terminate stuck export task

Reply via email to