On 2018-04-09 08:04, Stefan Hajnoczi wrote: > On Sun, Apr 08, 2018 at 10:35:16PM +0300, Benny Zlotnik wrote: > > What type of storage are the source and destination images? (e.g. > source is a local qcow2 file on xfs, destination is a raw file on NFS) > >> $ gdb -p 13024 -batch -ex "thread apply all bt" >> [Thread debugging using libthread_db enabled] >> Using host libthread_db library "/lib64/libthread_db.so.1". >> 0x00007f98275cfaff in ppoll () from /lib64/libc.so.6 >> >> Thread 1 (Thread 0x7f983e30ab00 (LWP 13024)): >> #0 0x00007f98275cfaff in ppoll () from /lib64/libc.so.6 >> #1 0x000055b55cf59d69 in qemu_poll_ns () >> #2 0x000055b55cf5ba45 in aio_poll () >> #3 0x000055b55ceedc0f in bdrv_get_block_status_above () >> #4 0x000055b55cea3611 in convert_iteration_sectors () > > CCing Max Reitz in case this is familiar.
Hmm, not really, no... The culprit I know of (sensing block status outside of qemu) would block in lseek64() under find_allocation(). I didn't have any luck reproducing the issue either... Whenever I had some hang in ppoll(), it was usually during a drain, but that doesn't seem to be the case here either. So I have no idea. Maybe I'll test some other configurations at another time, but so far I didn't experience any hangs and I have no idea what could be provoking them (other than some network issue outside of qemu, but well...). Max >> #5 0x000055b55cea4352 in img_convert () >> #6 0x000055b55ce9d819 in main () >> >> >> On Sun, Apr 8, 2018 at 10:28 PM, Nir Soffer <nir...@gmail.com> wrote: >> >>> On Sun, Apr 8, 2018 at 9:27 PM Benny Zlotnik <bzlot...@redhat.com> wrote: >>> >>>> Hi, >>>> >>>> As part of copy operation initiated by rhev got stuck for more than a day >>>> and consumes plenty of CPU >>>> vdsm 13024 3117 99 Apr07 ? 1-06:58:43 /usr/bin/qemu-img >>>> convert >>>> -p -t none -T none -f qcow2 >>>> /rhev/data-center/bb422fac-81c5-4fea-8782-3498bb5c8a59/ >>>> 26989331-2c39-4b34-a7ed-d7dd7703646c/images/597e12b6- >>>> 19f5-45bd-868f-767600c7115e/62a5492e-e120-4c25-898e-9f5f5629853e >>>> -O raw /rhev/data-center/mnt/mantis-nfs-lif1.lab.eng.tlv2.redhat.com: >>>> _vol__service/26989331-2c39-4b34-a7ed-d7dd7703646c/images/ >>>> 9ece9408-9ca6-48cd-992a-6f590c710672/06d6d3c0-beb8-4b6b-ab00-56523df185da >>>> >>>> The target image appears to have no data yet: >>>> qemu-img info 06d6d3c0-beb8-4b6b-ab00-56523df185da" >>>> image: 06d6d3c0-beb8-4b6b-ab00-56523df185da >>>> file format: raw >>>> virtual size: 120G (128849018880 bytes) >>>> disk size: 0 >>>> >>>> strace -p 13024 -tt -T -f shows only: >>>> ... >>>> 21:13:01.309382 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, >>>> 0}, >>>> NULL, 8) = 0 (Timeout) <0.000010> >>>> 21:13:01.309411 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, >>>> 0}, >>>> NULL, 8) = 0 (Timeout) <0.000009> >>>> 21:13:01.309440 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, >>>> 0}, >>>> NULL, 8) = 0 (Timeout) <0.000009> >>>> 21:13:01.309468 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, >>>> 0}, >>>> NULL, 8) = 0 (Timeout) <0.000010> >>>> >>>> version: qemu-img-rhev-2.9.0-16.el7_4.13.x86_64 >>>> >>>> What could cause this? I'll provide any additional information needed >>>> >>> >>> A backtrace may help, try: >>> >>> gdb -p 13024 -batch -ex "thread apply all bt" >>> >>> Also adding Kevin and qemu-block. >>> >>> Nir >>>
signature.asc
Description: OpenPGP digital signature