Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
On 09/27/2018 01:16 PM, Bjoern Walk wrote: > Michal Privoznik [2018-09-27, 12:07PM +0200]: >> On 09/27/2018 11:11 AM, Bjoern Walk wrote: >>> I still don't understand why we need a timeout at all. If virtlockd is >>> unable to get the lock, just bail and continue with what you did after >>> the timeout runs out. Is this some kind of safety-measure? Against what? >> >> When there are two threads fighting over the lock. Imagine two threads >> trying to set security label over the same file. Or imagine two daemons >> on two distant hosts trying to set label on the same file on NFS. >> virtlockd implements try-or-fail approach. So we need to call lock() >> repeatedly until we succeed (or timeout). > > One thread wins and the other fails with lock contention? I don't see > how repeatedly trying to acquire the lock will improve things, but maybe > I am not getting it right now. The point of metadata locking is not to prevent metadata change, but to be able to atomically change it. As I said in other thread, there is not much point in this feature alone since chown() and setfcon() are atomic themselves. But this is preparing the code for original label remembering which will be stored in XATTRs at which point the whole operation is not going to be atomic. https://www.redhat.com/archives/libvir-list/2018-September/msg01349.html Long story short, at the first chown() libvirt will record the original owner of the file into XATTRs and then on the last restore it will use that owner to return the file to instead of root:root. The locks are there because reading XATTR, parsing it, increasing/decreasing refcounter and possibly doing chown() is not atomic. But with locks it is. > >>> There IS a lock held on the image, from the FIRST domain that we >>> started. The second domain, which is using the SAME image, unshared, >>> runs into the locking timeout. Sorry if I failed to describe this setup >>> appropriately. I am starting to believe that this is expected behaviour, >>> although it is not intuitive. >>> >>> # lslocks -u >>> COMMANDPID TYPE SIZE MODE M STARTEND PATH >>> ... >>> virtlockd 199062 POSIX 1.5G WRITE 0 0 0 >>> /var/lib/libvirt/images/u1604.qcow2 >> >> But this should not clash, because metadata lock use different bytes: >> the regular locking uses offset=0, metadata lock uses offset=1. Both >> locks lock just one byte in length. > > It's the only lock in the output for the image. Should I see the > metadata lock at start=1 as well? You should see it if you run lslocks at the right moment. But since metadata locking happens only for a fraction of a second, it is very unlikely that you'll be able to run it at the right moment. > >> However, from the logs it looks like virtlockd doesn't try to actually >> acquire the lock because it found in internal records that the path is >> locked (even though it is locked with different offset). Anyway, now I >> am finally able to reproduce the issue and I'll look into this too. > > Good. > >> Quick and dirty patch might look something like this: >> >> diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c >> index 79fa48d365..3c51d7926b 100644 >> --- i/src/util/virlockspace.c >> +++ w/src/util/virlockspace.c >> @@ -630,8 +630,9 @@ int virLockSpaceAcquireResource(virLockSpacePtr >> lockspace, >> virMutexLock(&lockspace->lock); >> >> if ((res = virHashLookup(lockspace->resources, resname))) { >> -if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) && >> -(flags & VIR_LOCK_SPACE_ACQUIRE_SHARED)) { >> +if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED && >> +flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) || >> +start == 1) { >> >> if (VIR_EXPAND_N(res->owners, res->nOwners, 1) < 0) >> goto cleanup; > > Had a quick test with this, but now it seems like that the whole > metadata locking does nothing. When starting the second domain, I get an > internal error from QEMU, failing to get the write lock. The SELinux > labels of the image are changed as well. This is the same behaviour as > with metadata locking disabled. Not entirely sure what should happen or > if this is a error in my setup or not. I will have to think about this. Metadata locking is not supposed to prevent you from running two domains with the same disk. Nor to prevent chown() on the file. But since you are using virtlockd to lock the disk contents, it should prevent running such domains, not qemu. I wonder if this is a misconfiguration or something. Perhaps one domain has disk RW and the other has it RO + shared? Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Michal Privoznik [2018-09-27, 12:07PM +0200]: > On 09/27/2018 11:11 AM, Bjoern Walk wrote: > > I still don't understand why we need a timeout at all. If virtlockd is > > unable to get the lock, just bail and continue with what you did after > > the timeout runs out. Is this some kind of safety-measure? Against what? > > When there are two threads fighting over the lock. Imagine two threads > trying to set security label over the same file. Or imagine two daemons > on two distant hosts trying to set label on the same file on NFS. > virtlockd implements try-or-fail approach. So we need to call lock() > repeatedly until we succeed (or timeout). One thread wins and the other fails with lock contention? I don't see how repeatedly trying to acquire the lock will improve things, but maybe I am not getting it right now. > > There IS a lock held on the image, from the FIRST domain that we > > started. The second domain, which is using the SAME image, unshared, > > runs into the locking timeout. Sorry if I failed to describe this setup > > appropriately. I am starting to believe that this is expected behaviour, > > although it is not intuitive. > > > > # lslocks -u > > COMMANDPID TYPE SIZE MODE M STARTEND PATH > > ... > > virtlockd 199062 POSIX 1.5G WRITE 0 0 0 > > /var/lib/libvirt/images/u1604.qcow2 > > But this should not clash, because metadata lock use different bytes: > the regular locking uses offset=0, metadata lock uses offset=1. Both > locks lock just one byte in length. It's the only lock in the output for the image. Should I see the metadata lock at start=1 as well? > However, from the logs it looks like virtlockd doesn't try to actually > acquire the lock because it found in internal records that the path is > locked (even though it is locked with different offset). Anyway, now I > am finally able to reproduce the issue and I'll look into this too. Good. > Quick and dirty patch might look something like this: > > diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c > index 79fa48d365..3c51d7926b 100644 > --- i/src/util/virlockspace.c > +++ w/src/util/virlockspace.c > @@ -630,8 +630,9 @@ int virLockSpaceAcquireResource(virLockSpacePtr > lockspace, > virMutexLock(&lockspace->lock); > > if ((res = virHashLookup(lockspace->resources, resname))) { > -if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) && > -(flags & VIR_LOCK_SPACE_ACQUIRE_SHARED)) { > +if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED && > +flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) || > +start == 1) { > > if (VIR_EXPAND_N(res->owners, res->nOwners, 1) < 0) > goto cleanup; Had a quick test with this, but now it seems like that the whole metadata locking does nothing. When starting the second domain, I get an internal error from QEMU, failing to get the write lock. The SELinux labels of the image are changed as well. This is the same behaviour as with metadata locking disabled. Not entirely sure what should happen or if this is a error in my setup or not. I will have to think about this. > But as I say, this is very dirty hack. I need to find a better solution. > At least you might have something to test. > > Michal > > -- > libvir-list mailing list > libvir-list@redhat.com > https://www.redhat.com/mailman/listinfo/libvir-list > -- IBM Systems Linux on Z & Virtualization Development -- IBM Deutschland Research & Development GmbH Schönaicher Str. 220, 71032 Böblingen Phone: +49 7031 16 1819 -- Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 signature.asc Description: PGP signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
On 09/27/2018 11:11 AM, Bjoern Walk wrote: > Michal Privoznik [2018-09-27, 10:15AM +0200]: >> On 09/27/2018 09:01 AM, Bjoern Walk wrote: >>> Michal Privoznik [2018-09-19, 11:45AM +0200]: On 09/19/2018 11:17 AM, Bjoern Walk wrote: > Bjoern Walk [2018-09-12, 01:17PM +0200]: >> Michal Privoznik [2018-09-12, 11:32AM >> +0200]: >> > > Still seeing the same timeout. Is this expected behaviour? > Nope. I wonder if something has locked the path and forgot to unlock it (however, virtlockd should have unlocked all the paths owned by PID on connection close), or something is still holding the lock and connection opened. Can you see the timeout even when you turn off the selinux driver (security_driver="none' in qemu.conf)? I tried to reproduce the issue yesterday and was unsuccessful. Do you have any steps to reproduce? >>> >>> So, I haven't been able to actually dig into the debugging but we >>> have been able to reproduce this behaviour on x86 (both with SELinux >>> and DAC). Looks like a general problem, if a problem at all, because >>> from what I can see in the code, the 60 seconds timeout is actually >>> intended, or not? The security manager does try for 60 seconds to >>> acquire the lock and only then bails out. Why is this? >> >> The ideal solution would be to just tell virlockd "these are the paths I >> want you to lock on my behalf" and virtlockd would use F_SETLKW so that >> the moment all paths are unlocked virtlockd will lock them and libvirtd >> can continue its execution (i.e. chown() and setfcon()). However, we >> can't do this because virtlockd runs single threaded [1] and therefore >> if one thread is waiting for lock to be acquired there is no other >> thread to unlock the path. >> >> Therefore I had to move the logic into libvirtd which tries repeatedly >> to lock all the paths needed. And this is where the timeout steps in - >> the lock acquiring attempts are capped at 60 seconds. This is an >> arbitrary chosen timeout. We can make it smaller, but that will not >> solve your problem - only mask it. > > I still don't understand why we need a timeout at all. If virtlockd is > unable to get the lock, just bail and continue with what you did after > the timeout runs out. Is this some kind of safety-measure? Against what? When there are two threads fighting over the lock. Imagine two threads trying to set security label over the same file. Or imagine two daemons on two distant hosts trying to set label on the same file on NFS. virtlockd implements try-or-fail approach. So we need to call lock() repeatedly until we succeed (or timeout). > >> >>> >>> However, an actual bug is that while the security manager waits for >>> the lock acquire the whole libvirtd hangs, but from what I understood >>> and Marc explained to me, this is more of a pathological error in >>> libvirt behaviour with various domain locks. >>> >> >> Whole libvirtd shouldn't hang. Only those threads which try to acquire >> domain object lock. IOW it should be possible to run APIs over different >> domains (or other objects for that matter). For instance dumpxml over >> different domain works just fine. > > Yes, but from a user perspective, this is still pretty bad and > unexpected. libvirt should continue to operate as usual while virtlockd > is figuring out the locking. Agreed. I will look into this. > >> Anyway, we need to get to the bottom of this. Looks like something keeps >> the file locked and then when libvirt wants to lock if for metadata the >> timeout is hit and whole operation fails. Do you mind running 'lslocks >> -u' when starting a domain and before the timeout is hit? > > There IS a lock held on the image, from the FIRST domain that we > started. The second domain, which is using the SAME image, unshared, > runs into the locking timeout. Sorry if I failed to describe this setup > appropriately. I am starting to believe that this is expected behaviour, > although it is not intuitive. > > # lslocks -u > COMMANDPID TYPE SIZE MODE M STARTEND PATH > ... > virtlockd 199062 POSIX 1.5G WRITE 0 0 0 > /var/lib/libvirt/images/u1604.qcow2 But this should not clash, because metadata lock use different bytes: the regular locking uses offset=0, metadata lock uses offset=1. Both locks lock just one byte in length. However, from the logs it looks like virtlockd doesn't try to actually acquire the lock because it found in internal records that the path is locked (even though it is locked with different offset). Anyway, now I am finally able to reproduce the issue and I'll look into this too. Quick and dirty patch might look something like this: diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c index 79fa48d365..3c51d7926b 100644 --- i/src/util/virlockspace.c +++ w/src/util/virlockspace.c @@ -630,8 +630,9 @@ int virLockSpaceAcquireResource(virLockSpacePtr
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Michal Privoznik [2018-09-27, 10:15AM +0200]: > On 09/27/2018 09:01 AM, Bjoern Walk wrote: > > Michal Privoznik [2018-09-19, 11:45AM +0200]: > >> On 09/19/2018 11:17 AM, Bjoern Walk wrote: > >>> Bjoern Walk [2018-09-12, 01:17PM +0200]: > Michal Privoznik [2018-09-12, 11:32AM > +0200]: > >> > > >>> > >>> Still seeing the same timeout. Is this expected behaviour? > >>> > >> > >> Nope. I wonder if something has locked the path and forgot to > >> unlock it (however, virtlockd should have unlocked all the paths > >> owned by PID on connection close), or something is still holding > >> the lock and connection opened. > >> > >> Can you see the timeout even when you turn off the selinux driver > >> (security_driver="none' in qemu.conf)? I tried to reproduce the > >> issue yesterday and was unsuccessful. Do you have any steps to > >> reproduce? > > > > So, I haven't been able to actually dig into the debugging but we > > have been able to reproduce this behaviour on x86 (both with SELinux > > and DAC). Looks like a general problem, if a problem at all, because > > from what I can see in the code, the 60 seconds timeout is actually > > intended, or not? The security manager does try for 60 seconds to > > acquire the lock and only then bails out. Why is this? > > The ideal solution would be to just tell virlockd "these are the paths I > want you to lock on my behalf" and virtlockd would use F_SETLKW so that > the moment all paths are unlocked virtlockd will lock them and libvirtd > can continue its execution (i.e. chown() and setfcon()). However, we > can't do this because virtlockd runs single threaded [1] and therefore > if one thread is waiting for lock to be acquired there is no other > thread to unlock the path. > > Therefore I had to move the logic into libvirtd which tries repeatedly > to lock all the paths needed. And this is where the timeout steps in - > the lock acquiring attempts are capped at 60 seconds. This is an > arbitrary chosen timeout. We can make it smaller, but that will not > solve your problem - only mask it. I still don't understand why we need a timeout at all. If virtlockd is unable to get the lock, just bail and continue with what you did after the timeout runs out. Is this some kind of safety-measure? Against what? > > > > > However, an actual bug is that while the security manager waits for > > the lock acquire the whole libvirtd hangs, but from what I understood > > and Marc explained to me, this is more of a pathological error in > > libvirt behaviour with various domain locks. > > > > Whole libvirtd shouldn't hang. Only those threads which try to acquire > domain object lock. IOW it should be possible to run APIs over different > domains (or other objects for that matter). For instance dumpxml over > different domain works just fine. Yes, but from a user perspective, this is still pretty bad and unexpected. libvirt should continue to operate as usual while virtlockd is figuring out the locking. > Anyway, we need to get to the bottom of this. Looks like something keeps > the file locked and then when libvirt wants to lock if for metadata the > timeout is hit and whole operation fails. Do you mind running 'lslocks > -u' when starting a domain and before the timeout is hit? There IS a lock held on the image, from the FIRST domain that we started. The second domain, which is using the SAME image, unshared, runs into the locking timeout. Sorry if I failed to describe this setup appropriately. I am starting to believe that this is expected behaviour, although it is not intuitive. # lslocks -u COMMANDPID TYPE SIZE MODE M STARTEND PATH ... virtlockd 199062 POSIX 1.5G WRITE 0 0 0 /var/lib/libvirt/images/u1604.qcow2 ... > > Michal > > 1: The reason that virtlockd has to run single threaded is stupidity of > POSIX file locks. Imagine one thread doing: open() + fcntl(fd, F_SETLKW, > ...) and entering critical section. If another thread does open() + > close() on the same file the file is unlocked. Because we can't > guarantee this will not happen in multithreaded libvirt we had to > introduce a separate process to take care of that. And this process has > to be single threaded so there won't ever be the second thread to call > close() and unintentionally release the lock. Thanks for the explanation, I will have some reading to do to get a better overview of the locking process in Linux. > > Perhaps we could use OFD locks but those are not part of POSIX and are > available on Linux only. > > -- > libvir-list mailing list > libvir-list@redhat.com > https://www.redhat.com/mailman/listinfo/libvir-list > -- IBM Systems Linux on Z & Virtualization Development -- IBM Deutschland Research & Development GmbH Schönaicher Str. 220, 71032 Böblingen Phone: +49 7031 16 1819 -- Vorsitze
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Michal Privoznik [2018-09-27, 10:14AM +0200]: > Right, so this is just waiting for virtlockd to lock the paths. > virtlockd is obviously unable to do that (as I suggested in my previous > e-mail - is perhaps some other process holding the lock?). It can not lock the paths, because those paths are already locked by the first domain. This is intentional. But it should take 60 seconds to resolve this and block the whole libvirt daemon in the process. > Can you please enable debug logs for virtlockd, reproduce and share the > log with me? Setting this in /etc/libvirt/virtlockd.conf should be enough: > > log_level=1 > log_filters="4:event 3:rpc" > log_outputs="1:file:/var/log/virtlockd.lod" See attached. virtlockd.log.gz Description: application/gzip signature.asc Description: PGP signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
On Thu, Sep 27, 2018 at 10:15 AM +0200, Michal Privoznik wrote: > On 09/27/2018 09:01 AM, Bjoern Walk wrote: >> Michal Privoznik [2018-09-19, 11:45AM +0200]: >>> On 09/19/2018 11:17 AM, Bjoern Walk wrote: Bjoern Walk [2018-09-12, 01:17PM +0200]: > Michal Privoznik [2018-09-12, 11:32AM > +0200]: >>> > Still seeing the same timeout. Is this expected behaviour? >>> >>> Nope. I wonder if something has locked the path and forgot to >>> unlock it (however, virtlockd should have unlocked all the paths >>> owned by PID on connection close), or something is still holding >>> the lock and connection opened. >>> >>> Can you see the timeout even when you turn off the selinux driver >>> (security_driver="none' in qemu.conf)? I tried to reproduce the >>> issue yesterday and was unsuccessful. Do you have any steps to >>> reproduce? >> >> So, I haven't been able to actually dig into the debugging but we >> have been able to reproduce this behaviour on x86 (both with SELinux >> and DAC). Looks like a general problem, if a problem at all, because >> from what I can see in the code, the 60 seconds timeout is actually >> intended, or not? The security manager does try for 60 seconds to >> acquire the lock and only then bails out. Why is this? > > The ideal solution would be to just tell virlockd "these are the paths I > want you to lock on my behalf" and virtlockd would use F_SETLKW so that > the moment all paths are unlocked virtlockd will lock them and libvirtd > can continue its execution (i.e. chown() and setfcon()). However, we > can't do this because virtlockd runs single threaded [1] and therefore > if one thread is waiting for lock to be acquired there is no other > thread to unlock the path. > > Therefore I had to move the logic into libvirtd which tries repeatedly > to lock all the paths needed. And this is where the timeout steps in - > the lock acquiring attempts are capped at 60 seconds. This is an > arbitrary chosen timeout. We can make it smaller, but that will not > solve your problem - only mask it. > >> >> However, an actual bug is that while the security manager waits for >> the lock acquire the whole libvirtd hangs, but from what I understood >> and Marc explained to me, this is more of a pathological error in >> libvirt behaviour with various domain locks. >> > > Whole libvirtd shouldn't hang. The main loop doesn’t hang. > Only those threads which try to acquire > domain object lock. IOW it should be possible to run APIs over different > domains (or other objects for that matter). For instance dumpxml over > different domain works just fine. Oh sry, my assumption that the thread A is also holding the virDomainObjList lock is wrong! Here is the actual backtrace of a waiting thread (in this case the “worker thread for the "virsh list" command”) (gdb) bt #0 0x7f850887b7ed in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x7f8508874cf4 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x7f850c26cfd9 in virMutexLock (m=) at util/virthread.c:89 #3 0x7f850c2455df in virObjectLock (anyobj=) at util/virobject.c:429 #4 0x7f850c2ceaba in virDomainObjListFilter (list=list@entry=0x7f84f47a97c0, nvms=nvms@entry=0x7f84f47a97c8, conn=conn@entry=0x7f84dc01ad00, filter=filter@entry=0x7f850c318c70 , flags=flags@entry=1) at conf/virdomainobjlist.c:919 #5 0x7f850c2cfef2 in virDomainObjListCollect (domlist=0x7f848c10e290, conn=conn@entry=0x7f84dc01ad00, vms=vms@entry=0x7f84f47a9820, nvms=nvms@entry=0x7f84f47a9830, filter=0x7f850c318c70 , flags=1) at conf/virdomainobjlist.c:961 #6 0x7f850c2d01a6 in virDomainObjListExport (domlist=, conn=0x7f84dc01ad00, domains=0x7f84f47a9890, filter=, flags=) at conf/virdomainobjlist.c:1042 #7 0x7f850c426be9 in virConnectListAllDomains (conn=0x7f84dc01ad00, domains=0x7f84f47a9890, flags=1) at libvirt-domain.c:6493 #8 0x55a82b2d905d in remoteDispatchConnectListAllDomains (server=0x55a82c9e16d0, msg=0x55a82ca53380, args=0x7f84dc01ae30, args=0x7f84dc01ae30, ret=0x7f84dc005bb0, rerr=0x7f84f47a9960, client=) at remote/remote_daemon_dispatch_stubs.h:1372 #9 remoteDispatchConnectListAllDomainsHelper (server=0x55a82c9e16d0, client=, msg=0x55a82ca53380, rerr=0x7f84f47a9960, args=0x7f84dc01ae30, ret=0x7f84dc005bb0) at remote/remote_daemon_dispatch_stubs.h:1348 #10 0x7f850c338734 in virNetServerProgramDispatchCall (msg=0x55a82ca53380, client=0x55a82ca52a40, server=0x55a82c9e16d0, prog=0x55a82ca4be70) at rpc/virnetserverprogram.c:437 … virDomainObjListFilter needs the lock of every domain… That’s the actual problem. > > Anyway, we need to get to the bottom of this. Looks like something keeps > the file locked and then when libvirt wants to lock if for metadata the > timeout is hit and whole operation fails. Do you mind running 'lslocks > -u' when starting a domain and before the timeout is hit? > > Michal > > 1: The reason that virtlockd has to run single threaded is stupidity of > POS
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
On Thu, Sep 27, 2018 at 10:14 AM +0200, Michal Privoznik wrote: > On 09/27/2018 09:57 AM, Marc Hartmayer wrote: >> On Thu, Sep 27, 2018 at 09:01 AM +0200, Bjoern Walk >> wrote: >>> Michal Privoznik [2018-09-19, 11:45AM +0200]: On 09/19/2018 11:17 AM, Bjoern Walk wrote: > Bjoern Walk [2018-09-12, 01:17PM +0200]: >> Michal Privoznik [2018-09-12, 11:32AM +0200]: >> > > Still seeing the same timeout. Is this expected behaviour? > Nope. I wonder if something has locked the path and forgot to unlock it (however, virtlockd should have unlocked all the paths owned by PID on connection close), or something is still holding the lock and connection opened. Can you see the timeout even when you turn off the selinux driver (security_driver="none' in qemu.conf)? I tried to reproduce the issue yesterday and was unsuccessful. Do you have any steps to reproduce? >>> >>> So, I haven't been able to actually dig into the debugging but we have >>> been able to reproduce this behaviour on x86 (both with SELinux and >>> DAC). Looks like a general problem, if a problem at all, because from >>> what I can see in the code, the 60 seconds timeout is actually intended, >>> or not? The security manager does try for 60 seconds to acquire the lock >>> and only then bails out. Why is this? >> >> Backtrace of libvirtd where it’s hanging (in thread A) >> >> (gdb) bt >> #0 read () from /lib64/libpthread.so.0 >> #1 read (__nbytes=1024, __buf=0x7f84e401afd0, __fd=31) at >> /usr/include/bits/unistd.h:44 >> #2 saferead (fd=fd@entry=31, buf=0x7f84e401afd0, count=count@entry=1024) at >> util/virfile.c:1061 >> #3 saferead_lim (fd=31, max_len=max_len@entry=1024, >> length=length@entry=0x7f84f3fa8240) at util/virfile.c:1345 >> #4 virFileReadHeaderFD (fd=, maxlen=maxlen@entry=1024, >> buf=buf@entry=0x7f84f3fa8278) at util/virfile.c:1376 >> #5 virProcessRunInMountNamespace () at util/virprocess.c:1149 >> #6 virSecuritySELinuxTransactionCommit (mgr=, pid=23977) at >> security/security_selinux.c:1106 >> #7 virSecurityManagerTransactionCommit (mgr=0x7f848c0ffd60, >> pid=pid@entry=23977) at security/security_manager.c:324 >> #8 virSecurityStackTransactionCommit (mgr=, pid=23977) at >> security/security_stack.c:166 >> #9 virSecurityManagerTransactionCommit (mgr=0x7f848c183760, >> pid=pid@entry=23977) at security/security_manager.c:324 >> #10 qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, >> vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at >> _security.c:56 >> #11 in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, >> driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, >> asyncJob=a=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, >> snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, >> flags=1 qemu/qemu_process.c:6329 >> #12 in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, >> driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, >> updatedCPU==0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, >> migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, >> snapshot=0x=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at >> qemu/qemu_process.c:6816 > > > This is just waiting for forked child to finish. Nothing suspicious here. > >> >> … >> >> #25 in start_thread () from /lib64/libpthread.so.0 >> #26 in clone () from /lib64/libc.so.6 >> >> >> Backtrace of the forked process where the process is trying to get the >> meta data lock for 60 seconds.> >> #0 0x7f8508572730 in nanosleep () from target:/lib64/libc.so.6 >> #1 0x7f850859dff4 in usleep () from target:/lib64/libc.so.6 >> #2 0x7f850c26efea in virTimeBackOffWait (var=var@entry=0x7f84f3fa81a0) >> at util/virtime.c:453 >> #3 0x7f850c3108d8 in virSecurityManagerMetadataLock >> (mgr=0x7f848c183850, paths=, npaths=) at >> security/security_manager.c:1345 >> #4 0x7f850c30e44b in virSecurityDACTransactionRun (pid=pid@entry=23977, >> opaque=opaque@entry=0x7f84e4021450) at security/security_dac.c:226 >> #5 0x7f850c250021 in virProcessNamespaceHelper (opaque=0x7f84e4021450, >> cb=0x7f850c30e330 , pid=23977, errfd=33) at >> util/virprocess.c:1100 >> #6 virProcessRunInMountNamespace () at util/virprocess.c:1140 >> #7 0x7f850c30e55c in virSecurityDACTransactionCommit (mgr=> out>, pid=23977) at security/security_dac.c:565 >> #8 0x7f850c30eeca in virSecurityManagerTransactionCommit >> (mgr=0x7f848c183850, pid=pid@entry=23977) at security/security_manager.c:324 >> #9 0x7f850c30b36b in virSecurityStackTransactionCommit (mgr=> out>, pid=23977) at security/security_stack.c:166 >> #10 0x7f850c30eeca in virSecurityManagerTransactionCommit >> (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324 >> #11 0x7f84e0586bf2 in qemuSecuritySetAllLabel >> (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, >> stdin_path=stdin_path@entry=0x0) at qemu/qemu_secu
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
On 09/27/2018 09:01 AM, Bjoern Walk wrote: > Michal Privoznik [2018-09-19, 11:45AM +0200]: >> On 09/19/2018 11:17 AM, Bjoern Walk wrote: >>> Bjoern Walk [2018-09-12, 01:17PM +0200]: Michal Privoznik [2018-09-12, 11:32AM +0200]: >> >>> >>> Still seeing the same timeout. Is this expected behaviour? >>> >> >> Nope. I wonder if something has locked the path and forgot to >> unlock it (however, virtlockd should have unlocked all the paths >> owned by PID on connection close), or something is still holding >> the lock and connection opened. >> >> Can you see the timeout even when you turn off the selinux driver >> (security_driver="none' in qemu.conf)? I tried to reproduce the >> issue yesterday and was unsuccessful. Do you have any steps to >> reproduce? > > So, I haven't been able to actually dig into the debugging but we > have been able to reproduce this behaviour on x86 (both with SELinux > and DAC). Looks like a general problem, if a problem at all, because > from what I can see in the code, the 60 seconds timeout is actually > intended, or not? The security manager does try for 60 seconds to > acquire the lock and only then bails out. Why is this? The ideal solution would be to just tell virlockd "these are the paths I want you to lock on my behalf" and virtlockd would use F_SETLKW so that the moment all paths are unlocked virtlockd will lock them and libvirtd can continue its execution (i.e. chown() and setfcon()). However, we can't do this because virtlockd runs single threaded [1] and therefore if one thread is waiting for lock to be acquired there is no other thread to unlock the path. Therefore I had to move the logic into libvirtd which tries repeatedly to lock all the paths needed. And this is where the timeout steps in - the lock acquiring attempts are capped at 60 seconds. This is an arbitrary chosen timeout. We can make it smaller, but that will not solve your problem - only mask it. > > However, an actual bug is that while the security manager waits for > the lock acquire the whole libvirtd hangs, but from what I understood > and Marc explained to me, this is more of a pathological error in > libvirt behaviour with various domain locks. > Whole libvirtd shouldn't hang. Only those threads which try to acquire domain object lock. IOW it should be possible to run APIs over different domains (or other objects for that matter). For instance dumpxml over different domain works just fine. Anyway, we need to get to the bottom of this. Looks like something keeps the file locked and then when libvirt wants to lock if for metadata the timeout is hit and whole operation fails. Do you mind running 'lslocks -u' when starting a domain and before the timeout is hit? Michal 1: The reason that virtlockd has to run single threaded is stupidity of POSIX file locks. Imagine one thread doing: open() + fcntl(fd, F_SETLKW, ...) and entering critical section. If another thread does open() + close() on the same file the file is unlocked. Because we can't guarantee this will not happen in multithreaded libvirt we had to introduce a separate process to take care of that. And this process has to be single threaded so there won't ever be the second thread to call close() and unintentionally release the lock. Perhaps we could use OFD locks but those are not part of POSIX and are available on Linux only. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
On 09/27/2018 09:57 AM, Marc Hartmayer wrote: > On Thu, Sep 27, 2018 at 09:01 AM +0200, Bjoern Walk > wrote: >> Michal Privoznik [2018-09-19, 11:45AM +0200]: >>> On 09/19/2018 11:17 AM, Bjoern Walk wrote: Bjoern Walk [2018-09-12, 01:17PM +0200]: > Michal Privoznik [2018-09-12, 11:32AM +0200]: >>> > Still seeing the same timeout. Is this expected behaviour? >>> >>> Nope. I wonder if something has locked the path and forgot to unlock it >>> (however, virtlockd should have unlocked all the paths owned by PID on >>> connection close), or something is still holding the lock and connection >>> opened. >>> >>> Can you see the timeout even when you turn off the selinux driver >>> (security_driver="none' in qemu.conf)? I tried to reproduce the issue >>> yesterday and was unsuccessful. Do you have any steps to reproduce? >> >> So, I haven't been able to actually dig into the debugging but we have >> been able to reproduce this behaviour on x86 (both with SELinux and >> DAC). Looks like a general problem, if a problem at all, because from >> what I can see in the code, the 60 seconds timeout is actually intended, >> or not? The security manager does try for 60 seconds to acquire the lock >> and only then bails out. Why is this? > > Backtrace of libvirtd where it’s hanging (in thread A) > > (gdb) bt > #0 read () from /lib64/libpthread.so.0 > #1 read (__nbytes=1024, __buf=0x7f84e401afd0, __fd=31) at > /usr/include/bits/unistd.h:44 > #2 saferead (fd=fd@entry=31, buf=0x7f84e401afd0, count=count@entry=1024) at > util/virfile.c:1061 > #3 saferead_lim (fd=31, max_len=max_len@entry=1024, > length=length@entry=0x7f84f3fa8240) at util/virfile.c:1345 > #4 virFileReadHeaderFD (fd=, maxlen=maxlen@entry=1024, > buf=buf@entry=0x7f84f3fa8278) at util/virfile.c:1376 > #5 virProcessRunInMountNamespace () at util/virprocess.c:1149 > #6 virSecuritySELinuxTransactionCommit (mgr=, pid=23977) at > security/security_selinux.c:1106 > #7 virSecurityManagerTransactionCommit (mgr=0x7f848c0ffd60, > pid=pid@entry=23977) at security/security_manager.c:324 > #8 virSecurityStackTransactionCommit (mgr=, pid=23977) at > security/security_stack.c:166 > #9 virSecurityManagerTransactionCommit (mgr=0x7f848c183760, > pid=pid@entry=23977) at security/security_manager.c:324 > #10 qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, > vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at _security.c:56 > #11 in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, > driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, > asyncJob=a=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, > snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=1 > qemu/qemu_process.c:6329 > #12 in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, > driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, > updatedCPU==0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, > migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, > snapshot=0x=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at > qemu/qemu_process.c:6816 This is just waiting for forked child to finish. Nothing suspicious here. > > … > > #25 in start_thread () from /lib64/libpthread.so.0 > #26 in clone () from /lib64/libc.so.6 > > > Backtrace of the forked process where the process is trying to get the > meta data lock for 60 seconds.> > #0 0x7f8508572730 in nanosleep () from target:/lib64/libc.so.6 > #1 0x7f850859dff4 in usleep () from target:/lib64/libc.so.6 > #2 0x7f850c26efea in virTimeBackOffWait (var=var@entry=0x7f84f3fa81a0) > at util/virtime.c:453 > #3 0x7f850c3108d8 in virSecurityManagerMetadataLock (mgr=0x7f848c183850, > paths=, npaths=) at > security/security_manager.c:1345 > #4 0x7f850c30e44b in virSecurityDACTransactionRun (pid=pid@entry=23977, > opaque=opaque@entry=0x7f84e4021450) at security/security_dac.c:226 > #5 0x7f850c250021 in virProcessNamespaceHelper (opaque=0x7f84e4021450, > cb=0x7f850c30e330 , pid=23977, errfd=33) at > util/virprocess.c:1100 > #6 virProcessRunInMountNamespace () at util/virprocess.c:1140 > #7 0x7f850c30e55c in virSecurityDACTransactionCommit (mgr= out>, pid=23977) at security/security_dac.c:565 > #8 0x7f850c30eeca in virSecurityManagerTransactionCommit > (mgr=0x7f848c183850, pid=pid@entry=23977) at security/security_manager.c:324 > #9 0x7f850c30b36b in virSecurityStackTransactionCommit (mgr= out>, pid=23977) at security/security_stack.c:166 > #10 0x7f850c30eeca in virSecurityManagerTransactionCommit > (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324 > #11 0x7f84e0586bf2 in qemuSecuritySetAllLabel > (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, > stdin_path=stdin_path@entry=0x0) at qemu/qemu_security.c:56 > #12 0x7f84e051b7fd in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, > driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, > asyncJo
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
On Thu, Sep 27, 2018 at 09:01 AM +0200, Bjoern Walk wrote: > Michal Privoznik [2018-09-19, 11:45AM +0200]: >> On 09/19/2018 11:17 AM, Bjoern Walk wrote: >> > Bjoern Walk [2018-09-12, 01:17PM +0200]: >> >> Michal Privoznik [2018-09-12, 11:32AM +0200]: >> >> >> >> > >> > Still seeing the same timeout. Is this expected behaviour? >> > >> >> Nope. I wonder if something has locked the path and forgot to unlock it >> (however, virtlockd should have unlocked all the paths owned by PID on >> connection close), or something is still holding the lock and connection >> opened. >> >> Can you see the timeout even when you turn off the selinux driver >> (security_driver="none' in qemu.conf)? I tried to reproduce the issue >> yesterday and was unsuccessful. Do you have any steps to reproduce? > > So, I haven't been able to actually dig into the debugging but we have > been able to reproduce this behaviour on x86 (both with SELinux and > DAC). Looks like a general problem, if a problem at all, because from > what I can see in the code, the 60 seconds timeout is actually intended, > or not? The security manager does try for 60 seconds to acquire the lock > and only then bails out. Why is this? Backtrace of libvirtd where it’s hanging (in thread A) (gdb) bt #0 read () from /lib64/libpthread.so.0 #1 read (__nbytes=1024, __buf=0x7f84e401afd0, __fd=31) at /usr/include/bits/unistd.h:44 #2 saferead (fd=fd@entry=31, buf=0x7f84e401afd0, count=count@entry=1024) at util/virfile.c:1061 #3 saferead_lim (fd=31, max_len=max_len@entry=1024, length=length@entry=0x7f84f3fa8240) at util/virfile.c:1345 #4 virFileReadHeaderFD (fd=, maxlen=maxlen@entry=1024, buf=buf@entry=0x7f84f3fa8278) at util/virfile.c:1376 #5 virProcessRunInMountNamespace () at util/virprocess.c:1149 #6 virSecuritySELinuxTransactionCommit (mgr=, pid=23977) at security/security_selinux.c:1106 #7 virSecurityManagerTransactionCommit (mgr=0x7f848c0ffd60, pid=pid@entry=23977) at security/security_manager.c:324 #8 virSecurityStackTransactionCommit (mgr=, pid=23977) at security/security_stack.c:166 #9 virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324 #10 qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at _security.c:56 #11 in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=a=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=1 qemu/qemu_process.c:6329 #12 in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, updatedCPU==0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6816 … #25 in start_thread () from /lib64/libpthread.so.0 #26 in clone () from /lib64/libc.so.6 Backtrace of the forked process where the process is trying to get the meta data lock for 60 seconds. #0 0x7f8508572730 in nanosleep () from target:/lib64/libc.so.6 #1 0x7f850859dff4 in usleep () from target:/lib64/libc.so.6 #2 0x7f850c26efea in virTimeBackOffWait (var=var@entry=0x7f84f3fa81a0) at util/virtime.c:453 #3 0x7f850c3108d8 in virSecurityManagerMetadataLock (mgr=0x7f848c183850, paths=, npaths=) at security/security_manager.c:1345 #4 0x7f850c30e44b in virSecurityDACTransactionRun (pid=pid@entry=23977, opaque=opaque@entry=0x7f84e4021450) at security/security_dac.c:226 #5 0x7f850c250021 in virProcessNamespaceHelper (opaque=0x7f84e4021450, cb=0x7f850c30e330 , pid=23977, errfd=33) at util/virprocess.c:1100 #6 virProcessRunInMountNamespace () at util/virprocess.c:1140 #7 0x7f850c30e55c in virSecurityDACTransactionCommit (mgr=, pid=23977) at security/security_dac.c:565 #8 0x7f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183850, pid=pid@entry=23977) at security/security_manager.c:324 #9 0x7f850c30b36b in virSecurityStackTransactionCommit (mgr=, pid=23977) at security/security_stack.c:166 #10 0x7f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324 #11 0x7f84e0586bf2 in qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at qemu/qemu_security.c:56 #12 0x7f84e051b7fd in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6329 #13 0x7f84e051ee2e in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Michal Privoznik [2018-09-19, 11:45AM +0200]: > On 09/19/2018 11:17 AM, Bjoern Walk wrote: > > Bjoern Walk [2018-09-12, 01:17PM +0200]: > >> Michal Privoznik [2018-09-12, 11:32AM +0200]: > > >> > > > > Still seeing the same timeout. Is this expected behaviour? > > > > Nope. I wonder if something has locked the path and forgot to unlock it > (however, virtlockd should have unlocked all the paths owned by PID on > connection close), or something is still holding the lock and connection > opened. > > Can you see the timeout even when you turn off the selinux driver > (security_driver="none' in qemu.conf)? I tried to reproduce the issue > yesterday and was unsuccessful. Do you have any steps to reproduce? So, I haven't been able to actually dig into the debugging but we have been able to reproduce this behaviour on x86 (both with SELinux and DAC). Looks like a general problem, if a problem at all, because from what I can see in the code, the 60 seconds timeout is actually intended, or not? The security manager does try for 60 seconds to acquire the lock and only then bails out. Why is this? However, an actual bug is that while the security manager waits for the lock acquire the whole libvirtd hangs, but from what I understood and Marc explained to me, this is more of a pathological error in libvirt behaviour with various domain locks. -- IBM Systems Linux on Z & Virtualization Development -- IBM Deutschland Research & Development GmbH Schönaicher Str. 220, 71032 Böblingen Phone: +49 7031 16 1819 -- Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 signature.asc Description: PGP signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Michal Privoznik [2018-09-19, 11:45AM +0200]: > On 09/19/2018 11:17 AM, Bjoern Walk wrote: > > Bjoern Walk [2018-09-12, 01:17PM +0200]: > >> Michal Privoznik [2018-09-12, 11:32AM +0200]: > > >> > > > > Still seeing the same timeout. Is this expected behaviour? > > > > Nope. I wonder if something has locked the path and forgot to unlock it > (however, virtlockd should have unlocked all the paths owned by PID on > connection close), or something is still holding the lock and connection > opened. I can reproduce on a freshly booted machine. There should be no rouge lock held anywhere. > > Can you see the timeout even when you turn off the selinux driver > (security_driver="none' in qemu.conf)? I tried to reproduce the issue Yes, same issue. > yesterday and was unsuccessful. Do you have any steps to reproduce? 0. Host setup: # /usr/sbin/sestatus SELinux status: enabled SELinuxfs mount:/sys/fs/selinux SELinux root directory: /etc/selinux Loaded policy name: targeted Current mode: enforcing Mode from config file: enforcing Policy MLS status: enabled Policy deny_unknown status: allowed Memory protection checking: actual (secure) Max kernel policy version: 31 # grep -E "^[^#]" /etc/libvirt/qemu.conf lock_manager = "lockd" metadata_lock_manager = "lockd" 1. Define two domains, u1604-1 and u1604-2, using the same image, not shared: u1604-1 e49679de-eca9-4a72-8d1a-56f437541157 524288 524288 2 hvm destroy restart preserve /usr/bin/qemu-system-s390x 2. Start domain u1604-1: # ls -Z /var/lib/libvirt/images/u1604.qcow2 system_u:object_r:svirt_image_t:s0:c387,c937 /var/lib/libvirt/images/u1604.qcow2 3. Start domain u1604-2. 4. Result: the libvirtd hangs for 60 seconds after which the expected locking message appears. Security labels of the image are not changed: # virsh start u1604-2 error: Failed to start domain u1604-2 error: internal error: child reported: resource busy: Lockspace resource '/var/lib/libvirt/images/u1604.qcow2' is locked # ls -Z /var/lib/libvirt/images/u1604.qcow2 system_u:object_r:svirt_image_t:s0:c387,c937 /var/lib/libvirt/images/u1604.qcow2 Backtrace is the same I sent earlier. Let me know if you need more info or if anything is borked in my setup. > > Michal > signature.asc Description: PGP signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
On 09/19/2018 11:17 AM, Bjoern Walk wrote: > Bjoern Walk [2018-09-12, 01:17PM +0200]: >> Michal Privoznik [2018-09-12, 11:32AM +0200]: >> > > Still seeing the same timeout. Is this expected behaviour? > Nope. I wonder if something has locked the path and forgot to unlock it (however, virtlockd should have unlocked all the paths owned by PID on connection close), or something is still holding the lock and connection opened. Can you see the timeout even when you turn off the selinux driver (security_driver="none' in qemu.conf)? I tried to reproduce the issue yesterday and was unsuccessful. Do you have any steps to reproduce? Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Bjoern Walk [2018-09-12, 01:17PM +0200]: > Michal Privoznik [2018-09-12, 11:32AM +0200]: > > On 09/12/2018 07:19 AM, Bjoern Walk wrote: > > > Michal Privoznik [2018-09-10, 11:36AM +0200]: > > >> Technically, this is v4 of: > > >> > > >> https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html > > >> > > >> However, this is implementing different approach than any of the > > >> previous versions. > > >> > > >> One of the problems with previous version was that it was too > > >> complicated. The main reason for that was that we could not close the > > >> connection whilst there was a file locked. So we had to invent a > > >> mechanism that would prevent that (on the client side). > > >> > > >> These patches implement different approach. They rely on secdriver's > > >> transactions which bring all the paths we want to label into one place > > >> so that they can be relabelled within different namespace. > > >> I'm extending this idea so that transactions run all the time > > >> (regardless of domain namespacing) and only at the very last moment is > > >> decided which namespace would the relabeling run in. > > >> > > >> Metadata locking is then as easy as putting lock/unlock calls around one > > >> function. > > >> > > >> You can find the patches at my github too: > > >> > > >> https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt > > > > > > Hey Michal, > > > > > > is was running a quick test with this patch series with two domains > > > sharing a disk image without and SELinux enabled. When > > > starting the second domain, the whole libvirtd daemon hangs for almost a > > > minute until giving the error that the image is locked. I haven't > > > debugged it yet to figure out what happens. > > > > Is this on two different hosts or one? > > On the same host. > > > I'm unable to reproduce, so can you please attach debugger and share 't > > a a bt' output with me? > > (gdb) thread apply all bt > > Thread 17 (Thread 0x3ff48dff910 (LWP 193353)): > #0 0x03ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x03ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x03ff6678c5a8 in udevEventHandleThread (opaque=) at > node_device/node_device_udev.c:1603 > #3 0x03ff8c6bad16 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x03ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x03ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 16 (Thread 0x3ff4acfe910 (LWP 193312)): > #0 0x03ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x03ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x03ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x03ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x03ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x03ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 15 (Thread 0x3ff4b4ff910 (LWP 193311)): > #0 0x03ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x03ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x03ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x03ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x03ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x03ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 14 (Thread 0x3ff64efd910 (LWP 193310)): > #0 0x03ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x03ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x03ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x03ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x03ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x03ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 13 (Thread 0x3ff656fe910 (LWP 193309)): > #0 0x03ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x03ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x03ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x03ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x03ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x03ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 12 (Thread 0x3ff65eff910 (LWP 193308)): > #0 0x03ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x03ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x03ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x03ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x03ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x03ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 11 (Thread 0
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
On 09/10/2018 11:36 AM, Michal Privoznik wrote: > Hopefully, fixed all the nits John found, and pushed. Thanks John for the review! Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Michal Privoznik [2018-09-12, 11:32AM +0200]: > On 09/12/2018 07:19 AM, Bjoern Walk wrote: > > Michal Privoznik [2018-09-10, 11:36AM +0200]: > >> Technically, this is v4 of: > >> > >> https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html > >> > >> However, this is implementing different approach than any of the > >> previous versions. > >> > >> One of the problems with previous version was that it was too > >> complicated. The main reason for that was that we could not close the > >> connection whilst there was a file locked. So we had to invent a > >> mechanism that would prevent that (on the client side). > >> > >> These patches implement different approach. They rely on secdriver's > >> transactions which bring all the paths we want to label into one place > >> so that they can be relabelled within different namespace. > >> I'm extending this idea so that transactions run all the time > >> (regardless of domain namespacing) and only at the very last moment is > >> decided which namespace would the relabeling run in. > >> > >> Metadata locking is then as easy as putting lock/unlock calls around one > >> function. > >> > >> You can find the patches at my github too: > >> > >> https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt > > > > Hey Michal, > > > > is was running a quick test with this patch series with two domains > > sharing a disk image without and SELinux enabled. When > > starting the second domain, the whole libvirtd daemon hangs for almost a > > minute until giving the error that the image is locked. I haven't > > debugged it yet to figure out what happens. > > Is this on two different hosts or one? On the same host. > I'm unable to reproduce, so can you please attach debugger and share 't > a a bt' output with me? (gdb) thread apply all bt Thread 17 (Thread 0x3ff48dff910 (LWP 193353)): #0 0x03ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x03ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x03ff6678c5a8 in udevEventHandleThread (opaque=) at node_device/node_device_udev.c:1603 #3 0x03ff8c6bad16 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x03ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x03ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 16 (Thread 0x3ff4acfe910 (LWP 193312)): #0 0x03ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x03ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x03ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x03ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x03ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x03ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 15 (Thread 0x3ff4b4ff910 (LWP 193311)): #0 0x03ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x03ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x03ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x03ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x03ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x03ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 14 (Thread 0x3ff64efd910 (LWP 193310)): #0 0x03ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x03ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x03ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x03ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x03ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x03ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 13 (Thread 0x3ff656fe910 (LWP 193309)): #0 0x03ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x03ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x03ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x03ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x03ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x03ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 12 (Thread 0x3ff65eff910 (LWP 193308)): #0 0x03ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x03ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x03ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x03ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x03ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x03ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 11 (Thread 0x3ff67fff910 (LWP 193307)): #0 0x03ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x03ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x03ff8c6bbba0 in virThreadPoolWorker () from /lib64/li
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
On 09/12/2018 07:19 AM, Bjoern Walk wrote: > Michal Privoznik [2018-09-10, 11:36AM +0200]: >> Technically, this is v4 of: >> >> https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html >> >> However, this is implementing different approach than any of the >> previous versions. >> >> One of the problems with previous version was that it was too >> complicated. The main reason for that was that we could not close the >> connection whilst there was a file locked. So we had to invent a >> mechanism that would prevent that (on the client side). >> >> These patches implement different approach. They rely on secdriver's >> transactions which bring all the paths we want to label into one place >> so that they can be relabelled within different namespace. >> I'm extending this idea so that transactions run all the time >> (regardless of domain namespacing) and only at the very last moment is >> decided which namespace would the relabeling run in. >> >> Metadata locking is then as easy as putting lock/unlock calls around one >> function. >> >> You can find the patches at my github too: >> >> https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt > > Hey Michal, > > is was running a quick test with this patch series with two domains > sharing a disk image without and SELinux enabled. When > starting the second domain, the whole libvirtd daemon hangs for almost a > minute until giving the error that the image is locked. I haven't > debugged it yet to figure out what happens. Is this on two different hosts or one? I'm unable to reproduce, so can you please attach debugger and share 't a a bt' output with me? When I try to reproduce I always get one domain running and the other failing to start because qemu is unable to do its locking. When I put in both, they start successfully. TBH, I don't run SELinux enabled host so I'm testing DAC only, but the changes to the code are merely the same. So I'm wondering if this is really an issue in my SELinux impl or somewhere else. BTW: The one minute timeout comes from patch 16/23: #define LOCK_ACQUIRE_TIMEOUT 60 Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Michal Privoznik [2018-09-10, 11:36AM +0200]: > Technically, this is v4 of: > > https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html > > However, this is implementing different approach than any of the > previous versions. > > One of the problems with previous version was that it was too > complicated. The main reason for that was that we could not close the > connection whilst there was a file locked. So we had to invent a > mechanism that would prevent that (on the client side). > > These patches implement different approach. They rely on secdriver's > transactions which bring all the paths we want to label into one place > so that they can be relabelled within different namespace. > I'm extending this idea so that transactions run all the time > (regardless of domain namespacing) and only at the very last moment is > decided which namespace would the relabeling run in. > > Metadata locking is then as easy as putting lock/unlock calls around one > function. > > You can find the patches at my github too: > > https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt Hey Michal, is was running a quick test with this patch series with two domains sharing a disk image without and SELinux enabled. When starting the second domain, the whole libvirtd daemon hangs for almost a minute until giving the error that the image is locked. I haven't debugged it yet to figure out what happens. Otherwise it's looking good, relabeling is prevented as expected. Bjoern signature.asc Description: PGP signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list