This [1] should fix the multiple release lock issue [1] https://gerrit.ovirt.org/#/c/90077/
On Tue, Apr 10, 2018 at 3:53 PM, Ravi Shankar Nori <rn...@redhat.com> wrote: > Working on a patch will post a fix > > Thanks > > Ravi > > On Tue, Apr 10, 2018 at 9:14 AM, Alona Kaplan <alkap...@redhat.com> wrote: > >> Hi all, >> >> Looking at the log it seems that the new GetCapabilitiesAsync is >> responsible for the mess. >> >> - >> * 08:29:47 - engine loses connectivity to host >> 'lago-basic-suite-4-2-host-0'.* >> >> >> >> *- Every 3 seconds a getCapabalititiesAsync request is sent to the host >> (unsuccessfully).* >> >> * before each "getCapabilitiesAsync" the monitoring lock is taken >> (VdsManager,refreshImpl) >> >> * "getCapabilitiesAsync" immediately fails and throws >> 'VDSNetworkException: java.net.ConnectException: Connection refused'. The >> exception is caught by >> 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls >> 'onFailure' of the callback and re-throws the exception. >> >> catch (Throwable t) { >> getParameters().getCallback().onFailure(t); >> throw t; >> } >> >> * The 'onFailure' of the callback releases the "monitoringLock" >> ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) >> lockManager.releaseLock(monitoringLock);') >> >> * 'VdsManager,refreshImpl' catches the network exception, marks >> 'releaseLock = true' and *tries to release the already released lock*. >> >> The following warning is printed to the log - >> >> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] >> (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to release >> exclusive lock which does not exist, lock key: >> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >> >> >> >> >> *- 08:30:51 a successful getCapabilitiesAsync is sent.* >> >> >> *- 08:32:55 - The failing test starts (Setup Networks for setting ipv6). * >> >> * SetupNetworks takes the monitoring lock. >> >> *- 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests from >> 4 minutes ago from its queue and prints a VDSNetworkException: Vds timeout >> occured.* >> >> * When the first request is removed from the queue >> ('ResponseTracker.remove()'), the >> *'Callback.onFailure' is invoked (for the second time) -> monitoring lock is >> released (the lock taken by the SetupNetworks!).* >> >> * *The other requests removed from the queue also try to release the >> monitoring lock*, but there is nothing to release. >> >> * The following warning log is printed - >> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] >> (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to release >> exclusive lock which does not exist, lock key: >> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >> >> - *08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started*. >> Why? I'm not 100% sure but I guess the late processing of the >> 'getCapabilitiesAsync' that causes losing of the monitoring lock and the >> late + mupltiple processing of failure is root cause. >> >> >> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is trying >> to be released three times. Please share your opinion regarding how it >> should be fixed. >> >> >> Thanks, >> >> Alona. >> >> >> >> >> >> >> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg <dan...@redhat.com> wrote: >> >>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas <eh...@redhat.com> wrote: >>> >>>> >>>> >>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri <ee...@redhat.com> wrote: >>>> >>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >>>>> Is it still failing? >>>>> >>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren <bkor...@redhat.com> >>>>> wrote: >>>>> >>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg <dan...@redhat.com> wrote: >>>>>> > No, I am afraid that we have not managed to understand why setting >>>>>> and >>>>>> > ipv6 address too the host off the grid. We shall continue >>>>>> researching >>>>>> > this next week. >>>>>> > >>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, >>>>>> but >>>>>> > could it possibly be related (I really doubt that)? >>>>>> > >>>>>> >>>>> >>>> Sorry, but I do not see how this problem is related to VDSM. >>>> There is nothing that indicates that there is a VDSM problem. >>>> >>>> Has the RPC connection between Engine and VDSM failed? >>>> >>>> >>> Further up the thread, Piotr noticed that (at least on one failure of >>> this test) that the Vdsm host lost connectivity to its storage, and Vdsm >>> process was restarted. However, this does not seems to happen in all cases >>> where this test fails. >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> >
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel