On Wed, Apr 11, 2018 at 12:45 PM, Alona Kaplan <alkap...@redhat.com> wrote:
> > > On Tue, Apr 10, 2018 at 6:52 PM, Gal Ben Haim <gbenh...@redhat.com> wrote: > >> I'm seeing the same error in [1], during 006_migrations.migrate_vm. >> >> [1] http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1650/ >> > > Seems like another bug. The migration failed since for some reason the vm > is already defined on the destination host. > > 2018-04-10 11:08:08,685-0400 ERROR (jsonrpc/0) [api] FINISH create > error=Virtual machine already exists (api:129) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in > method > ret = func(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 191, in create > raise exception.VMExists() > VMExists: Virtual machine already exists > > Milan, Francesco, could it be that because of [1] that appears on the destination host right after shutting down the VM, it remained defined on that host? [1] 2018-04-10 11:01:40,005-0400 ERROR (libvirt/events) [vds] Error running VM callback (clientIF:683) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 646, in dispatchLibvirtEvents v.onLibvirtLifecycleEvent(event, detail, None) AttributeError: 'NoneType' object has no attribute 'onLibvirtLifecycleEvent' > > >> >> >> On Tue, Apr 10, 2018 at 4:14 PM, Alona Kaplan <alkap...@redhat.com> >> wrote: >> >>> Hi all, >>> >>> Looking at the log it seems that the new GetCapabilitiesAsync is >>> responsible for the mess. >>> >>> - >>> * 08:29:47 - engine loses connectivity to host >>> 'lago-basic-suite-4-2-host-0'.* >>> >>> >>> >>> *- Every 3 seconds a getCapabalititiesAsync request is sent to the host >>> (unsuccessfully).* >>> >>> * before each "getCapabilitiesAsync" the monitoring lock is taken >>> (VdsManager,refreshImpl) >>> >>> * "getCapabilitiesAsync" immediately fails and throws >>> 'VDSNetworkException: java.net.ConnectException: Connection refused'. The >>> exception is caught by >>> 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls >>> 'onFailure' of the callback and re-throws the exception. >>> >>> catch (Throwable t) { >>> getParameters().getCallback().onFailure(t); >>> throw t; >>> } >>> >>> * The 'onFailure' of the callback releases the "monitoringLock" >>> ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) >>> lockManager.releaseLock(monitoringLock);') >>> >>> * 'VdsManager,refreshImpl' catches the network exception, marks >>> 'releaseLock = true' and *tries to release the already released lock*. >>> >>> The following warning is printed to the log - >>> >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] >>> (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to release >>> exclusive lock which does not exist, lock key: >>> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >>> >>> >>> >>> >>> *- 08:30:51 a successful getCapabilitiesAsync is sent.* >>> >>> >>> *- 08:32:55 - The failing test starts (Setup Networks for setting ipv6). >>> * >>> >>> * SetupNetworks takes the monitoring lock. >>> >>> *- 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests from >>> 4 minutes ago from its queue and prints a VDSNetworkException: Vds timeout >>> occured.* >>> >>> * When the first request is removed from the queue >>> ('ResponseTracker.remove()'), the >>> *'Callback.onFailure' is invoked (for the second time) -> monitoring lock >>> is released (the lock taken by the SetupNetworks!).* >>> >>> * *The other requests removed from the queue also try to release the >>> monitoring lock*, but there is nothing to release. >>> >>> * The following warning log is printed - >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] >>> (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to release >>> exclusive lock which does not exist, lock key: >>> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >>> >>> - *08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started*. >>> Why? I'm not 100% sure but I guess the late processing of the >>> 'getCapabilitiesAsync' that causes losing of the monitoring lock and the >>> late + mupltiple processing of failure is root cause. >>> >>> >>> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is >>> trying to be released three times. Please share your opinion regarding how >>> it should be fixed. >>> >>> >>> Thanks, >>> >>> Alona. >>> >>> >>> >>> >>> >>> >>> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg <dan...@redhat.com> >>> wrote: >>> >>>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas <eh...@redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri <ee...@redhat.com> wrote: >>>>> >>>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >>>>>> Is it still failing? >>>>>> >>>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren <bkor...@redhat.com> >>>>>> wrote: >>>>>> >>>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg <dan...@redhat.com> wrote: >>>>>>> > No, I am afraid that we have not managed to understand why setting >>>>>>> and >>>>>>> > ipv6 address too the host off the grid. We shall continue >>>>>>> researching >>>>>>> > this next week. >>>>>>> > >>>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, >>>>>>> but >>>>>>> > could it possibly be related (I really doubt that)? >>>>>>> > >>>>>>> >>>>>> >>>>> Sorry, but I do not see how this problem is related to VDSM. >>>>> There is nothing that indicates that there is a VDSM problem. >>>>> >>>>> Has the RPC connection between Engine and VDSM failed? >>>>> >>>>> >>>> Further up the thread, Piotr noticed that (at least on one failure of >>>> this test) that the Vdsm host lost connectivity to its storage, and Vdsm >>>> process was restarted. However, this does not seems to happen in all cases >>>> where this test fails. >>>> >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>> >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> >> >> -- >> *GAL bEN HAIM* >> RHV DEVOPS >> > > > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel >
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel