Re: [ovirt-users] oVirt DR: ansible with 4.1, only a subset of storage domain replicated

2018-02-06 Thread Maor Lipchuk
On Tue, Feb 6, 2018 at 11:32 AM, Luca 'remix_tj' Lorenzetto <
lorenzetto.l...@gmail.com> wrote:

> On Mon, Feb 5, 2018 at 7:20 PM, Maor Lipchuk  wrote:
> > Hi Luca,
> >
> > Thank you for your interst in the Disaster Recovery ansible solution, it
> is
> > great to see users get familiar with it.
> > Please see my comments inline
> >
> > Regards,
> > Maor
> >
> > On Mon, Feb 5, 2018 at 7:54 PM, Yaniv Kaul  wrote:
> >>
> >>
> >>
> >> On Feb 5, 2018 5:00 PM, "Luca 'remix_tj' Lorenzetto"
> >>  wrote:
> >>
> >> Hello,
> >>
> >> i'm starting the implementation of our disaster recovery site with RHV
> >> 4.1.latest for our production environment.
> >>
> >> Our production setup is very easy, with self hosted engine on dc
> >> KVMPDCA, and virtual machines both in KVMPDCA and KVMPD dcs. All our
> >> setup has an FC storage backend, which is EMC VPLEX/VMAX in KVMPDCA
> >> and EMC VNX8000. Both storage arrays supports replication via their
> >> own replication protocols (SRDF, MirrorView), so we'd like to delegate
> >> to them the replication of data to the remote site, which is located
> >> on another remote datacenter.
> >>
> >> In KVMPD DC we have some storage domains that contains non critical
> >> VMs, which we don't want to replicate to remote site (in case of
> >> failure they have a low priority and will be restored from a backup).
> >> In our setup we won't replicate them, so will be not available for
> >> attachment on remote site. Can be this be an issue? Do we require to
> >> replicate everything?
> >
> >
> > No, it is not required to replicate everything.
> > If there are no disks on those storage domains that attached to your
> > critical VMs/Templates you don't have to use them as part of yout mapping
> > var file
> >
>
> Excellent.
>
> >>
> >> What about master domain? Do i require that the master storage domain
> >> stays on a replicated volume or can be any of the available ones?
> >
> >
> >
> > You can choose which storage domains you want to recover.
> > Basically, if a storage domain is indicated as "master" in the mapping
> var
> > file then it should be attached first to the Data Center.
> > If your secondary setup already contains a master storage domain which
> you
> > dont care to replicate and recover, then you can configure your mapping
> var
> > file to only attach regular storage domains, simply indicate
> > "dr_master_domain: False" in the dr_import_storages for all the storage
> > domains. (You can contact me on IRC if you need some guidance with it)
> >
>
> Good,
>
> that's my case. I don't need a new master domain on remote side,
> because is an already up and running setup where i want to attach
> replicated storage and run the critical VMs.
>
>
>
> >>
> >>
> >> I've seen that since 4.1 there's an API for updating OVF_STORE disks.
> >> Do we require to invoke it with a frequency that is the compatible
> >> with the replication frequency on storage side.
> >
> >
> >
> > No, you don't have to use the update OVF_STORE disk for replication.
> > The OVF_STORE disk is being updated every 60 minutes (The default
> > configuration value),
> >
>
> What i need is that informations about vms is replicated to the remote
> site with disk.
> In an older test i had the issue that disks were replicated to remote
> site, but vm configuration not!
> I've found disks in the "Disk"  tab of storage domain, but nothing on VM
> Import.
>


Can you reproduce it and attach the logs of the setup before the disaster
and after the recovery?
That could happen in case of new created VMs and Templates which were not
yet updated in the OVF_STORE disk, since the OVF_STORE update process was
not running yet before the disaster.
Since the time of a disaster can't be anticipated, gaps like this might
happen.


>
> >>
> >> We set at the moment
> >> RPO to 1hr (even if planned RPO requires 2hrs). Does OVF_STORE gets
> >> updated with the required frequency?
> >
> >
> >
> > OVF_STORE disk is being updated every 60 minutes but keep in mind that
> the
> > OVF_STORE is being updated internally in the engine so it might not be
> > synced with the RPO which you configured.
> > If I understood correctly, then you are right by indicating that the
> data of
> > the storage domain will be synced at approximatly 2 hours = RPO of 1hr +
> > OVF_STORE update of 1hr
> >
>
> We require that we can recover vms with a status that is up to 2 hours
> ago. In worst case, from what you say, i think we'll be able to.
>
> [cut]
> >
> > Indeed,
> > We also introduced several functionalities like detach of master storage
> > domain , and attach of "dirty" master storage domain which are depndant
> on
> > the failover process, so unfortunatly to support a full recovery process
> you
> > will need oVirt 4.2 env.
> >
>
> Ok, but if i keep master storage domain on a non replicate volume, do
> i require this function?
>


Basically it should also fail on VM/Template registration in oVirt 4.1
since there are also other functionalities like mapping of O

Re: [ovirt-users] Memory leaks in ovirt-ha-agent, vdsmd

2018-02-06 Thread Roy Golan
On Tue, 6 Feb 2018 at 19:12 Chris Adams  wrote:

> I regularly see memory leaks in ovirt-ha-agent and vdsmd.  For example,
> I have a two-node 4.2.0 test setup with a hosted engine on iSCSI.  Right
> now, vdsmd on one node is using 7.8G RAM, and ovirt-ha-agent is using
> 1.1G on each node.
>
> I've had this kind of problem with 4.1 production systems as well; it
> just seems to be a recurring issue.  I have to periodically go through
> and restart these services on the nodes.  Occasionally I see sanlock use
> up a bunch of RAM as well.
>
>
Can you please report it on bugzilla?


> --
> Chris Adams 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Memory leaks in ovirt-ha-agent, vdsmd

2018-02-06 Thread Chris Adams
I regularly see memory leaks in ovirt-ha-agent and vdsmd.  For example,
I have a two-node 4.2.0 test setup with a hosted engine on iSCSI.  Right
now, vdsmd on one node is using 7.8G RAM, and ovirt-ha-agent is using
1.1G on each node.

I've had this kind of problem with 4.1 production systems as well; it
just seems to be a recurring issue.  I have to periodically go through
and restart these services on the nodes.  Occasionally I see sanlock use
up a bunch of RAM as well.

-- 
Chris Adams 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] test

2018-02-06 Thread Konstantinos Bonaros
test, please ignore
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt and gateway behavior

2018-02-06 Thread Martin Sivak
> This is expected behaviour, even if it’s not very bright. It’s being used as
> a way to detect network is operating correctly.

Correct, it is used to check whether users can reach the host and the
VM that runs on it. There aren't that many options to check that. All
require data exchange of some kind (ICMP req/res, TCP SYN/ACK, some
UDP echo..).

> It is insane as there are so many ways it breaks.  My network admin turns
> off ICMP responses and death to network.

ICMP is an important signaling mechanism.. seriously, it is usually a
bad idea to block it.

> I got this trying to install on a network with out a gateway.

How were your users accessing the VMs? Was this some kind of super
secure deployment with no outside connectivity?


Best regards

Martin Sivak

On Tue, Feb 6, 2018 at 4:32 PM, Ben De Luca  wrote:
> This is expected behaviour, even if it’s not very bright. It’s being used as
> a way to detect network is operating correctly.
>
> I got this trying to install on a network with out a gateway.
>
> It is insane as there are so many ways it breaks.  My network admin turns
> off ICMP responses and death to network.
>
> On Tue 6. Feb 2018 at 16:27, Alex K  wrote:
>>
>> Hi,
>>
>> I have seen hosts rendered unresponsive when gateway is lost.
>> I will be able to provide more info once I prepare an environment and test
>> this further.
>>
>> Thanx,
>> Alex
>>
>> On Tue, Feb 6, 2018 at 10:40 AM, Yaniv Kaul  wrote:
>>>
>>>
>>>
>>> On Feb 5, 2018 2:21 PM, "Alex K"  wrote:
>>>
>>> Hi all,
>>>
>>> I have a 3 nodes ovirt 4.1 cluster, self hosted on top of glusterfs. The
>>> cluster is used to host several VMs.
>>> I have observed that when gateway is lost (say the gateway device is
>>> down) the ovirt cluster goes down.
>>>
>>>
>>> Is the cluster down, or just the self-hosted engine?
>>>
>>>
>>> It seems a bit extreme behavior especially when one does not care if the
>>> hosted VMs have connectivity to Internet or not.
>>>
>>>
>>> Are the VMs down?
>>> The hosts?
>>> Y.
>>>
>>>
>>> Can this behavior be disabled?
>>>
>>> Thanx,
>>> Alex
>>>
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt and gateway behavior

2018-02-06 Thread Martin Sivak
Hi,

ee use the ping check to see whether the host running hosted engine
has connectivity with the rest of the cluster and users. We kill the
VM in a hope that some other host will make the engine available to
users again.

We use the gateway by default as it is pretty common to have separate
network for data center, but you can change the address if your
topology is different.

Best regards

Martin Sivak

On Tue, Feb 6, 2018 at 4:27 PM, Alex K  wrote:
> Hi,
>
> I have seen hosts rendered unresponsive when gateway is lost.
> I will be able to provide more info once I prepare an environment and test
> this further.
>
> Thanx,
> Alex
>
> On Tue, Feb 6, 2018 at 10:40 AM, Yaniv Kaul  wrote:
>>
>>
>>
>> On Feb 5, 2018 2:21 PM, "Alex K"  wrote:
>>
>> Hi all,
>>
>> I have a 3 nodes ovirt 4.1 cluster, self hosted on top of glusterfs. The
>> cluster is used to host several VMs.
>> I have observed that when gateway is lost (say the gateway device is down)
>> the ovirt cluster goes down.
>>
>>
>> Is the cluster down, or just the self-hosted engine?
>>
>>
>> It seems a bit extreme behavior especially when one does not care if the
>> hosted VMs have connectivity to Internet or not.
>>
>>
>> Are the VMs down?
>> The hosts?
>> Y.
>>
>>
>> Can this behavior be disabled?
>>
>> Thanx,
>> Alex
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt and gateway behavior

2018-02-06 Thread Ben De Luca
This is expected behaviour, even if it’s not very bright. It’s being used
as a way to detect network is operating correctly.

I got this trying to install on a network with out a gateway.

It is insane as there are so many ways it breaks.  My network admin turns
off ICMP responses and death to network.

On Tue 6. Feb 2018 at 16:27, Alex K  wrote:

> Hi,
>
> I have seen hosts rendered unresponsive when gateway is lost.
> I will be able to provide more info once I prepare an environment and test
> this further.
>
> Thanx,
> Alex
>
> On Tue, Feb 6, 2018 at 10:40 AM, Yaniv Kaul  wrote:
>
>>
>>
>> On Feb 5, 2018 2:21 PM, "Alex K"  wrote:
>>
>> Hi all,
>>
>> I have a 3 nodes ovirt 4.1 cluster, self hosted on top of glusterfs. The
>> cluster is used to host several VMs.
>> I have observed that when gateway is lost (say the gateway device is
>> down) the ovirt cluster goes down.
>>
>>
>> Is the cluster down, or just the self-hosted engine?
>>
>>
>> It seems a bit extreme behavior especially when one does not care if the
>> hosted VMs have connectivity to Internet or not.
>>
>>
>> Are the VMs down?
>> The hosts?
>> Y.
>>
>>
>> Can this behavior be disabled?
>>
>> Thanx,
>> Alex
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt and gateway behavior

2018-02-06 Thread Alex K
Hi,

I have seen hosts rendered unresponsive when gateway is lost.
I will be able to provide more info once I prepare an environment and test
this further.

Thanx,
Alex

On Tue, Feb 6, 2018 at 10:40 AM, Yaniv Kaul  wrote:

>
>
> On Feb 5, 2018 2:21 PM, "Alex K"  wrote:
>
> Hi all,
>
> I have a 3 nodes ovirt 4.1 cluster, self hosted on top of glusterfs. The
> cluster is used to host several VMs.
> I have observed that when gateway is lost (say the gateway device is down)
> the ovirt cluster goes down.
>
>
> Is the cluster down, or just the self-hosted engine?
>
>
> It seems a bit extreme behavior especially when one does not care if the
> hosted VMs have connectivity to Internet or not.
>
>
> Are the VMs down?
> The hosts?
> Y.
>
>
> Can this behavior be disabled?
>
> Thanx,
> Alex
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt and gateway behavior

2018-02-06 Thread Darrell Budic
I’ve seen this sort of happen on my systems, the gateway ip goes down for some 
reason, and the engine restarts repeatedly, rending it unusable, even though 
it’s on the same ip subnet as all the host boxes and can still talk to the 
VDSMs. In my case, it doesn’t hurt the cluster or DC, but it’s annoying and 
unnecessary in my environment where the gateway isn’t important for cluster 
communications..

I can understand why using the ip of the gateway became a test as a proxy for 
network connectivity, but it seems like it’s something that isn’t always valid 
and maybe the local admin should have a choice of how it’s used. Something like 
the current fencing option for “50% hosts down” as a double check, if you can 
still reach the vdsm hosts, don’t restart the engine vm.

  -Darrell
> From: Yaniv Kaul 
> Subject: Re: [ovirt-users] ovirt and gateway behavior
> Date: February 6, 2018 at 2:40:14 AM CST
> To: Alex
> Cc: Ovirt Users
> 
> 
> 
> On Feb 5, 2018 2:21 PM, "Alex K"  > wrote:
> Hi all, 
> 
> I have a 3 nodes ovirt 4.1 cluster, self hosted on top of glusterfs. The 
> cluster is used to host several VMs. 
> I have observed that when gateway is lost (say the gateway device is down) 
> the ovirt cluster goes down. 
> 
> Is the cluster down, or just the self-hosted engine? 
> 
> 
> It seems a bit extreme behavior especially when one does not care if the 
> hosted VMs have connectivity to Internet or not. 
> 
> Are the VMs down? 
> The hosts? 
> Y. 
> 
> 
> Can this behavior be disabled?
> 
> Thanx, 
> Alex
> 
> ___
> Users mailing list
> Users@ovirt.org 
> http://lists.ovirt.org/mailman/listinfo/users 
> 
> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Add a disk and set the console for a VM in the user portal

2018-02-06 Thread nicolas
I can't even see other options, like adding NICs, or changing the 
machine type (server, desktop)... Was this removed on purpose or there's 
some permission(s) to grant?


El 2018-02-06 11:45, nico...@devels.es escribió:

Hi,

We recently upgraded to oVirt 4.2.0 and we're testing things so we can
determine if our production system might also be upgraded or not. We
do an extensive use of the User Portal, I've granted the VmCreator and
DiskProfileUser privileges on a user (the user has a quota as well), I
logged in to the user portal, I can successfully create a VM setting
its memory and CPUs but:

1) I can't see a way to change the console type. By default, when the
machine is created, SPICE is chosen as the mechanism, and I'd like to
change it to VNC, but I can't find a way.
2) I can't see a way to add a disk to the VM.

I'm attaching a screenshot of what I see in the panel.

Are some new privileges needed to add a disk or change the console 
type?


Thanks

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Problem - Ubuntu 16.04.3 Guest Weekly Freezes

2018-02-06 Thread Andrei V
Hi !

I have strange and annoying problem with one VM on oVirt node 4.2 - weekly 
freezes of Ubuntu 16.04.3 with ISPConfig 3.1 active.
ISPConfig is a Web GUI frontend (written in PHP) to Apache, Postfix, Dovecot, 
Amavis, Clam and ProFTPd.
Separate engine PC, not hosted engine.

Ubuntu 16.04.3 LTS (Xenial Xerus), 2 cores allocated, 8 GB RAM (only fraction 
is being used).
kernel 4.13.0-32-generic
6300ESB Watchdog Timer

Memory ballooning disables, and there are always about 7 GB of free RAM left.
4 VMs active, CPU load on node is low.
Tried several kernel versions, no change.

I can’t trace any problem in the log on Ubuntu guest. Even watchdog timer 
6300ESB configured to reset does nothing (what is really strange).
VM stops responding even to pings, VM screen is also frozen.
oVirt engine don’t display IP address anymore, it means ovirt-guest-agent is 
dead.

VM is in DMZ, and not connected to ovirtmgmt, but rather to bridged Ethernet 
interface.
in oVirt I have defined network "DMZ Node10-NIC2”.
On node:
cd /etc/sysconfig/network-scripts/
tail ifcfg-enp3s4f1

DEVICE=enp3s4f1
BRIDGE=ond04ad91e59c14
ONBOOT=yes
MTU=1500
DEFROUTE=no
NM_CONTROLLED=no
IPV6INIT=no

Googling doesn’t show anything useful except attempt to change kernel version 
what I already did.

1) Any idea how to fix this freeze ?

2) While problem is not fixed, I can create cron script  to handle stubborn VM 
on oVirt engine PC. 
Q: How to force power off, and then launch (after timeout e.g. 20sec) this VM 
from bash or Python script?

Thanks in advance for any help.
Andrei

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Add a disk and set the console for a VM in the user portal

2018-02-06 Thread nicolas

Hi,

We recently upgraded to oVirt 4.2.0 and we're testing things so we can 
determine if our production system might also be upgraded or not. We do 
an extensive use of the User Portal, I've granted the VmCreator and 
DiskProfileUser privileges on a user (the user has a quota as well), I 
logged in to the user portal, I can successfully create a VM setting its 
memory and CPUs but:


1) I can't see a way to change the console type. By default, when the 
machine is created, SPICE is chosen as the mechanism, and I'd like to 
change it to VNC, but I can't find a way.

2) I can't see a way to add a disk to the VM.

I'm attaching a screenshot of what I see in the panel.

Are some new privileges needed to add a disk or change the console type?

Thanks___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Documentation about vGPU in oVirt 4.2

2018-02-06 Thread Martin Polednik

On 05/02/18 14:38 +0100, Gianluca Cecchi wrote:

On Fri, Feb 2, 2018 at 12:13 PM, Jordan, Marcel 
wrote:


Hi,

i have some NVIDIA Tesla P100 and V100 gpu in our oVirt 4.2 cluster and
searching for a documentation how to use the new vGPU feature. Is there
any documentation out there how i configure it correctly?

--
Marcel Jordan




Possibly check what would become the official documentation for RHEV 4.2,
even if it could not map one-to-one with oVirt

Admin guide here:
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2-beta/html/administration_guide/sect-host_tasks#Preparing_GPU_Passthrough

Planning and prerequisites guide here:
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2-Beta/html/planning_and_prerequisites_guide/requirements#pci_device_requirements

In oVirt 4.2 release notes I see these bugzilla entries that can help too...
https://bugzilla.redhat.com/show_bug.cgi?id=1481007
https://bugzilla.redhat.com/show_bug.cgi?id=1482033


There are also blogposts about vGPU in 4.1.4/4.2 that you might find useful:

https://mpolednik.github.io/2017/09/13/vgpu-in-ovirt/
https://mpolednik.github.io/2017/05/21/vfio-mdev/

mpolednik


HIH,
Gianluca



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt 4.2 , VM stuck in "Migrating from" state.

2018-02-06 Thread Eduardo Mayoral
Worked like a charm. Double thanks, for helping and for helping so fast!

Best regards,

Eduardo Mayoral Jimeno (emayo...@arsys.es)
Administrador de sistemas. Departamento de Plataformas. Arsys internet.
+34 941 620 145 ext. 5153

On 06/02/18 12:25, Arik Hadas wrote:
> Hi,
>
> The problem you had is fixed already by
> https://gerrit.ovirt.org/#/c/86367/.
> I'm afraid you'll need to manually set the VM to Down in the database:
> update vm_dynamic set status=0 where vm_guid in  (select vm_guid from
> vm_static where vm_name='')
>
> On Tue, Feb 6, 2018 at 11:20 AM, Eduardo Mayoral  > wrote:
>
> Hi,
>
>     Got a problem with oVirt 4.2
>
> While putting a Host in maintenance mode, an VM has failed to migrate.
> The end state is that the Web UI shows the VM as "Migrating from".
>
> The VM is not running in any Host in the cluster.
>
> This is the relevant message in the /var/log/ovirt-engine/engine.log
>
> 2018-02-06 09:09:05,379Z INFO 
> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
> (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] VM
> 'ab158ff3-a716-4655-9269-11738cd53b05'(repositorionuget) is running in
> db and not running on VDS
> '82b49615-9c65-4d8e-80e0-f10089cb4225'(llkh456.arsyslan.es
> )
> 2018-02-06 09:09:05,381Z ERROR
> [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring]
> (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Failed during
> monitoring vm: ab158ff3-a716-4655-9269-11738cd53b05 , error is: {}:
> java.lang.NullPointerException
>     at
> 
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.auditVmMigrationAbort(VmAnalyzer.java:440)
> [vdsbroker.jar:]
>     at
> 
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.abortVmMigration(VmAnalyzer.java:432)
> [vdsbroker.jar:]
>     at
> 
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.proceedDisappearedVm(VmAnalyzer.java:794)
> [vdsbroker.jar:]
>     at
> 
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.java:135)
> [vdsbroker.jar:]
>     at
> 
> org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.lambda$analyzeVms$1(VmsMonitoring.java:136)
> [vdsbroker.jar:]
>     at java.util.ArrayList.forEach(ArrayList.java:1255)
> [rt.jar:1.8.0_151]
>     at
> 
> org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.analyzeVms(VmsMonitoring.java:131)
> [vdsbroker.jar:]
>     at
> 
> org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.perform(VmsMonitoring.java:94)
> [vdsbroker.jar:]
>     at
> 
> org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:43)
> [vdsbroker.jar:]
>     at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [rt.jar:1.8.0_151]
>     at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [rt.jar:1.8.0_151]
>     at
> 
> org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383)
> [javax.enterprise.concurrent-1.0.jar:]
>     at
> 
> org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534)
> [javax.enterprise.concurrent-1.0.jar:]
>     at
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [rt.jar:1.8.0_151]
>     at
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [rt.jar:1.8.0_151]
>     at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151]
>     at
> 
> org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250)
> [javax.enterprise.concurrent-1.0.jar:]
>     at
> 
> org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)
>
> 2018-02-06 09:09:05,381Z ERROR
> [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring]
> (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Exception::
> java.lang.NullPointerException
>     at
> 
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.auditVmMigrationAbort(VmAnalyzer.java:440)
> [vdsbroker.jar:]
>     at
> 
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.abortVmMigration(VmAnalyzer.java:432)
> [vdsbroker.jar:]
>     at
> 
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.proceedDisappearedVm(VmAnalyzer.java:794)
> [vdsbroker.jar:]
>     at
> 
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.java:135)
> [vdsbroker.jar:]

Re: [ovirt-users] oVirt 4.2 , VM stuck in "Migrating from" state.

2018-02-06 Thread Arik Hadas
Hi,

The problem you had is fixed already by https://gerrit.ovirt.org/#/c/86367/.
I'm afraid you'll need to manually set the VM to Down in the database:
update vm_dynamic set status=0 where vm_guid in  (select vm_guid from
vm_static where vm_name='')

On Tue, Feb 6, 2018 at 11:20 AM, Eduardo Mayoral  wrote:

> Hi,
>
> Got a problem with oVirt 4.2
>
> While putting a Host in maintenance mode, an VM has failed to migrate.
> The end state is that the Web UI shows the VM as "Migrating from".
>
> The VM is not running in any Host in the cluster.
>
> This is the relevant message in the /var/log/ovirt-engine/engine.log
>
> 2018-02-06 09:09:05,379Z INFO
> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
> (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] VM
> 'ab158ff3-a716-4655-9269-11738cd53b05'(repositorionuget) is running in
> db and not running on VDS
> '82b49615-9c65-4d8e-80e0-f10089cb4225'(llkh456.arsyslan.es)
> 2018-02-06 09:09:05,381Z ERROR
> [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring]
> (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Failed during
> monitoring vm: ab158ff3-a716-4655-9269-11738cd53b05 , error is: {}:
> java.lang.NullPointerException
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.
> auditVmMigrationAbort(VmAnalyzer.java:440)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.abortVmMigration(
> VmAnalyzer.java:432)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.
> proceedDisappearedVm(VmAnalyzer.java:794)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.
> java:135)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.lambda$
> analyzeVms$1(VmsMonitoring.java:136)
> [vdsbroker.jar:]
> at java.util.ArrayList.forEach(ArrayList.java:1255)
> [rt.jar:1.8.0_151]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.analyzeVms(
> VmsMonitoring.java:131)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.perform(
> VmsMonitoring.java:94)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll(
> PollVmStatsRefresher.java:43)
> [vdsbroker.jar:]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [rt.jar:1.8.0_151]
> at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [rt.jar:1.8.0_151]
> at
> org.glassfish.enterprise.concurrent.internal.
> ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(
> ManagedScheduledThreadPoolExecutor.java:383)
> [javax.enterprise.concurrent-1.0.jar:]
> at
> org.glassfish.enterprise.concurrent.internal.
> ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(
> ManagedScheduledThreadPoolExecutor.java:534)
> [javax.enterprise.concurrent-1.0.jar:]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> [rt.jar:1.8.0_151]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> [rt.jar:1.8.0_151]
> at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151]
> at
> org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$
> ManagedThread.run(ManagedThreadFactoryImpl.java:250)
> [javax.enterprise.concurrent-1.0.jar:]
> at
> org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$
> ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)
>
> 2018-02-06 09:09:05,381Z ERROR
> [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring]
> (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Exception::
> java.lang.NullPointerException
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.
> auditVmMigrationAbort(VmAnalyzer.java:440)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.abortVmMigration(
> VmAnalyzer.java:432)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.
> proceedDisappearedVm(VmAnalyzer.java:794)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.
> java:135)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.lambda$
> analyzeVms$1(VmsMonitoring.java:136)
> [vdsbroker.jar:]
> at java.util.ArrayList.forEach(ArrayList.java:1255)
> [rt.jar:1.8.0_151]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.analyzeVms(
> VmsMonitoring.java:131)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.perform(
> VmsMonitoring.java:94)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll(
> PollVmStatsRefresher.java:43)
> [vdsbroker.jar:]
> at
> java.util.concurrent.Executors$RunnableAdapter.ca

Re: [ovirt-users] Slow conversion from VMware in 4.1

2018-02-06 Thread Richard W.M. Jones
On Tue, Feb 06, 2018 at 11:11:37AM +0100, Luca 'remix_tj' Lorenzetto wrote:
> Il 6 feb 2018 10:52 AM, "Yaniv Kaul"  ha scritto:
> 
> 
> I assume its network interfaces are also a bottleneck as well. Certainly if
> they are 1g.
> Y.
> 
> 
> That's not the case, vcenter uses 10g and also all the involved hosts.
> 
> We first supposed the culprit was network, but investigations has cleared
> its position. Network usage is under 40% with 4 ongoing migrations.

The problem is two-fold and is common to all vCenter transformations:

(1) A single https connection is used and each block of data that is
requested is processed serially.

(2) vCenter has to forward each request to the ESXi hypervisor.

(1) + (2) => most time is spent waiting on the lengthy round trips for
each requested block of data.

This is why overlapping multiple parallel conversions works and
(although each conversion is just as slow) improves throughput,
because you're filling in the long idle gaps by serving other
conversions.

This is also why other methods perform so much better.  VMX over SSH
uses a single connection but connects directly to the ESXi hypervisor,
so cause (2) is eliminated.  VMX over NFS eliminates VMware servers
entirely and can make multiple parallel requests, eliminating (1) and
(2).  VDDK [in ideal circumstances] can mount the FC storage directly
on the conversion host meaning the ordinary network is not even used
and all requests travel over the SAN.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.2 vdsclient

2018-02-06 Thread Irit Goihman
Hi,
The command is `vdsm-client Task getInfo taskID=`

You can see available arguments in JSON format using `vdsm-client Task
getInfo -h` command.

On Tue, Feb 6, 2018 at 10:36 AM, Alex K  wrote:

> Hi Benny,
>
> I was trying to do it with vdsm-client without success.
>
> vdsm-client Task -h
> usage: vdsm-client Task [-h] method [arg=value] ...
>
> optional arguments:
>   -h, --help  show this help message and exit
>
> Task methods:
>   method [arg=value]
> getInfo   Get information about a Task.
> getStatus Get Task status information.
> revertRollback a Task to restore the previous system state.
> clear Discard information about a finished Task.
> stop  Stop a currently running Task.
> [root@v0 common]# vdsm-client Task getInfo
> vdsm-client: Command Task.getInfo with args {} failed:
> (code=-32603, message=Internal JSON-RPC error: {'reason': '__init__()
> takes exactly 2 arguments (1 given)'})
> [root@v0 common]# vdsm-client Task getStatus
> vdsm-client: Command Task.getStatus with args {} failed:
> (code=-32603, message=Internal JSON-RPC error: {'reason': '__init__()
> takes exactly 2 arguments (1 given)'})
>
> What other arguments does this expect. When using Host namespace I am able
> to run the available options.
>
> Thanx,
> Alex
>
>
> On Tue, Feb 6, 2018 at 10:24 AM, Benny Zlotnik 
> wrote:
>
>> It was replaced by vdsm-client[1]
>>
>> [1] - https://www.ovirt.org/develop/developer-guide/vdsm/vdsm-client/
>>
>> On Tue, Feb 6, 2018 at 10:17 AM, Alex K  wrote:
>>
>>> Hi all,
>>>
>>> I have a stuck snapshot removal from a VM which is blocking the VM to
>>> start.
>>> In ovirt 4.1 I was able to cancel the stuck task by running within SPM
>>> host:
>>>
>>> vdsClient -s 0 getAllTasksStatuses
>>> vdsClient -s 0 stopTask 
>>>
>>> Is there a similar way to do at ovirt 4.2?
>>>
>>> Thanx,
>>> Alex
>>>
>>>
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>


-- 

IRIT GOIHMAN

SOFTWARE ENGINEER

EMEA VIRTUALIZATION R&D

Red Hat EMEA 


TRIED. TESTED. TRUSTED. 
@redhatnews    Red Hat
   Red Hat

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Slow conversion from VMware in 4.1

2018-02-06 Thread Luca 'remix_tj' Lorenzetto
Il 6 feb 2018 10:52 AM, "Yaniv Kaul"  ha scritto:


I assume its network interfaces are also a bottleneck as well. Certainly if
they are 1g.
Y.


That's not the case, vcenter uses 10g and also all the involved hosts.

We first supposed the culprit was network, but investigations has cleared
its position. Network usage is under 40% with 4 ongoing migrations.

Luca
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Slow conversion from VMware in 4.1

2018-02-06 Thread Yaniv Kaul
On Feb 6, 2018 11:06 AM, "Luca 'remix_tj' Lorenzetto" <
lorenzetto.l...@gmail.com> wrote:

On Mon, Feb 5, 2018 at 11:13 PM, Richard W.M. Jones 
wrote:
> http://libguestfs.org/virt-v2v.1.html#vmware-vcenter-resources
>
> You should be able to run multiple conversions in parallel
> to improve throughput.
>
> The only long-term solution is to use a different method such as VMX
> over SSH.  vCenter is just fundamentally bad.

4 conversions in parallel works, but each one is very slow. But i
think i've to blame vcenter cpu which is stuck at 100%.


I assume its network interfaces are also a bottleneck as well. Certainly if
they are 1g.
Y.


Thank you for the directions and suggestions,

Luca

--
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)

Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <
lorenzetto.l...@gmail.com>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] qemu-kvm images corruption

2018-02-06 Thread Yaniv Kaul
On Feb 6, 2018 11:09 AM, "Nicolas Ecarnot"  wrote:

Hello,

On our two 3.6 DCs, we're still facing qcow2 corruptions, even on freshly
installed VMs (CentOS7, win2012, win2008...).


Please provide complete information on the issue. When, how often, which
storage, etc.


(We are still hoping to find some time to migrate all this to 4.2, but it's
a big work and our one-person team - me - is overwhelmed.)


Understood. Note that we have some scripts that can assist somewhat.


My workaround is described in my previous thread below, but it's just a
workaround.

Reading further, I found that :

https://forum.proxmox.com/threads/qcow2-corruption-after-
snapshot-or-heavy-disk-i-o.32865/page-2

There are many things I don't know or understand, and I'd like your opinion
:

- Is "virtio" is synonym of "virtio-blk"?


Yes.

- Is it true that the development of virtio-scsi is active and the one of
virtio is stopped?


No.

- People in the proxmox forum seem to say that no qcow2 corruption occurs
when using IDE (not an option for me) neither virtio-scsi.


Anecdotal evidence or properly reproduced?
Have they filed an issue?

Does any Redhat people ever heard of this?


I'm not aware of an existing corruption issue.

- Is converting all my VMs to use virtio-scsi a guarantee against further
corruptions?


No.

- What is the non-official but nonetheless recommended driver oVirt devs
recommend in the sense of future, development and stability?


Depends. I like virtio-scsi for its features (DISCARD mainly), but in some
workloads virtio-blk may be somewhat faster (supposedly lower overhead).
Both interfaces are stable.

We should focus on properly reporting the issue so the qemu folks can look
at this.
Y.


Regards,

-- 
Nicolas ECARNOT


Le 15/09/2017 à 14:06, Nicolas Ecarnot a écrit :

> TL;DR:
> How to avoid images corruption?
>
>
> Hello,
>
> On two of our old 3.6 DC, a recent series of VM migrations lead to some
> issues :
> - I'm putting a host into maintenance mode
> - most of the VM are migrating nicely
> - one remaining VM never migrates, and the logs are showing :
>
> * engine.log : "...VM has been paused due to I/O error..."
> * vdsm.log : "...Improbable extension request for volume..."
>
> After digging amongst the RH BZ tickets, I saved the day by :
> - stopping the VM
> - lvchange -ay the adequate /dev/...
> - qemu-img check [-r all] /rhev/blahblah
> - lvchange -an...
> - boot the VM
> - enjoy!
>
> Yesterday this worked for a VM where only one error occurred on the qemu
> image, and the repair was easily done by qemu-img.
>
> Today, facing the same issue on another VM, it failed because the errors
> were very numerous, and also because of this message :
>
> [...]
> Rebuilding refcount structure
> ERROR writing refblock: No space left on device
> qemu-img: Check failed: No space left on device
> [...]
>
> The PV/VG/LV are far from being full, so I guess I don't where to look at.
> I tried many ways to solve it but I'm not comfortable at all with qemu
> images, corruption and solving, so I ended up exporting this VM (to an NFS
> export domain), importing it into another DC : this had the side effect to
> use qemu-img convert from qcow2 to qcow2, and (maybe?) to solve some
> errors???
> I also copied it into another qcow2 file with the same qemu-img convert
> way, but it is leading to another clean qcow2 image without errors.
>
> I saw that on 4.x some bugs are fixed about VM migrations, but this is not
> the point here.
> I checked my SANs, my network layers, my blades, the OS (CentOS 7.2) of my
> hosts, but I see nothing special.
>
> The real reason behind my message is not to know how to repair anything,
> rather than to understand what could have lead to this situation?
> Where to keep a keen eye?
>
>

-- 
Nicolas ECARNOT
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt DR: ansible with 4.1, only a subset of storage domain replicated

2018-02-06 Thread Luca 'remix_tj' Lorenzetto
On Mon, Feb 5, 2018 at 7:20 PM, Maor Lipchuk  wrote:
> Hi Luca,
>
> Thank you for your interst in the Disaster Recovery ansible solution, it is
> great to see users get familiar with it.
> Please see my comments inline
>
> Regards,
> Maor
>
> On Mon, Feb 5, 2018 at 7:54 PM, Yaniv Kaul  wrote:
>>
>>
>>
>> On Feb 5, 2018 5:00 PM, "Luca 'remix_tj' Lorenzetto"
>>  wrote:
>>
>> Hello,
>>
>> i'm starting the implementation of our disaster recovery site with RHV
>> 4.1.latest for our production environment.
>>
>> Our production setup is very easy, with self hosted engine on dc
>> KVMPDCA, and virtual machines both in KVMPDCA and KVMPD dcs. All our
>> setup has an FC storage backend, which is EMC VPLEX/VMAX in KVMPDCA
>> and EMC VNX8000. Both storage arrays supports replication via their
>> own replication protocols (SRDF, MirrorView), so we'd like to delegate
>> to them the replication of data to the remote site, which is located
>> on another remote datacenter.
>>
>> In KVMPD DC we have some storage domains that contains non critical
>> VMs, which we don't want to replicate to remote site (in case of
>> failure they have a low priority and will be restored from a backup).
>> In our setup we won't replicate them, so will be not available for
>> attachment on remote site. Can be this be an issue? Do we require to
>> replicate everything?
>
>
> No, it is not required to replicate everything.
> If there are no disks on those storage domains that attached to your
> critical VMs/Templates you don't have to use them as part of yout mapping
> var file
>

Excellent.

>>
>> What about master domain? Do i require that the master storage domain
>> stays on a replicated volume or can be any of the available ones?
>
>
>
> You can choose which storage domains you want to recover.
> Basically, if a storage domain is indicated as "master" in the mapping var
> file then it should be attached first to the Data Center.
> If your secondary setup already contains a master storage domain which you
> dont care to replicate and recover, then you can configure your mapping var
> file to only attach regular storage domains, simply indicate
> "dr_master_domain: False" in the dr_import_storages for all the storage
> domains. (You can contact me on IRC if you need some guidance with it)
>

Good,

that's my case. I don't need a new master domain on remote side,
because is an already up and running setup where i want to attach
replicated storage and run the critical VMs.



>>
>>
>> I've seen that since 4.1 there's an API for updating OVF_STORE disks.
>> Do we require to invoke it with a frequency that is the compatible
>> with the replication frequency on storage side.
>
>
>
> No, you don't have to use the update OVF_STORE disk for replication.
> The OVF_STORE disk is being updated every 60 minutes (The default
> configuration value),
>

What i need is that informations about vms is replicated to the remote
site with disk.
In an older test i had the issue that disks were replicated to remote
site, but vm configuration not!
I've found disks in the "Disk"  tab of storage domain, but nothing on VM Import.

>>
>> We set at the moment
>> RPO to 1hr (even if planned RPO requires 2hrs). Does OVF_STORE gets
>> updated with the required frequency?
>
>
>
> OVF_STORE disk is being updated every 60 minutes but keep in mind that the
> OVF_STORE is being updated internally in the engine so it might not be
> synced with the RPO which you configured.
> If I understood correctly, then you are right by indicating that the data of
> the storage domain will be synced at approximatly 2 hours = RPO of 1hr +
> OVF_STORE update of 1hr
>

We require that we can recover vms with a status that is up to 2 hours
ago. In worst case, from what you say, i think we'll be able to.

[cut]
>
> Indeed,
> We also introduced several functionalities like detach of master storage
> domain , and attach of "dirty" master storage domain which are depndant on
> the failover process, so unfortunatly to support a full recovery process you
> will need oVirt 4.2 env.
>

Ok, but if i keep master storage domain on a non replicate volume, do
i require this function?

I have to admit that i require, for subscription and support
requirements, to use RHV over oVirt. I've seen 4.2 is coming also from
that side, and we'll upgrade for sure when available.


[cut]
>
>
> Please feel free to share your comments and questions, I would very
> appreciate to know your user expirience.

Sure, i'll do! And i'll bother you on irc if i need some guidance :-)

Thank you so much,

Luca


-- 
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)

Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
_

[ovirt-users] oVirt 4.2 , VM stuck in "Migrating from" state.

2018-02-06 Thread Eduardo Mayoral
Hi,

    Got a problem with oVirt 4.2

While putting a Host in maintenance mode, an VM has failed to migrate.
The end state is that the Web UI shows the VM as "Migrating from".

The VM is not running in any Host in the cluster.

This is the relevant message in the /var/log/ovirt-engine/engine.log

2018-02-06 09:09:05,379Z INFO 
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
(EE-ManagedThreadFactory-engineScheduled-Thread-14) [] VM
'ab158ff3-a716-4655-9269-11738cd53b05'(repositorionuget) is running in
db and not running on VDS
'82b49615-9c65-4d8e-80e0-f10089cb4225'(llkh456.arsyslan.es)
2018-02-06 09:09:05,381Z ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring]
(EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Failed during
monitoring vm: ab158ff3-a716-4655-9269-11738cd53b05 , error is: {}:
java.lang.NullPointerException
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.auditVmMigrationAbort(VmAnalyzer.java:440)
[vdsbroker.jar:]
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.abortVmMigration(VmAnalyzer.java:432)
[vdsbroker.jar:]
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.proceedDisappearedVm(VmAnalyzer.java:794)
[vdsbroker.jar:]
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.java:135)
[vdsbroker.jar:]
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.lambda$analyzeVms$1(VmsMonitoring.java:136)
[vdsbroker.jar:]
    at java.util.ArrayList.forEach(ArrayList.java:1255)
[rt.jar:1.8.0_151]
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.analyzeVms(VmsMonitoring.java:131)
[vdsbroker.jar:]
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.perform(VmsMonitoring.java:94)
[vdsbroker.jar:]
    at
org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:43)
[vdsbroker.jar:]
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[rt.jar:1.8.0_151]
    at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[rt.jar:1.8.0_151]
    at
org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383)
[javax.enterprise.concurrent-1.0.jar:]
    at
org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534)
[javax.enterprise.concurrent-1.0.jar:]
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[rt.jar:1.8.0_151]
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[rt.jar:1.8.0_151]
    at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151]
    at
org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250)
[javax.enterprise.concurrent-1.0.jar:]
    at
org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)

2018-02-06 09:09:05,381Z ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring]
(EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Exception::
java.lang.NullPointerException
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.auditVmMigrationAbort(VmAnalyzer.java:440)
[vdsbroker.jar:]
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.abortVmMigration(VmAnalyzer.java:432)
[vdsbroker.jar:]
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.proceedDisappearedVm(VmAnalyzer.java:794)
[vdsbroker.jar:]
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.java:135)
[vdsbroker.jar:]
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.lambda$analyzeVms$1(VmsMonitoring.java:136)
[vdsbroker.jar:]
    at java.util.ArrayList.forEach(ArrayList.java:1255)
[rt.jar:1.8.0_151]
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.analyzeVms(VmsMonitoring.java:131)
[vdsbroker.jar:]
    at
org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.perform(VmsMonitoring.java:94)
[vdsbroker.jar:]
    at
org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:43)
[vdsbroker.jar:]
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[rt.jar:1.8.0_151]
    at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[rt.jar:1.8.0_151]
    at
org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383)
[javax.enterprise.concurrent-1.0.jar:]
    at
org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534)
[javax.enterprise.concurrent-1.0.jar:]
    at
java.util.concurre

Re: [ovirt-users] qemu-kvm images corruption

2018-02-06 Thread Nicolas Ecarnot

Hello,

On our two 3.6 DCs, we're still facing qcow2 corruptions, even on 
freshly installed VMs (CentOS7, win2012, win2008...).


(We are still hoping to find some time to migrate all this to 4.2, but 
it's a big work and our one-person team - me - is overwhelmed.)


My workaround is described in my previous thread below, but it's just a 
workaround.


Reading further, I found that :

https://forum.proxmox.com/threads/qcow2-corruption-after-snapshot-or-heavy-disk-i-o.32865/page-2

There are many things I don't know or understand, and I'd like your 
opinion :


- Is "virtio" is synonym of "virtio-blk"?
- Is it true that the development of virtio-scsi is active and the one 
of virtio is stopped?
- People in the proxmox forum seem to say that no qcow2 corruption 
occurs when using IDE (not an option for me) neither virtio-scsi. Does 
any Redhat people ever heard of this?
- Is converting all my VMs to use virtio-scsi a guarantee against 
further corruptions?
- What is the non-official but nonetheless recommended driver oVirt devs 
recommend in the sense of future, development and stability?


Regards,

--
Nicolas ECARNOT

Le 15/09/2017 à 14:06, Nicolas Ecarnot a écrit :

TL;DR:
How to avoid images corruption?


Hello,

On two of our old 3.6 DC, a recent series of VM migrations lead to some 
issues :

- I'm putting a host into maintenance mode
- most of the VM are migrating nicely
- one remaining VM never migrates, and the logs are showing :

* engine.log : "...VM has been paused due to I/O error..."
* vdsm.log : "...Improbable extension request for volume..."

After digging amongst the RH BZ tickets, I saved the day by :
- stopping the VM
- lvchange -ay the adequate /dev/...
- qemu-img check [-r all] /rhev/blahblah
- lvchange -an...
- boot the VM
- enjoy!

Yesterday this worked for a VM where only one error occurred on the qemu 
image, and the repair was easily done by qemu-img.


Today, facing the same issue on another VM, it failed because the errors 
were very numerous, and also because of this message :


[...]
Rebuilding refcount structure
ERROR writing refblock: No space left on device
qemu-img: Check failed: No space left on device
[...]

The PV/VG/LV are far from being full, so I guess I don't where to look at.
I tried many ways to solve it but I'm not comfortable at all with qemu 
images, corruption and solving, so I ended up exporting this VM (to an 
NFS export domain), importing it into another DC : this had the side 
effect to use qemu-img convert from qcow2 to qcow2, and (maybe?) to 
solve some errors???
I also copied it into another qcow2 file with the same qemu-img convert 
way, but it is leading to another clean qcow2 image without errors.


I saw that on 4.x some bugs are fixed about VM migrations, but this is 
not the point here.
I checked my SANs, my network layers, my blades, the OS (CentOS 7.2) of 
my hosts, but I see nothing special.


The real reason behind my message is not to know how to repair anything, 
rather than to understand what could have lead to this situation?

Where to keep a keen eye?




--
Nicolas ECARNOT
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Slow conversion from VMware in 4.1

2018-02-06 Thread Luca 'remix_tj' Lorenzetto
On Mon, Feb 5, 2018 at 11:13 PM, Richard W.M. Jones  wrote:
> http://libguestfs.org/virt-v2v.1.html#vmware-vcenter-resources
>
> You should be able to run multiple conversions in parallel
> to improve throughput.
>
> The only long-term solution is to use a different method such as VMX
> over SSH.  vCenter is just fundamentally bad.

4 conversions in parallel works, but each one is very slow. But i
think i've to blame vcenter cpu which is stuck at 100%.

Thank you for the directions and suggestions,

Luca

-- 
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)

Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt and gateway behavior

2018-02-06 Thread Yaniv Kaul
On Feb 5, 2018 2:21 PM, "Alex K"  wrote:

Hi all,

I have a 3 nodes ovirt 4.1 cluster, self hosted on top of glusterfs. The
cluster is used to host several VMs.
I have observed that when gateway is lost (say the gateway device is down)
the ovirt cluster goes down.


Is the cluster down, or just the self-hosted engine?


It seems a bit extreme behavior especially when one does not care if the
hosted VMs have connectivity to Internet or not.


Are the VMs down?
The hosts?
Y.


Can this behavior be disabled?

Thanx,
Alex

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.2 vdsclient

2018-02-06 Thread Alex K
Hi Benny,

I was trying to do it with vdsm-client without success.

vdsm-client Task -h
usage: vdsm-client Task [-h] method [arg=value] ...

optional arguments:
  -h, --help  show this help message and exit

Task methods:
  method [arg=value]
getInfo   Get information about a Task.
getStatus Get Task status information.
revertRollback a Task to restore the previous system state.
clear Discard information about a finished Task.
stop  Stop a currently running Task.
[root@v0 common]# vdsm-client Task getInfo
vdsm-client: Command Task.getInfo with args {} failed:
(code=-32603, message=Internal JSON-RPC error: {'reason': '__init__() takes
exactly 2 arguments (1 given)'})
[root@v0 common]# vdsm-client Task getStatus
vdsm-client: Command Task.getStatus with args {} failed:
(code=-32603, message=Internal JSON-RPC error: {'reason': '__init__() takes
exactly 2 arguments (1 given)'})

What other arguments does this expect. When using Host namespace I am able
to run the available options.

Thanx,
Alex


On Tue, Feb 6, 2018 at 10:24 AM, Benny Zlotnik  wrote:

> It was replaced by vdsm-client[1]
>
> [1] - https://www.ovirt.org/develop/developer-guide/vdsm/vdsm-client/
>
> On Tue, Feb 6, 2018 at 10:17 AM, Alex K  wrote:
>
>> Hi all,
>>
>> I have a stuck snapshot removal from a VM which is blocking the VM to
>> start.
>> In ovirt 4.1 I was able to cancel the stuck task by running within SPM
>> host:
>>
>> vdsClient -s 0 getAllTasksStatuses
>> vdsClient -s 0 stopTask 
>>
>> Is there a similar way to do at ovirt 4.2?
>>
>> Thanx,
>> Alex
>>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Failed upgrade from 4.1.9 to 4.2.x

2018-02-06 Thread nicolas

El 2018-02-05 14:48, Martin Perina escribió:

On Mon, Feb 5, 2018 at 3:08 PM,  wrote:


El 2018-02-05 14:03, Simone Tiraboschi escribió:
On Mon, Feb 5, 2018 at 2:46 PM,  wrote:

Hi,

We're trying to upgrade from 4.1.9 to 4.2.x and we're bumping into
an error we don't know how to solve. As per [1] we run the
'engine-setup' command and it fails with:

[ INFO  ] Rolling back to the previous PostgreSQL instance
(postgresql).
[ ERROR ] Failed to execute stage 'Misc configuration': Command
'/opt/rh/rh-postgresql95/root/usr/bin/postgresql-setup' failed to
execute
[ INFO  ] Yum Performing yum transaction rollback
[ INFO  ] Stage: Clean up
          Log file is located at



/var/log/ovirt-engine/setup/ovirt-engine-setup-20180205133116-sm2xd1.log

[ INFO  ] Generating answer file
'/var/lib/ovirt-engine/setup/answers/20180205133354-setup.co [1]
[1]nf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Execution of setup failed

As of the



/var/log/ovirt-engine/setup/ovirt-engine-setup-20180205133116-sm2xd1.log

file I could see this:

 * upgrading from 'postgresql.service' to
'rh-postgresql95-postgresql.se [2] [2]rvice'
 * Upgrading database.
ERROR: pg_upgrade tool failed
ERROR: Upgrade failed.
 * See /var/lib/pgsql/upgrade_rh-postgresql95-postgresql.log for
details.

And this file contains this information:

  Performing Consistency Checks
  -
  Checking cluster versions                         
         ok
  Checking database user is the install user             
    ok
  Checking database connection settings                 
     ok
  Checking for prepared transactions                   
      ok
  Checking for reg* system OID user data types             
  ok
  Checking for contrib/isn with bigint-passing mismatch     
 ok
  Checking for invalid "line" user columns               
    ok
  Creating dump of global objects                     
       ok
  Creating dump of database schemas
    django
    engine
    ovirt_engine_history
    postgres
    template1
                                           
                ok
  Checking for presence of required libraries             
   fatal

  Your installation references loadable libraries that are missing
from the
  new installation.  You can add these libraries to the new
installation,
  or remove the functions using them from the old installation. 
A list of
  problem libraries is in the file:
  loadable_libraries.txt

  Failure, exiting

I'm attaching full logs FWIW. Also, I'd like to mention that we
created two custom triggers on the engine's 'users' table, but as I
understand from the error this is not the issue (We upgraded
several
times within the same minor and we had no issues with that).

Could someone shed some light on this error and how to debug it?

Hi,
can you please attach also loadable_libraries.txt ?
 


 Could not load library "$libdir/plpython2"
 ERROR:  could not access file "$libdir/plpython2": No such file or
directory

​Hmm, you probably need to install
rh-postgresql95-postgresql-plpython package. This is not installed by
default with oVirt as we don't use it
​ 


Indeed, this made it. Thank you very much.




Well, definitely it has to do with the triggers... The trigger uses
plpython2u to replicate some entries in a different database. Is
there a way I can get rid of this error other than disabling
plpython2 before upgrading and re-enabling it after the upgrade?

Thanks.

Thanks.

  [1]: https://www.ovirt.org/release/4.2.0/ [3] [3]
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users [4] [4]

Links:
--
[1] http://20180205133354-setup.co [1]
[2] http://rh-postgresql95-postgresql.se [2]
[3] https://www.ovirt.org/release/4.2.0/ [3]
[4] http://lists.ovirt.org/mailman/listinfo/users [4]

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users [4]

--

Martin Perina
Associate Manager, Software Engineering
Red Hat Czech s.r.o.


Links:
--
[1] http://20180205133354-setup.co
[2] http://rh-postgresql95-postgresql.se
[3] https://www.ovirt.org/release/4.2.0/
[4] http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.2 vdsclient

2018-02-06 Thread Benny Zlotnik
It was replaced by vdsm-client[1]

[1] - https://www.ovirt.org/develop/developer-guide/vdsm/vdsm-client/

On Tue, Feb 6, 2018 at 10:17 AM, Alex K  wrote:

> Hi all,
>
> I have a stuck snapshot removal from a VM which is blocking the VM to
> start.
> In ovirt 4.1 I was able to cancel the stuck task by running within SPM
> host:
>
> vdsClient -s 0 getAllTasksStatuses
> vdsClient -s 0 stopTask 
>
> Is there a similar way to do at ovirt 4.2?
>
> Thanx,
> Alex
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt and gateway behavior

2018-02-06 Thread Alex K
Hi Edward,

So this is not an expected behavior?
I will collect logs as soon as I reproduce it.

Thanx,
Alex

On Tue, Feb 6, 2018 at 9:36 AM, Edward Haas  wrote:

> Hi Alex,
>
> Please provide Engine logs from when this is occurring and mention the
> date/time we should focus at.
>
> Thanks,
> Edy.
>
>
> On Mon, Feb 5, 2018 at 2:19 PM, Alex K  wrote:
>
>> Hi all,
>>
>> I have a 3 nodes ovirt 4.1 cluster, self hosted on top of glusterfs. The
>> cluster is used to host several VMs.
>> I have observed that when gateway is lost (say the gateway device is
>> down) the ovirt cluster goes down.
>>
>> It seems a bit extreme behavior especially when one does not care if the
>> hosted VMs have connectivity to Internet or not.
>>
>> Can this behavior be disabled?
>>
>> Thanx,
>> Alex
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] ovirt 4.2 vdsclient

2018-02-06 Thread Alex K
Hi all,

I have a stuck snapshot removal from a VM which is blocking the VM to
start.
In ovirt 4.1 I was able to cancel the stuck task by running within SPM
host:

vdsClient -s 0 getAllTasksStatuses
vdsClient -s 0 stopTask 

Is there a similar way to do at ovirt 4.2?

Thanx,
Alex
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users