[ovirt-devel] Re: dynamic ownership changes

2018-05-31 Thread Martin Polednik

On 31/05/18 14:58 +0300, Elad Ben Aharon wrote:

Martin, please update, if you think the failures are not related to your
patch I'll test with the master as Nir suggested.

Thanks


I believe the failures are related to the patch, and more specifically
to the way libvirt handles seclabel for snapshots.

Opened https://bugzilla.redhat.com/show_bug.cgi?id=1584682.


On Thu, May 31, 2018 at 1:19 PM, Nir Soffer  wrote:


On Thu, May 31, 2018 at 1:05 PM Martin Polednik 
wrote:


On 31/05/18 12:47 +0300, Elad Ben Aharon wrote:
>Execution is done, 59/65 cases passed. Latest 4.2.4 execution ended with
>100% so failures were caused probably due to the changes done in the
patch.
>Failures are mainly on preview snapshots.



Can we run the same job on the patch before Martin patch? maybe
the issue are already in master, caused by other patches?



>
>Execution info provided to Martin separately.

I'm currently investigating the snapshot breakage, thanks Elad!

>On Wed, May 30, 2018 at 5:44 PM, Elad Ben Aharon 
>wrote:
>
>> Triggered a sanity automation execution using [1], which covers all the
>> requested areas, on iSCSI, NFS and Gluster.
>> I'll update with the results.
>>
>> [1]
>> *https://gerrit.ovirt.org/#/c/90906/ <https://gerrit.ovirt.org/#/c/
90906/>*
>> vdsm-4.20.28-6.gitc23aef6.el7.x86_64
>>
>>
>> On Tue, May 29, 2018 at 4:26 PM, Martin Polednik 
>> wrote:
>>
>>> On 29/05/18 15:30 +0300, Elad Ben Aharon wrote:
>>>
>>>> Hi Martin,
>>>>
>>>> Can you please create a cerry pick patch that is based on 4.2?
>>>>
>>>
>>> See https://gerrit.ovirt.org/#/c/90906/. The CI failure isn unrelated
>>> (storage needs real env).
>>>
>>> mpolednik
>>>
>>>
>>>
>>>> Thanks
>>>>
>>>> On Tue, May 29, 2018 at 1:34 PM, Dan Kenigsberg 
>>>> wrote:
>>>>
>>>> On Tue, May 29, 2018 at 1:21 PM, Elad Ben Aharon <
ebena...@redhat.com>
>>>>> wrote:
>>>>> > Hi Dan,
>>>>> >
>>>>> > In the last execution, the success rate was very low due to a
large
>>>>> number
>>>>> > of failures on start VM caused, according to Michal, by the
>>>>> > vdsm-hook-allocate_net that was installed on the host.
>>>>> >
>>>>> > This is the latest status here, would you like me to re-execute?
>>>>>
>>>>> yes, of course. but you should rebase Polednik's code on top of
>>>>> *current* ovirt-4.2.3 branch.
>>>>>
>>>>> > If so, with
>>>>> > or W/O vdsm-hook-allocate_net installed?
>>>>>
>>>>> There was NO reason to have that installed. Please keep it (and any
>>>>> other needless code) out of the test environment.
>>>>>
>>>>> >
>>>>> > On Tue, May 29, 2018 at 1:14 PM, Dan Kenigsberg <
dan...@redhat.com>
>>>>> wrote:
>>>>> >>
>>>>> >> On Mon, May 7, 2018 at 3:53 PM, Michal Skrivanek
>>>>> >>  wrote:
>>>>> >> > Hi Elad,
>>>>> >> > why did you install vdsm-hook-allocate_net?
>>>>> >> >
>>>>> >> > adding Dan as I think the hook is not supposed to fail this
badly
>>>>> in
>>>>> any
>>>>> >> > case
>>>>> >>
>>>>> >> yep, this looks bad and deserves a little bug report. Installing
this
>>>>> >> little hook should not block vm startup.
>>>>> >>
>>>>> >> But more importantly - what is the conclusion of this thread? Do
we
>>>>> >> have a green light from QE to take this in?
>>>>> >>
>>>>> >>
>>>>> >> >
>>>>> >> > Thanks,
>>>>> >> > michal
>>>>> >> >
>>>>> >> > On 5 May 2018, at 19:22, Elad Ben Aharon 
>>>>> wrote:
>>>>> >> >
>>>>> >> > Start VM fails on:
>>>>> >> >
>>>>> >> > 2018-05-05 17:53:27,399+0300 INFO  (vm/e6ce66ce) [virt.vm]
>>>>> >> > (vmId='e6ce66ce-852f-48c5-9997-5d2959432a27') drive 'vda'
path:
>>>>> >> >
>>>>> >> > 'dev=/rhev/data-center/mnt/blockSD/db5a6696-d907-4938-
>>>>> 9a78-bdd13a843c62/images/6

[ovirt-devel] Re: dynamic ownership changes

2018-05-31 Thread Martin Polednik

On 31/05/18 12:47 +0300, Elad Ben Aharon wrote:

Execution is done, 59/65 cases passed. Latest 4.2.4 execution ended with
100% so failures were caused probably due to the changes done in the patch.
Failures are mainly on preview snapshots.

Execution info provided to Martin separately.


I'm currently investigating the snapshot breakage, thanks Elad!


On Wed, May 30, 2018 at 5:44 PM, Elad Ben Aharon 
wrote:


Triggered a sanity automation execution using [1], which covers all the
requested areas, on iSCSI, NFS and Gluster.
I'll update with the results.

[1]
*https://gerrit.ovirt.org/#/c/90906/ <https://gerrit.ovirt.org/#/c/90906/>*
vdsm-4.20.28-6.gitc23aef6.el7.x86_64


On Tue, May 29, 2018 at 4:26 PM, Martin Polednik 
wrote:


On 29/05/18 15:30 +0300, Elad Ben Aharon wrote:


Hi Martin,

Can you please create a cerry pick patch that is based on 4.2?



See https://gerrit.ovirt.org/#/c/90906/. The CI failure isn unrelated
(storage needs real env).

mpolednik




Thanks

On Tue, May 29, 2018 at 1:34 PM, Dan Kenigsberg 
wrote:

On Tue, May 29, 2018 at 1:21 PM, Elad Ben Aharon 

wrote:
> Hi Dan,
>
> In the last execution, the success rate was very low due to a large
number
> of failures on start VM caused, according to Michal, by the
> vdsm-hook-allocate_net that was installed on the host.
>
> This is the latest status here, would you like me to re-execute?

yes, of course. but you should rebase Polednik's code on top of
*current* ovirt-4.2.3 branch.

> If so, with
> or W/O vdsm-hook-allocate_net installed?

There was NO reason to have that installed. Please keep it (and any
other needless code) out of the test environment.

>
> On Tue, May 29, 2018 at 1:14 PM, Dan Kenigsberg 
wrote:
>>
>> On Mon, May 7, 2018 at 3:53 PM, Michal Skrivanek
>>  wrote:
>> > Hi Elad,
>> > why did you install vdsm-hook-allocate_net?
>> >
>> > adding Dan as I think the hook is not supposed to fail this badly
in
any
>> > case
>>
>> yep, this looks bad and deserves a little bug report. Installing this
>> little hook should not block vm startup.
>>
>> But more importantly - what is the conclusion of this thread? Do we
>> have a green light from QE to take this in?
>>
>>
>> >
>> > Thanks,
>> > michal
>> >
>> > On 5 May 2018, at 19:22, Elad Ben Aharon 
wrote:
>> >
>> > Start VM fails on:
>> >
>> > 2018-05-05 17:53:27,399+0300 INFO  (vm/e6ce66ce) [virt.vm]
>> > (vmId='e6ce66ce-852f-48c5-9997-5d2959432a27') drive 'vda' path:
>> >
>> > 'dev=/rhev/data-center/mnt/blockSD/db5a6696-d907-4938-
9a78-bdd13a843c62/images/6cdabfe5-
>> > d1ca-40af-ae63-9834f235d1c8/7ef97445-30e6-4435-8425-f35a01928211'
->
>> >
>> > u'*dev=/rhev/data-center/mnt/blockSD/db5a6696-d907-4938-
9a78-bdd13a843c62/images/6cdabfe5-d1ca-40af-ae63-
9834f235d1c8/7ef97445-30e6-4435-8425-
>> > f35a01928211' (storagexml:334)
>> > 2018-05-05 17:53:27,888+0300 INFO  (jsonrpc/1) [vdsm.api] START
>> > getSpmStatus(spUUID='940fe6f3-b0c6-4d0c-a921-198e7819c1cc',
>> > options=None)
>> > from=:::10.35.161.127,53512,
>> > task_id=c70ace39-dbfe-4f5c-ae49-a1e3a82c
>> > 2758 (api:46)
>> > 2018-05-05 17:53:27,909+0300 INFO  (vm/e6ce66ce) [root]
>> > /usr/libexec/vdsm/hooks/before_device_create/10_allocate_net: rc=2
>> > err=vm
>> > net allocation hook: [unexpected error]: Traceback (most recent
call
>> > last):
>> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_ne
t",
>> > line
>> > 105, in 
>> >main()
>> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_ne
t",
>> > line
>> > 93, in main
>> >allocate_random_network(device_xml)
>> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_ne
t",
>> > line
>> > 62, in allocate_random_network
>> >net = _get_random_network()
>> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_ne
t",
>> > line
>> > 50, in _get_random_network
>> >available_nets = _parse_nets()
>> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_ne
t",
>> > line
>> > 46, in _parse_nets
>> >return [net for net in os.environ[AVAIL_NETS_KEY].split()]
>> >  File "/usr/lib64/python2.7/UserDict.py", line 23, in __getitem__
>> >raise KeyError(key)
>> > KeyError: 'equivnets'
>> >
>> >
>> > (hooks:110)
>> > 2018-05-05 17:53:27,915+0300 ERROR (vm/e6ce66ce) [virt.vm]
>> > (vmId='e6ce66ce-852f-48c5

[ovirt-devel] Re: dynamic ownership changes

2018-05-29 Thread Martin Polednik
omxml_
preprocess.py",
>> > line 240, in replace_device_xml_with_hooks_xml
>> >dev_custom)
>> >  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
134,
>> > in
>> > before_device_create
>> >params=customProperties)
>> >  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
120,
>> > in
>> > _runHooksDir
>> >raise exception.HookError(err)
>> > HookError: Hook Error: ('vm net allocation hook: [unexpected error]:
>> > Traceback (most recent call last):\n  File
>> > "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line
>> > 105, in
>> > \nmain()\n
>> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
>> > line
>> > 93, in main\nallocate_random_network(device_xml)\n  File
>> > "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line
62,
>> > i
>> > n allocate_random_network\nnet = _get_random_network()\n  File
>> > "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line
50,
>> > in
>> > _get_random_network\navailable_nets = _parse_nets()\n  File "/us
>> > r/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 46,
in
>> > _parse_nets\nreturn [net for net in
>> > os.environ[AVAIL_NETS_KEY].split()]\n  File
>> > "/usr/lib64/python2.7/UserDict.py", line 23, in __getit
>> > em__\nraise KeyError(key)\nKeyError: \'equivnets\'\n\n\n',)
>> >
>> >
>> >
>> > Hence, the success rate was 28% against 100% running with d/s (d/s).
If
>> > needed, I'll compare against the latest master, but I think you get
the
>> > picture with d/s.
>> >
>> > vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64
>> > libvirt-3.9.0-14.el7_5.3.x86_64
>> > qemu-kvm-rhev-2.10.0-21.el7_5.2.x86_64
>> > kernel 3.10.0-862.el7.x86_64
>> > rhel7.5
>> >
>> >
>> > Logs attached
>> >
>> > On Sat, May 5, 2018 at 1:26 PM, Elad Ben Aharon 
>> > wrote:
>> >>
>> >> nvm, found gluster 3.12 repo, managed to install vdsm
>> >>
>> >> On Sat, May 5, 2018 at 1:12 PM, Elad Ben Aharon 
>> >> wrote:
>> >>>
>> >>> No, vdsm requires it:
>> >>>
>> >>> Error: Package: vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64
>> >>> (/vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64)
>> >>>   Requires: glusterfs-fuse >= 3.12
>> >>>   Installed: glusterfs-fuse-3.8.4-54.8.el7.x86_64
(@rhv-4.2.3)
>> >>>
>> >>> Therefore, vdsm package installation is skipped upon force install.
>> >>>
>> >>> On Sat, May 5, 2018 at 11:42 AM, Michal Skrivanek
>> >>>  wrote:
>> >>>>
>> >>>>
>> >>>>
>> >>>> On 5 May 2018, at 00:38, Elad Ben Aharon 
wrote:
>> >>>>
>> >>>> Hi guys,
>> >>>>
>> >>>> The vdsm build from the patch requires glusterfs-fuse > 3.12. This
is
>> >>>> while the latest 4.2.3-5 d/s build requires 3.8.4
(3.4.0.59rhs-1.el7)
>> >>>>
>> >>>>
>> >>>> because it is still oVirt, not a downstream build. We can’t really
do
>> >>>> downstream builds with unmerged changes:/
>> >>>>
>> >>>> Trying to get this gluster-fuse build, so far no luck.
>> >>>> Is this requirement intentional?
>> >>>>
>> >>>>
>> >>>> it should work regardless, I guess you can force install it without
>> >>>> the
>> >>>> dependency
>> >>>>
>> >>>>
>> >>>> On Fri, May 4, 2018 at 2:38 PM, Michal Skrivanek
>> >>>>  wrote:
>> >>>>>
>> >>>>> Hi Elad,
>> >>>>> to make it easier to compare, Martin backported the change to 4.2
so
>> >>>>> it
>> >>>>> is actually comparable with a run without that patch. Would you
>> >>>>> please try
>> >>>>> that out?
>> >>>>> It would be best to have 4.2 upstream and this[1] run to really
>> >>>>> minimize the noise.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> michal
&g

Re: [ovirt-devel] dynamic ownership changes

2018-04-27 Thread Martin Polednik

On 24/04/18 00:37 +0300, Elad Ben Aharon wrote:

I will update with the results of the next tier1 execution on latest 4.2.3


That isn't master but old branch though. Could you run it against
*current* VDSM master?


On Mon, Apr 23, 2018 at 3:56 PM, Martin Polednik <mpoled...@redhat.com>
wrote:


On 23/04/18 01:23 +0300, Elad Ben Aharon wrote:


Hi, I've triggered another execution [1] due to some issues I saw in the
first which are not related to the patch.

The success rate is 78% which is low comparing to tier1 executions with
code from downstream builds (95-100% success rates) [2].



Could you run the current master (without the dynamic_ownership patch)
so that we have viable comparision?

From what I could see so far, there is an issue with move and copy

operations to and from Gluster domains. For example [3].

The logs are attached.


[1]
*https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv
-4.2-ge-runner-tier1-after-upgrade/7/testReport/
<https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv
-4.2-ge-runner-tier1-after-upgrade/7/testReport/>*



[2]
https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/

rhv-4.2-ge-runner-tier1-after-upgrade/7/



[3]
2018-04-22 13:06:28,316+0300 INFO  (jsonrpc/7) [vdsm.api] FINISH
deleteImage error=Image does not exist in domain:
'image=cabb8846-7a4b-4244-9835-5f603e682f33,
domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'
from=:
::10.35.161.182,40936, flow_id=disks_syncAction_ba6b2630-5976-4935,
task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51)
2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task]
(Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error (task:875)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
in
_run
  return fn(*args, **kargs)
File "", line 2, in deleteImage
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in
method
  ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1503,
in
deleteImage
  raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
ImageDoesNotExistInSD: Image does not exist in domain:
'image=cabb8846-7a4b-4244-9835-5f603e682f33,
domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'

2018-04-22 13:06:28,317+0300 INFO  (jsonrpc/7) [storage.TaskManager.Task]
(Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is aborted:
"Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835-
5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code 268
(task:1181)
2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH
deleteImage error=Image does not exist in domain:
'image=cabb8846-7a4b-4244-9835-5f603e682f33,
domain=e5fd29c8-52ba-467e-be09
-ca40ff054d
d4' (dispatcher:82)



On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon <ebena...@redhat.com>
wrote:

Triggered a sanity tier1 execution [1] using [2], which covers all the

requested areas, on iSCSI, NFS and Gluster.
I'll update with the results.

[1]
https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2
_dev/job/rhv-4.2-ge-flow-storage/1161/

[2]
https://gerrit.ovirt.org/#/c/89830/
vdsm-4.30.0-291.git77aef9a.el7.x86_64



On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik <mpoled...@redhat.com>
wrote:

On 19/04/18 14:54 +0300, Elad Ben Aharon wrote:


Hi Martin,


I see [1] requires a rebase, can you please take care?



Should be rebased.

At the moment, our automation is stable only on iSCSI, NFS, Gluster and


FC.
Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's
not
stable enough at the moment.



That is still pretty good.


[1] https://gerrit.ovirt.org/#/c/89830/




Thanks

On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik <mpoled...@redhat.com
>
wrote:

On 18/04/18 11:37 +0300, Elad Ben Aharon wrote:



Hi, sorry if I misunderstood, I waited for more input regarding what


areas
have to be tested here.


I'd say that you have quite a bit of freedom in this regard.

GlusterFS
should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite
that covers basic operations (start & stop VM, migrate it), snapshots
and merging them, and whatever else would be important for storage
sanity.

mpolednik


On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik <
mpoled...@redhat.com
>

wrote:


On 11/04/18 16:52 +0300, Elad Ben Aharon wrote:



We can test this on iSCSI, NFS and GlusterFS. As for ceph and
cinder,

will

have to check, since usually, we don't execute our automation on
them.


Any update on this? I believe the gluster tests were successful,
OST


passes fine and unit tests pass fine, that makes the storage
backends
test the last required piece.


On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir <rata...@redhat.com>
wrote:


+Elad



On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg <dan...@redhat.com

>
wrote:

On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer <nsof...@redh

Re: [ovirt-devel] dynamic ownership changes

2018-04-23 Thread Martin Polednik

On 23/04/18 01:23 +0300, Elad Ben Aharon wrote:

Hi, I've triggered another execution [1] due to some issues I saw in the
first which are not related to the patch.

The success rate is 78% which is low comparing to tier1 executions with
code from downstream builds (95-100% success rates) [2].


Could you run the current master (without the dynamic_ownership patch)
so that we have viable comparision?


From what I could see so far, there is an issue with move and copy
operations to and from Gluster domains. For example [3].

The logs are attached.


[1]
*https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-tier1-after-upgrade/7/testReport/
<https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-tier1-after-upgrade/7/testReport/>*



[2]
https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/
rhv-4.2-ge-runner-tier1-after-upgrade/7/



[3]
2018-04-22 13:06:28,316+0300 INFO  (jsonrpc/7) [vdsm.api] FINISH
deleteImage error=Image does not exist in domain:
'image=cabb8846-7a4b-4244-9835-5f603e682f33,
domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'
from=:
::10.35.161.182,40936, flow_id=disks_syncAction_ba6b2630-5976-4935,
task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51)
2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task]
(Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error (task:875)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in
_run
  return fn(*args, **kargs)
File "", line 2, in deleteImage
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in
method
  ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1503, in
deleteImage
  raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
ImageDoesNotExistInSD: Image does not exist in domain:
'image=cabb8846-7a4b-4244-9835-5f603e682f33,
domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'

2018-04-22 13:06:28,317+0300 INFO  (jsonrpc/7) [storage.TaskManager.Task]
(Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is aborted:
"Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835-
5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code 268
(task:1181)
2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH
deleteImage error=Image does not exist in domain:
'image=cabb8846-7a4b-4244-9835-5f603e682f33, domain=e5fd29c8-52ba-467e-be09
-ca40ff054d
d4' (dispatcher:82)



On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon <ebena...@redhat.com>
wrote:


Triggered a sanity tier1 execution [1] using [2], which covers all the
requested areas, on iSCSI, NFS and Gluster.
I'll update with the results.

[1]
https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2
_dev/job/rhv-4.2-ge-flow-storage/1161/

[2]
https://gerrit.ovirt.org/#/c/89830/
vdsm-4.30.0-291.git77aef9a.el7.x86_64



On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik <mpoled...@redhat.com>
wrote:


On 19/04/18 14:54 +0300, Elad Ben Aharon wrote:


Hi Martin,

I see [1] requires a rebase, can you please take care?



Should be rebased.

At the moment, our automation is stable only on iSCSI, NFS, Gluster and

FC.
Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's not
stable enough at the moment.



That is still pretty good.


[1] https://gerrit.ovirt.org/#/c/89830/



Thanks

On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik <mpoled...@redhat.com>
wrote:

On 18/04/18 11:37 +0300, Elad Ben Aharon wrote:


Hi, sorry if I misunderstood, I waited for more input regarding what

areas
have to be tested here.



I'd say that you have quite a bit of freedom in this regard. GlusterFS
should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite
that covers basic operations (start & stop VM, migrate it), snapshots
and merging them, and whatever else would be important for storage
sanity.

mpolednik


On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik <mpoled...@redhat.com
>


wrote:

On 11/04/18 16:52 +0300, Elad Ben Aharon wrote:



We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder,


will
have to check, since usually, we don't execute our automation on
them.


Any update on this? I believe the gluster tests were successful, OST

passes fine and unit tests pass fine, that makes the storage backends
test the last required piece.


On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir <rata...@redhat.com>
wrote:



+Elad



On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg <dan...@redhat.com>
wrote:

On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer <nsof...@redhat.com>
wrote:



On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri <ee...@redhat.com>
wrote:



Please make sure to run as much OST suites on this patch as
possible

before merging ( using 'ci please build' )



But note that OST is not a way to verify the patch.



Such changes require testing with all storage type

Re: [ovirt-devel] dynamic ownership changes

2018-04-19 Thread Martin Polednik

On 19/04/18 14:54 +0300, Elad Ben Aharon wrote:

Hi Martin,

I see [1] requires a rebase, can you please take care?


Should be rebased.


At the moment, our automation is stable only on iSCSI, NFS, Gluster and FC.
Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's not
stable enough at the moment.


That is still pretty good.


[1] https://gerrit.ovirt.org/#/c/89830/


Thanks

On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik <mpoled...@redhat.com>
wrote:


On 18/04/18 11:37 +0300, Elad Ben Aharon wrote:


Hi, sorry if I misunderstood, I waited for more input regarding what areas
have to be tested here.



I'd say that you have quite a bit of freedom in this regard. GlusterFS
should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite
that covers basic operations (start & stop VM, migrate it), snapshots
and merging them, and whatever else would be important for storage
sanity.

mpolednik


On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik <mpoled...@redhat.com>

wrote:

On 11/04/18 16:52 +0300, Elad Ben Aharon wrote:


We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder,

will
have to check, since usually, we don't execute our automation on them.



Any update on this? I believe the gluster tests were successful, OST
passes fine and unit tests pass fine, that makes the storage backends
test the last required piece.


On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir <rata...@redhat.com> wrote:



+Elad



On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg <dan...@redhat.com>
wrote:

On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer <nsof...@redhat.com>
wrote:



On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri <ee...@redhat.com> wrote:



Please make sure to run as much OST suites on this patch as possible


before merging ( using 'ci please build' )


But note that OST is not a way to verify the patch.


Such changes require testing with all storage types we support.

Nir

On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik <
mpoled...@redhat.com
>

wrote:


Hey,



I've created a patch[0] that is finally able to activate libvirt's
dynamic_ownership for VDSM while not negatively affecting
functionality of our storage code.

That of course comes with quite a bit of code removal, mostly in
the
area of host devices, hwrng and anything that touches devices;
bunch
of test changes and one XML generation caveat (storage is handled
by
VDSM, therefore disk relabelling needs to be disabled on the VDSM
level).

Because of the scope of the patch, I welcome storage/virt/network
people to review the code and consider the implication this change
has
on current/future features.

[0] https://gerrit.ovirt.org/#/c/89830/


In particular:  dynamic_ownership was set to 0 prehistorically (as



part

of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because
libvirt,
running as root, was not able to play properly with root-squash nfs
mounts.

Have you attempted this use case?

I join to Nir's request to run this with storage QE.





--


Raz Tamir
Manager, RHV QE




___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] dynamic ownership changes

2018-04-18 Thread Martin Polednik

On 18/04/18 11:37 +0300, Elad Ben Aharon wrote:

Hi, sorry if I misunderstood, I waited for more input regarding what areas
have to be tested here.


I'd say that you have quite a bit of freedom in this regard. GlusterFS
should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite
that covers basic operations (start & stop VM, migrate it), snapshots
and merging them, and whatever else would be important for storage
sanity.

mpolednik


On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik <mpoled...@redhat.com>
wrote:


On 11/04/18 16:52 +0300, Elad Ben Aharon wrote:


We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder, will
have to check, since usually, we don't execute our automation on them.



Any update on this? I believe the gluster tests were successful, OST
passes fine and unit tests pass fine, that makes the storage backends
test the last required piece.


On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir <rata...@redhat.com> wrote:


+Elad


On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg <dan...@redhat.com>
wrote:

On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer <nsof...@redhat.com> wrote:


On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri <ee...@redhat.com> wrote:


Please make sure to run as much OST suites on this patch as possible

before merging ( using 'ci please build' )



But note that OST is not a way to verify the patch.

Such changes require testing with all storage types we support.

Nir

On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik <mpoled...@redhat.com
>


wrote:

Hey,


I've created a patch[0] that is finally able to activate libvirt's
dynamic_ownership for VDSM while not negatively affecting
functionality of our storage code.

That of course comes with quite a bit of code removal, mostly in the
area of host devices, hwrng and anything that touches devices; bunch
of test changes and one XML generation caveat (storage is handled by
VDSM, therefore disk relabelling needs to be disabled on the VDSM
level).

Because of the scope of the patch, I welcome storage/virt/network
people to review the code and consider the implication this change
has
on current/future features.

[0] https://gerrit.ovirt.org/#/c/89830/



In particular:  dynamic_ownership was set to 0 prehistorically (as

part
of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because
libvirt,
running as root, was not able to play properly with root-squash nfs
mounts.

Have you attempted this use case?

I join to Nir's request to run this with storage QE.





--


Raz Tamir
Manager, RHV QE



___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] dynamic ownership changes

2018-04-18 Thread Martin Polednik

On 11/04/18 16:52 +0300, Elad Ben Aharon wrote:

We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder, will
have to check, since usually, we don't execute our automation on them.


Any update on this? I believe the gluster tests were successful, OST
passes fine and unit tests pass fine, that makes the storage backends
test the last required piece.


On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir <rata...@redhat.com> wrote:


+Elad

On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg <dan...@redhat.com> wrote:


On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer <nsof...@redhat.com> wrote:


On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri <ee...@redhat.com> wrote:


Please make sure to run as much OST suites on this patch as possible
before merging ( using 'ci please build' )



But note that OST is not a way to verify the patch.

Such changes require testing with all storage types we support.

Nir

On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik <mpoled...@redhat.com>

wrote:


Hey,

I've created a patch[0] that is finally able to activate libvirt's
dynamic_ownership for VDSM while not negatively affecting
functionality of our storage code.

That of course comes with quite a bit of code removal, mostly in the
area of host devices, hwrng and anything that touches devices; bunch
of test changes and one XML generation caveat (storage is handled by
VDSM, therefore disk relabelling needs to be disabled on the VDSM
level).

Because of the scope of the patch, I welcome storage/virt/network
people to review the code and consider the implication this change has
on current/future features.

[0] https://gerrit.ovirt.org/#/c/89830/




In particular:  dynamic_ownership was set to 0 prehistorically (as part
of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because libvirt,
running as root, was not able to play properly with root-squash nfs mounts.

Have you attempted this use case?

I join to Nir's request to run this with storage QE.





--


Raz Tamir
Manager, RHV QE


___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] dynamic ownership changes

2018-04-11 Thread Martin Polednik

On 11/04/18 16:28 +0300, Dan Kenigsberg wrote:

On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer <nsof...@redhat.com> wrote:


On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri <ee...@redhat.com> wrote:


Please make sure to run as much OST suites on this patch as possible
before merging ( using 'ci please build' )



But note that OST is not a way to verify the patch.

Such changes require testing with all storage types we support.

Nir

On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik <mpoled...@redhat.com>

wrote:


Hey,

I've created a patch[0] that is finally able to activate libvirt's
dynamic_ownership for VDSM while not negatively affecting
functionality of our storage code.

That of course comes with quite a bit of code removal, mostly in the
area of host devices, hwrng and anything that touches devices; bunch
of test changes and one XML generation caveat (storage is handled by
VDSM, therefore disk relabelling needs to be disabled on the VDSM
level).

Because of the scope of the patch, I welcome storage/virt/network
people to review the code and consider the implication this change has
on current/future features.

[0] https://gerrit.ovirt.org/#/c/89830/




In particular:  dynamic_ownership was set to 0 prehistorically (as part of
https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because libvirt,
running as root, was not able to play properly with root-squash nfs mounts.

Have you attempted this use case?


I have not. Added this to my to-do list.

The important part to note about this patch (compared to my previous
attempts in the past) is that it explicitly disables dynamic_ownership
for FILE/BLOCK-backed disks. That means, unless `seclabel` is broken
on libivrt side, the behavior would be unchanged for storage.


I join to Nir's request to run this with storage QE.

___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] dynamic ownership changes

2018-04-11 Thread Martin Polednik

On 11/04/18 12:27 +, Nir Soffer wrote:

On Wed, Apr 11, 2018 at 12:38 PM Eyal Edri <ee...@redhat.com> wrote:


On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer <nsof...@redhat.com> wrote:


On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri <ee...@redhat.com> wrote:


Please make sure to run as much OST suites on this patch as possible
before merging ( using 'ci please build' )



But note that OST is not a way to verify the patch.

Such changes require testing with all storage types we support.



Well, we already have HE suite that runs on ISCSI, so at least we have
NFS+ISCSI on nested,
for real storage testing, you'll have to do it manually



We need glusterfs (both native and fuse based), and cinder/ceph storage.

But we cannot practically test all flows with all types of storage for
every patch.


That leads to a question - how do I go around verifying such patch
without sufficient environment? Is there someone from storage QA that
could assist with this?


Nir







Nir

On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik <mpoled...@redhat.com>

wrote:


Hey,

I've created a patch[0] that is finally able to activate libvirt's
dynamic_ownership for VDSM while not negatively affecting
functionality of our storage code.

That of course comes with quite a bit of code removal, mostly in the
area of host devices, hwrng and anything that touches devices; bunch
of test changes and one XML generation caveat (storage is handled by
VDSM, therefore disk relabelling needs to be disabled on the VDSM
level).

Because of the scope of the patch, I welcome storage/virt/network
people to review the code and consider the implication this change has
on current/future features.

[0] https://gerrit.ovirt.org/#/c/89830/

mpolednik
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel





--

Eyal edri


MANAGER

RHV DevOps

EMEA VIRTUALIZATION R


Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED.
<https://redhat.com/trusted>
phone: +972-9-7692018 <+972%209-769-2018>
irc: eedri (on #tlv #rhev-dev #rhev-integ)
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel






--

Eyal edri


MANAGER

RHV DevOps

EMEA VIRTUALIZATION R


Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
phone: +972-9-7692018 <+972%209-769-2018>
irc: eedri (on #tlv #rhev-dev #rhev-integ)


___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] dynamic ownership changes

2018-04-10 Thread Martin Polednik

Hey,

I've created a patch[0] that is finally able to activate libvirt's
dynamic_ownership for VDSM while not negatively affecting
functionality of our storage code.

That of course comes with quite a bit of code removal, mostly in the
area of host devices, hwrng and anything that touches devices; bunch
of test changes and one XML generation caveat (storage is handled by
VDSM, therefore disk relabelling needs to be disabled on the VDSM
level).

Because of the scope of the patch, I welcome storage/virt/network
people to review the code and consider the implication this change has
on current/future features.

[0] https://gerrit.ovirt.org/#/c/89830/

mpolednik
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] high performance VM preset

2017-05-24 Thread Martin Polednik

On 24/05/17 12:57 +0200, Michal Skrivanek wrote:

Hi all,
we plan to work on an improvement in VM definition for high performance 
workloads which do not require desktop-class devices and generally favor 
highest possible performance in expense of less flexibility.
We’re thinking of adding a new VM preset in addition to current Desktop and 
Server in New VM dialog, which would automatically pre-select existing options 
in the right way, and suggest/warn on suboptimal configuration
All the presets and warning can be changed and ignored. There are few things we 
already identified as boosting performance and/or minimize the complexity of 
the VM, so we plan the preset to:
- remove all graphical consoles and set the VM as headless, making it 
accessible by serial console.
- disable all USB.
- disable soundcard.
- enable I/O Threads, just one for all disks by default.
- set host cpu passthrough (effectively disabling VM live migration), add I/O 
Thread pinning in a similar way as the existing CPU pinning.
We plan the following checks and suggest to perform CPU pinning, host topology 
== guest topology (number of cores per socket and threads per core should 
match), NUMA topology host and guest match, check and suggest the I/O threads 
pinning.
A popup on a VM dialog save seems suitable.


As for the checks, I'd prefer to see slightly more fine-grained
topology analysis. We don't need the guest topology to overlap with
hosts topology, but it needs to fit in. So the checks should be
(ordered by it's topology significance):

1) #guest_numa_nodes <= #host_numa_nodes
2) #guest_sockets_per_node <= #host_sockets_per_node
3) #guest_cpus_per_socket <= #host_cpus_per_socket (this check has to
account for cores X threads difference)
4) guest_ram_per_node <= host_ram_per_node

These four checks guarantee that each guest's numa fits onto host numa
node and that we're not requesting more nodes than we have at disposal.
Now if these don't pass, each check can have suggestion on how to
proceed:

1) lower the number of numa nodes but increase sockets/cores/memory
per node
2) increase number of numa nodes
3) increase number of sockets
4) increase number of numa nodes

Should these checks fail, the VM can't be started in high performance mode.
In short, we've relaxed the requirement that host topology == guest
topology to host topology >= guest topology.

If the checks pass, we should recommend pinning the numa nodes (and do
strict/prefered pinning rather than interleave), numa
memory and CPUs to make sure it will match the topology. Additionally,
if the VM uses host devices (incl. SR-IOV), the pinning should be as
close to the node where devices' MMIO resides as possible.

Additionally, we should suggest to leave some CPUs within numa node
and pin iothreads/emulator against these.

BTW what about virtio-scsi vs virtio-blk in this case? High
performance may be the case where virtio-blk is reasonable.


currently identified task and status can be followed on trello card[1]

Please share your thoughts, questions, any kind of feedback…

Thanks,
michal


[1] https://trello.com/c/MHRDD8ZO


___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] Engine XML: metadata and devices from XML

2017-03-23 Thread Martin Polednik

On 22/03/17 16:52 +0100, Francesco Romani wrote:

On 03/18/2017 01:14 PM, Nir Soffer wrote:



On Fri, Mar 17, 2017 at 4:58 PM Francesco Romani <from...@redhat.com
<mailto:from...@redhat.com>> wrote:

On 03/16/2017 08:03 PM, Francesco Romani wrote:
> On 03/16/2017 01:26 PM, Francesco Romani wrote:
>> On 03/16/2017 11:47 AM, Michal Skrivanek wrote:
>>>> On 16 Mar 2017, at 09:45, Francesco Romani
<from...@redhat.com <mailto:from...@redhat.com>> wrote:
>>>>
>>>> We talked about sending storage device purely on metadata,
letting Vdsm
>>>> rebuild them and getting the XML like today.
>>>>
>>>> In the other direction, Vdsm will pass through the XML
(perhaps only
>>>> parts of it, e.g. the devices subtree) like before.
>>>>
>>>> This way we can minimize the changes we are uncertain of, and
more
>>>> importantly, we can minimize the risky changes.
>>>>
>>>>
>>>> The following is  a realistic example of how the XML could
look like if
>>>> we send all but the storage devices. It is built using my
pyxmlpickle
>>>> module (see [3] below).
>>> That’s quite verbose. How much work would it need to actually
minimize it and turn it into something more simple.
>>> Most such stuff should go away and I believe it would be
    beneficial to make it difficult to use to discourage using
metadata as a generic junkyard
>> It is verbose because it is generic - indeed perhaps too generic.
>> I can try something else based on a concept from Martin
Polednik. Will
>> follow up soon.
> Early preview:
>

https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:virt-metadata-compact
>
> still plenty of TODOs, I expect to be reviewable material worst case
> monday morning.

This is how typical XML could look like:







Why do we need this nesting?


We use libvirt metadata for elements; so each direct child of metadata
is a separate metadata grop:

ovirt-tune:qos is one
ovirt-vm:vm is one

and so forth.  I don't want to mess up with the existing elements. So we
could be backward compatible at XML level (read: 4.1 XML works without
changes to 4.2)








What is ovirt-instance?


Gone, merged with ovirt-vm

ovirt-vm will hold both per-vm metadata and per-device metadata.





true
true
192.168.1.51
en-us
DEFAULT


smain,sinputs,scursor,splayback,srecord,sdisplay,ssmartcard,susbredir
ovirtmgmt






Why do we need this nesting?


*Here* we had this nesting because the example took vm.conf and brutally
translated to XML. Is a worst case scenario.

Why is it relevant? There it was initial discussion about how to deal
with complex device; to minimize changes, we could
marshal the existing vm.conf into the device metadata, then unmarshal on
Vdsm side and just use it to rebuild the devices
with the very same code we have today (yes, this means
sneaking/embedding vm.conf into the XML)

Should we go that way, it could look like the above.


Let's talk in general now. There are three main use cases requiring nesting:

1. per-vm metadta. Attach key/value pairs. We need one level of nesting
to avoid to mess up with other data. So it could look like


 
   1
   2
 


this is simple and nice and I think is not bothering anyone (hopefully :))

2. per-device metadata: it has to fit into vm section, and we could
possibly have more than one device with metadata, so the simplest format
is something like


 
   1
   2
   
 
   true
   false
 

 


This is the minimal nesting level we need. We could gather the
per-device metadata in a dict and feed device with it, like with a new
"meta" argument to device constructor, much like "custom" and "specParams"

Would that look good?


OK to me. Won't be easy to design an API that doesn't look bad, isn't
too generic and is semi-hard to use, but the design seems fine.


3. QoS. we need to support the current layout for obvious backward
compatibility questions. We could postpone this and use existing code
for some more time, but ultimately this should handled by metadata
module, just because it is supposed to be
the process-wide metadata gateway.


I'd say let's wait with QoS and consider options to change it without
breaking backwards compatibility first. If that fails, let's see how
new code could handle that.




 

Re: [ovirt-devel] Lowering the bar for wiki contribution?

2017-01-04 Thread Martin Polednik

On 04/01/17 09:57 +0200, Roy Golan wrote:

I'm getting the feeling I'm not alone in this, authoring and publishing a
wiki page isn't as used to be for long time.

I want to suggest a bit lighter workflow:

1.  Everyone can merge their page - (it's a wiki)
 Same as with (public and open) code, no one has the motivation to publish
a badly written
 wiki page under their name. True, it can have an impact, but not as with
broken code

2. Use Page-Status marker
The author first merges the draft. Its now out there and should be updated
as time goes and its
status is DRAFT. Maintainers will come later and after review would change
the status to
PUBLISH. That could be a header in on the page:
---
page status: DRAFT/PUBLISH
---

Simple I think, and should work.


+1, github's contribution workflow is terrible and doesn't make any
sense for wiki pages.


___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] New test failure on travis

2016-12-08 Thread Martin Polednik

On 08/12/16 11:26 +0200, Edward Haas wrote:

On Thu, Dec 8, 2016 at 10:12 AM, Martin Polednik <mpoled...@redhat.com>
wrote:


On 08/12/16 09:28 +0200, Edward Haas wrote:


On Wed, Dec 7, 2016 at 11:54 PM, Nir Soffer <nsof...@redhat.com> wrote:

broken_on_ci is uses default name="OVIRT_CI", to mark it also for

travis, we need another broken_on_ci with name="TRAVIS_CI".

Maybe this test should run only if nm is active on the machine?



We need the test to always run when expected.
If NM is not running, the test will not run (silently) and we will never
know if there is a problem or not.

It is not convenient to mark each CI type as broken, why the test code
needs to know we have multiple
CI/s?



I believe this is great point - we should just mark the test as broken
on *any* CI to create a pressure to get it fixed.

Slight off-topic addition: I don't understand why patch marking a test
as broken on CI takes more than 5 minutes to get merged in when given
pointer to the failure.



Because it is a wrong approach. :)
If a test fails, it is a smell that something bad happens and we may have a
production problem.
So before marking and excluding the test, one should feel very guilty that
this check is no longer covered and better understand why it fails.


Those are 1-in-N case breakages. The fact the test is unstable should
be noted by a maintainer, but shouldn't block any other patches or
series (which is what often happens).

I don't feel any guilt marking bad test as bad.






Currently, we run on CI tests that are not marked as 'functional'.

Perhaps we need another test type that can be mark not to run on simple
CI.
"power-integration", "super-integration"?






On Wed, Dec 7, 2016 at 11:23 PM, Dan Kenigsberg <dan...@redhat.com>
wrote:
> On Wed, Dec 7, 2016 at 2:03 PM, Nir Soffer <nsof...@redhat.com> wrote:
>> Looks like we need @brokentest("reason...", name="TRAVIC_CI") on this:
>
> Odd, the code already has
>
> @broken_on_ci('NetworkManager should not be started on CI nodes')
>
>
>>
>> See https://travis-ci.org/oVirt/vdsm/jobs/181933329
>>
>> 
==
>>
>> ERROR: test suite for > '/vdsm/tests/network/nmdbus_test.py'>
>>
>> 
--
>>
>> Traceback (most recent call last):
>>
>>   File "/usr/lib/python2.7/site-packages/nose/suite.py", line 209, in
run
>>
>> self.setUp()
>>
>>   File "/usr/lib/python2.7/site-packages/nose/suite.py", line 292, in
setUp
>>
>> self.setupContext(ancestor)
>>
>>   File "/usr/lib/python2.7/site-packages/nose/suite.py", line 315, in
>> setupContext
>>
>> try_run(context, names)
>>
>>   File "/usr/lib/python2.7/site-packages/nose/util.py", line 471, in
try_run
>>
>> return func()
>>
>>   File "/vdsm/tests/testValidation.py", line 191, in wrapper
>>
>> return f(*args, **kwargs)
>>
>>   File "/vdsm/tests/testValidation.py", line 97, in wrapper
>>
>> return f(*args, **kwargs)
>>
>>   File "/vdsm/tests/network/nmdbus_test.py", line 48, in setup_module
>>
>> NMDbus.init()
>>
>>   File "/vdsm/lib/vdsm/network/nm/nmdbus/__init__.py", line 33, in
init
>>
>> NMDbus.bus = dbus.SystemBus()
>>
>>   File "/usr/lib64/python2.7/site-packages/dbus/_dbus.py", line 194,
in __new__
>>
>> private=private)
>>
>>   File "/usr/lib64/python2.7/site-packages/dbus/_dbus.py", line 100,
in __new__
>>
>> bus = BusConnection.__new__(subclass, bus_type,
mainloop=mainloop)
>>
>>   File "/usr/lib64/python2.7/site-packages/dbus/bus.py", line 122, in
__new__
>>
>> bus = cls._new_for_bus(address_or_type, mainloop=mainloop)
>>
>> DBusException: org.freedesktop.DBus.Error.FileNotFound: Failed to
>> connect to socket /var/run/dbus/system_bus_socket: No such file or
>> directory
>>
>>  >> begin captured logging << 
>>
>> 2016-12-07 11:48:33,458 DEBUG (MainThread) [root] /usr/bin/taskset
>> --cpu-list 0-1 /bin/systemctl status NetworkManager (cwd None)
>> (commands:69)
>>
>> 2016-12-07 11:48:33,465 DEBUG (MainThread) [root] FAILED:  =
>> 'Failed to get D-Bus connection: Operation not permitted\n';  = 1
>> (commands:93)
>>
>> 2016-12-07 11:48:33,465 DEBUG (MainThread) [root] /usr/bin/taskset
>> --cpu-list 0-1 /bin/systemctl start NetworkManager (cwd None)
>> (commands:69)
>>
>> 2016-12-07 11:48:33,470 DEBUG (MainThread) [root] FAILED:  =
>> 'Failed to get D-Bus connection: Operation not permitted\n';  = 1
>> (commands:93)
>>
>> - >> end captured logging << -



___

Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel





___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] New test failure on travis

2016-12-08 Thread Martin Polednik

On 08/12/16 09:28 +0200, Edward Haas wrote:

On Wed, Dec 7, 2016 at 11:54 PM, Nir Soffer  wrote:


broken_on_ci is uses default name="OVIRT_CI", to mark it also for
travis, we need another broken_on_ci with name="TRAVIS_CI".

Maybe this test should run only if nm is active on the machine?



We need the test to always run when expected.
If NM is not running, the test will not run (silently) and we will never
know if there is a problem or not.

It is not convenient to mark each CI type as broken, why the test code
needs to know we have multiple
CI/s?


I believe this is great point - we should just mark the test as broken
on *any* CI to create a pressure to get it fixed.

Slight off-topic addition: I don't understand why patch marking a test
as broken on CI takes more than 5 minutes to get merged in when given
pointer to the failure.


Currently, we run on CI tests that are not marked as 'functional'.
Perhaps we need another test type that can be mark not to run on simple CI.
"power-integration", "super-integration"?






On Wed, Dec 7, 2016 at 11:23 PM, Dan Kenigsberg  wrote:
> On Wed, Dec 7, 2016 at 2:03 PM, Nir Soffer  wrote:
>> Looks like we need @brokentest("reason...", name="TRAVIC_CI") on this:
>
> Odd, the code already has
>
> @broken_on_ci('NetworkManager should not be started on CI nodes')
>
>
>>
>> See https://travis-ci.org/oVirt/vdsm/jobs/181933329
>>
>> ==
>>
>> ERROR: test suite for > '/vdsm/tests/network/nmdbus_test.py'>
>>
>> --
>>
>> Traceback (most recent call last):
>>
>>   File "/usr/lib/python2.7/site-packages/nose/suite.py", line 209, in
run
>>
>> self.setUp()
>>
>>   File "/usr/lib/python2.7/site-packages/nose/suite.py", line 292, in
setUp
>>
>> self.setupContext(ancestor)
>>
>>   File "/usr/lib/python2.7/site-packages/nose/suite.py", line 315, in
>> setupContext
>>
>> try_run(context, names)
>>
>>   File "/usr/lib/python2.7/site-packages/nose/util.py", line 471, in
try_run
>>
>> return func()
>>
>>   File "/vdsm/tests/testValidation.py", line 191, in wrapper
>>
>> return f(*args, **kwargs)
>>
>>   File "/vdsm/tests/testValidation.py", line 97, in wrapper
>>
>> return f(*args, **kwargs)
>>
>>   File "/vdsm/tests/network/nmdbus_test.py", line 48, in setup_module
>>
>> NMDbus.init()
>>
>>   File "/vdsm/lib/vdsm/network/nm/nmdbus/__init__.py", line 33, in init
>>
>> NMDbus.bus = dbus.SystemBus()
>>
>>   File "/usr/lib64/python2.7/site-packages/dbus/_dbus.py", line 194,
in __new__
>>
>> private=private)
>>
>>   File "/usr/lib64/python2.7/site-packages/dbus/_dbus.py", line 100,
in __new__
>>
>> bus = BusConnection.__new__(subclass, bus_type, mainloop=mainloop)
>>
>>   File "/usr/lib64/python2.7/site-packages/dbus/bus.py", line 122, in
__new__
>>
>> bus = cls._new_for_bus(address_or_type, mainloop=mainloop)
>>
>> DBusException: org.freedesktop.DBus.Error.FileNotFound: Failed to
>> connect to socket /var/run/dbus/system_bus_socket: No such file or
>> directory
>>
>>  >> begin captured logging << 
>>
>> 2016-12-07 11:48:33,458 DEBUG (MainThread) [root] /usr/bin/taskset
>> --cpu-list 0-1 /bin/systemctl status NetworkManager (cwd None)
>> (commands:69)
>>
>> 2016-12-07 11:48:33,465 DEBUG (MainThread) [root] FAILED:  =
>> 'Failed to get D-Bus connection: Operation not permitted\n';  = 1
>> (commands:93)
>>
>> 2016-12-07 11:48:33,465 DEBUG (MainThread) [root] /usr/bin/taskset
>> --cpu-list 0-1 /bin/systemctl start NetworkManager (cwd None)
>> (commands:69)
>>
>> 2016-12-07 11:48:33,470 DEBUG (MainThread) [root] FAILED:  =
>> 'Failed to get D-Bus connection: Operation not permitted\n';  = 1
>> (commands:93)
>>
>> - >> end captured logging << -




___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] VDSM review & development tools

2016-12-07 Thread Martin Polednik

Hello developers,

this e-mail is mostly aimed at VDSM contributors (current and
potential) and should serve as continuation of the idea started at our
weekly meeting.

We've had a proposal of moving the project to github[1] and using
reviewable[2] as a code review due to github's code review interface.

I would like to start a *discussion* regarding how we would like to
develop VDSM in future. So far, the suggested options were (1st
implied):

1) gerrit (stay as we are),
2) github & reviewable,
3) mailing list.

What would be your favorite review & development tool and why? Do you
hate any of them? Let the flamewar^Wconstructive discussion begin! :)



My preferred tool is mailing list with the main tree mirrored to
github. Why mailing list in 2016 (almost 2017)?

a) stack - We're built on top of libvirt, libvirt is
built on top of qemu and qemu utilizes kvm which is a kernel module.
Each of this projects uses mailing lists for development.

b) tooling - Everyone is free to use tools of choice. Any sane e-mail
client can handle mailing list patches, and it's up to reviewers to
choose the best way to handle the review. As for sending the patches,
there is the universal fallback in the form of git-send-email.

c) freedom - It's up to us to decide how would we handle such
development. As many system-level projects already use mailing lists
(see a), there is enough inspiration for workflow design[3].

d) accessibility - VDSM is in unfortunate position between "cool",
high level projects and "boring" low level projects. I believe that we
should be more accessible to the developers from below the stack
rather than to general public. Having unified workflow that doesn't
require additional accounts and is compatible with their workflows
makes that easier.



[1] https://github.com/
[2] https://reviewable.io/
[3] e.g. https://www.kernel.org/doc/Documentation/SubmittingPatches

Regards,
mpolednik
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] Experimental Flow for Master Fails to Run a VM

2016-12-02 Thread Martin Polednik

On 02/12/16 10:55 +0100, Anton Marchukov wrote:

Hello All.

Engine log can be viewed here:

http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/3838/artifact/exported-artifacts/basic_suite_master.sh-el7/exported-artifacts/test_logs/basic-suite-master/post-004_basic_sanity.py/lago-basic-suite-master-engine/_var_log_ovirt-engine/engine.log

I see the following exception there:

2016-12-02 04:29:24,030-05 DEBUG
[org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
(ResponseWorker) [83b6b5d] Message received: {"jsonrpc": "2.0", "id":
"ec254aad-441b-47e7-a644-aebddcc1d62c", "result": true}
2016-12-02 04:29:24,030-05 ERROR
[org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker)
[83b6b5d] Not able to update response for
"ec254aad-441b-47e7-a644-aebddcc1d62c"
2016-12-02 04:29:24,041-05 DEBUG
[org.ovirt.engine.core.utils.timer.FixedDelayJobListener]
(DefaultQuartzScheduler3) [47a31d72] Rescheduling
DEFAULT.org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshLightWeightData#-9223372036854775775
as there is no unfired trigger.
2016-12-02 04:29:24,024-05 DEBUG
[org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (default
task-12) [d932871a-af4f-4fc9-9ee5-f7a0126a7b85] Exception:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: Timeout during xml-rpc call
at 
org.ovirt.engine.core.vdsbroker.vdsbroker.FutureVDSCommand.get(FutureVDSCommand.java:73)
[vdsbroker.jar:]



2016-12-02 04:29:24,042-05 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (default
task-12) [d932871a-af4f-4fc9-9ee5-f7a0126a7b85] Timeout waiting for
VDSM response: Internal timeout occured
2016-12-02 04:29:24,044-05 DEBUG
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(default task-12) [d932871a-af4f-4fc9-9ee5-f7a0126a7b85] START,
GetCapabilitiesVDSCommand(HostName = lago-basic-suite-master-host0,
VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
hostId='5eb7019e-28a3-4f93-9188-685b6c64a2f5',
vds='Host[lago-basic-suite-master-host0,5eb7019e-28a3-4f93-9188-685b6c64a2f5]'}),
log id: 58f448b8
2016-12-02 04:29:24,044-05 DEBUG
[org.ovirt.vdsm.jsonrpc.client.reactors.stomp.impl.Message] (default
task-12) [d932871a-af4f-4fc9-9ee5-f7a0126a7b85] SEND
destination:jms.topic.vdsm_requests
reply-to:jms.topic.vdsm_responses
content-length:105


Please note that this runs on localhost with local bridge. So it is not
likely to be network itself.


The main issue I see is that the VM run command has actually failed
due to libvirt no accepting /dev/urandom as RNG source[1]. This was
done as engine patch and according to git log, posted around Mon Nov
28. Also adding Jakub - this should either not happen from engine's
point of view or the lago host is outdated.

[1]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/3838/artifact/exported-artifacts/basic_suite_master.sh-el7/exported-artifacts/test_logs/basic-suite-master/post-004_basic_sanity.py/lago-basic-suite-master-host0/_var_log_vdsm/vdsm.log


Anton.

On Fri, Dec 2, 2016 at 10:43 AM, Anton Marchukov 
wrote:


FYI. Experimental flow for master currently fails to run a VM. The tests
times out while waiting for 180 seconds:

http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
master/3838/testReport/(root)/004_basic_sanity/vm_run/

This is reproducible over 23 runs of this happened tonight, sounds like a
regression to me:

http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/

I will update here with additional information once I find it.

Last successful run was with this patch:

https://gerrit.ovirt.org/#/c/66416/ (vdsm: API: move vm parameters fixup
in a method)

Known to start failing around this patch:

https://gerrit.ovirt.org/#/c/67647/ (vdsmapi: fix a typo in string
formatting)

Please notes that we do not have gating implemented yet, so everything
that was merged in between those patches might have caused this (not
necessary in vdsm project).

Anton.
--
Anton Marchukov
Senior Software Engineer - RHEV CI - Red Hat





--
Anton Marchukov
Senior Software Engineer - RHEV CI - Red Hat



___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] aarch64 / Raspberry Pi3

2016-11-30 Thread Martin Polednik

On 28/10/16 09:06 +0200, Sandro Bonazzola wrote:

Hi,
some time ago we've been asked to provide aarch64 build of qemu-kvm-ev for
CentOS Cloud SIG consumption.
We did it, and while at it we also built oVirt 4.0 VDSM dependencies for
aarch64 in CentOS Virt SIG.

Testing repositories have been created and are now publicly available:

[centos-qemu-ev-test]
name=CentOS-$releasever - QEMU EV Testing
baseurl=
http://buildlogs.centos.org/centos/$releasever/virt/$basearch/kvm-common/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Virtualization

[centos-ovirt40-test]
name=CentOS-$releasever - oVirt 4.0 Testing
baseurl=
http://buildlogs.centos.org/centos/$releasever/virt/$basearch/ovirt-4.0/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Virtualization

We don't officially support aarch64 and this is highly experimental so
expect bugs. But if you've a Raspberry Pi 3 and you decide to give VDSM a
run, please share your feedback.


I'd like to revive this thread as I've experimented with master VDSM
on rpi 3. First, there are close to no aarch64 builds for raspberry
except from images by Kraxel https://www.kraxel.org/repos/rpi2/images/.

Even with these images, it's doesn't seem to be easy to get KVM up and
running due to various HW quirks. Additionally, there is no network
bridge/tunnel support built into the kernel.

If you manage to bypass all this, VDSM depends on wide range of
packages such as python-blivet that don't work on rasperry's fedora 24
properly and require additional hacking (arch detection).

I'll probably blog about my adventure some time in the future, but
maybe those findings could be helpful.

mpolednik


Thanks,

--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com




___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] VDSM changes Linux memory dirty ratios - why?

2016-11-30 Thread Martin Polednik

On 30/11/16 13:10 +0100, Sven Kieske wrote:

On 30/11/16 08:48, Martin Polednik wrote:

It's not really irrelevant, the host still uses disk cache. Anyway,
there is BZ[1] with a presentation[2] that (imho reasonably) states:

"Reduce dirty page limits in KVM host to allow
direct I/O writer VMs to compete successfully
with buffered writer processes for storage
access"

I wonder why virtual-host tuned profile doesn't contain these values:

$ grep vm.dirty /usr/lib/tuned/virtual-host/tuned.conf
vm.dirty_background_ratio = 5

[1]https://bugzilla.redhat.com/show_bug.cgi?id=740887
[2]http://perf1.lab.bos.redhat.com/bengland/laptop/rhev/rhev-vm-rsptime.pdf


Could you share [2] with the wider community? This would be awesome!


Sorry, I've totally missed the fact it's internal link (unfortunately
publicly visible on the BZ). I believe it's slightly outdated, but
let's ask the author.

Ben, is the document[2] somehow still valid and could it be made publicly
available?

Re-referencing for completeness:
[2]http://perf1.lab.bos.redhat.com/bengland/laptop/rhev/rhev-vm-rsptime.pdf


--
Mit freundlichen Grüßen / Regards

Sven Kieske

Systemadministrator
Mittwald CM Service GmbH & Co. KG
Königsberger Straße 6
32339 Espelkamp
T: +495772 293100
F: +495772 29
https://www.mittwald.de
Geschäftsführer: Robert Meyer
St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen







___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] VDSM changes Linux memory dirty ratios - why?

2016-11-29 Thread Martin Polednik

On 29/11/16 22:01 +0200, Yaniv Kaul wrote:

It appears that VDSM changes the following params:
vm.dirty_ratio = 5
vm.dirty_background_ratio = 2

Any idea why? Because we use cache=none it's irrelevant anyway?


It's not really irrelevant, the host still uses disk cache. Anyway,
there is BZ[1] with a presentation[2] that (imho reasonably) states:

"Reduce dirty page limits in KVM host to allow
direct I/O writer VMs to compete successfully
with buffered writer processes for storage
access"

I wonder why virtual-host tuned profile doesn't contain these values:

$ grep vm.dirty /usr/lib/tuned/virtual-host/tuned.conf
vm.dirty_background_ratio = 5

[1]https://bugzilla.redhat.com/show_bug.cgi?id=740887
[2]http://perf1.lab.bos.redhat.com/bengland/laptop/rhev/rhev-vm-rsptime.pdf


TIA,
Y.



___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] CentOS Virt SIG - aarch 64 and ppc64le support

2016-09-14 Thread Martin Polednik

On 14/09/16 10:42 +0200, Sandro Bonazzola wrote:

On Mon, Aug 29, 2016 at 11:01 AM, Sandro Bonazzola 
wrote:


Hi,
within CentOS Virt SIG interest in having ppc64le and aarch64 as supported
architecture in oVirt has raised.
Packages from oVirt 4.0.2 has been built for both architectures for
hypervisor host side.
You can find the rpms in the cbs repositories waiting to be signed and
published to mirrors here:

- QEMU and deps: https://cbs.centos.org/repos/virt7-kvm-common-release/
- oVirt Common deps: https://cbs.centos.org/repos/virt7-ovirt-common-
release/
- oVirt 4.0 specific packages: https://cbs.centos.
org/repos/virt7-ovirt-40-release/

While I'm pretty sure ppc64le will work out of the box I think we're
missing something on engine side in order to fully support aarch64.
Michal, Francesco, can you open relevant BZs in order to get aarch64 fully
supported?



Michal, Francesco, any feedback? News?


The virt stack is still working on aarch64, I believe it doesn't make
sense to try and start the work from oVirt's POV before CentOS 7.3 is
out.

That being said, I believe aarch64 support is a huge feature that'll
take a while to get to a point where we can start submitting bugs. No
news in that regard.






--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com





--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com




___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] Moving configuration files to separate directory

2016-08-02 Thread Martin Polednik

Hey devels,

last week, I've been working on patch series that moves most of
configuration and "static" files away from our source code to a dir
called "static"[1]. (based on the previous' week VDSM weekly)

Current version has static dir's layout as flat - keeping all files in
the directory with few exceptions (mom.d and systemd). The downside of
the approach is that we still have to rename some of the files in
makefile due to possibility of name clashes if we had similarly named
files (50_vdsm from sudoers and 50_vdsm anything else).

There is another possibility - hierarchy within the folder. Instead of 
current structure -


static
├── Makefile.am
├── limits.conf
├── logger.conf.in
├── mom.conf.in
├── mom.d
│   ├── 00-defines.policy
│   ├── 01-parameters.policy
│   ├── 02-balloon.policy
│   ├── 03-ksm.policy
│   ├── 04-cputune.policy
│   ├── 05-iotune.policy
│   └── Makefile.am
├── sudoers.vdsm.in
├── svdsm.logger.conf.in
├── systemd
│   ├── Makefile.am
│   ├── mom-vdsm.service.in
│   ├── supervdsmd.service.in
│   ├── vdsm-network.service.in
│   └── vdsmd.service.in
├── vdsm-bonding-modprobe.conf
├── vdsm-logrotate.conf
├── vdsm-modules-load.d.conf
├── vdsm-sysctl.conf
└── vdsm.rwtab.in

we could structure the directory to a corresponding subfolders over
the system:

etc
├── modprobe.d
│   └── vdsm-bonding-modprobe.conf
├── modules-load.d
│   └── vdsm.conf
├── rwtab.d
│   └── vdsm
├── security
│   └── limits.d
│   └── 99-vdsm.conf
├── sudoers.d
│   ├── 50_vdsm
├── sysctl.d
│   └── vdsm.conf
└── vdsm
   ├── logger.conf
   ├── logrotate
   │   └── vdsm
   ├── mom.conf
   ├── mom.d
   │   ├── 00-defines.policy
   │   ├── 01-parameters.policy
   │   ├── 02-balloon.policy
   │   ├── 03-ksm.policy
   │   ├── 04-cputune.policy
   │   └── 05-iotune.policy
   ├── svdsm.logger.conf
   ├── vdsm.conf
   └── vdsm.conf.d

There is little downside to the second approach, that is more code is
added to VDSM in a sense that more makefiles will have to exist. On
the other hand, we can drop all the renaming and have the files named
as they would be named on their destination after install.
Opinions?

[1]https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:static-assets
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] new internal stable modules + proposal

2016-03-29 Thread Martin Polednik

On 29/03/16 20:36 +0300, Nir Soffer wrote:

On Tue, Mar 29, 2016 at 7:44 PM, Martin Polednik <mpoled...@redhat.com> wrote:

On 29/03/16 11:12 -0400, Francesco Romani wrote:


Hi,

in the last Vdsm developer call we agreed to promote a few modules in the
common repository.
The common repository provides the additional guarantees over regular
modules in lib/vdsm/

- stable API
- (thus) safe to use across verticals

the planned moves are:

lib/vdsm/schedule.py -> lib/vdsm/common/schedule.py
lib/vdsm/periodic.py -> lib/vdsm/common/periodic.py
lib/vdsm/virt/api.py -> lib/vdsm/common/api.py

Question is if those modules should go under common/ or under another
subdirectory, maybe infra?



Hi.

I agree that some modules should be kept (and advertised) as stable.

The name 'common' is more suited for such package as 'infra' is very ovirt
specific and could hurt readability and navigation of the project. It does
make sense to structure the project as follows:

lib/vdsm/common
lib/vdsm/virt/common
lib/vdsm/storage/common
lib/vdsm/network/common


This creates a dependency between the packages. The goal is to have
packages that do
not depend on each other except the common package, which is library
code used by
other parts.


True, forgot about that. We fortunately still have possibility of further
sharding 'common' if that need arises.


Taking extra care taken to avoid 'common junkyard' situation and
making sure that the scope of common corresponds to it's package.

List of modules that I consider public (mostly author/main contributor
of these lately):

lib/vdsm/cpuarch.py
lib/vdsm/cpuinfo.py
lib/vdsm/hostdev.py
lib/vdsm/machinetype.py
lib/vdsm/numa.py
lib/vdsm/osinfo.py
lib/vdsm/udevadm.py


+1


Lastly, i have a proposal about better handling of those modules.

First, the mere fact a module is placed under lib/vdsm/common provides the
extra guarantees I mentioned.
But should we added more annotations?

for example something like

__API__ = {}

near the top of the module

if this attribute exist, then the module is safe to use across verticals,
has stable API and so forth
(this is _in addition_ to the common/ package, not as replacement).

Like:

__API__ = {
 "introduced-in": "4.14.0",
 "deprecated-from": "4.18.0",
 "removed-at": "4.20.0",
 "contact": "from...@redhat.com"
}

We could refine further this concept if we like it. The idea is to be
lightweight as possible while
carrying all the information we need.



I agree about keeping global metadata in that way, except ideally
using docstrings for better readability (e.g. interactive help()). To
better reflect the needs for granular deprecation and based on our
private discussion on this subject, I'd like to see at least
@deprecated(target_removal_version) decorator. On top of that, we need
properly documented public API and deprecation process.


Using standard way such as __author__ is better than docstrings, see for example
the output of help(module)


Partially agreed. Having proper docstrings wouldn't hurt either. I
frequently use repl w/ vdsm now that it is in lib, this would help
myself - am I the only one?

I would still like to see some kind of proper deprecation in private
APIs. It might as well serve as a testing ground for deprecation in
public API.


Nir
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] new internal stable modules + proposal

2016-03-29 Thread Martin Polednik

On 29/03/16 11:12 -0400, Francesco Romani wrote:

Hi,

in the last Vdsm developer call we agreed to promote a few modules in the 
common repository.
The common repository provides the additional guarantees over regular modules 
in lib/vdsm/

- stable API
- (thus) safe to use across verticals

the planned moves are:

lib/vdsm/schedule.py -> lib/vdsm/common/schedule.py
lib/vdsm/periodic.py -> lib/vdsm/common/periodic.py
lib/vdsm/virt/api.py -> lib/vdsm/common/api.py

Question is if those modules should go under common/ or under another 
subdirectory, maybe infra?


Hi.

I agree that some modules should be kept (and advertised) as stable.

The name 'common' is more suited for such package as 'infra' is very ovirt
specific and could hurt readability and navigation of the project. It does
make sense to structure the project as follows:

lib/vdsm/common
lib/vdsm/virt/common
lib/vdsm/storage/common
lib/vdsm/network/common

Taking extra care taken to avoid 'common junkyard' situation and
making sure that the scope of common corresponds to it's package.

List of modules that I consider public (mostly author/main contributor
of these lately):

lib/vdsm/cpuarch.py
lib/vdsm/cpuinfo.py
lib/vdsm/hostdev.py
lib/vdsm/machinetype.py
lib/vdsm/numa.py
lib/vdsm/osinfo.py
lib/vdsm/udevadm.py


Lastly, i have a proposal about better handling of those modules.

First, the mere fact a module is placed under lib/vdsm/common provides the 
extra guarantees I mentioned.
But should we added more annotations?

for example something like

__API__ = {}

near the top of the module

if this attribute exist, then the module is safe to use across verticals, has 
stable API and so forth
(this is _in addition_ to the common/ package, not as replacement).

Like:

__API__ = {
 "introduced-in": "4.14.0",
 "deprecated-from": "4.18.0",
 "removed-at": "4.20.0",
 "contact": "from...@redhat.com"
}

We could refine further this concept if we like it. The idea is to be 
lightweight as possible while
carrying all the information we need.


I agree about keeping global metadata in that way, except ideally
using docstrings for better readability (e.g. interactive help()). To
better reflect the needs for granular deprecation and based on our
private discussion on this subject, I'd like to see at least
@deprecated(target_removal_version) decorator. On top of that, we need
properly documented public API and deprecation process.


Comments welcome as usual

bests,

--
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] logging VDSM-generated libvirt XML properly

2016-03-03 Thread Martin Polednik

Hello developers!

I've been chasing a bug that lead me to an idea of improving our XML
logging. Now, to see a VMs generated libvirt XML, we have to rely on
vdsm.log. The issue is that the log is rotating and therefore, it is
easy to miss the correct XML when dealing with busy hypervisor.

Since we're built on libvirt, I was thinking of doing similar thinks
that libvirt does with qemu commandline. Each running domain(VM) has
it's command line logged in /var/log/libvirt/qemu/${vmname}. This is
great for debugging as you can mostly just take the cmdline and
restart the VM.

There is an issue with using the cmdline directly - networking.
Libvirt uses additional script to create and up a bridge. Therefore,
it is easier to use the XML and shape it to one's needs.

I propose that we properly log the generated XML in a similar fashion
as libvirt generates the cmdline. This means we would have path like
/var/log/vdsm/libvirt/${vmname}, where generated XML would be stored.
To minimize the logging requirements, only last definition of VM with
that name may be stored. Additionally, exception level errors
related to that VM could also be stored there.

What do you think, can we afford the space and additional writes per
VM to help the debugging process?

Regards,
mpolednik
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] sync meeting - Vdsm 2.2

2016-02-04 Thread Martin Polednik

On 04/02/16 12:13 +0200, Nir Soffer wrote:

On Thu, Feb 4, 2016 at 10:54 AM, Martin Polednik <mpoled...@redhat.com> wrote:



- Original Message -

From: "Dan Kenigsberg" <dan...@redhat.com>
To: "devel" <devel@ovirt.org>
Sent: Tuesday, February 2, 2016 5:19:12 PM
Subject: [ovirt-devel] sync meeting - Vdsm 2.2

(nir, piotr, danken)

- splitting supervdsmServer:
  I think that its a good idea, and that https://gerrit.ovirt.org/#/c/52875/
  is a good start.
  It would give a nice separation of responsibility, and may serve as a
  "teaser" for how Vdsm's public API can be broken apart.

  Nir is worried that it would introduce instability for no immediate gain,
  while distracting us from solving the supervdsmServer memory leak, or
  possible security concens

- schema conversion: Piotr presented his https://gerrit.ovirt.org/#/c/52864/
  which would convert the json-based schema into a cleaner yaml-based one,
  which would be easier to version, validate, and obsolete.

- Nir was unhappy with recent changes to the contrib client:
  
https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:jsonrpc-client
  would prefer using standard stomp client

- name discussion is stalling. some people are worried that a rename may
  turn out to be expensive (release engineering, outher packages,
  interal module and function name). Still, it would be fun to foresake
  the non-pronounceable name "vdsm". ovirt-hostd seems like a front
  runner at the moment.

- Nir has advocated trying to use https://trello.com/b/U3lsbVRU/maintenance
to
  maintain the list of our pending tasks. Let's try.


Awesome idea! I propose that we add the board to community section of
our wiki[1] since the board is public and great first stop for potential
contributors (or project newcomers).


It is a wiki, you can do it.


No permissions for the community page. That's why it was a proposal.


But note that the wiki is going to be replaced by
https://github.com/oVirt/ovirt-site

I'm not sure if the new side is taking automatically new changes
from the wiki, so you may want to send also a pull request to the
new site.


Good point!


Nir



[1] http://www.ovirt.org/Community


Ciao!
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] sync meeting - Vdsm 2.2

2016-02-04 Thread Martin Polednik


- Original Message -
> From: "Dan Kenigsberg" 
> To: "devel" 
> Sent: Tuesday, February 2, 2016 5:19:12 PM
> Subject: [ovirt-devel] sync meeting - Vdsm 2.2
> 
> (nir, piotr, danken)
> 
> - splitting supervdsmServer:
>   I think that its a good idea, and that https://gerrit.ovirt.org/#/c/52875/
>   is a good start.
>   It would give a nice separation of responsibility, and may serve as a
>   "teaser" for how Vdsm's public API can be broken apart.
> 
>   Nir is worried that it would introduce instability for no immediate gain,
>   while distracting us from solving the supervdsmServer memory leak, or
>   possible security concens
> 
> - schema conversion: Piotr presented his https://gerrit.ovirt.org/#/c/52864/
>   which would convert the json-based schema into a cleaner yaml-based one,
>   which would be easier to version, validate, and obsolete.
> 
> - Nir was unhappy with recent changes to the contrib client:
>   
> https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:jsonrpc-client
>   would prefer using standard stomp client
> 
> - name discussion is stalling. some people are worried that a rename may
>   turn out to be expensive (release engineering, outher packages,
>   interal module and function name). Still, it would be fun to foresake
>   the non-pronounceable name "vdsm". ovirt-hostd seems like a front
>   runner at the moment.
> 
> - Nir has advocated trying to use https://trello.com/b/U3lsbVRU/maintenance
> to
>   maintain the list of our pending tasks. Let's try.

Awesome idea! I propose that we add the board to community section of
our wiki[1] since the board is public and great first stop for potential
contributors (or project newcomers).

[1] http://www.ovirt.org/Community

> Ciao!
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
> 
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] Changing the name of VDSM in oVirt 4.0.

2016-01-28 Thread Martin Polednik

On 27/01/16 09:53 +0200, Nir Soffer wrote:

On Wed, Jan 27, 2016 at 9:29 AM, Yedidyah Bar David  wrote:

On Tue, Jan 26, 2016 at 7:26 PM, Nir Soffer  wrote:

On Tue, Jan 26, 2016 at 5:29 PM, Yaniv Dary  wrote:

I suggest for ease of use and tracking we change the versioning to align to
the engine (4.0.0 in oVirt 4.0 GA) to make it easy to know which version was
in which release and also change the package naming to something like
ovirt-host-manager\ovirt-host-agent.


When we think about the names, we should consider all the components
installed or running on the host. Here is the current names and future options:

Current names:

vdsmd
supervdsmd
vdsm-tool
vdsClient
(we have also two hosted engine daemons, I don't remember the names)

Here are some options in no particular order to name these components:

Alt 1:
ovirt-hypervisor
ovirt-hypervisor-helper
ovirt-hypervisor-tool
ovirt-hyperviosr-cli

Alt 2:


Not sure it's that important. Still, how about:


ovirt-host


ovirt-hostd


I like this




ovirt-host-helper


ovirt-priv-hostd


How about ovirt-privd?

I like short names.




ovirt-host-tool


ovirt-hostd-tool


ovirt-host-cli


ovirt-hostd-cli


I think we should use the example of systemd:

systemd
systemctl

So ovirt-hostd, ovirt-hostctl ovirt-hostcli


I'd even suggest going simply with ovirtd and ovirtctl (maybe
ovirtdctl to differentiate ovirt, ovirtd and ovirt-engine).

Names like ovirt-host-agent possibly introduce abbreviation
clashes - we would most likely end up abbreviating host-agent to HA
and that could be mistaken for high availability in discussions.



Also we should get rid of '/rhev/' in start of mount points IMO. How
about '/var/lib/ovirt-hostd/mounts/' or something like that?


We want to use /run/ovirt-host/storage/ for that, but this is hard change,
since it breaks migration to from hosts using different vdsm versions.
New vms expect the disks at /rhev/data-center and old vms at /rhev/data-center/

Maybe we can change disks path during migartion on the destination, but
migrating vms to older hosts will be impossible, as the vdsm on the older
machine does not support such manipulation.

Nir
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] CPU sockets, threads, cores and NUMA

2015-12-11 Thread Martin Polednik

On 11/12/15 12:31 +0100, Michal Skrivanek wrote:



On 10 Dec 2015, at 16:36, Yaniv Kaul <yk...@redhat.com> wrote:

On Thu, Dec 10, 2015 at 5:07 PM, Martin Polednik <mpoled...@redhat.com 
<mailto:mpoled...@redhat.com>> wrote:
Hello developers,

tl;dr version:
* deprecate report_host_threads_as_cores
* remove cpuSockets, use sum(numaNodes.keys())
* report threadsPerCore for ppc64le / report total number of threads
 for ppc64le
* work on our naming issues

I've been going over our capabilities reporting code in VDSM due to
specific threading requirements on ppc64le platform and noticed few
issues. Before trying to fix something that "works", I'm sending this
mail to start a discussion regarding current and future state of the
code.

First thing is the terminology. What we consider cpu sockets, cores and threads 
are in fact NUMA cells, sum of cores present in NUMA
nodes and the same for threads. I'd like to see the code moving in a
direction that is correct in this sense.

Note that I think users are more familiar with sockets-cores-threads than NUMA 
cells, terminology-wise.


we do report numa separately today, and we should keep doing that. I consider 
it another level of detail/complexity which many users do not care about.
So we should keep both


The issue is not removing one or the other, but rather that what we
report as CPU sockets/cores/threads are an actual NUMA
sockets/cores/threads. As long as 1 CPU == 1 NUMA cell we're fine, but
the POWER8 CPUs are 1 chip (socket) = 4 cores = 4 NUMA cells -
reporting 4 as the number of sockets per cpu.




More important are the actual calculations. I believe we should draw
an uncrossable line between cores and threads and not interfere with
it at least on VDSM's side. That would mean deprecating
report_host_threads_as_cores option. The actual algorithm used at
present does calculate the numa cores and numa threads correctly given
that there are no offline CPUs - most likely fine enough. We don't
have to report the actual number of sockets though, as it is reported
in numa* keys.

There is a reason for report_host_threads_as_cores option. I don't remember it 
right now, but it had to do with some limitation of some OS or license or 
something.
I don't think we should deprecate it.


the idea was to remove that option from VDSM conf (as it’s cumbersome to use), 
and rather report all relevant information so engine can decide later on 
whether to count it this way or another
Today it’s used as a simple “core multiplier” if your workload is running "good 
enough” in parallel on 2 threads within one core, we just consider it as additional 
available “cpu”. For some workloads where this assumption is not working well, and 
also for licensing or any other reason, you can disable it and see “half” of the 
cpus on x86 despite having HT enabled in BIOS.

On PPC this is more tricky as MArtin says below - (threads are not able to run 
multiple VMs simultaneously) - so we need to push that decision from vdsm up 
the chain.



It does fail to provide us with information that can be used in
ppc64le environment, where for POWER8 we want to run the host without
SMT while VMs would have multiple CPUs assigned. There are various
configurations of so-called subcores in POWER8, where each CPU core
can contain 1, 2 or 4 subcores. This configuration must be taken in
consideration as given e.g. 160 threads overall, it is possible to run
either 20 VMs in smt8 mode, 40 VMs in smt4 mode or 80 VMs in smt2
mode. We have to report either the total number of threads OR just the
threadsPerCore setting, so the users know how many "CPUs" should be
assigned to machines for optimal performance.


x per y sounds best to me
but I think it’s even more complicated, if we consider offline CPUs(we don’t do that 
today) then the default picture on POWER8 currently looks like 20 cores in 4 numa 
cells, 8 threads per core. SMT is disabled altogether, so CPUs 1-7,9-15,… are 
offline. So should we report them or not? On x86 I would not do that as they are 
administratively disabled and can’t be used, however on ppc since RHEL 7.2 they are 
dynamically enabled on demand (if the guest topology uses threads as well), so they 
should be reported as available (or "sort-of-available”:)


If we report threads per core, the offline CPUs can be calculated from
the available online CPUs and given threads per core value.


still, I think we should go with simple "sockets, cores/socket, threads/core” 
numbers,
the rest needs to be computed or chosen from, based on additional detailed 
report of NUMA topology and online/offline CPU status
perhaps with different behavior/capabilities on x86 and on power



YAY... do we have a comparison what libvirt knows / looks at (or they ignore it 
altogether?)
Y.


As always, I welcome any opinions regarding the proposed ideas. Also
note that all of the changes can be done via deprecation to be fully
backwards comp

Re: [ovirt-devel] rethinking fake-kvm and faqemu

2015-10-27 Thread Martin Polednik

On 26/10/15 15:24 +0200, Dan Kenigsberg wrote:

On Mon, Oct 26, 2015 at 11:17:10AM +0100, Michal Skrivanek wrote:


> On 26 Oct 2015, at 09:41, Dan Kenigsberg <dan...@redhat.com> wrote:
>
> On Mon, Oct 05, 2015 at 02:53:00PM +0200, Martin Polednik wrote:
>> On 05/10/15 11:31 +0200, Michal Skrivanek wrote:
>>>
>>> On Oct 3, 2015, at 20:48 , Martin Polednik <mpoled...@redhat.com> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> I've been reworking the fake_kvm and faqemu VDSM hook to make them
>>>> somewhat more usable and mostly to allow testing of ppc64le on x86_64
>>>> based hosts.
>>>>
>>>> TL;DR version: checkout [1], enable fake_kvm and happy ppc64le hacking :)
>
> I like the initiative and the approach. Dropping faqemu bits out of
> mainline caps.py was an old task of mine. Moving code into lib/vdsm is a
> big task for our next release anyway.

one more unrelated note
current faqemu is broken on 3.6 as it removes the whole cpu tag which is needed 
when hotplug memory is enabled, which in turn enables numa, libvirt checks for 
cpu-numa mapping and fails to find any
please fix it along the way:-)


Better fix it on its own, since we'd need to have the fix backported.
(unless you plan to backport the whole cpuinfo branch)


It is fixed in the chain, but the fix can be backported without
cpuinfo with a bit of code redundancy. The bad thing is, fake_kvm
is also broken in 3.6 and fixing that becomes quite ugly (e.g. machine
types).
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] rethinking fake-kvm and faqemu

2015-10-05 Thread Martin Polednik

On 05/10/15 11:31 +0200, Michal Skrivanek wrote:


On Oct 3, 2015, at 20:48 , Martin Polednik <mpoled...@redhat.com> wrote:


Hello everyone,

I've been reworking the fake_kvm and faqemu VDSM hook to make them
somewhat more usable and mostly to allow testing of ppc64le on x86_64
based hosts.

TL;DR version: checkout [1], enable fake_kvm and happy ppc64le hacking :)

Current fake_kvm isn't really a hook and is contained within 'caps'
module. This is wrong for multiple reasons, the most important one
being mixing optional with mainline code. Another issue appears when
one tries to move the fake_kvm code into a hook: the whole notion of
architectures within VDSM is contained within the 'caps' module.

The patch series, which git tip is at [1], introduces new cpuinfo
module that moves information related to architecture to 'cpuinfo'
module of the VDSM library. Intermediate benefit is that current hooks
and library code can access the information related to host's cpu.

This allows for moving fake_kvm code into a hook that I've called
fakearch. Fakearch is, in my opinion, more suitable name - there is
barely any KVM faking, but the host 'fakes' selected architecture.

Faqemu is, on the other hand, a hook. Unfortunately it wasn't really
updated and doesn't allow running VMs under fake architecture. The
series therefore try to refactor it to allow cross-arch VMs to be
started (the VM actually uses host architecture, but from engine's
point of view it's running on the faked arch).


so it will run in full emulation, or?


That is tricky to answer. It is running full emulation in host's
architecture, hiding the differences between x86_64 and ppc64le by
modifying the XML e.g. spice->vnc when running x86_64 fakearch on
ppc64le host.

The reason for that is the qemu-kvm-rhev doesn't seem to support
emulation of different arch, so the hook tries to 'fake' it.

The implication is the fact that the underlying (libvirt) VM is quite
different to from what we ask for. I wouldn't recommend doing anything
with the VM apart from running it and killing it.





So far tested cross-arch runs are * x86_64->x86_64 (faqemu functionality),
* x86_64->ppc64le (the most important one),
* ppc64le->x86_64,
* ppc64le->ppc64le (faqemu for Power).

I'm interested in your reviews and comments regarding this effort!

[1] https://gerrit.ovirt.org/#/c/46962/

Best regards,
mpolednik



___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] [vdsm] VmDevices rework

2015-05-04 Thread Martin Polednik


- Original Message -
 From: Dan Kenigsberg dan...@redhat.com
 To: Martin Polednik mpoled...@redhat.com
 Cc: devel@ovirt.org
 Sent: Friday, May 1, 2015 5:06:04 PM
 Subject: Re: [ovirt-devel] [vdsm] VmDevices rework
 
 On Tue, Apr 28, 2015 at 04:28:12AM -0400, Martin Polednik wrote:
  Hello everyone,
  
  I have started working on line of patches that deal with current state of
  VM devices
  in VDSM. There is a wiki page at [1] describing the issues, phases and
  final goals
  for this work. Additionally, there is formal naming system for code related
  to
  devices. I would love to hear your opinions and comments about this effort!
  (and please note that this is very long term work)
  
  [1] http://www.ovirt.org/Feature/VmDevices_rework
 
 Hi Martin,
 
 It's a great and just initiative. Renaming is good, but I'd like to hear
 more about your final goal - besides the hope that it the code is
 shorter.
 
 In my opion you should state your high-level goal explicitly:
 device_object should be self-sufficient regarding: initiation, state
 change, stat extraction and stat reporting.
 
 Comments:
 1. We want to drop the legacy conf support. In
https://gerrit.ovirt.org/40104 we have removed support of engine3.3
so production environment does not need it any more.
 
There might be an issue with command-line test scripts, though that
might be abusing the legacy args. At the very least it should be
marked as deprecated.

Therefore, the move still makes sense, as for the deprecation - I fully
agree. Will try to figure a nice way to mention it (logging and docs I guess).

 2. dev_map is an unfortunate historical mistake. The fact that it is a
map of device type to device_object lists helps no one. It
should be hidden behind one or two lookup functions. Each object
class should be able to iterate its instances, and add filtering on
top of that.

Interesting point, but you still need to somehow store the device 
instances. Having the implementation hidden is fine (and the _device name
already suggests that) but it still needs to exist.

 3. A definition like Type of the device is dev_type does not help for
someone who knows what is dev_type. Please give examples.
The formal naming system could use some wiki styling.

Will work on that!

 4. I did not understand Libvirt XML parsing and processing is dumb -
missing orchestration object to delegate XML chunks to devices themself

Too terse indeed, will try to rephrase. The issue is the fact that each dev_type
parsing iterates over all device elements in XML, leading to O(n^2) complexity
just for the parsing.

 5. It seems that you have further phases planned, but not written.
Could you provide at least the title, or charter of each phase? 1.1
is about renaming. 1.2 is about legacy. Do you have phases 2, 3..?

Still in process of figuring out the naming of the phases and their order,
I'll try to add them when it makes sense (I just can't spit 10 names out of
my head right now that would make sense in a long term).

 Death (or at least diet) to vm.py!

+1 :) Thanks for the review!

 Dan.
 
 
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] [vdsm] VmDevices rework

2015-04-28 Thread Martin Polednik
Hello everyone,

I have started working on line of patches that deal with current state of VM 
devices
in VDSM. There is a wiki page at [1] describing the issues, phases and final 
goals
for this work. Additionally, there is formal naming system for code related to
devices. I would love to hear your opinions and comments about this effort!
(and please note that this is very long term work)

[1] http://www.ovirt.org/Feature/VmDevices_rework

mpolednik
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] VDSM mom dependancy on RHEL7

2014-07-23 Thread Martin Polednik
- Original Message -
 From: Dan Kenigsberg dan...@redhat.com
 To: Martin Polednik mpoled...@redhat.com, dc...@redhat.com
 Cc: devel@ovirt.org
 Sent: Wednesday, July 23, 2014 12:19:42 PM
 Subject: Re: [ovirt-devel] VDSM mom dependancy on RHEL7
 
 On Tue, Jul 22, 2014 at 06:36:10PM -0400, Martin Polednik wrote:
  Hi,
  
  I've gone through installing VDSM on RHEL7 host
  (Red Hat Enterprise Linux Server release 7.0 (Maipo)) and encountered
  issue with mom:
  
  Error: Package: mom-0.4.1-2.el6.noarch (ovirt-3.5-epel)
 Requires: python(abi) = 2.6
 Installed: python-2.7.5-16.el7.x86_64 (@rhel7)
 python(abi) = 2.7
 python(abi) = 2.7
  
  Repositories used were master, master-snapshots and 3.5 + dependancies.
  Although it is possible to get it working by getting mom source and
  rebuilding it on RHEL7, I'd like to know if there is different RHEL7
  repo or this is mistake in repos.
 
 http://resources.ovirt.org/pub/ovirt-3.5-pre/rpm/el7/ is quite empty,
 I'm afraid.

The problem actually occurs when getting mom from 
http://resources.ovirt.org/pub/ovirt-master-snapshot/rpm/el7/

 Since epel 7 has
 http://dl.fedoraproject.org/pub/epel/beta/7/x86_64/mom-0.4.1-1.el7.noarch.rpm
 you can enable it instead
 
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] VDSM faqemu default memory change

2014-06-10 Thread Martin Polednik
Hello

I'm writing in order to propose a change to faqemu hook as indicated in [1],
modifying the hook's default behavior:

There are currently hardcoded x86_64 and PPC memory limits (20480 and 262144),
this change would remove these implicit constraints and keep the memory 
unchanged.
A new variable in configuration is introduced, fake_kvm_memory, which allows you
to limit the memory for both these platforms to a value (or using unchanged 
memory
by setting it to 0, which is default).

This means that running faqemu without modification would consume up to 10x 
times
memory using default VM memory memory size for x86_64 and unchanged for PPC.

[1] http://gerrit.ovirt.org/#/c/28320/
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel