Re: [ovirt-devel] Gerrit parallel patch handling and CI (Or, why did my code fail post-merge)

2016-11-20 Thread Dan Kenigsberg
On Sun, Nov 20, 2016 at 06:09:59PM +0200, Nir Soffer wrote:
> On Sun, Nov 20, 2016 at 5:12 PM, Barak Korren  wrote:
> >> With the current setting (in vdsm), submitting a series of patches is
> >> a huge pain. Sometimes refreshing the page and submitting the next
> >> patch in the series works, but sometimes you have to rebase again
> >> the next patches in the series, and in the worst cases, you have to
> >> do several rebases in the same series. This when the entire series
> >> was already rebased properly before the submit.
> >
> > Actually vdsm is configured to "Cherry Pick" ATM, I'm not sure what
> > were the reasons for this, but this should probably be changed to
> > ff-only ASAP b/c as it is, it allows patches to be submitted
> > completely out-of-order.
> >
> >> In vdsm we were bitten by this many times, and both Dan and me agree
> >> now that fast-forward is the only way.
> >>
> >> I don't think we need to agree on all projects for this, the whole point
> >> of having multiple project is that we don't to agree on every little
> >> detail, the project maintainer can do whatever they want.
> >
> > Ok, so can we get an agreement between the vdsm maintainers to change
> > to "ff-only"?
> 
> +1
> 
> Dan, can you confirm?

I enjoyed the freedom of cherry-pick, but after 2 broken nightly builds
in the span of 10 days, I give up. Let's try ff-only.

Dan.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] Gerrit parallel patch handling and CI (Or, why did my code fail post-merge)

2016-11-20 Thread Martin Perina
On Sun, Nov 20, 2016 at 9:18 PM, Sandro Bonazzola 
wrote:

> Il 20/Nov/2016 15:08, "Barak Korren"  ha scritto:
> >
> > Hi there,
> >
> > I would like to address a concernt that had been raised to us by
> > multiple developers, and reach an agreement on how (and if)  to remedy
> > it.
> >
> > Lets assume the following situation:
> > We have a Git repo in Gerrit with top commit C0 in master.
> > On time t0 developers Alice and Bob push patches P1 and P2 respectively
> > to master so that we end up with the following situation in git:
> > C0 <= P1 (this is Alice`s patch)
> > C0 <= P2 (this is Bob`s patch)
> >
> > On time t1 CI runs for both patches checking the code as it looks for
> > each patch. Lets assume CI is successful for both.
> >
> > On time t2 Alice submits her patch and Gerrit merges it, resulting in
> > the following situation in master:
> > C0 <= P1
> >
> > On time t2 Bob submits his patch. Gerrit, seeing master has changed,
> > re-bases the patch and merges it, the resulting situation (If the
> > rebase is successful) is:
> > C0 <= P1 <= P2
> >
> > This means that the resulting code was never tested in CI. This, in
> > turn, causes various failures to show up post-merge despite having
> > pre-merge CI run successfully.
> >
> > This situation is a result of the way our repos are currently
> > configured. Most repos ATM are configured with the "Rebase If
> > Necessary" submit type. This means that Gerrit tries to automatically
> > rebase patches as mentioned in t2 above.
> >
> > We could, instead, configure the repos to use the "Fast Forward Only"
> > submit type. In that case, when Bob submits on t2, Gerrit refuses to
> > merge and asks Bob to rebase (While offering a convenient button to do
> > it). When he does, a new patch set gets pushed, and subsequently
> > checked by CI.
> >
> > I recommend we switch all projects to use the "Fast Forward Only" submit
> type.
> >
> > Thoughts? Concerns?
>

​AFAIR this was enabled for ovirt-engine project in the past and it was
pretty impossible to merge any patch with CI+1 when some important dates
were near (like feature freeze), because all maintainer tried to merge
patches and waited for CI to finish. Personally I'd say that current status
is OK, because it's a responsibility of a maintainer to check CI results of
a patch that he/she merged (and if error is raised then investigate the
issue and post a fix asap if needed).

So "Fast Forward Only" could successfully works for smaller projects, but I
don't think it will work for big projects like engine or vdsm.

+1 for me
>
> >
> > --
> > Barak Korren
> > bkor...@redhat.com
> > RHEV-CI Team
> > ___
> > Devel mailing list
> > Devel@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/devel
>
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] Merge gating in Gerrit

2016-11-20 Thread Sandro Bonazzola
Il 20/Nov/2016 17:25, "Nir Soffer"  ha scritto:
>
> On Sun, Nov 20, 2016 at 5:39 PM, Yedidyah Bar David 
wrote:
> > On Sun, Nov 20, 2016 at 5:06 PM, Barak Korren 
wrote:
> >> Hi all,
> >>
> >> Perhaps the main purpose of CI, is to prevent braking code from
> >> getting merged into the stable/master branches. Unfortunately our CI
> >> is not there yet, and one of the reasons for that is that we do large
> >> amount of our CI tests only _after_ the code is merged.
> >>
> >> The reason for that is that when balancing through, but time
> >> consuming, tests (e.g. enging build with all permutations) v.s. faster
> >> but more basic ones (e.g. "findbugs" and single permutation build), we
> >> typically choose the faster tests to be run per-patch-set and leave
> >> the through testing to only be run post-merge.
> >>
> >> We'd like to change that and have the through tests also run before
> >> merge. Ideally we would like to just hook stuff to the "submit"
> >> button, but Gerrit doesn't allow one to do that easily. So instead
> >> we'll need to adopt some kind of flag to indicate we want to submit
> >> and have Jenkins
> >> "click" the submit button on our behalf if tests pass.
> >>
> >> I see two options here:
> >> 1. Use Code-Review+2 as the indicator to run "heavy" CI and merge.
>
> This is problematic. For example in vdsm we have 5 maintainers with
> +2, and 4 maintainers with commit right, but only 2 are commenting
> regularly.
>
> >> 2. Add an "approve" flag that maintainers can set to +1 (This is
> >>what OpenStack is doing).
>
> This seems better.
>
> But there is another requirement - maintainer should be able to commit
> even if jenkins fails. Sometimes the CI is broken, or there are flakey
tests
> breaking the build, and some jobs are failing regularly (check-merged)
> and I don't want to wait for it.

Either disable the jobs or fix them. Having jobs consitently failing and
just ignore them is just a waste of resources.

>
> Today we can override the CI vote and commit, if we keep it as is I don't
> see any problem with this change.
>
> Nir
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] Merge gating in Gerrit

2016-11-20 Thread Sandro Bonazzola
Il 20/Nov/2016 16:06, "Barak Korren"  ha scritto:
>
> Hi all,
>
> Perhaps the main purpose of CI, is to prevent braking code from
> getting merged into the stable/master branches. Unfortunately our CI
> is not there yet, and one of the reasons for that is that we do large
> amount of our CI tests only _after_ the code is merged.
>
> The reason for that is that when balancing through, but time
> consuming, tests (e.g. enging build with all permutations) v.s. faster
> but more basic ones (e.g. "findbugs" and single permutation build), we
> typically choose the faster tests to be run per-patch-set and leave
> the through testing to only be run post-merge.
>
> We'd like to change that and have the through tests also run before
> merge.

Hopefully not the same tests ☺

Ideally we would like to just hook stuff to the "submit"
> button, but Gerrit doesn't allow one to do that easily. So instead
> we'll need to adopt some kind of flag to indicate we want to submit
> and have Jenkins
> "click" the submit button on our behalf if tests pass.
>
> I see two options here:
> 1. Use Code-Review+2 as the indicator to run "heavy" CI and merge.
> 2. Add an "approve" flag that maintainers can set to +1 (This is
>what OpenStack is doing).
>
> What would you prefer?

I would prefer to follow openstack example. Will help developers to have
same flow in both projects.

>
> --
> Barak Korren
> bkor...@redhat.com
> RHEV-CI Team
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] Gerrit parallel patch handling and CI (Or, why did my code fail post-merge)

2016-11-20 Thread Sandro Bonazzola
Il 20/Nov/2016 15:08, "Barak Korren"  ha scritto:
>
> Hi there,
>
> I would like to address a concernt that had been raised to us by
> multiple developers, and reach an agreement on how (and if)  to remedy
> it.
>
> Lets assume the following situation:
> We have a Git repo in Gerrit with top commit C0 in master.
> On time t0 developers Alice and Bob push patches P1 and P2 respectively
> to master so that we end up with the following situation in git:
> C0 <= P1 (this is Alice`s patch)
> C0 <= P2 (this is Bob`s patch)
>
> On time t1 CI runs for both patches checking the code as it looks for
> each patch. Lets assume CI is successful for both.
>
> On time t2 Alice submits her patch and Gerrit merges it, resulting in
> the following situation in master:
> C0 <= P1
>
> On time t2 Bob submits his patch. Gerrit, seeing master has changed,
> re-bases the patch and merges it, the resulting situation (If the
> rebase is successful) is:
> C0 <= P1 <= P2
>
> This means that the resulting code was never tested in CI. This, in
> turn, causes various failures to show up post-merge despite having
> pre-merge CI run successfully.
>
> This situation is a result of the way our repos are currently
> configured. Most repos ATM are configured with the "Rebase If
> Necessary" submit type. This means that Gerrit tries to automatically
> rebase patches as mentioned in t2 above.
>
> We could, instead, configure the repos to use the "Fast Forward Only"
> submit type. In that case, when Bob submits on t2, Gerrit refuses to
> merge and asks Bob to rebase (While offering a convenient button to do
> it). When he does, a new patch set gets pushed, and subsequently
> checked by CI.
>
> I recommend we switch all projects to use the "Fast Forward Only" submit
type.
>
> Thoughts? Concerns?

+1 for me

>
> --
> Barak Korren
> bkor...@redhat.com
> RHEV-CI Team
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] system tests failing on template export

2016-11-20 Thread Yaniv Kaul
On Nov 20, 2016 6:33 PM, "Nir Soffer"  wrote:
>
> On Sun, Nov 20, 2016 at 6:25 PM, Eyal Edri  wrote:
> > It happened again in [1]
> >
> > 2016-11-20 10:48:12,106 ERROR (jsonrpc/2) [storage.TaskManager.Task]
> > (Task='6c1ec6e7-fb37-465b-8e30-1613317683b2') Unexpected error
(task:870)
> > Traceback (most recent call last):
> >   File "/usr/share/vdsm/storage/task.py", line 877, in _run
> > return fn(*args, **kargs)
> >   File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in
> > wrapper
> > res = f(*args, **kwargs)
> >   File "/usr/share/vdsm/storage/hsm.py", line 2205, in getAllTasksInfo
> > allTasksInfo = sp.getAllTasksInfo()
> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py",
line
> > 77, in wrapper
> > raise SecureError("Secured object is not in safe state")
> > SecureError: Secured object is not in safe state
> > 2016-11-20 10:48:12,109 INFO  (jsonrpc/2) [storage.TaskManager.Task]
> > (Task='6c1ec6e7-fb37-465b-8e30-1613317683b2') aborting: Task is aborted:
> > u'Secured object is not in safe state' - code 100 (task:1175)
> > 2016-11-20 10:48:12,110 ERROR (jsonrpc/2) [storage.Dispatcher] Secured
> > object is not in safe state (dispatcher:80)
> > Traceback (most recent call last):
> >   File "/usr/share/vdsm/storage/dispatcher.py", line 72, in wrapper
> > result = ctask.prepare(func, *args, **kwargs)
> >   File "/usr/share/vdsm/storage/task.py", line 105, in wrapper
> > return m(self, *a, **kw)
> >   File "/usr/share/vdsm/storage/task.py", line 1183, in prepare
> > raise self.error
> > SecureError: Secured object is not in safe state
>
> This can also mean that the SPM is not started yet. Maybe you are not
> waiting until the SPM is ready before you try to perform an operation?
>
> Who is the owner of this test? This person should debug this test.

The relevant team for the feature.

>
> >
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/3506/artifact/exported-artifacts/basic_suite_master.sh-el7/exported-artifacts/test_logs/basic-suite-master/post-006_network_by_label.py/lago-basic-suite-master-host1/_var_log_vdsm/vdsm.log
> >
> > The storage VM is running on the same VM as engine ( to save memory )
and
> > its serving both NFS & ISCSI.
> > Do you think running it on the same VM as engine might cause such
issues?
>
> I don't think so, but this prevents testing lot of interesting negative
flows.

Which don't belong to CI.

>
> For example, when one storage server is down, the system should be
> able to use the other storage domain. Having each storage server in
> its own vm makes this possible.

You have both NFS and ISCSI there. It's trival to set multiple of each if
needed, of course.
I do wish to add more IPs and test iSCSI bonding as well as both NFSv3 and
NFSv4.

>
> Also, we may like to test multiple storage servers of same type.
> the storage servers should be decoupled so we can start any number
> of them as needed for the current test.

Right, but not on this suite.
Again, it's trivial to do so. The main motivation was to conserve resources
so everyone could run the tests.

Y.

>
> > On Mon, Oct 17, 2016 at 11:45 PM, Adam Litke  wrote:
> >>
> >> On 17/10/16 11:51 +0200, Piotr Kliczewski wrote:
> >>>
> >>> Adam,
> >>>
> >>> I see constant failures due to this and found:
> >>>
> >>> 2016-10-17 03:55:21,045 ERROR   (jsonrpc/3) [storage.TaskManager.Task]
> >>> Task=`8989d694-7099-449b-bd66-4d63786be089`::Unexpected error
> >>> (task:870)
> >>> Traceback (most recent call last):
> >>>  File "/usr/share/vdsm/storage/task.py", line 877, in _run
> >>>return fn(*args, **kargs)
> >>>  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in
> >>> wrapper
> >>>res = f(*args, **kwargs)
> >>>  File "/usr/share/vdsm/storage/hsm.py", line 2212, in getAllTasksInfo
> >>>allTasksInfo = sp.getAllTasksInfo()
> >>>  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py",
> >>> line 77, in wrapper
> >>>raise SecureError("Secured object is not in safe state")
> >>> SecureError: Secured object is not in safe state
> >>
> >>
> >> This usually indicates that the SPM role has been lost which happens
> >> most likely due to connection issues with the storage.  What is the
> >> storage environment being used for the system tests?
> >>
> >>>
> >>> Please take a look not sure whether it is related. You can find latest
> >>> build here [1]
> >>>
> >>> Thanks,
> >>> Piotr
> >>>
> >>> [1] http://jenkins.ovirt.org/job/ovirt_master_system-tests/668/
> >>>
> >>> On Fri, Oct 14, 2016 at 11:22 AM, Evgheni Dereveanchin
> >>>  wrote:
> 
>  Hello,
> 
>  We've got several cases today where system tests failed
>  when attempting to export templates:
> 
> 
> 
http://jenkins.ovirt.org/job/ovirt_master_system-tests/655/testReport/junit/(root)/004_basic_sanity/template_export/
> 
>  Related engine.log looks something like this:
>  https://paste.fedoraproject.org/449936/47643643/raw

Re: [ovirt-devel] Failures in OST (4.0/master) ( was error msg from Jenkins )

2016-11-20 Thread Yaniv Kaul
On Nov 20, 2016 6:30 PM, "Eyal Edri"  wrote:
>
> Renaming title and adding devel.
>
> On Sun, Nov 20, 2016 at 2:36 PM, Piotr Kliczewski 
wrote:
>>
>> The last failure seems to be storage related.
>>
>> @Nir please take a look.
>>
>> Here is engine side error:
>>
>> 2016-11-20 05:54:59,605 DEBUG
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand]
(default task-5) [59fc0074] Exception:
org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException:
IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot
find master domain: u'spUUID=1ca141f1-b64d-4a52-8861-05c7de2a72b2,
msdUUID=7d4bf750-4fb8-463f-bbb0-92156c47306e'
>>
>> and here is vdsm:
>>
>> jsonrpc.Executor/5::ERROR::2016-11-20
05:54:56,331::multipath::95::Storage.Multipath::(resize_devices) Could not
resize device 360014052749733c7b8248628637b990f
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/storage/multipath.py", line 93, in resize_devices
>> _resize_if_needed(guid)
>>   File "/usr/share/vdsm/storage/multipath.py", line 101, in
_resize_if_needed
>> for slave in devicemapper.getSlaves(name)]
>>   File "/usr/share/vdsm/storage/multipath.py", line 158, in getDeviceSize
>> bs, phyBs = getDeviceBlockSizes(devName)
>>   File "/usr/share/vdsm/storage/multipath.py", line 150, in
getDeviceBlockSizes
>> "queue", "logical_block_size")).read())
>> IOError: [Errno 2] No such file or directory:
'/sys/block/sdb/queue/logical_block_size'
>
>
>
> We now see a different error in master [1], which also indicates the
hosts are in a problematic state: ( failing 'assign_hosts_network_label'
test  )
>
> status: 409
> reason: Conflict
> detail: Cannot add Label. Operation can be performed only when Host
status is  Maintenance, Up, NonOperational.

I believe you are mixing unrelated issues.
I've seen this once and I have an unproven theory :
The previous suite restarts Engine after LDAP configuration then performs
its test, which is quite short (24 seconds on my poor laptop + few
additional secs between suites).
I'm not convinced it is enough time for hosts status to be updated in
Engine back to UP state.

Y.

>  >> begin captured logging << 
>
>
> [1]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/3506/testReport/junit/(root)/006_network_by_label/assign_hosts_network_label/
>
>
>>
>>
>>
>> On Sun, Nov 20, 2016 at 12:50 PM, Eyal Edri  wrote:
>>>
>>>
>>>
>>> On Sun, Nov 20, 2016 at 1:42 PM, Yaniv Kaul  wrote:



 On Sun, Nov 20, 2016 at 1:30 PM, Yaniv Kaul  wrote:
>
>
>
> On Sun, Nov 20, 2016 at 1:18 PM, Eyal Edri  wrote:
>>
>> the test fails to run VM because no hosts are in UP state(?) [1],
not sure it is related to the triggering patch[2]
>>
>> status: 400
>> reason: Bad Request
>> detail: There are no hosts to use. Check that the cluster contains
at least one host in Up state.
>>
>> Thoughts? Shouldn't we fail the test earlier we hosts are not UP?
>
>
> Yes. It's more likely that we are picking the wrong host or so, but
who knows - where are the engine and VDSM logs?


 A simple grep on the engine.log[1] finds serveral unrelated issues I'm
not sure are reported, it's despairing to even begin...
 That being said, I don't see the issue there. We may need better
logging on the API level, to see what is being sent. Is it consistent?
>>>
>>>
>>> Just failed now the first time, I didn't see it before.
>>>

 Y.


 [1]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.0/3015/artifact/exported-artifacts/basic_suite_4.0.sh-el7/exported-artifacts/test_logs/basic-suite-4.0/post-004_basic_sanity.py/lago-basic-suite-4-0-engine/_var_log_ovirt-engine/engine.log

>
> Y.
>
>>
>>
>>
>> [1]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.0/3015/testReport/junit/(root)/004_basic_sanity/vm_run/
>> [2]
http://jenkins.ovirt.org/job/ovirt-engine_4.0_build-artifacts-el7-x86_64/1535/changes#detail
>>
>>
>>
>> On Sun, Nov 20, 2016 at 1:00 PM, 
wrote:
>>>
>>> Build:
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.0/3015/,
>>> Build Number: 3015,
>>> Build Status: FAILURE
>>> ___
>>> Infra mailing list
>>> in...@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>
>>
>>
>>
>> --
>> Eyal Edri
>> Associate Manager
>> RHV DevOps
>> EMEA ENG Virtualization R&D
>> Red Hat Israel
>>
>> phone: +972-9-7692018
>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>
>

>>>
>>>
>>>
>>> --
>>> Eyal Edri
>>> Associate Manager
>>> RHV DevOps
>>> EMEA ENG Virtualization R&D
>>> Red Hat Israel
>>>
>>> phone: +972-9-7692018
>>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>>
>>
>
>
>
> --
> Eyal Edri
> Associate Manager
> RHV DevOps
> EMEA ENG Virtualizat

Re: [ovirt-devel] Failures in OST (4.0/master) ( was error msg from Jenkins )

2016-11-20 Thread Nir Soffer
On Sun, Nov 20, 2016 at 6:30 PM, Eyal Edri  wrote:
> Renaming title and adding devel.
>
> On Sun, Nov 20, 2016 at 2:36 PM, Piotr Kliczewski 
> wrote:
>>
>> The last failure seems to be storage related.
>>
>> @Nir please take a look.
>>
>> Here is engine side error:
>>
>> 2016-11-20 05:54:59,605 DEBUG
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand]
>> (default task-5) [59fc0074] Exception:
>> org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException:
>> IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot
>> find master domain: u'spUUID=1ca141f1-b64d-4a52-8861-05c7de2a72b2,
>> msdUUID=7d4bf750-4fb8-463f-bbb0-92156c47306e'
>>
>> and here is vdsm:
>>
>> jsonrpc.Executor/5::ERROR::2016-11-20
>> 05:54:56,331::multipath::95::Storage.Multipath::(resize_devices) Could not
>> resize device 360014052749733c7b8248628637b990f
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/storage/multipath.py", line 93, in resize_devices
>> _resize_if_needed(guid)
>>   File "/usr/share/vdsm/storage/multipath.py", line 101, in
>> _resize_if_needed
>> for slave in devicemapper.getSlaves(name)]
>>   File "/usr/share/vdsm/storage/multipath.py", line 158, in getDeviceSize
>> bs, phyBs = getDeviceBlockSizes(devName)
>>   File "/usr/share/vdsm/storage/multipath.py", line 150, in
>> getDeviceBlockSizes
>> "queue", "logical_block_size")).read())
>> IOError: [Errno 2] No such file or directory:
>> '/sys/block/sdb/queue/logical_block_size'

Please open a bug for this, this is an expected situation (when device is
during a scan), and we should be able to cope with it.

Adding Fred who worked on this area.

Nir

> We now see a different error in master [1], which also indicates the hosts
> are in a problematic state: ( failing 'assign_hosts_network_label' test  )
>
> status: 409
> reason: Conflict
> detail: Cannot add Label. Operation can be performed only when Host status
> is  Maintenance, Up, NonOperational.
>  >> begin captured logging << 
>
>
> [1]
> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/3506/testReport/junit/(root)/006_network_by_label/assign_hosts_network_label/
>
>
>>
>>
>>
>> On Sun, Nov 20, 2016 at 12:50 PM, Eyal Edri  wrote:
>>>
>>>
>>>
>>> On Sun, Nov 20, 2016 at 1:42 PM, Yaniv Kaul  wrote:



 On Sun, Nov 20, 2016 at 1:30 PM, Yaniv Kaul  wrote:
>
>
>
> On Sun, Nov 20, 2016 at 1:18 PM, Eyal Edri  wrote:
>>
>> the test fails to run VM because no hosts are in UP state(?) [1], not
>> sure it is related to the triggering patch[2]
>>
>> status: 400
>> reason: Bad Request
>> detail: There are no hosts to use. Check that the cluster contains at
>> least one host in Up state.
>>
>> Thoughts? Shouldn't we fail the test earlier we hosts are not UP?
>
>
> Yes. It's more likely that we are picking the wrong host or so, but who
> knows - where are the engine and VDSM logs?


 A simple grep on the engine.log[1] finds serveral unrelated issues I'm
 not sure are reported, it's despairing to even begin...
 That being said, I don't see the issue there. We may need better logging
 on the API level, to see what is being sent. Is it consistent?
>>>
>>>
>>> Just failed now the first time, I didn't see it before.
>>>

 Y.


 [1]
 http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.0/3015/artifact/exported-artifacts/basic_suite_4.0.sh-el7/exported-artifacts/test_logs/basic-suite-4.0/post-004_basic_sanity.py/lago-basic-suite-4-0-engine/_var_log_ovirt-engine/engine.log
>
> Y.
>
>>
>>
>>
>> [1]
>> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.0/3015/testReport/junit/(root)/004_basic_sanity/vm_run/
>> [2]
>> http://jenkins.ovirt.org/job/ovirt-engine_4.0_build-artifacts-el7-x86_64/1535/changes#detail
>>
>>
>>
>> On Sun, Nov 20, 2016 at 1:00 PM, 
>> wrote:
>>>
>>> Build:
>>> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.0/3015/,
>>> Build Number: 3015,
>>> Build Status: FAILURE
>>> ___
>>> Infra mailing list
>>> in...@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>
>>
>>
>>
>> --
>> Eyal Edri
>> Associate Manager
>> RHV DevOps
>> EMEA ENG Virtualization R&D
>> Red Hat Israel
>>
>> phone: +972-9-7692018
>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>
>

>>>
>>>
>>>
>>> --
>>> Eyal Edri
>>> Associate Manager
>>> RHV DevOps
>>> EMEA ENG Virtualization R&D
>>> Red Hat Israel
>>>
>>> phone: +972-9-7692018
>>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>>
>>
>
>
>
> --
> Eyal Edri
> Associate Manager
> RHV DevOps
> EMEA ENG Virtualization R&D
> Red Hat Israel
>
> phone: +972-9-7692018
> irc: eedri (on #tlv #rhev-dev #rhev-integ

Re: [ovirt-devel] system tests failing on template export

2016-11-20 Thread Nir Soffer
On Sun, Nov 20, 2016 at 6:25 PM, Eyal Edri  wrote:
> It happened again in [1]
>
> 2016-11-20 10:48:12,106 ERROR (jsonrpc/2) [storage.TaskManager.Task]
> (Task='6c1ec6e7-fb37-465b-8e30-1613317683b2') Unexpected error (task:870)
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/task.py", line 877, in _run
> return fn(*args, **kargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in
> wrapper
> res = f(*args, **kwargs)
>   File "/usr/share/vdsm/storage/hsm.py", line 2205, in getAllTasksInfo
> allTasksInfo = sp.getAllTasksInfo()
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line
> 77, in wrapper
> raise SecureError("Secured object is not in safe state")
> SecureError: Secured object is not in safe state
> 2016-11-20 10:48:12,109 INFO  (jsonrpc/2) [storage.TaskManager.Task]
> (Task='6c1ec6e7-fb37-465b-8e30-1613317683b2') aborting: Task is aborted:
> u'Secured object is not in safe state' - code 100 (task:1175)
> 2016-11-20 10:48:12,110 ERROR (jsonrpc/2) [storage.Dispatcher] Secured
> object is not in safe state (dispatcher:80)
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/dispatcher.py", line 72, in wrapper
> result = ctask.prepare(func, *args, **kwargs)
>   File "/usr/share/vdsm/storage/task.py", line 105, in wrapper
> return m(self, *a, **kw)
>   File "/usr/share/vdsm/storage/task.py", line 1183, in prepare
> raise self.error
> SecureError: Secured object is not in safe state

This can also mean that the SPM is not started yet. Maybe you are not
waiting until the SPM is ready before you try to perform an operation?

Who is the owner of this test? This person should debug this test.

> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/3506/artifact/exported-artifacts/basic_suite_master.sh-el7/exported-artifacts/test_logs/basic-suite-master/post-006_network_by_label.py/lago-basic-suite-master-host1/_var_log_vdsm/vdsm.log
>
> The storage VM is running on the same VM as engine ( to save memory ) and
> its serving both NFS & ISCSI.
> Do you think running it on the same VM as engine might cause such issues?

I don't think so, but this prevents testing lot of interesting negative flows.

For example, when one storage server is down, the system should be
able to use the other storage domain. Having each storage server in
its own vm makes this possible.

Also, we may like to test multiple storage servers of same type.
the storage servers should be decoupled so we can start any number
of them as needed for the current test.

> On Mon, Oct 17, 2016 at 11:45 PM, Adam Litke  wrote:
>>
>> On 17/10/16 11:51 +0200, Piotr Kliczewski wrote:
>>>
>>> Adam,
>>>
>>> I see constant failures due to this and found:
>>>
>>> 2016-10-17 03:55:21,045 ERROR   (jsonrpc/3) [storage.TaskManager.Task]
>>> Task=`8989d694-7099-449b-bd66-4d63786be089`::Unexpected error
>>> (task:870)
>>> Traceback (most recent call last):
>>>  File "/usr/share/vdsm/storage/task.py", line 877, in _run
>>>return fn(*args, **kargs)
>>>  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in
>>> wrapper
>>>res = f(*args, **kwargs)
>>>  File "/usr/share/vdsm/storage/hsm.py", line 2212, in getAllTasksInfo
>>>allTasksInfo = sp.getAllTasksInfo()
>>>  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py",
>>> line 77, in wrapper
>>>raise SecureError("Secured object is not in safe state")
>>> SecureError: Secured object is not in safe state
>>
>>
>> This usually indicates that the SPM role has been lost which happens
>> most likely due to connection issues with the storage.  What is the
>> storage environment being used for the system tests?
>>
>>>
>>> Please take a look not sure whether it is related. You can find latest
>>> build here [1]
>>>
>>> Thanks,
>>> Piotr
>>>
>>> [1] http://jenkins.ovirt.org/job/ovirt_master_system-tests/668/
>>>
>>> On Fri, Oct 14, 2016 at 11:22 AM, Evgheni Dereveanchin
>>>  wrote:

 Hello,

 We've got several cases today where system tests failed
 when attempting to export templates:


 http://jenkins.ovirt.org/job/ovirt_master_system-tests/655/testReport/junit/(root)/004_basic_sanity/template_export/

 Related engine.log looks something like this:
 https://paste.fedoraproject.org/449936/47643643/raw/

 I could not find any obvious issues in SPM logs, could someone
 please take a look to confirm what may be causing this issue?

 Full logs from the test are available here:
 http://jenkins.ovirt.org/job/ovirt_master_system-tests/655/artifact/

 Regards,
 Evgheni Dereveanchin
 ___
 Devel mailing list
 Devel@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/devel
>>
>>
>> --
>> Adam Litke
>>
>> ___
>> Devel mailing list
>> Devel@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel

[ovirt-devel] Failures in OST (4.0/master) ( was error msg from Jenkins )

2016-11-20 Thread Eyal Edri
Renaming title and adding devel.

On Sun, Nov 20, 2016 at 2:36 PM, Piotr Kliczewski 
wrote:

> The last failure seems to be storage related.
>
> @Nir please take a look.
>
> Here is engine side error:
>
> 2016-11-20 05:54:59,605 DEBUG 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand]
> (default task-5) [59fc0074] Exception: org.ovirt.engine.core.
> vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException:
> IRSErrorException: IRSNoMasterDomainException: Cannot find master domain:
> u'spUUID=1ca141f1-b64d-4a52-8861-05c7de2a72b2, msdUUID=7d4bf750-4fb8-463f-
> bbb0-92156c47306e'
>
> and here is vdsm:
>
> jsonrpc.Executor/5::ERROR::2016-11-20 05:54:56,331::multipath::95::
> Storage.Multipath::(resize_devices) Could not resize device
> 360014052749733c7b8248628637b990f
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/multipath.py", line 93, in resize_devices
> _resize_if_needed(guid)
>   File "/usr/share/vdsm/storage/multipath.py", line 101, in
> _resize_if_needed
> for slave in devicemapper.getSlaves(name)]
>   File "/usr/share/vdsm/storage/multipath.py", line 158, in getDeviceSize
> bs, phyBs = getDeviceBlockSizes(devName)
>   File "/usr/share/vdsm/storage/multipath.py", line 150, in
> getDeviceBlockSizes
> "queue", "logical_block_size")).read())
> IOError: [Errno 2] No such file or directory:
> '/sys/block/sdb/queue/logical_block_size'
>


We now see a different error in master [1], which also indicates the hosts
are in a problematic state: ( failing 'assign_hosts_network_label' test  )

status: 409
reason: Conflict
detail: Cannot add Label. Operation can be performed only when Host status
is  Maintenance, Up, NonOperational.
 >> begin captured logging << 


[1]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/3506/testReport/junit/(root)/006_network_by_label/assign_hosts_network_label/



>
>
> On Sun, Nov 20, 2016 at 12:50 PM, Eyal Edri  wrote:
>
>>
>>
>> On Sun, Nov 20, 2016 at 1:42 PM, Yaniv Kaul  wrote:
>>
>>>
>>>
>>> On Sun, Nov 20, 2016 at 1:30 PM, Yaniv Kaul  wrote:
>>>


 On Sun, Nov 20, 2016 at 1:18 PM, Eyal Edri  wrote:

> the test fails to run VM because no hosts are in UP state(?) [1], not
> sure it is related to the triggering patch[2]
>
> status: 400
> reason: Bad Request
> detail: There are no hosts to use. Check that the cluster contains at
> least one host in Up state.
>
> Thoughts? Shouldn't we fail the test earlier we hosts are not UP?
>

 Yes. It's more likely that we are picking the wrong host or so, but who
 knows - where are the engine and VDSM logs?

>>>
>>> A simple grep on the engine.log[1] finds serveral unrelated issues I'm
>>> not sure are reported, it's despairing to even begin...
>>> That being said, I don't see the issue there. We may need better logging
>>> on the API level, to see what is being sent. Is it consistent?
>>>
>>
>> Just failed now the first time, I didn't see it before.
>>
>>
>>> Y.
>>>
>>>
>>> [1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.
>>> 0/3015/artifact/exported-artifacts/basic_suite_4.0.sh-el7/ex
>>> ported-artifacts/test_logs/basic-suite-4.0/post-004_basic_
>>> sanity.py/lago-basic-suite-4-0-engine/_var_log_ovirt-engine/engine.log
>>>
 Y.


>
>
> [1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimenta
> l_4.0/3015/testReport/junit/(root)/004_basic_sanity/vm_run/
> [2] http://jenkins.ovirt.org/job/ovirt-engine_4.0_build-arti
> facts-el7-x86_64/1535/changes#detail
>
>
>
> On Sun, Nov 20, 2016 at 1:00 PM, 
> wrote:
>
>> Build: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.
>> 0/3015/,
>> Build Number: 3015,
>> Build Status: FAILURE
>> ___
>> Infra mailing list
>> in...@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/infra
>>
>>
>
>
> --
> Eyal Edri
> Associate Manager
> RHV DevOps
> EMEA ENG Virtualization R&D
> Red Hat Israel
>
> phone: +972-9-7692018
> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>


>>>
>>
>>
>> --
>> Eyal Edri
>> Associate Manager
>> RHV DevOps
>> EMEA ENG Virtualization R&D
>> Red Hat Israel
>>
>> phone: +972-9-7692018
>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>>
>
>


-- 
Eyal Edri
Associate Manager
RHV DevOps
EMEA ENG Virtualization R&D
Red Hat Israel

phone: +972-9-7692018
irc: eedri (on #tlv #rhev-dev #rhev-integ)
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] system tests failing on template export

2016-11-20 Thread Eyal Edri
It happened again in [1]

   1. 2016-11-20 10:48:12,106 ERROR (jsonrpc/2) [storage.TaskManager.Task]
   (Task='6c1ec6e7-fb37-465b-8e30-1613317683b2') Unexpected error (task:870)
   2. Traceback (most recent call last):
   3.   File "/usr/share/vdsm/storage/task.py", line 877, in _run
   4. return fn(*args, **kargs)
   5.   File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50,
   in wrapper
   6. res = f(*args, **kwargs)
   7.   File "/usr/share/vdsm/storage/hsm.py", line 2205, in getAllTasksInfo
   8. allTasksInfo = sp.getAllTasksInfo()
   9.   File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py",
   line 77, in wrapper
   10. raise SecureError("Secured object is not in safe state")
   11. SecureError: Secured object is not in safe state
   12. 2016-11-20 10:48:12,109 INFO  (jsonrpc/2) [storage.TaskManager.Task]
   (Task='6c1ec6e7-fb37-465b-8e30-1613317683b2') aborting: Task is aborted:
   u'Secured object is not in safe state' - code 100 (task:1175)
   13. 2016-11-20 10:48:12,110 ERROR (jsonrpc/2) [storage.Dispatcher]
   Secured object is not in safe state (dispatcher:80)
   14. Traceback (most recent call last):
   15.   File "/usr/share/vdsm/storage/dispatcher.py", line 72, in wrapper
   16. result = ctask.prepare(func, *args, **kwargs)
   17.   File "/usr/share/vdsm/storage/task.py", line 105, in wrapper
   18. return m(self, *a, **kw)
   19.   File "/usr/share/vdsm/storage/task.py", line 1183, in prepare
   20. raise self.error
   21. SecureError: Secured object is not in safe state


http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/3506/artifact/exported-artifacts/basic_suite_master.sh-el7/exported-artifacts/test_logs/basic-suite-master/post-006_network_by_label.py/lago-basic-suite-master-host1/_var_log_vdsm/vdsm.log

The storage VM is running on the same VM as engine ( to save memory ) and
its serving both NFS & ISCSI.
Do you think running it on the same VM as engine might cause such issues?



On Mon, Oct 17, 2016 at 11:45 PM, Adam Litke  wrote:

> On 17/10/16 11:51 +0200, Piotr Kliczewski wrote:
>
>> Adam,
>>
>> I see constant failures due to this and found:
>>
>> 2016-10-17 03:55:21,045 ERROR   (jsonrpc/3) [storage.TaskManager.Task]
>> Task=`8989d694-7099-449b-bd66-4d63786be089`::Unexpected error
>> (task:870)
>> Traceback (most recent call last):
>>  File "/usr/share/vdsm/storage/task.py", line 877, in _run
>>return fn(*args, **kargs)
>>  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in
>> wrapper
>>res = f(*args, **kwargs)
>>  File "/usr/share/vdsm/storage/hsm.py", line 2212, in getAllTasksInfo
>>allTasksInfo = sp.getAllTasksInfo()
>>  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py",
>> line 77, in wrapper
>>raise SecureError("Secured object is not in safe state")
>> SecureError: Secured object is not in safe state
>>
>
> This usually indicates that the SPM role has been lost which happens
> most likely due to connection issues with the storage.  What is the
> storage environment being used for the system tests?
>
>
>> Please take a look not sure whether it is related. You can find latest
>> build here [1]
>>
>> Thanks,
>> Piotr
>>
>> [1] http://jenkins.ovirt.org/job/ovirt_master_system-tests/668/
>>
>> On Fri, Oct 14, 2016 at 11:22 AM, Evgheni Dereveanchin
>>  wrote:
>>
>>> Hello,
>>>
>>> We've got several cases today where system tests failed
>>> when attempting to export templates:
>>>
>>> http://jenkins.ovirt.org/job/ovirt_master_system-tests/655/t
>>> estReport/junit/(root)/004_basic_sanity/template_export/
>>>
>>> Related engine.log looks something like this:
>>> https://paste.fedoraproject.org/449936/47643643/raw/
>>>
>>> I could not find any obvious issues in SPM logs, could someone
>>> please take a look to confirm what may be causing this issue?
>>>
>>> Full logs from the test are available here:
>>> http://jenkins.ovirt.org/job/ovirt_master_system-tests/655/artifact/
>>>
>>> Regards,
>>> Evgheni Dereveanchin
>>> ___
>>> Devel mailing list
>>> Devel@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/devel
>>>
>>
> --
> Adam Litke
>
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
>
>


-- 
Eyal Edri
Associate Manager
RHV DevOps
EMEA ENG Virtualization R&D
Red Hat Israel

phone: +972-9-7692018
irc: eedri (on #tlv #rhev-dev #rhev-integ)
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] Merge gating in Gerrit

2016-11-20 Thread Nir Soffer
On Sun, Nov 20, 2016 at 5:39 PM, Yedidyah Bar David  wrote:
> On Sun, Nov 20, 2016 at 5:06 PM, Barak Korren  wrote:
>> Hi all,
>>
>> Perhaps the main purpose of CI, is to prevent braking code from
>> getting merged into the stable/master branches. Unfortunately our CI
>> is not there yet, and one of the reasons for that is that we do large
>> amount of our CI tests only _after_ the code is merged.
>>
>> The reason for that is that when balancing through, but time
>> consuming, tests (e.g. enging build with all permutations) v.s. faster
>> but more basic ones (e.g. "findbugs" and single permutation build), we
>> typically choose the faster tests to be run per-patch-set and leave
>> the through testing to only be run post-merge.
>>
>> We'd like to change that and have the through tests also run before
>> merge. Ideally we would like to just hook stuff to the "submit"
>> button, but Gerrit doesn't allow one to do that easily. So instead
>> we'll need to adopt some kind of flag to indicate we want to submit
>> and have Jenkins
>> "click" the submit button on our behalf if tests pass.
>>
>> I see two options here:
>> 1. Use Code-Review+2 as the indicator to run "heavy" CI and merge.

This is problematic. For example in vdsm we have 5 maintainers with
+2, and 4 maintainers with commit right, but only 2 are commenting
regularly.

>> 2. Add an "approve" flag that maintainers can set to +1 (This is
>>what OpenStack is doing).

This seems better.

But there is another requirement - maintainer should be able to commit
even if jenkins fails. Sometimes the CI is broken, or there are flakey tests
breaking the build, and some jobs are failing regularly (check-merged)
and I don't want to wait for it.

Today we can override the CI vote and commit, if we keep it as is I don't
see any problem with this change.

Nir
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] Gerrit parallel patch handling and CI (Or, why did my code fail post-merge)

2016-11-20 Thread Nir Soffer
On Sun, Nov 20, 2016 at 5:12 PM, Barak Korren  wrote:
>> With the current setting (in vdsm), submitting a series of patches is
>> a huge pain. Sometimes refreshing the page and submitting the next
>> patch in the series works, but sometimes you have to rebase again
>> the next patches in the series, and in the worst cases, you have to
>> do several rebases in the same series. This when the entire series
>> was already rebased properly before the submit.
>
> Actually vdsm is configured to "Cherry Pick" ATM, I'm not sure what
> were the reasons for this, but this should probably be changed to
> ff-only ASAP b/c as it is, it allows patches to be submitted
> completely out-of-order.
>
>> In vdsm we were bitten by this many times, and both Dan and me agree
>> now that fast-forward is the only way.
>>
>> I don't think we need to agree on all projects for this, the whole point
>> of having multiple project is that we don't to agree on every little
>> detail, the project maintainer can do whatever they want.
>
> Ok, so can we get an agreement between the vdsm maintainers to change
> to "ff-only"?

+1

Dan, can you confirm?
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] Merge gating in Gerrit

2016-11-20 Thread Yedidyah Bar David
On Sun, Nov 20, 2016 at 5:06 PM, Barak Korren  wrote:
> Hi all,
>
> Perhaps the main purpose of CI, is to prevent braking code from
> getting merged into the stable/master branches. Unfortunately our CI
> is not there yet, and one of the reasons for that is that we do large
> amount of our CI tests only _after_ the code is merged.
>
> The reason for that is that when balancing through, but time
> consuming, tests (e.g. enging build with all permutations) v.s. faster
> but more basic ones (e.g. "findbugs" and single permutation build), we
> typically choose the faster tests to be run per-patch-set and leave
> the through testing to only be run post-merge.
>
> We'd like to change that and have the through tests also run before
> merge. Ideally we would like to just hook stuff to the "submit"
> button, but Gerrit doesn't allow one to do that easily. So instead
> we'll need to adopt some kind of flag to indicate we want to submit
> and have Jenkins
> "click" the submit button on our behalf if tests pass.
>
> I see two options here:
> 1. Use Code-Review+2 as the indicator to run "heavy" CI and merge.
> 2. Add an "approve" flag that maintainers can set to +1 (This is
>what OpenStack is doing).
>
> What would you prefer?

(2.), and call it "Run heavy CI tests", and only do this and not
merge, so that one can ask to run these tests prior to merging.
-- 
Didi
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] Gerrit parallel patch handling and CI (Or, why did my code fail post-merge)

2016-11-20 Thread Barak Korren
> With the current setting (in vdsm), submitting a series of patches is
> a huge pain. Sometimes refreshing the page and submitting the next
> patch in the series works, but sometimes you have to rebase again
> the next patches in the series, and in the worst cases, you have to
> do several rebases in the same series. This when the entire series
> was already rebased properly before the submit.

Actually vdsm is configured to "Cherry Pick" ATM, I'm not sure what
were the reasons for this, but this should probably be changed to
ff-only ASAP b/c as it is, it allows patches to be submitted
completely out-of-order.

> In vdsm we were bitten by this many times, and both Dan and me agree
> now that fast-forward is the only way.
>
> I don't think we need to agree on all projects for this, the whole point
> of having multiple project is that we don't to agree on every little
> detail, the project maintainer can do whatever they want.

Ok, so can we get an agreement between the vdsm maintainers to change
to "ff-only"?

-- 
Barak Korren
bkor...@redhat.com
RHEV-CI Team
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] Merge gating in Gerrit

2016-11-20 Thread Barak Korren
Hi all,

Perhaps the main purpose of CI, is to prevent braking code from
getting merged into the stable/master branches. Unfortunately our CI
is not there yet, and one of the reasons for that is that we do large
amount of our CI tests only _after_ the code is merged.

The reason for that is that when balancing through, but time
consuming, tests (e.g. enging build with all permutations) v.s. faster
but more basic ones (e.g. "findbugs" and single permutation build), we
typically choose the faster tests to be run per-patch-set and leave
the through testing to only be run post-merge.

We'd like to change that and have the through tests also run before
merge. Ideally we would like to just hook stuff to the "submit"
button, but Gerrit doesn't allow one to do that easily. So instead
we'll need to adopt some kind of flag to indicate we want to submit
and have Jenkins
"click" the submit button on our behalf if tests pass.

I see two options here:
1. Use Code-Review+2 as the indicator to run "heavy" CI and merge.
2. Add an "approve" flag that maintainers can set to +1 (This is
   what OpenStack is doing).

What would you prefer?

-- 
Barak Korren
bkor...@redhat.com
RHEV-CI Team
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] Gerrit parallel patch handling and CI (Or, why did my code fail post-merge)

2016-11-20 Thread Nir Soffer
On Sun, Nov 20, 2016 at 4:07 PM, Barak Korren  wrote:
> Hi there,
>
> I would like to address a concernt that had been raised to us by
> multiple developers, and reach an agreement on how (and if)  to remedy
> it.
>
> Lets assume the following situation:
> We have a Git repo in Gerrit with top commit C0 in master.
> On time t0 developers Alice and Bob push patches P1 and P2 respectively
> to master so that we end up with the following situation in git:
> C0 <= P1 (this is Alice`s patch)
> C0 <= P2 (this is Bob`s patch)
>
> On time t1 CI runs for both patches checking the code as it looks for
> each patch. Lets assume CI is successful for both.
>
> On time t2 Alice submits her patch and Gerrit merges it, resulting in
> the following situation in master:
> C0 <= P1
>
> On time t2 Bob submits his patch. Gerrit, seeing master has changed,
> re-bases the patch and merges it, the resulting situation (If the
> rebase is successful) is:
> C0 <= P1 <= P2
>
> This means that the resulting code was never tested in CI.

This makes the CI useless.

To know if a patch actually passed the tests, you have to manually
rebase each patch and wait for the CI - this takes up to 20 minutes
on vdsm CI.

> This, in
> turn, causes various failures to show up post-merge despite having
> pre-merge CI run successfully.
>
> This situation is a result of the way our repos are currently
> configured. Most repos ATM are configured with the "Rebase If
> Necessary" submit type. This means that Gerrit tries to automatically
> rebase patches as mentioned in t2 above.
>
> We could, instead, configure the repos to use the "Fast Forward Only"
> submit type. In that case, when Bob submits on t2, Gerrit refuses to
> merge and asks Bob to rebase (While offering a convenient button to do
> it). When he does, a new patch set gets pushed, and subsequently
> checked by CI.
>
> I recommend we switch all projects to use the "Fast Forward Only" submit type.
>
> Thoughts? Concerns?

We have fast-forward in ioprocess and ovirt-imageio, and we are
happy with this setting.

Another advantage of fast-forward only merges is being able to submit
multiple patches with *one click*. If you submit the top patch in a series,
all patches are submitted.

With the current setting (in vdsm), submitting a series of patches is
a huge pain. Sometimes refreshing the page and submitting the next
patch in the series works, but sometimes you have to rebase again
the next patches in the series, and in the worst cases, you have to
do several rebases in the same series. This when the entire series
was already rebased properly before the submit.

In vdsm we were bitten by this many times, and both Dan and me agree
now that fast-forward is the only way.

I don't think we need to agree on all projects for this, the whole point
of having multiple project is that we don't to agree on every little
detail, the project maintainer can do whatever they want.

Thanks for raising this issue.

Nir
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] Gerrit parallel patch handling and CI (Or, why did my code fail post-merge)

2016-11-20 Thread Barak Korren
Hi there,

I would like to address a concernt that had been raised to us by
multiple developers, and reach an agreement on how (and if)  to remedy
it.

Lets assume the following situation:
We have a Git repo in Gerrit with top commit C0 in master.
On time t0 developers Alice and Bob push patches P1 and P2 respectively
to master so that we end up with the following situation in git:
C0 <= P1 (this is Alice`s patch)
C0 <= P2 (this is Bob`s patch)

On time t1 CI runs for both patches checking the code as it looks for
each patch. Lets assume CI is successful for both.

On time t2 Alice submits her patch and Gerrit merges it, resulting in
the following situation in master:
C0 <= P1

On time t2 Bob submits his patch. Gerrit, seeing master has changed,
re-bases the patch and merges it, the resulting situation (If the
rebase is successful) is:
C0 <= P1 <= P2

This means that the resulting code was never tested in CI. This, in
turn, causes various failures to show up post-merge despite having
pre-merge CI run successfully.

This situation is a result of the way our repos are currently
configured. Most repos ATM are configured with the "Rebase If
Necessary" submit type. This means that Gerrit tries to automatically
rebase patches as mentioned in t2 above.

We could, instead, configure the repos to use the "Fast Forward Only"
submit type. In that case, when Bob submits on t2, Gerrit refuses to
merge and asks Bob to rebase (While offering a convenient button to do
it). When he does, a new patch set gets pushed, and subsequently
checked by CI.

I recommend we switch all projects to use the "Fast Forward Only" submit type.

Thoughts? Concerns?

-- 
Barak Korren
bkor...@redhat.com
RHEV-CI Team
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel