Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-20 Thread Gal Ben Haim
The failure happened again on "ovirt-srv04".
The suite wasn't run from "/dev/shm" since it was full of stale lago
environments of "hc-basic-suite-4.1" and "he-basic-iscsi-suite-4.2".
The reason for the stale envs is a timeout that was raised by Jenkins
(the suites were stuck for 6 hours), so OST's cleanup has not been called.
I'm going to add an internal timeout to OST.


On Tue, Mar 20, 2018 at 11:03 AM, Yedidyah Bar David 
wrote:

> On Tue, Mar 20, 2018 at 10:57 AM, Barak Korren  wrote:
> > On 20 March 2018 at 10:53, Yedidyah Bar David  wrote:
> >> On Tue, Mar 20, 2018 at 10:11 AM, Barak Korren 
> wrote:
> >>> On 20 March 2018 at 09:17, Yedidyah Bar David  wrote:
>  On Mon, Mar 19, 2018 at 6:56 PM, Dominik Holler 
> wrote:
> > Thanks Gal, I expect the problem is fixed until something eats
> > all space in /dev/shm.
> > But the usage of /dev/shm is logged in the output, so we would be
> able
> > to detect the problem next time instantly.
> >
> > From my point of view it would be good to know why /dev/shm was full,
> > to prevent this situation in future.
> 
>  Gal already wrote below - it was because some build failed to clean up
>  after itself.
> 
>  I don't know about this specific case, but I was told that I am
>  personally causing such issues by using the 'cancel' button, so I
>  sadly stopped. Sadly, because our CI system is quite loaded and when I
>  know that some build is useless, I wish to kill it and save some
>  load...
> 
>  Back to your point, perhaps we should make jobs check /dev/shm when
>  they _start_, and either alert/fail/whatever if it's not almost free,
>  or, if we know what we are doing, just remove stuff there? That might
>  be much easier than fixing things to clean up in end, and/or debugging
>  why this cleaning failed.
> >>>
> >>> Sure thing, patches to:
> >>>
> >>> [jenkins repo]/jobs/confs/shell-scripts/cleanup_slave.sh
> >>>
> >>> Are welcome, we often find interesting stuff to add there...
> >>>
> >>> If constrained for time, please turn this comment into an orderly RFE
> in Jira...
> >>
> >> Searched for '/dev/shm' and found way too many places to analyze them
> >> all and add something to cleanup_slave to cover all.
> >
> > Where did you search?
>
> ovirt-system-tests, lago, lago-ost-plugin.
> ovirt-system-tests has 83 occurrences. I realize almost all are in
> lago guests, but looking still takes time...
>
> In theory I can patch cleanup_slave.sh as you suggested, removing
> _everything_ there.
> Not sure this is safe.
>
> >
> >>
> >> Pushed this for now:
> >>
> >> https://gerrit.ovirt.org/89215
> >>
> >>>
> >>> --
> >>> Barak Korren
> >>> RHV DevOps team , RHCE, RHCi
> >>> Red Hat EMEA
> >>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
> >>
> >>
> >>
> >> --
> >> Didi
> >
> >
> >
> > --
> > Barak Korren
> > RHV DevOps team , RHCE, RHCi
> > Red Hat EMEA
> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>
>
>
> --
> Didi
> ___
> Infra mailing list
> Infra@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
>



-- 
*GAL bEN HAIM*
RHV DEVOPS
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-20 Thread Yedidyah Bar David
On Tue, Mar 20, 2018 at 10:57 AM, Barak Korren  wrote:
> On 20 March 2018 at 10:53, Yedidyah Bar David  wrote:
>> On Tue, Mar 20, 2018 at 10:11 AM, Barak Korren  wrote:
>>> On 20 March 2018 at 09:17, Yedidyah Bar David  wrote:
 On Mon, Mar 19, 2018 at 6:56 PM, Dominik Holler  wrote:
> Thanks Gal, I expect the problem is fixed until something eats
> all space in /dev/shm.
> But the usage of /dev/shm is logged in the output, so we would be able
> to detect the problem next time instantly.
>
> From my point of view it would be good to know why /dev/shm was full,
> to prevent this situation in future.

 Gal already wrote below - it was because some build failed to clean up
 after itself.

 I don't know about this specific case, but I was told that I am
 personally causing such issues by using the 'cancel' button, so I
 sadly stopped. Sadly, because our CI system is quite loaded and when I
 know that some build is useless, I wish to kill it and save some
 load...

 Back to your point, perhaps we should make jobs check /dev/shm when
 they _start_, and either alert/fail/whatever if it's not almost free,
 or, if we know what we are doing, just remove stuff there? That might
 be much easier than fixing things to clean up in end, and/or debugging
 why this cleaning failed.
>>>
>>> Sure thing, patches to:
>>>
>>> [jenkins repo]/jobs/confs/shell-scripts/cleanup_slave.sh
>>>
>>> Are welcome, we often find interesting stuff to add there...
>>>
>>> If constrained for time, please turn this comment into an orderly RFE in 
>>> Jira...
>>
>> Searched for '/dev/shm' and found way too many places to analyze them
>> all and add something to cleanup_slave to cover all.
>
> Where did you search?

ovirt-system-tests, lago, lago-ost-plugin.
ovirt-system-tests has 83 occurrences. I realize almost all are in
lago guests, but looking still takes time...

In theory I can patch cleanup_slave.sh as you suggested, removing
_everything_ there.
Not sure this is safe.

>
>>
>> Pushed this for now:
>>
>> https://gerrit.ovirt.org/89215
>>
>>>
>>> --
>>> Barak Korren
>>> RHV DevOps team , RHCE, RHCi
>>> Red Hat EMEA
>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>
>>
>>
>> --
>> Didi
>
>
>
> --
> Barak Korren
> RHV DevOps team , RHCE, RHCi
> Red Hat EMEA
> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted



-- 
Didi
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-20 Thread Barak Korren
On 20 March 2018 at 10:53, Yedidyah Bar David  wrote:
> On Tue, Mar 20, 2018 at 10:11 AM, Barak Korren  wrote:
>> On 20 March 2018 at 09:17, Yedidyah Bar David  wrote:
>>> On Mon, Mar 19, 2018 at 6:56 PM, Dominik Holler  wrote:
 Thanks Gal, I expect the problem is fixed until something eats
 all space in /dev/shm.
 But the usage of /dev/shm is logged in the output, so we would be able
 to detect the problem next time instantly.

 From my point of view it would be good to know why /dev/shm was full,
 to prevent this situation in future.
>>>
>>> Gal already wrote below - it was because some build failed to clean up
>>> after itself.
>>>
>>> I don't know about this specific case, but I was told that I am
>>> personally causing such issues by using the 'cancel' button, so I
>>> sadly stopped. Sadly, because our CI system is quite loaded and when I
>>> know that some build is useless, I wish to kill it and save some
>>> load...
>>>
>>> Back to your point, perhaps we should make jobs check /dev/shm when
>>> they _start_, and either alert/fail/whatever if it's not almost free,
>>> or, if we know what we are doing, just remove stuff there? That might
>>> be much easier than fixing things to clean up in end, and/or debugging
>>> why this cleaning failed.
>>
>> Sure thing, patches to:
>>
>> [jenkins repo]/jobs/confs/shell-scripts/cleanup_slave.sh
>>
>> Are welcome, we often find interesting stuff to add there...
>>
>> If constrained for time, please turn this comment into an orderly RFE in 
>> Jira...
>
> Searched for '/dev/shm' and found way too many places to analyze them
> all and add something to cleanup_slave to cover all.

Where did you search?

>
> Pushed this for now:
>
> https://gerrit.ovirt.org/89215
>
>>
>> --
>> Barak Korren
>> RHV DevOps team , RHCE, RHCi
>> Red Hat EMEA
>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>
>
>
> --
> Didi



-- 
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-20 Thread Yedidyah Bar David
On Tue, Mar 20, 2018 at 10:11 AM, Barak Korren  wrote:
> On 20 March 2018 at 09:17, Yedidyah Bar David  wrote:
>> On Mon, Mar 19, 2018 at 6:56 PM, Dominik Holler  wrote:
>>> Thanks Gal, I expect the problem is fixed until something eats
>>> all space in /dev/shm.
>>> But the usage of /dev/shm is logged in the output, so we would be able
>>> to detect the problem next time instantly.
>>>
>>> From my point of view it would be good to know why /dev/shm was full,
>>> to prevent this situation in future.
>>
>> Gal already wrote below - it was because some build failed to clean up
>> after itself.
>>
>> I don't know about this specific case, but I was told that I am
>> personally causing such issues by using the 'cancel' button, so I
>> sadly stopped. Sadly, because our CI system is quite loaded and when I
>> know that some build is useless, I wish to kill it and save some
>> load...
>>
>> Back to your point, perhaps we should make jobs check /dev/shm when
>> they _start_, and either alert/fail/whatever if it's not almost free,
>> or, if we know what we are doing, just remove stuff there? That might
>> be much easier than fixing things to clean up in end, and/or debugging
>> why this cleaning failed.
>
> Sure thing, patches to:
>
> [jenkins repo]/jobs/confs/shell-scripts/cleanup_slave.sh
>
> Are welcome, we often find interesting stuff to add there...
>
> If constrained for time, please turn this comment into an orderly RFE in 
> Jira...

Searched for '/dev/shm' and found way too many places to analyze them
all and add something to cleanup_slave to cover all.

Pushed this for now:

https://gerrit.ovirt.org/89215

>
> --
> Barak Korren
> RHV DevOps team , RHCE, RHCi
> Red Hat EMEA
> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted



-- 
Didi
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-20 Thread Barak Korren
On 20 March 2018 at 09:17, Yedidyah Bar David  wrote:
> On Mon, Mar 19, 2018 at 6:56 PM, Dominik Holler  wrote:
>> Thanks Gal, I expect the problem is fixed until something eats
>> all space in /dev/shm.
>> But the usage of /dev/shm is logged in the output, so we would be able
>> to detect the problem next time instantly.
>>
>> From my point of view it would be good to know why /dev/shm was full,
>> to prevent this situation in future.
>
> Gal already wrote below - it was because some build failed to clean up
> after itself.
>
> I don't know about this specific case, but I was told that I am
> personally causing such issues by using the 'cancel' button, so I
> sadly stopped. Sadly, because our CI system is quite loaded and when I
> know that some build is useless, I wish to kill it and save some
> load...
>
> Back to your point, perhaps we should make jobs check /dev/shm when
> they _start_, and either alert/fail/whatever if it's not almost free,
> or, if we know what we are doing, just remove stuff there? That might
> be much easier than fixing things to clean up in end, and/or debugging
> why this cleaning failed.

Sure thing, patches to:

[jenkins repo]/jobs/confs/shell-scripts/cleanup_slave.sh

Are welcome, we often find interesting stuff to add there...

If constrained for time, please turn this comment into an orderly RFE in Jira...

-- 
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-20 Thread Yedidyah Bar David
On Mon, Mar 19, 2018 at 6:56 PM, Dominik Holler  wrote:
> Thanks Gal, I expect the problem is fixed until something eats
> all space in /dev/shm.
> But the usage of /dev/shm is logged in the output, so we would be able
> to detect the problem next time instantly.
>
> From my point of view it would be good to know why /dev/shm was full,
> to prevent this situation in future.

Gal already wrote below - it was because some build failed to clean up
after itself.

I don't know about this specific case, but I was told that I am
personally causing such issues by using the 'cancel' button, so I
sadly stopped. Sadly, because our CI system is quite loaded and when I
know that some build is useless, I wish to kill it and save some
load...

Back to your point, perhaps we should make jobs check /dev/shm when
they _start_, and either alert/fail/whatever if it's not almost free,
or, if we know what we are doing, just remove stuff there? That might
be much easier than fixing things to clean up in end, and/or debugging
why this cleaning failed.

>
>
>  On Mon, 19 Mar 2018 18:44:54
> +0200 Gal Ben Haim  wrote:
>
>> I see that this failure happens a lot on "ovirt-srv19.phx.ovirt.org
>> ", and by
>> different projects that uses ansible.
>> Not sure it relates, but I've found (and removed) a stale lago
>> environment in "/dev/shm" that were created by
>> ovirt-system-tests_he-basic-iscsi-suite -master
>> 
>> .
>> The stale environment caused the suite to not run in "/dev/shm".
>> The maximum number of semaphore on both  ovirt-srv19.phx.ovirt.org
>>  and
>> ovirt-srv23.phx.ovirt.org
>>  (which
>> run the ansible suite with success) is 128.
>>
>> On Mon, Mar 19, 2018 at 3:37 PM, Yedidyah Bar David 
>> wrote:
>>
>> > Failed also here:
>> >
>> > http://jenkins.ovirt.org/job/ovirt-system-tests_master_
>> > check-patch-el7-x86_64/4540/
>> >
>> > The patch trigerring this affects many suites, and the job failed
>> > during ansible-suite-master .
>> >
>> > On Mon, Mar 19, 2018 at 3:10 PM, Eyal Edri  wrote:
>> >
>> >> Gal and Daniel are looking into it, strange its not affecting all
>> >> suites.
>> >>
>> >> On Mon, Mar 19, 2018 at 2:11 PM, Dominik Holler
>> >>  wrote:
>> >>
>> >>> Looks like /dev/shm is run out of space.
>> >>>
>> >>> On Mon, 19 Mar 2018 13:33:28 +0200
>> >>> Leon Goldberg  wrote:
>> >>>
>> >>> > Hey, any updates?
>> >>> >
>> >>> > On Sun, Mar 18, 2018 at 10:44 AM, Edward Haas 
>> >>> > wrote:
>> >>> >
>> >>> > > We are doing nothing special there, just executing ansible
>> >>> > > through their API.
>> >>> > >
>> >>> > > On Sun, Mar 18, 2018 at 10:42 AM, Daniel Belenky
>> >>> > >  wrote:
>> >>> > >
>> >>> > >> It's not a space issue. Other suites ran on that slave after
>> >>> > >> your suite successfully.
>> >>> > >> I think that the problem is the setting for max semaphores,
>> >>> > >> though I don't know what you're doing to reach that limit.
>> >>> > >>
>> >>> > >> [dbelenky@ovirt-srv18 ~]$ ipcs -ls
>> >>> > >>
>> >>> > >> -- Semaphore Limits 
>> >>> > >> max number of arrays = 128
>> >>> > >> max semaphores per array = 250
>> >>> > >> max semaphores system wide = 32000
>> >>> > >> max ops per semop call = 32
>> >>> > >> semaphore max value = 32767
>> >>> > >>
>> >>> > >>
>> >>> > >> On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas
>> >>> > >>  wrote:
>> >>> > >>> http://jenkins.ovirt.org/job/ovirt-system-tests_network-suit
>> >>> e-master/
>> >>> > >>>
>> >>> > >>> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky
>> >>> > >>>  wrote:
>> >>> > >>>
>> >>> >  Hi Edi,
>> >>> > 
>> >>> >  Are there any logs? where you're running the suite? may I
>> >>> >  have a link?
>> >>> > 
>> >>> >  On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas
>> >>> >   wrote:
>> >>> > > Good morning,
>> >>> > >
>> >>> > > We are running in the OST network suite a test module with
>> >>> > > Ansible and it started failing during the weekend on
>> >>> > > "OSError: [Errno 28] No space left on device" when
>> >>> > > attempting to take a lock in the mutiprocessing python
>> >>> > > module.
>> >>> > >
>> >>> > > It smells like a slave resource problem, could someone
>> >>> > > help investigate this?
>> >>> > >
>> >>> > > Thanks,
>> >>> > > Edy.
>> >>> > >
>> >>> > > === FAILURES
>> >>> > > === __
>> >>> > > test_ovn_provider_create_scenario ___
>> 

Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-19 Thread Dominik Holler
Thanks Gal, I expect the problem is fixed until something eats
all space in /dev/shm.
But the usage of /dev/shm is logged in the output, so we would be able
to detect the problem next time instantly.

>From my point of view it would be good to know why /dev/shm was full,
to prevent this situation in future.


 On Mon, 19 Mar 2018 18:44:54
+0200 Gal Ben Haim  wrote:

> I see that this failure happens a lot on "ovirt-srv19.phx.ovirt.org
> ", and by
> different projects that uses ansible.
> Not sure it relates, but I've found (and removed) a stale lago
> environment in "/dev/shm" that were created by
> ovirt-system-tests_he-basic-iscsi-suite -master
> 
> .
> The stale environment caused the suite to not run in "/dev/shm".
> The maximum number of semaphore on both  ovirt-srv19.phx.ovirt.org
>  and
> ovirt-srv23.phx.ovirt.org
>  (which
> run the ansible suite with success) is 128.
> 
> On Mon, Mar 19, 2018 at 3:37 PM, Yedidyah Bar David 
> wrote:
> 
> > Failed also here:
> >
> > http://jenkins.ovirt.org/job/ovirt-system-tests_master_
> > check-patch-el7-x86_64/4540/
> >
> > The patch trigerring this affects many suites, and the job failed
> > during ansible-suite-master .
> >
> > On Mon, Mar 19, 2018 at 3:10 PM, Eyal Edri  wrote:
> >  
> >> Gal and Daniel are looking into it, strange its not affecting all
> >> suites.
> >>
> >> On Mon, Mar 19, 2018 at 2:11 PM, Dominik Holler
> >>  wrote:
> >>  
> >>> Looks like /dev/shm is run out of space.
> >>>
> >>> On Mon, 19 Mar 2018 13:33:28 +0200
> >>> Leon Goldberg  wrote:
> >>>  
> >>> > Hey, any updates?
> >>> >
> >>> > On Sun, Mar 18, 2018 at 10:44 AM, Edward Haas 
> >>> > wrote:
> >>> >  
> >>> > > We are doing nothing special there, just executing ansible
> >>> > > through their API.
> >>> > >
> >>> > > On Sun, Mar 18, 2018 at 10:42 AM, Daniel Belenky
> >>> > >  wrote:
> >>> > >  
> >>> > >> It's not a space issue. Other suites ran on that slave after
> >>> > >> your suite successfully.
> >>> > >> I think that the problem is the setting for max semaphores,
> >>> > >> though I don't know what you're doing to reach that limit.
> >>> > >>
> >>> > >> [dbelenky@ovirt-srv18 ~]$ ipcs -ls
> >>> > >>
> >>> > >> -- Semaphore Limits 
> >>> > >> max number of arrays = 128
> >>> > >> max semaphores per array = 250
> >>> > >> max semaphores system wide = 32000
> >>> > >> max ops per semop call = 32
> >>> > >> semaphore max value = 32767
> >>> > >>
> >>> > >>
> >>> > >> On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas
> >>> > >>  wrote:  
> >>> > >>> http://jenkins.ovirt.org/job/ovirt-system-tests_network-suit  
> >>> e-master/  
> >>> > >>>
> >>> > >>> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky
> >>> > >>>  wrote:
> >>> > >>>  
> >>> >  Hi Edi,
> >>> > 
> >>> >  Are there any logs? where you're running the suite? may I
> >>> >  have a link?
> >>> > 
> >>> >  On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas
> >>> >   wrote:  
> >>> > > Good morning,
> >>> > >
> >>> > > We are running in the OST network suite a test module with
> >>> > > Ansible and it started failing during the weekend on
> >>> > > "OSError: [Errno 28] No space left on device" when
> >>> > > attempting to take a lock in the mutiprocessing python
> >>> > > module.
> >>> > >
> >>> > > It smells like a slave resource problem, could someone
> >>> > > help investigate this?
> >>> > >
> >>> > > Thanks,
> >>> > > Edy.
> >>> > >
> >>> > > === FAILURES
> >>> > > === __
> >>> > > test_ovn_provider_create_scenario ___
> >>> > >
> >>> > > os_client_config = None
> >>> > >
> >>> > > def
> >>> > > test_ovn_provider_create_scenario(os_client_config):  
> >>> > > >   _test_ovn_provider('create_scenario.yml')  
> >>> > >
> >>> > > network-suite-master/tests/test_ovn_provider.py:68:
> >>> > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> >>> > > _ _ _ _ _ _ _ _ _ _ _
> >>> > > network-suite-master/tests/test_ovn_provider.py:78: in
> >>> > > _test_ovn_provider playbook.run()
> >>> > > network-suite-master/lib/ansiblelib.py:127: in run
> >>> > > self._run_playbook_executor()
> >>> > > network-suite-master/lib/ansiblelib.py:138: in
> >>> > > _run_playbook_executor pbex =
> >>> > > PlaybookExecutor(**self._pbex_args)  
> >>> 

Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-19 Thread Gal Ben Haim
I see that this failure happens a lot on "ovirt-srv19.phx.ovirt.org
", and by
different projects that uses ansible.
Not sure it relates, but I've found (and removed) a stale lago environment
in "/dev/shm" that were created by ovirt-system-tests_he-basic-iscsi-suite
-master

.
The stale environment caused the suite to not run in "/dev/shm".
The maximum number of semaphore on both  ovirt-srv19.phx.ovirt.org
 and
ovirt-srv23.phx.ovirt.org
 (which run
the ansible suite with success) is 128.

On Mon, Mar 19, 2018 at 3:37 PM, Yedidyah Bar David  wrote:

> Failed also here:
>
> http://jenkins.ovirt.org/job/ovirt-system-tests_master_
> check-patch-el7-x86_64/4540/
>
> The patch trigerring this affects many suites, and the job failed during
> ansible-suite-master .
>
> On Mon, Mar 19, 2018 at 3:10 PM, Eyal Edri  wrote:
>
>> Gal and Daniel are looking into it, strange its not affecting all suites.
>>
>> On Mon, Mar 19, 2018 at 2:11 PM, Dominik Holler 
>> wrote:
>>
>>> Looks like /dev/shm is run out of space.
>>>
>>> On Mon, 19 Mar 2018 13:33:28 +0200
>>> Leon Goldberg  wrote:
>>>
>>> > Hey, any updates?
>>> >
>>> > On Sun, Mar 18, 2018 at 10:44 AM, Edward Haas 
>>> > wrote:
>>> >
>>> > > We are doing nothing special there, just executing ansible through
>>> > > their API.
>>> > >
>>> > > On Sun, Mar 18, 2018 at 10:42 AM, Daniel Belenky
>>> > >  wrote:
>>> > >
>>> > >> It's not a space issue. Other suites ran on that slave after your
>>> > >> suite successfully.
>>> > >> I think that the problem is the setting for max semaphores, though
>>> > >> I don't know what you're doing to reach that limit.
>>> > >>
>>> > >> [dbelenky@ovirt-srv18 ~]$ ipcs -ls
>>> > >>
>>> > >> -- Semaphore Limits 
>>> > >> max number of arrays = 128
>>> > >> max semaphores per array = 250
>>> > >> max semaphores system wide = 32000
>>> > >> max ops per semop call = 32
>>> > >> semaphore max value = 32767
>>> > >>
>>> > >>
>>> > >> On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas 
>>> > >> wrote:
>>> > >>> http://jenkins.ovirt.org/job/ovirt-system-tests_network-suit
>>> e-master/
>>> > >>>
>>> > >>> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky
>>> > >>>  wrote:
>>> > >>>
>>> >  Hi Edi,
>>> > 
>>> >  Are there any logs? where you're running the suite? may I have a
>>> >  link?
>>> > 
>>> >  On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas 
>>> >  wrote:
>>> > > Good morning,
>>> > >
>>> > > We are running in the OST network suite a test module with
>>> > > Ansible and it started failing during the weekend on "OSError:
>>> > > [Errno 28] No space left on device" when attempting to take a
>>> > > lock in the mutiprocessing python module.
>>> > >
>>> > > It smells like a slave resource problem, could someone help
>>> > > investigate this?
>>> > >
>>> > > Thanks,
>>> > > Edy.
>>> > >
>>> > > === FAILURES
>>> > > === __
>>> > > test_ovn_provider_create_scenario ___
>>> > >
>>> > > os_client_config = None
>>> > >
>>> > > def test_ovn_provider_create_scenario(os_client_config):
>>> > > >   _test_ovn_provider('create_scenario.yml')
>>> > >
>>> > > network-suite-master/tests/test_ovn_provider.py:68:
>>> > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>> > > _ _ _ _ _ _ _ _
>>> > > network-suite-master/tests/test_ovn_provider.py:78: in
>>> > > _test_ovn_provider playbook.run()
>>> > > network-suite-master/lib/ansiblelib.py:127: in run
>>> > > self._run_playbook_executor()
>>> > > network-suite-master/lib/ansiblelib.py:138: in
>>> > > _run_playbook_executor pbex =
>>> > > PlaybookExecutor(**self._pbex_args)
>>> /usr/lib/python2.7/site-packages/ansible/executor/playbook_e
>>> xecutor.py:60:
>>> > > in __init__ self._tqm = TaskQueueManager(inventory=inventory,
>>> > > variable_manager=variable_manager, loader=loader,
>>> > > options=options,
>>> > > passwords=self.passwords) /usr/lib/python2.7/site-packag
>>> es/ansible/executor/task_queue_manager.py:104:
>>> > > in __init__ self._final_q =
>>> > > multiprocessing.Queue() /usr/lib64/python2.7/multiproc
>>> essing/__init__.py:218:
>>> > > in Queue return
>>> > > Queue(maxsize) /usr/lib64/python2.7/multiproc
>>> essing/queues.py:63:
>>> > > in __init__ self._rlock =
>>> > > Lock() 

Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-19 Thread Yedidyah Bar David
Failed also here:

http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4540/

The patch trigerring this affects many suites, and the job failed during
ansible-suite-master .

On Mon, Mar 19, 2018 at 3:10 PM, Eyal Edri  wrote:

> Gal and Daniel are looking into it, strange its not affecting all suites.
>
> On Mon, Mar 19, 2018 at 2:11 PM, Dominik Holler 
> wrote:
>
>> Looks like /dev/shm is run out of space.
>>
>> On Mon, 19 Mar 2018 13:33:28 +0200
>> Leon Goldberg  wrote:
>>
>> > Hey, any updates?
>> >
>> > On Sun, Mar 18, 2018 at 10:44 AM, Edward Haas 
>> > wrote:
>> >
>> > > We are doing nothing special there, just executing ansible through
>> > > their API.
>> > >
>> > > On Sun, Mar 18, 2018 at 10:42 AM, Daniel Belenky
>> > >  wrote:
>> > >
>> > >> It's not a space issue. Other suites ran on that slave after your
>> > >> suite successfully.
>> > >> I think that the problem is the setting for max semaphores, though
>> > >> I don't know what you're doing to reach that limit.
>> > >>
>> > >> [dbelenky@ovirt-srv18 ~]$ ipcs -ls
>> > >>
>> > >> -- Semaphore Limits 
>> > >> max number of arrays = 128
>> > >> max semaphores per array = 250
>> > >> max semaphores system wide = 32000
>> > >> max ops per semop call = 32
>> > >> semaphore max value = 32767
>> > >>
>> > >>
>> > >> On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas 
>> > >> wrote:
>> > >>> http://jenkins.ovirt.org/job/ovirt-system-tests_network-suit
>> e-master/
>> > >>>
>> > >>> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky
>> > >>>  wrote:
>> > >>>
>> >  Hi Edi,
>> > 
>> >  Are there any logs? where you're running the suite? may I have a
>> >  link?
>> > 
>> >  On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas 
>> >  wrote:
>> > > Good morning,
>> > >
>> > > We are running in the OST network suite a test module with
>> > > Ansible and it started failing during the weekend on "OSError:
>> > > [Errno 28] No space left on device" when attempting to take a
>> > > lock in the mutiprocessing python module.
>> > >
>> > > It smells like a slave resource problem, could someone help
>> > > investigate this?
>> > >
>> > > Thanks,
>> > > Edy.
>> > >
>> > > === FAILURES
>> > > === __
>> > > test_ovn_provider_create_scenario ___
>> > >
>> > > os_client_config = None
>> > >
>> > > def test_ovn_provider_create_scenario(os_client_config):
>> > > >   _test_ovn_provider('create_scenario.yml')
>> > >
>> > > network-suite-master/tests/test_ovn_provider.py:68:
>> > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>> > > _ _ _ _ _ _ _ _
>> > > network-suite-master/tests/test_ovn_provider.py:78: in
>> > > _test_ovn_provider playbook.run()
>> > > network-suite-master/lib/ansiblelib.py:127: in run
>> > > self._run_playbook_executor()
>> > > network-suite-master/lib/ansiblelib.py:138: in
>> > > _run_playbook_executor pbex =
>> > > PlaybookExecutor(**self._pbex_args)
>> /usr/lib/python2.7/site-packages/ansible/executor/playbook_
>> executor.py:60:
>> > > in __init__ self._tqm = TaskQueueManager(inventory=inventory,
>> > > variable_manager=variable_manager, loader=loader,
>> > > options=options,
>> > > passwords=self.passwords) /usr/lib/python2.7/site-packag
>> es/ansible/executor/task_queue_manager.py:104:
>> > > in __init__ self._final_q =
>> > > multiprocessing.Queue() /usr/lib64/python2.7/multiproc
>> essing/__init__.py:218:
>> > > in Queue return
>> > > Queue(maxsize) /usr/lib64/python2.7/multiprocessing/queues.py:63:
>> > > in __init__ self._rlock =
>> > > Lock() /usr/lib64/python2.7/multiprocessing/synchronize.py:147:
>> > > in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1) _ _ _ _ _ _
>> > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>> > > _ _
>> > >
>> > > self = , kind = 1, value = 1, maxvalue = 1
>> > >
>> > > def __init__(self, kind, value, maxvalue):
>> > > >   sl = self._semlock = _multiprocessing.SemLock(kind,
>> > > > value, maxvalue)
>> > > E   OSError: [Errno 28] No space left on device
>> > >
>> > > /usr/lib64/python2.7/multiprocessing/synchronize.py:75: OSError
>> > >
>> > >
>> > 
>> > 
>> >  --
>> > 
>> >  DANIEL BELENKY
>> > 
>> >  RHV DEVOPS
>> > 
>> > >>>
>> > >>>
>> > >>
>> > >>
>> > >> --
>> > >>
>> > >> DANIEL BELENKY
>> > >>
>> > >> RHV DEVOPS
>> > >>
>> > >
>> > >
>>
>> ___
>> Infra mailing list
>> Infra@ovirt.org
>> 

Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-19 Thread Eyal Edri
Gal and Daniel are looking into it, strange its not affecting all suites.

On Mon, Mar 19, 2018 at 2:11 PM, Dominik Holler  wrote:

> Looks like /dev/shm is run out of space.
>
> On Mon, 19 Mar 2018 13:33:28 +0200
> Leon Goldberg  wrote:
>
> > Hey, any updates?
> >
> > On Sun, Mar 18, 2018 at 10:44 AM, Edward Haas 
> > wrote:
> >
> > > We are doing nothing special there, just executing ansible through
> > > their API.
> > >
> > > On Sun, Mar 18, 2018 at 10:42 AM, Daniel Belenky
> > >  wrote:
> > >
> > >> It's not a space issue. Other suites ran on that slave after your
> > >> suite successfully.
> > >> I think that the problem is the setting for max semaphores, though
> > >> I don't know what you're doing to reach that limit.
> > >>
> > >> [dbelenky@ovirt-srv18 ~]$ ipcs -ls
> > >>
> > >> -- Semaphore Limits 
> > >> max number of arrays = 128
> > >> max semaphores per array = 250
> > >> max semaphores system wide = 32000
> > >> max ops per semop call = 32
> > >> semaphore max value = 32767
> > >>
> > >>
> > >> On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas 
> > >> wrote:
> > >>> http://jenkins.ovirt.org/job/ovirt-system-tests_network-
> suite-master/
> > >>>
> > >>> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky
> > >>>  wrote:
> > >>>
> >  Hi Edi,
> > 
> >  Are there any logs? where you're running the suite? may I have a
> >  link?
> > 
> >  On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas 
> >  wrote:
> > > Good morning,
> > >
> > > We are running in the OST network suite a test module with
> > > Ansible and it started failing during the weekend on "OSError:
> > > [Errno 28] No space left on device" when attempting to take a
> > > lock in the mutiprocessing python module.
> > >
> > > It smells like a slave resource problem, could someone help
> > > investigate this?
> > >
> > > Thanks,
> > > Edy.
> > >
> > > === FAILURES
> > > === __
> > > test_ovn_provider_create_scenario ___
> > >
> > > os_client_config = None
> > >
> > > def test_ovn_provider_create_scenario(os_client_config):
> > > >   _test_ovn_provider('create_scenario.yml')
> > >
> > > network-suite-master/tests/test_ovn_provider.py:68:
> > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> > > _ _ _ _ _ _ _ _
> > > network-suite-master/tests/test_ovn_provider.py:78: in
> > > _test_ovn_provider playbook.run()
> > > network-suite-master/lib/ansiblelib.py:127: in run
> > > self._run_playbook_executor()
> > > network-suite-master/lib/ansiblelib.py:138: in
> > > _run_playbook_executor pbex =
> > > PlaybookExecutor(**self._pbex_args) /usr/lib/python2.7/site-
> packages/ansible/executor/playbook_executor.py:60:
> > > in __init__ self._tqm = TaskQueueManager(inventory=inventory,
> > > variable_manager=variable_manager, loader=loader,
> > > options=options,
> > > passwords=self.passwords) /usr/lib/python2.7/site-
> packages/ansible/executor/task_queue_manager.py:104:
> > > in __init__ self._final_q =
> > > multiprocessing.Queue() /usr/lib64/python2.7/
> multiprocessing/__init__.py:218:
> > > in Queue return
> > > Queue(maxsize) /usr/lib64/python2.7/multiprocessing/queues.py:63:
> > > in __init__ self._rlock =
> > > Lock() /usr/lib64/python2.7/multiprocessing/synchronize.py:147:
> > > in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1) _ _ _ _ _ _
> > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> > > _ _
> > >
> > > self = , kind = 1, value = 1, maxvalue = 1
> > >
> > > def __init__(self, kind, value, maxvalue):
> > > >   sl = self._semlock = _multiprocessing.SemLock(kind,
> > > > value, maxvalue)
> > > E   OSError: [Errno 28] No space left on device
> > >
> > > /usr/lib64/python2.7/multiprocessing/synchronize.py:75: OSError
> > >
> > >
> > 
> > 
> >  --
> > 
> >  DANIEL BELENKY
> > 
> >  RHV DEVOPS
> > 
> > >>>
> > >>>
> > >>
> > >>
> > >> --
> > >>
> > >> DANIEL BELENKY
> > >>
> > >> RHV DEVOPS
> > >>
> > >
> > >
>
> ___
> Infra mailing list
> Infra@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
>



-- 

Eyal edri


MANAGER

RHV DevOps

EMEA VIRTUALIZATION R


Red Hat EMEA 
 TRIED. TESTED. TRUSTED. 
phone: +972-9-7692018
irc: eedri (on #tlv #rhev-dev #rhev-integ)
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-19 Thread Dominik Holler
Looks like /dev/shm is run out of space.

On Mon, 19 Mar 2018 13:33:28 +0200
Leon Goldberg  wrote:

> Hey, any updates?
> 
> On Sun, Mar 18, 2018 at 10:44 AM, Edward Haas 
> wrote:
> 
> > We are doing nothing special there, just executing ansible through
> > their API.
> >
> > On Sun, Mar 18, 2018 at 10:42 AM, Daniel Belenky
> >  wrote:
> >  
> >> It's not a space issue. Other suites ran on that slave after your
> >> suite successfully.
> >> I think that the problem is the setting for max semaphores, though
> >> I don't know what you're doing to reach that limit.
> >>
> >> [dbelenky@ovirt-srv18 ~]$ ipcs -ls
> >>
> >> -- Semaphore Limits 
> >> max number of arrays = 128
> >> max semaphores per array = 250
> >> max semaphores system wide = 32000
> >> max ops per semop call = 32
> >> semaphore max value = 32767
> >>
> >>
> >> On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas 
> >> wrote: 
> >>> http://jenkins.ovirt.org/job/ovirt-system-tests_network-suite-master/
> >>>
> >>> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky
> >>>  wrote:
> >>>  
>  Hi Edi,
> 
>  Are there any logs? where you're running the suite? may I have a
>  link?
> 
>  On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas 
>  wrote: 
> > Good morning,
> >
> > We are running in the OST network suite a test module with
> > Ansible and it started failing during the weekend on "OSError:
> > [Errno 28] No space left on device" when attempting to take a
> > lock in the mutiprocessing python module.
> >
> > It smells like a slave resource problem, could someone help
> > investigate this?
> >
> > Thanks,
> > Edy.
> >
> > === FAILURES
> > === __
> > test_ovn_provider_create_scenario ___
> >
> > os_client_config = None
> >
> > def test_ovn_provider_create_scenario(os_client_config):  
> > >   _test_ovn_provider('create_scenario.yml')  
> >
> > network-suite-master/tests/test_ovn_provider.py:68:
> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> > _ _ _ _ _ _ _ _
> > network-suite-master/tests/test_ovn_provider.py:78: in
> > _test_ovn_provider playbook.run()
> > network-suite-master/lib/ansiblelib.py:127: in run
> > self._run_playbook_executor()
> > network-suite-master/lib/ansiblelib.py:138: in
> > _run_playbook_executor pbex =
> > PlaybookExecutor(**self._pbex_args) 
> > /usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py:60:
> > in __init__ self._tqm = TaskQueueManager(inventory=inventory,
> > variable_manager=variable_manager, loader=loader,
> > options=options,
> > passwords=self.passwords) 
> > /usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py:104:
> > in __init__ self._final_q =
> > multiprocessing.Queue() 
> > /usr/lib64/python2.7/multiprocessing/__init__.py:218:
> > in Queue return
> > Queue(maxsize) /usr/lib64/python2.7/multiprocessing/queues.py:63:
> > in __init__ self._rlock =
> > Lock() /usr/lib64/python2.7/multiprocessing/synchronize.py:147:
> > in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1) _ _ _ _ _ _
> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> > _ _
> >
> > self = , kind = 1, value = 1, maxvalue = 1
> >
> > def __init__(self, kind, value, maxvalue):  
> > >   sl = self._semlock = _multiprocessing.SemLock(kind,
> > > value, maxvalue)  
> > E   OSError: [Errno 28] No space left on device
> >
> > /usr/lib64/python2.7/multiprocessing/synchronize.py:75: OSError
> >
> >  
> 
> 
>  --
> 
>  DANIEL BELENKY
> 
>  RHV DEVOPS
>   
> >>>
> >>>  
> >>
> >>
> >> --
> >>
> >> DANIEL BELENKY
> >>
> >> RHV DEVOPS
> >>  
> >
> >  

___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-19 Thread Leon Goldberg
Hey, any updates?

On Sun, Mar 18, 2018 at 10:44 AM, Edward Haas  wrote:

> We are doing nothing special there, just executing ansible through their
> API.
>
> On Sun, Mar 18, 2018 at 10:42 AM, Daniel Belenky 
> wrote:
>
>> It's not a space issue. Other suites ran on that slave after your suite
>> successfully.
>> I think that the problem is the setting for max semaphores, though I
>> don't know what you're doing to reach that limit.
>>
>> [dbelenky@ovirt-srv18 ~]$ ipcs -ls
>>
>> -- Semaphore Limits 
>> max number of arrays = 128
>> max semaphores per array = 250
>> max semaphores system wide = 32000
>> max ops per semop call = 32
>> semaphore max value = 32767
>>
>>
>> On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas  wrote:
>>
>>> http://jenkins.ovirt.org/job/ovirt-system-tests_network-suite-master/
>>>
>>> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky 
>>> wrote:
>>>
 Hi Edi,

 Are there any logs? where you're running the suite? may I have a link?

 On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas  wrote:

> Good morning,
>
> We are running in the OST network suite a test module with Ansible and
> it started failing during the weekend on "OSError: [Errno 28] No space 
> left
> on device" when attempting to take a lock in the mutiprocessing python
> module.
>
> It smells like a slave resource problem, could someone help
> investigate this?
>
> Thanks,
> Edy.
>
> === FAILURES 
> ===
> __ test_ovn_provider_create_scenario 
> ___
>
> os_client_config = None
>
> def test_ovn_provider_create_scenario(os_client_config):
> >   _test_ovn_provider('create_scenario.yml')
>
> network-suite-master/tests/test_ovn_provider.py:68:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _
> network-suite-master/tests/test_ovn_provider.py:78: in _test_ovn_provider
> playbook.run()
> network-suite-master/lib/ansiblelib.py:127: in run
> self._run_playbook_executor()
> network-suite-master/lib/ansiblelib.py:138: in _run_playbook_executor
> pbex = PlaybookExecutor(**self._pbex_args)
> /usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py:60:
>  in __init__
> self._tqm = TaskQueueManager(inventory=inventory, 
> variable_manager=variable_manager, loader=loader, options=options, 
> passwords=self.passwords)
> /usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py:104:
>  in __init__
> self._final_q = multiprocessing.Queue()
> /usr/lib64/python2.7/multiprocessing/__init__.py:218: in Queue
> return Queue(maxsize)
> /usr/lib64/python2.7/multiprocessing/queues.py:63: in __init__
> self._rlock = Lock()
> /usr/lib64/python2.7/multiprocessing/synchronize.py:147: in __init__
> SemLock.__init__(self, SEMAPHORE, 1, 1)
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _
>
> self = , kind = 1, value = 1, maxvalue = 1
>
> def __init__(self, kind, value, maxvalue):
> >   sl = self._semlock = _multiprocessing.SemLock(kind, value, 
> > maxvalue)
> E   OSError: [Errno 28] No space left on device
>
> /usr/lib64/python2.7/multiprocessing/synchronize.py:75: OSError
>
>


 --

 DANIEL BELENKY

 RHV DEVOPS

>>>
>>>
>>
>>
>> --
>>
>> DANIEL BELENKY
>>
>> RHV DEVOPS
>>
>
>
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-18 Thread Edward Haas
We are doing nothing special there, just executing ansible through their
API.

On Sun, Mar 18, 2018 at 10:42 AM, Daniel Belenky 
wrote:

> It's not a space issue. Other suites ran on that slave after your suite
> successfully.
> I think that the problem is the setting for max semaphores, though I don't
> know what you're doing to reach that limit.
>
> [dbelenky@ovirt-srv18 ~]$ ipcs -ls
>
> -- Semaphore Limits 
> max number of arrays = 128
> max semaphores per array = 250
> max semaphores system wide = 32000
> max ops per semop call = 32
> semaphore max value = 32767
>
>
> On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas  wrote:
>
>> http://jenkins.ovirt.org/job/ovirt-system-tests_network-suite-master/
>>
>> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky 
>> wrote:
>>
>>> Hi Edi,
>>>
>>> Are there any logs? where you're running the suite? may I have a link?
>>>
>>> On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas  wrote:
>>>
 Good morning,

 We are running in the OST network suite a test module with Ansible and
 it started failing during the weekend on "OSError: [Errno 28] No space left
 on device" when attempting to take a lock in the mutiprocessing python
 module.

 It smells like a slave resource problem, could someone help investigate
 this?

 Thanks,
 Edy.

 === FAILURES 
 ===
 __ test_ovn_provider_create_scenario 
 ___

 os_client_config = None

 def test_ovn_provider_create_scenario(os_client_config):
 >   _test_ovn_provider('create_scenario.yml')

 network-suite-master/tests/test_ovn_provider.py:68:
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 _ _ _
 network-suite-master/tests/test_ovn_provider.py:78: in _test_ovn_provider
 playbook.run()
 network-suite-master/lib/ansiblelib.py:127: in run
 self._run_playbook_executor()
 network-suite-master/lib/ansiblelib.py:138: in _run_playbook_executor
 pbex = PlaybookExecutor(**self._pbex_args)
 /usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py:60: 
 in __init__
 self._tqm = TaskQueueManager(inventory=inventory, 
 variable_manager=variable_manager, loader=loader, options=options, 
 passwords=self.passwords)
 /usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py:104:
  in __init__
 self._final_q = multiprocessing.Queue()
 /usr/lib64/python2.7/multiprocessing/__init__.py:218: in Queue
 return Queue(maxsize)
 /usr/lib64/python2.7/multiprocessing/queues.py:63: in __init__
 self._rlock = Lock()
 /usr/lib64/python2.7/multiprocessing/synchronize.py:147: in __init__
 SemLock.__init__(self, SEMAPHORE, 1, 1)
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 _ _ _

 self = , kind = 1, value = 1, maxvalue = 1

 def __init__(self, kind, value, maxvalue):
 >   sl = self._semlock = _multiprocessing.SemLock(kind, value, 
 > maxvalue)
 E   OSError: [Errno 28] No space left on device

 /usr/lib64/python2.7/multiprocessing/synchronize.py:75: OSError


>>>
>>>
>>> --
>>>
>>> DANIEL BELENKY
>>>
>>> RHV DEVOPS
>>>
>>
>>
>
>
> --
>
> DANIEL BELENKY
>
> RHV DEVOPS
>
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-18 Thread Daniel Belenky
It's not a space issue. Other suites ran on that slave after your suite
successfully.
I think that the problem is the setting for max semaphores, though I don't
know what you're doing to reach that limit.

[dbelenky@ovirt-srv18 ~]$ ipcs -ls

-- Semaphore Limits 
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767


On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas  wrote:

> http://jenkins.ovirt.org/job/ovirt-system-tests_network-suite-master/
>
> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky 
> wrote:
>
>> Hi Edi,
>>
>> Are there any logs? where you're running the suite? may I have a link?
>>
>> On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas  wrote:
>>
>>> Good morning,
>>>
>>> We are running in the OST network suite a test module with Ansible and
>>> it started failing during the weekend on "OSError: [Errno 28] No space left
>>> on device" when attempting to take a lock in the mutiprocessing python
>>> module.
>>>
>>> It smells like a slave resource problem, could someone help investigate
>>> this?
>>>
>>> Thanks,
>>> Edy.
>>>
>>> === FAILURES 
>>> ===
>>> __ test_ovn_provider_create_scenario 
>>> ___
>>>
>>> os_client_config = None
>>>
>>> def test_ovn_provider_create_scenario(os_client_config):
>>> >   _test_ovn_provider('create_scenario.yml')
>>>
>>> network-suite-master/tests/test_ovn_provider.py:68:
>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
>>> _ _
>>> network-suite-master/tests/test_ovn_provider.py:78: in _test_ovn_provider
>>> playbook.run()
>>> network-suite-master/lib/ansiblelib.py:127: in run
>>> self._run_playbook_executor()
>>> network-suite-master/lib/ansiblelib.py:138: in _run_playbook_executor
>>> pbex = PlaybookExecutor(**self._pbex_args)
>>> /usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py:60: 
>>> in __init__
>>> self._tqm = TaskQueueManager(inventory=inventory, 
>>> variable_manager=variable_manager, loader=loader, options=options, 
>>> passwords=self.passwords)
>>> /usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py:104:
>>>  in __init__
>>> self._final_q = multiprocessing.Queue()
>>> /usr/lib64/python2.7/multiprocessing/__init__.py:218: in Queue
>>> return Queue(maxsize)
>>> /usr/lib64/python2.7/multiprocessing/queues.py:63: in __init__
>>> self._rlock = Lock()
>>> /usr/lib64/python2.7/multiprocessing/synchronize.py:147: in __init__
>>> SemLock.__init__(self, SEMAPHORE, 1, 1)
>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
>>> _ _
>>>
>>> self = , kind = 1, value = 1, maxvalue = 1
>>>
>>> def __init__(self, kind, value, maxvalue):
>>> >   sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
>>> E   OSError: [Errno 28] No space left on device
>>>
>>> /usr/lib64/python2.7/multiprocessing/synchronize.py:75: OSError
>>>
>>>
>>
>>
>> --
>>
>> DANIEL BELENKY
>>
>> RHV DEVOPS
>>
>
>


-- 

DANIEL BELENKY

RHV DEVOPS
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-18 Thread Edward Haas
http://jenkins.ovirt.org/job/ovirt-system-tests_network-suite-master/

On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky 
wrote:

> Hi Edi,
>
> Are there any logs? where you're running the suite? may I have a link?
>
> On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas  wrote:
>
>> Good morning,
>>
>> We are running in the OST network suite a test module with Ansible and it
>> started failing during the weekend on "OSError: [Errno 28] No space left on
>> device" when attempting to take a lock in the mutiprocessing python module.
>>
>> It smells like a slave resource problem, could someone help investigate
>> this?
>>
>> Thanks,
>> Edy.
>>
>> === FAILURES 
>> ===
>> __ test_ovn_provider_create_scenario 
>> ___
>>
>> os_client_config = None
>>
>> def test_ovn_provider_create_scenario(os_client_config):
>> >   _test_ovn_provider('create_scenario.yml')
>>
>> network-suite-master/tests/test_ovn_provider.py:68:
>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
>> _ _
>> network-suite-master/tests/test_ovn_provider.py:78: in _test_ovn_provider
>> playbook.run()
>> network-suite-master/lib/ansiblelib.py:127: in run
>> self._run_playbook_executor()
>> network-suite-master/lib/ansiblelib.py:138: in _run_playbook_executor
>> pbex = PlaybookExecutor(**self._pbex_args)
>> /usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py:60: 
>> in __init__
>> self._tqm = TaskQueueManager(inventory=inventory, 
>> variable_manager=variable_manager, loader=loader, options=options, 
>> passwords=self.passwords)
>> /usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py:104: 
>> in __init__
>> self._final_q = multiprocessing.Queue()
>> /usr/lib64/python2.7/multiprocessing/__init__.py:218: in Queue
>> return Queue(maxsize)
>> /usr/lib64/python2.7/multiprocessing/queues.py:63: in __init__
>> self._rlock = Lock()
>> /usr/lib64/python2.7/multiprocessing/synchronize.py:147: in __init__
>> SemLock.__init__(self, SEMAPHORE, 1, 1)
>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
>> _ _
>>
>> self = , kind = 1, value = 1, maxvalue = 1
>>
>> def __init__(self, kind, value, maxvalue):
>> >   sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
>> E   OSError: [Errno 28] No space left on device
>>
>> /usr/lib64/python2.7/multiprocessing/synchronize.py:75: OSError
>>
>>
>
>
> --
>
> DANIEL BELENKY
>
> RHV DEVOPS
>
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-18 Thread Daniel Belenky
Hi Edi,

Are there any logs? where you're running the suite? may I have a link?

On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas  wrote:

> Good morning,
>
> We are running in the OST network suite a test module with Ansible and it
> started failing during the weekend on "OSError: [Errno 28] No space left on
> device" when attempting to take a lock in the mutiprocessing python module.
>
> It smells like a slave resource problem, could someone help investigate
> this?
>
> Thanks,
> Edy.
>
> === FAILURES 
> ===
> __ test_ovn_provider_create_scenario 
> ___
>
> os_client_config = None
>
> def test_ovn_provider_create_scenario(os_client_config):
> >   _test_ovn_provider('create_scenario.yml')
>
> network-suite-master/tests/test_ovn_provider.py:68:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> network-suite-master/tests/test_ovn_provider.py:78: in _test_ovn_provider
> playbook.run()
> network-suite-master/lib/ansiblelib.py:127: in run
> self._run_playbook_executor()
> network-suite-master/lib/ansiblelib.py:138: in _run_playbook_executor
> pbex = PlaybookExecutor(**self._pbex_args)
> /usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py:60: in 
> __init__
> self._tqm = TaskQueueManager(inventory=inventory, 
> variable_manager=variable_manager, loader=loader, options=options, 
> passwords=self.passwords)
> /usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py:104: 
> in __init__
> self._final_q = multiprocessing.Queue()
> /usr/lib64/python2.7/multiprocessing/__init__.py:218: in Queue
> return Queue(maxsize)
> /usr/lib64/python2.7/multiprocessing/queues.py:63: in __init__
> self._rlock = Lock()
> /usr/lib64/python2.7/multiprocessing/synchronize.py:147: in __init__
> SemLock.__init__(self, SEMAPHORE, 1, 1)
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
>
> self = , kind = 1, value = 1, maxvalue = 1
>
> def __init__(self, kind, value, maxvalue):
> >   sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
> E   OSError: [Errno 28] No space left on device
>
> /usr/lib64/python2.7/multiprocessing/synchronize.py:75: OSError
>
>


-- 

DANIEL BELENKY

RHV DEVOPS
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-18 Thread Edward Haas
Good morning,

We are running in the OST network suite a test module with Ansible and it
started failing during the weekend on "OSError: [Errno 28] No space left on
device" when attempting to take a lock in the mutiprocessing python module.

It smells like a slave resource problem, could someone help investigate
this?

Thanks,
Edy.

=== FAILURES ===
__ test_ovn_provider_create_scenario ___

os_client_config = None

def test_ovn_provider_create_scenario(os_client_config):
>   _test_ovn_provider('create_scenario.yml')

network-suite-master/tests/test_ovn_provider.py:68:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
network-suite-master/tests/test_ovn_provider.py:78: in _test_ovn_provider
playbook.run()
network-suite-master/lib/ansiblelib.py:127: in run
self._run_playbook_executor()
network-suite-master/lib/ansiblelib.py:138: in _run_playbook_executor
pbex = PlaybookExecutor(**self._pbex_args)
/usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py:60:
in __init__
self._tqm = TaskQueueManager(inventory=inventory,
variable_manager=variable_manager, loader=loader, options=options,
passwords=self.passwords)
/usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py:104:
in __init__
self._final_q = multiprocessing.Queue()
/usr/lib64/python2.7/multiprocessing/__init__.py:218: in Queue
return Queue(maxsize)
/usr/lib64/python2.7/multiprocessing/queues.py:63: in __init__
self._rlock = Lock()
/usr/lib64/python2.7/multiprocessing/synchronize.py:147: in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = , kind = 1, value = 1, maxvalue = 1

def __init__(self, kind, value, maxvalue):
>   sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
E   OSError: [Errno 28] No space left on device

/usr/lib64/python2.7/multiprocessing/synchronize.py:75: OSError
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


OST Network suite is failing on "OSError: [Errno 28] No space left on device"

2018-03-18 Thread Edward Haas
Good morning,

The network suite is running Ansible in one of its tests and from the
weekend it started failing without a good explanation.
It raises "OSError: [Errno 28] No space left on device"
when trying to take lock on the python multiprocess module.

It smells like a problem on the slave, but I am not sure.
Any ideas?

Thanks,
Edy.

=== FAILURES ===
__ test_ovn_provider_create_scenario ___

os_client_config = None

def test_ovn_provider_create_scenario(os_client_config):
>   _test_ovn_provider('create_scenario.yml')

network-suite-master/tests/test_ovn_provider.py:68:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
network-suite-master/tests/test_ovn_provider.py:78: in _test_ovn_provider
playbook.run()
network-suite-master/lib/ansiblelib.py:127: in run
self._run_playbook_executor()
network-suite-master/lib/ansiblelib.py:138: in _run_playbook_executor
pbex = PlaybookExecutor(**self._pbex_args)
/usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py:60:
in __init__
self._tqm = TaskQueueManager(inventory=inventory,
variable_manager=variable_manager, loader=loader, options=options,
passwords=self.passwords)
/usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py:104:
in __init__
self._final_q = multiprocessing.Queue()
/usr/lib64/python2.7/multiprocessing/__init__.py:218: in Queue
return Queue(maxsize)
/usr/lib64/python2.7/multiprocessing/queues.py:63: in __init__
self._rlock = Lock()
/usr/lib64/python2.7/multiprocessing/synchronize.py:147: in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = , kind = 1, value = 1, maxvalue = 1

def __init__(self, kind, value, maxvalue):
>   sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
E   OSError: [Errno 28] No space left on device

/usr/lib64/python2.7/multiprocessing/synchronize.py:75: OSError
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra