Re: [openstack-dev] [networking-sfc] Intermittent database transaction issues, affecting the tempest gate

2017-01-05 Thread Bernard Cafarelli
After some research, this review fixes the tempest failures:
https://review.openstack.org/#/c/416503/1 (newer patchset has an
unrelated fix for the functional tests gate)

Multiple local tempest runs and gate rechecks all turned green with
this fix. That is the good news part.

The bad news is that I am still not sure on the root cause. The code
that triggers the problems is:
https://github.com/openstack/networking-sfc/blob/f5b52d5304796e44431b3874117aa0be91ed13d8/networking_sfc/services/sfc/drivers/ovs/db.py#L292
_get_port_detail() is just a wrapper on CommonDbMixin._get_by_id()
from neutron, so is it triggered by two _model_query() calls in a row?

Hoping someone can shed a light here, next time it may not be as an
easy fix as removing an unused line


On 22 December 2016 at 20:48, Mike Bayer <mba...@redhat.com> wrote:
>
> On 12/20/2016 06:50 PM, Cathy Zhang wrote:
>>
>> Hi Bernard,
>>
>> Thanks for the email. I will take a look at this. Xiaodong has been
>> working on tempest test scripts.
>> I will work with Xiaodong on this issue.
>
>
> I've added a comment to the issue which refers to upstream SQLAlchemy issue
> https://bitbucket.org/zzzeek/sqlalchemy/issues/3803 as a potential
> contributor, though looking at the logs linked from the issue it appears
> that database deadlocks are also occurring which may also be a precursor
> here.   There are many improvements in SQLAlchemy 1.1 such that the
> "rollback()" state should not be as susceptible to a corrupted database
> connection as seems to be the case here.
>
>
>
>
>
>>
>> Cathy
>>
>>
>> -Original Message-
>> From: Bernard Cafarelli [mailto:bcafa...@redhat.com]
>> Sent: Tuesday, December 20, 2016 3:00 AM
>> To: OpenStack Development Mailing List
>> Subject: [openstack-dev] [networking-sfc] Intermittent database
>> transaction issues, affecting the tempest gate
>>
>> Hi everyone,
>>
>> we have an open bug (thanks Igor for the report) on DB transaction issues:
>> https://bugs.launchpad.net/networking-sfc/+bug/1630503
>>
>> The thing is, I am seeing  quite a few tempest gate failures that follow
>> the same pattern: at some point in the test suite, the service gets
>> warnings/errors from the DB layer (reentrant call, closed transaction,
>> nested rollback, …), and all following tests fail.
>>
>> This affects both master and stable/newton branches (not many changes for
>> now in the DB parts between these branches)
>>
>> Some examples:
>> * https://review.openstack.org/#/c/400396/ failed with console log
>>
>> http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/console.html#_2016-12-16_12_44_47_564544
>> and service log
>>
>> http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_12_44_32_301
>> * https://review.openstack.org/#/c/405391/ failed,
>>
>> http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/console.html.gz#_2016-12-16_13_05_17_384323
>> and
>> http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_13_04_11_840
>> * another on master branch: https://review.openstack.org/#/c/411194/
>> with
>> http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/console.html.gz#_2016-12-15_22_36_15_216260
>> and
>> http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-15_22_35_53_310
>>
>> I took a look at the errors, but only found old-and-apparently-fixed
>> pymysql bugs, and suggestions like:
>> *
>> http://docs.sqlalchemy.org/en/latest/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar
>> *  https://review.openstack.org/#/c/230481/
>> Not really my forte, so if someone could take a look at these logs and fix
>> the problem, it would be great! Especially with the upcoming multinode
>> tempest gate
>>
>> Thanks,
>> --
>> Bernard Cafarelli
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> __
>> OpenStac

Re: [openstack-dev] [networking-sfc] Intermittent database transaction issues, affecting the tempest gate

2016-12-22 Thread Mike Bayer


On 12/20/2016 06:50 PM, Cathy Zhang wrote:

Hi Bernard,

Thanks for the email. I will take a look at this. Xiaodong has been working on 
tempest test scripts.
I will work with Xiaodong on this issue.


I've added a comment to the issue which refers to upstream SQLAlchemy 
issue https://bitbucket.org/zzzeek/sqlalchemy/issues/3803 as a potential 
contributor, though looking at the logs linked from the issue it appears 
that database deadlocks are also occurring which may also be a precursor 
here.   There are many improvements in SQLAlchemy 1.1 such that the 
"rollback()" state should not be as susceptible to a corrupted database 
connection as seems to be the case here.







Cathy


-Original Message-
From: Bernard Cafarelli [mailto:bcafa...@redhat.com]
Sent: Tuesday, December 20, 2016 3:00 AM
To: OpenStack Development Mailing List
Subject: [openstack-dev] [networking-sfc] Intermittent database transaction 
issues, affecting the tempest gate

Hi everyone,

we have an open bug (thanks Igor for the report) on DB transaction issues:
https://bugs.launchpad.net/networking-sfc/+bug/1630503

The thing is, I am seeing  quite a few tempest gate failures that follow the 
same pattern: at some point in the test suite, the service gets warnings/errors 
from the DB layer (reentrant call, closed transaction, nested rollback, …), and 
all following tests fail.

This affects both master and stable/newton branches (not many changes for now 
in the DB parts between these branches)

Some examples:
* https://review.openstack.org/#/c/400396/ failed with console log
http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/console.html#_2016-12-16_12_44_47_564544
and service log
http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_12_44_32_301
* https://review.openstack.org/#/c/405391/ failed,
http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/console.html.gz#_2016-12-16_13_05_17_384323
and 
http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_13_04_11_840
* another on master branch: https://review.openstack.org/#/c/411194/
with 
http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/console.html.gz#_2016-12-15_22_36_15_216260
and 
http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-15_22_35_53_310

I took a look at the errors, but only found old-and-apparently-fixed pymysql 
bugs, and suggestions like:
* 
http://docs.sqlalchemy.org/en/latest/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar
*  https://review.openstack.org/#/c/230481/
Not really my forte, so if someone could take a look at these logs and fix the 
problem, it would be great! Especially with the upcoming multinode tempest gate

Thanks,
--
Bernard Cafarelli

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [networking-sfc] Intermittent database transaction issues, affecting the tempest gate

2016-12-20 Thread Cathy Zhang
Hi Bernard,

Thanks for the email. I will take a look at this. Xiaodong has been working on 
tempest test scripts. 
I will work with Xiaodong on this issue. 

Cathy


-Original Message-
From: Bernard Cafarelli [mailto:bcafa...@redhat.com] 
Sent: Tuesday, December 20, 2016 3:00 AM
To: OpenStack Development Mailing List
Subject: [openstack-dev] [networking-sfc] Intermittent database transaction 
issues, affecting the tempest gate

Hi everyone,

we have an open bug (thanks Igor for the report) on DB transaction issues:
https://bugs.launchpad.net/networking-sfc/+bug/1630503

The thing is, I am seeing  quite a few tempest gate failures that follow the 
same pattern: at some point in the test suite, the service gets warnings/errors 
from the DB layer (reentrant call, closed transaction, nested rollback, …), and 
all following tests fail.

This affects both master and stable/newton branches (not many changes for now 
in the DB parts between these branches)

Some examples:
* https://review.openstack.org/#/c/400396/ failed with console log
http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/console.html#_2016-12-16_12_44_47_564544
and service log
http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_12_44_32_301
* https://review.openstack.org/#/c/405391/ failed,
http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/console.html.gz#_2016-12-16_13_05_17_384323
and 
http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_13_04_11_840
* another on master branch: https://review.openstack.org/#/c/411194/
with 
http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/console.html.gz#_2016-12-15_22_36_15_216260
and 
http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-15_22_35_53_310

I took a look at the errors, but only found old-and-apparently-fixed pymysql 
bugs, and suggestions like:
* 
http://docs.sqlalchemy.org/en/latest/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar
*  https://review.openstack.org/#/c/230481/
Not really my forte, so if someone could take a look at these logs and fix the 
problem, it would be great! Especially with the upcoming multinode tempest gate

Thanks,
--
Bernard Cafarelli

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [networking-sfc] Intermittent database transaction issues, affecting the tempest gate

2016-12-20 Thread Bernard Cafarelli
Hi everyone,

we have an open bug (thanks Igor for the report) on DB transaction issues:
https://bugs.launchpad.net/networking-sfc/+bug/1630503

The thing is, I am seeing  quite a few tempest gate failures that
follow the same pattern: at some point in the test suite, the service
gets warnings/errors from the DB layer (reentrant call, closed
transaction, nested rollback, …), and all following tests fail.

This affects both master and stable/newton branches (not many changes
for now in the DB parts between these branches)

Some examples:
* https://review.openstack.org/#/c/400396/ failed with console log
http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/console.html#_2016-12-16_12_44_47_564544
and service log
http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_12_44_32_301
* https://review.openstack.org/#/c/405391/ failed,
http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/console.html.gz#_2016-12-16_13_05_17_384323
and 
http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_13_04_11_840
* another on master branch: https://review.openstack.org/#/c/411194/
with 
http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/console.html.gz#_2016-12-15_22_36_15_216260
and 
http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-15_22_35_53_310

I took a look at the errors, but only found old-and-apparently-fixed
pymysql bugs, and suggestions like:
* 
http://docs.sqlalchemy.org/en/latest/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar
*  https://review.openstack.org/#/c/230481/
Not really my forte, so if someone could take a look at these logs and
fix the problem, it would be great! Especially with the upcoming
multinode tempest gate

Thanks,
-- 
Bernard Cafarelli

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev