Hi Vish,

I probably wasn't careful enough with my wording - the API server may not be 
threaded as such, but the use of eventlets gives effectively the same 
concurrency issues that point towards needing to use scoped sessions.

Our basis for concluding that this is some form of concurrency issue is that we 
can easily reproduce the issue by running concurrent requests into an API 
server, and we have seen the problem disappear if we reduce the eventlet pool 
to 1 or change to scoped sessions.   Whilst the symptom is that the session has 
terminated by the time the lazy load is requested, as far as we can see the 
eventlet handing the query hasn't itself terminated the session - although it 
does seem likely that another eventlet using the same shared session could 
have. This seems to be specifically the type of issue that scoped sessions are 
intended to address.

http://www.sqlalchemy.org/docs/orm/session.html#contextual-thread-local-sessions

All of this is based on a limited understanding of how sqlalchemy is used in 
Nova - I'd be more than happy to be corrected by others with more experience, 
hence the question to the mailing list.

I fully understand the drive to clean up the database layer, and I'm not 
knocking the fix to 855660 - its clearly a good template for the way the DB 
needs to go in Essex.   My concern is that as shown by 855660 these changes 
have a pretty wide scope, and by the time that's been expanded to all of the 
current joinedloads it feels like it would be such a large set of changes that 
I'd be concerned about them coming back into Diablo.Stable.

Hence instead we were looking for a much smaller change that can address the 
whole class of problem of joinedloads in Diablo for now ahead of the DB 
refactoring in Essex - and from our testing scoped sessions seem to address 
that.  However as changing to scoped session breaks the migrate code in unit 
tests, and not really understanding why this is or the intricacies of the DB 
unit tests I wanted to see if we were heading down a path that had already been 
examined and discarded before we spend too much time on it.

I'd be really interested in hearing from anyone with experience of 
scoped_sessions, and/or willing to help us understand the issues we're seeing 
in the Unit Tests.

And of course I'd like to know what the communities feeling is towards a 
simpler approach to fixing the issue in Diablo.Final vs the backport of DB 
simplification changes from Essex - which I'm assuming will take some tiem yet 
to work through all of the joinedloads.

Phil

From: Vishvananda Ishaya [mailto:vishvana...@gmail.com]
Sent: 31 October 2011 19:50
To: Day, Phil
Cc: openstack@lists.launchpad.net (openstack@lists.launchpad.net); Johnson, 
Andrew Gordon (HP Cloud Services); Hassan, Ahmad; Haynes, David; 
nova-datab...@lists.launchpad.net
Subject: Re: [Openstack] Is there a reason Nova doesn't use scoped sessions in 
sqlalchemy ?

All of the workers are single-threaded, so I'm not sure that scoped sessions 
are really necessary.

We did however decide that objects from the db layer are supposed to be simple 
dictionaries.  We currently allow nested dictionaries to optimize joined 
objects. Unfortunately we never switched to sanitizing data from sqlalchemy, 
and instead we make the sqlalchemy objects provide a dictionary-like interface 
and pass the object itself.

The issue that you're seeing is because network wasn't properly 'joinedload'ed 
in the initial query, and because the data is not sanitized, sqlalchemy tries 
to joinedload, but the session has been terminated.  If we had sanitized data, 
we would get a more useful error like a key error when network is accessed. The 
current solution is to add the proper joinedload.

One of the goals of the nova-database team is to do the necessary data 
sanitization and to remove as many of the joinedloads as possible (hopefully 
all of them).

Vish

On Oct 31, 2011, at 12:25 PM, Day, Phil wrote:


Hi Folks,

We've been looking into a problem which looks a lot like:

https://bugs.launchpad.net/nova/+bug/855660



2011-10-21 14:13:31,035 ERROR nova.api [5bd52130-d46f-4702-b06b-9ca5045473d7 
smokeuser smokeproject] Unexpected error raised: Parent instance <FixedIp at 
0x4e74490> is not bound to a Session; lazy load operation of attribute 
'network' cannot proceed
(nova.api): TRACE: Traceback (most recent call last):
(nova.api): TRACE: File 
"/usr/lib/python2.7/dist-packages/nova/api/ec2/__init__.py", line 363, in 
__call__
(nova.api): TRACE: result = api_request.invoke(context)
(nova.api): TRACE: File 
"/usr/lib/python2.7/dist-packages/nova/api/ec2/apirequest.py", line 90, in 
invoke
(nova.api): TRACE: result = method(context, **args)
(nova.api): TRACE: File 
"/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py", line 1195, in 
describe_instances
(nova.api): TRACE: instance_id=instance_id)
(nova.api): TRACE: File 
"/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py", line 1204, in 
_format_describe_instances
(nova.api): TRACE: return {'reservationSet': self._format_instances(context, 
**kwargs)}
(nova.api): TRACE: File 
"/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py", line 1309, in 
_format_instances
(nova.api): TRACE: if fixed['network'] and use_v6:
(nova.api): TRACE: File 
"/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/models.py", line 76, in 
__getitem__
(nova.api): TRACE: return getattr(self, key)
(nova.api): TRACE: File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py", line 163, in 
__get__
(nova.api): TRACE: instance_dict(instance))
(nova.api): TRACE: File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py", line 383, in 
get
(nova.api): TRACE: value = callable_(passive=passive)
(nova.api): TRACE: File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/strategies.py", line 595, in 
__call__
(nova.api): TRACE: (mapperutil.state_str(state), self.key)
(nova.api): TRACE: DetachedInstanceError: Parent instance <FixedIp at 
0x4e74490> is not bound to a Session; lazy load operation of attribute 
'network' cannot proceed
(nova.api): TRACE:


As far as we can see the problem seems to be related to some conflict between 
multiple threads in the same API server instance and lazy loading of some part 
of the object.

Looking at the sqlalchemy documentation it seems to strongly suggest that when 
used from multi-threaded WSGI applications that scoped_sessions should be used 
(I'm not clear on the details but it seems that this effectively makes lazy 
load operations thread safe).    However whilst this fixes the problem it has a 
bad effect on the unit tests - in particular it seems to upset all of the DB 
migration code used in the unit tests.

So does anyone know if there was an explicit decision / reason not to use 
scoped_sessions in Nova ?

Thanks,
Phil

PS:  The other possible fix we've found is to change sqlalchemy/models.py so 
that the associations are explicitly set to use eager load - which also seems 
to fix the problem but feels like a more clumsy way to go about it.   Any 
thoughts on that would also be appreciated ?



_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : 
openstack@lists.launchpad.net<mailto:openstack@lists.launchpad.net>
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to