subject:"\[sqlalchemy\] Re\: \(Hopefully\) simple problem with backrefs not being loaded when eagerloading."

Re: [sqlalchemy] Re: (Hopefully) simple problem with backrefs not being loaded when eagerloading.

2010-09-14 Thread Jon Siddle


 Thank you for such a full elaboration. I still think the end result is
something a little unintuitive (albeit only for those using detached
objects, who will come across it); but I can't argue against your
decision here. Based on the information you've given, keeping
things the way they are is undoubtedly the decision that will benefit
most users. I've worked with ORM internals before, and I know only
too well that architectural decisions made for sound reasons can
nevertheless make some things difficult, which feel like they should
be simple. As always, the best you can do is maximise the value to
the most people.

Perhaps just a note in the eagerload docs to say ...will not load
backreferences eagerly. If you are using detached objects, try the
join...contains_eager pattern. or words to that effect?

Thanks

Jon



On 14/09/10 16:27, Michael Bayer wrote:


On Sep 14, 6:03 am, Jon Siddlej...@corefiling.co.uk  wrote:

I would agree with all of this if I understood why a) it takes an
appreciable number of function calls or b) automatic decisionmaking is
necessary. I don't think there's any ambiguity here, but again; perhaps
I'm missing something fundamental.

Theres a backref with an attribute handler that receives append
events in userland, and emits a corresponding set event on the other
side. The Collection attribute implementation could receive extra
code that would cause it to do something similar to the backref
handler when the collection is populated.This is what you are
looking for here.

The append of the collection ultimately occurs in mapper.py line 2298
in the current tip.   The function calling it, a closure called
_instance() that is generated inside of _instance_process(), is always
the top hit on any profiling run against a load of objects.
Removing just three function calls from this method is a win.   Months
of work have gone into the workings of this method so that our load
speeds remain competitive with, and sometimes even faster than, other
ORMs that do not use identity maps or units of work, ORMs that are now
written almost entirely in C (I leave the identification of these
products as an exercise for the reader).

So the implementation here would probably replace the list being
appended onto with one that is additionally instrumented to catch
these events and populate in the other direction.  We already have the
GenericBackrefExtension which does this.  GBE is designed to work in
userland, not during a load but when you populate things manually.
GBE is handy, but is a primary place for bugs to occur, as it is an
event handler that issues more events.  I just fixed one the other day
involving some rare endless loop where the events could not decide
when to stop back-populating in each direction.   I was pretty amazed
that there were still some of those cases left after all these years.

So GBE as it is wouldn't work here, we'd need to write something just
like GBE but tailored towards loads, when there are no events firing
off.  It would be simpler than GBE, maybe ten lines total all
inlined.  But it would be a little bit redundant (now we have two back
populators ! one for loads, one for userland events).If we OTOH
tried to make this new reverse-populator and GBE somehow work on the
same system, that's probably more complicated.   At the very least it
would add method overhead as GBE would now be delegating down a deeper
chain.   So this is already complexity that really would have to be
worth it in order to dig into.

We don't just now have two ways of backref population, we also have
two different ways that Child.parent may be populated.  It might be
populated by the LoaderStrategy associated with Child.parent, or it
might be populated by the one associated with Parent.children.   If
the query spans across both relationships, now both LoaderStrategy
objects are there, and unless we do something about it, *both* will
populate Child.parent - totally redundant effort, sloppy.   Do we want
to prevent that ?  Sure, now we need more messages and flags running
around trying to prevent that case - the internals become that more
intricate.   Why are they loading this attribute in two different ways
when the first way is totally fine ?   Who wrote this crap ?

I know you don't believe me but I've been writing this thing for five
years - everything gets much more hairy than you'd like.  Even the
immediateloader patch I made you the other day, it immediately failed
when i started testing it further.  I had to hack into that same
_instance_process() method I told you about and modify it so that it
could run certain attribute loaders after all the others.   I cannot
recall ever adding any feature, however small and innocuous, that did
not have some unforeseen side effects that required further testing,
further checks and decisions that we didn't realize would be
needed.

So with the effort of adding new code and possibly refactoring how
backref events work, all the new tests needed

Re: [sqlalchemy] Re: (Hopefully) simple problem with backrefs not being loaded when eagerloading.

2010-09-14 Thread Michael Bayer


On Sep 14, 2010, at 12:22 PM, Jon Siddle wrote:

 Thank you for such a full elaboration. I still think the end result is
 something a little unintuitive (albeit only for those using detached
 objects, who will come across it); but I can't argue against your
 decision here. Based on the information you've given, keeping
 things the way they are is undoubtedly the decision that will benefit
 most users. I've worked with ORM internals before, and I know only
 too well that architectural decisions made for sound reasons can
 nevertheless make some things difficult, which feel like they should
 be simple. As always, the best you can do is maximise the value to
 the most people.
 
 Perhaps just a note in the eagerload docs to say ...will not load
 backreferences eagerly. If you are using detached objects, try the
 join...contains_eager pattern. or words to that effect?

yeah definitely.   I am psyched for the immediateload option, I think this 
option was not so easy to implement originally but the current architecture 
allows it to go in nicely with very little effort.



 
 Thanks
 
 Jon
 
 
 
 On 14/09/10 16:27, Michael Bayer wrote:
 
 On Sep 14, 6:03 am, Jon Siddlej...@corefiling.co.uk  wrote:
 I would agree with all of this if I understood why a) it takes an
 appreciable number of function calls or b) automatic decisionmaking is
 necessary. I don't think there's any ambiguity here, but again; perhaps
 I'm missing something fundamental.
 Theres a backref with an attribute handler that receives append
 events in userland, and emits a corresponding set event on the other
 side. The Collection attribute implementation could receive extra
 code that would cause it to do something similar to the backref
 handler when the collection is populated.This is what you are
 looking for here.
 
 The append of the collection ultimately occurs in mapper.py line 2298
 in the current tip.   The function calling it, a closure called
 _instance() that is generated inside of _instance_process(), is always
 the top hit on any profiling run against a load of objects.
 Removing just three function calls from this method is a win.   Months
 of work have gone into the workings of this method so that our load
 speeds remain competitive with, and sometimes even faster than, other
 ORMs that do not use identity maps or units of work, ORMs that are now
 written almost entirely in C (I leave the identification of these
 products as an exercise for the reader).
 
 So the implementation here would probably replace the list being
 appended onto with one that is additionally instrumented to catch
 these events and populate in the other direction.  We already have the
 GenericBackrefExtension which does this.  GBE is designed to work in
 userland, not during a load but when you populate things manually.
 GBE is handy, but is a primary place for bugs to occur, as it is an
 event handler that issues more events.  I just fixed one the other day
 involving some rare endless loop where the events could not decide
 when to stop back-populating in each direction.   I was pretty amazed
 that there were still some of those cases left after all these years.
 
 So GBE as it is wouldn't work here, we'd need to write something just
 like GBE but tailored towards loads, when there are no events firing
 off.  It would be simpler than GBE, maybe ten lines total all
 inlined.  But it would be a little bit redundant (now we have two back
 populators ! one for loads, one for userland events).If we OTOH
 tried to make this new reverse-populator and GBE somehow work on the
 same system, that's probably more complicated.   At the very least it
 would add method overhead as GBE would now be delegating down a deeper
 chain.   So this is already complexity that really would have to be
 worth it in order to dig into.
 
 We don't just now have two ways of backref population, we also have
 two different ways that Child.parent may be populated.  It might be
 populated by the LoaderStrategy associated with Child.parent, or it
 might be populated by the one associated with Parent.children.   If
 the query spans across both relationships, now both LoaderStrategy
 objects are there, and unless we do something about it, *both* will
 populate Child.parent - totally redundant effort, sloppy.   Do we want
 to prevent that ?  Sure, now we need more messages and flags running
 around trying to prevent that case - the internals become that more
 intricate.   Why are they loading this attribute in two different ways
 when the first way is totally fine ?   Who wrote this crap ?
 
 I know you don't believe me but I've been writing this thing for five
 years - everything gets much more hairy than you'd like.  Even the
 immediateloader patch I made you the other day, it immediately failed
 when i started testing it further.  I had to hack into that same
 _instance_process() method I told you about and modify it so that it
 could run certain attribute loaders after all the

Re: [sqlalchemy] Re: (Hopefully) simple problem with backrefs not being loaded when eagerloading.

Re: [sqlalchemy] Re: (Hopefully) simple problem with backrefs not being loaded when eagerloading.

2 matches

Site Navigation

Mail list logo

Footer information