Re: [sqlalchemy] Re: (Hopefully) simple problem with backrefs not being loaded when eagerloading.
Thank you for such a full elaboration. I still think the end result is something a little unintuitive (albeit only for those using detached objects, who will come across it); but I can't argue against your decision here. Based on the information you've given, keeping things the way they are is undoubtedly the decision that will benefit most users. I've worked with ORM internals before, and I know only too well that architectural decisions made for sound reasons can nevertheless make some things difficult, which feel like they should be simple. As always, the best you can do is maximise the value to the most people. Perhaps just a note in the eagerload docs to say ...will not load backreferences eagerly. If you are using detached objects, try the join...contains_eager pattern. or words to that effect? Thanks Jon On 14/09/10 16:27, Michael Bayer wrote: On Sep 14, 6:03 am, Jon Siddlej...@corefiling.co.uk wrote: I would agree with all of this if I understood why a) it takes an appreciable number of function calls or b) automatic decisionmaking is necessary. I don't think there's any ambiguity here, but again; perhaps I'm missing something fundamental. Theres a backref with an attribute handler that receives append events in userland, and emits a corresponding set event on the other side. The Collection attribute implementation could receive extra code that would cause it to do something similar to the backref handler when the collection is populated.This is what you are looking for here. The append of the collection ultimately occurs in mapper.py line 2298 in the current tip. The function calling it, a closure called _instance() that is generated inside of _instance_process(), is always the top hit on any profiling run against a load of objects. Removing just three function calls from this method is a win. Months of work have gone into the workings of this method so that our load speeds remain competitive with, and sometimes even faster than, other ORMs that do not use identity maps or units of work, ORMs that are now written almost entirely in C (I leave the identification of these products as an exercise for the reader). So the implementation here would probably replace the list being appended onto with one that is additionally instrumented to catch these events and populate in the other direction. We already have the GenericBackrefExtension which does this. GBE is designed to work in userland, not during a load but when you populate things manually. GBE is handy, but is a primary place for bugs to occur, as it is an event handler that issues more events. I just fixed one the other day involving some rare endless loop where the events could not decide when to stop back-populating in each direction. I was pretty amazed that there were still some of those cases left after all these years. So GBE as it is wouldn't work here, we'd need to write something just like GBE but tailored towards loads, when there are no events firing off. It would be simpler than GBE, maybe ten lines total all inlined. But it would be a little bit redundant (now we have two back populators ! one for loads, one for userland events).If we OTOH tried to make this new reverse-populator and GBE somehow work on the same system, that's probably more complicated. At the very least it would add method overhead as GBE would now be delegating down a deeper chain. So this is already complexity that really would have to be worth it in order to dig into. We don't just now have two ways of backref population, we also have two different ways that Child.parent may be populated. It might be populated by the LoaderStrategy associated with Child.parent, or it might be populated by the one associated with Parent.children. If the query spans across both relationships, now both LoaderStrategy objects are there, and unless we do something about it, *both* will populate Child.parent - totally redundant effort, sloppy. Do we want to prevent that ? Sure, now we need more messages and flags running around trying to prevent that case - the internals become that more intricate. Why are they loading this attribute in two different ways when the first way is totally fine ? Who wrote this crap ? I know you don't believe me but I've been writing this thing for five years - everything gets much more hairy than you'd like. Even the immediateloader patch I made you the other day, it immediately failed when i started testing it further. I had to hack into that same _instance_process() method I told you about and modify it so that it could run certain attribute loaders after all the others. I cannot recall ever adding any feature, however small and innocuous, that did not have some unforeseen side effects that required further testing, further checks and decisions that we didn't realize would be needed. So with the effort of adding new code and possibly refactoring how backref events work, all the new tests needed
Re: [sqlalchemy] Re: (Hopefully) simple problem with backrefs not being loaded when eagerloading.
On Sep 14, 2010, at 12:22 PM, Jon Siddle wrote: Thank you for such a full elaboration. I still think the end result is something a little unintuitive (albeit only for those using detached objects, who will come across it); but I can't argue against your decision here. Based on the information you've given, keeping things the way they are is undoubtedly the decision that will benefit most users. I've worked with ORM internals before, and I know only too well that architectural decisions made for sound reasons can nevertheless make some things difficult, which feel like they should be simple. As always, the best you can do is maximise the value to the most people. Perhaps just a note in the eagerload docs to say ...will not load backreferences eagerly. If you are using detached objects, try the join...contains_eager pattern. or words to that effect? yeah definitely. I am psyched for the immediateload option, I think this option was not so easy to implement originally but the current architecture allows it to go in nicely with very little effort. Thanks Jon On 14/09/10 16:27, Michael Bayer wrote: On Sep 14, 6:03 am, Jon Siddlej...@corefiling.co.uk wrote: I would agree with all of this if I understood why a) it takes an appreciable number of function calls or b) automatic decisionmaking is necessary. I don't think there's any ambiguity here, but again; perhaps I'm missing something fundamental. Theres a backref with an attribute handler that receives append events in userland, and emits a corresponding set event on the other side. The Collection attribute implementation could receive extra code that would cause it to do something similar to the backref handler when the collection is populated.This is what you are looking for here. The append of the collection ultimately occurs in mapper.py line 2298 in the current tip. The function calling it, a closure called _instance() that is generated inside of _instance_process(), is always the top hit on any profiling run against a load of objects. Removing just three function calls from this method is a win. Months of work have gone into the workings of this method so that our load speeds remain competitive with, and sometimes even faster than, other ORMs that do not use identity maps or units of work, ORMs that are now written almost entirely in C (I leave the identification of these products as an exercise for the reader). So the implementation here would probably replace the list being appended onto with one that is additionally instrumented to catch these events and populate in the other direction. We already have the GenericBackrefExtension which does this. GBE is designed to work in userland, not during a load but when you populate things manually. GBE is handy, but is a primary place for bugs to occur, as it is an event handler that issues more events. I just fixed one the other day involving some rare endless loop where the events could not decide when to stop back-populating in each direction. I was pretty amazed that there were still some of those cases left after all these years. So GBE as it is wouldn't work here, we'd need to write something just like GBE but tailored towards loads, when there are no events firing off. It would be simpler than GBE, maybe ten lines total all inlined. But it would be a little bit redundant (now we have two back populators ! one for loads, one for userland events).If we OTOH tried to make this new reverse-populator and GBE somehow work on the same system, that's probably more complicated. At the very least it would add method overhead as GBE would now be delegating down a deeper chain. So this is already complexity that really would have to be worth it in order to dig into. We don't just now have two ways of backref population, we also have two different ways that Child.parent may be populated. It might be populated by the LoaderStrategy associated with Child.parent, or it might be populated by the one associated with Parent.children. If the query spans across both relationships, now both LoaderStrategy objects are there, and unless we do something about it, *both* will populate Child.parent - totally redundant effort, sloppy. Do we want to prevent that ? Sure, now we need more messages and flags running around trying to prevent that case - the internals become that more intricate. Why are they loading this attribute in two different ways when the first way is totally fine ? Who wrote this crap ? I know you don't believe me but I've been writing this thing for five years - everything gets much more hairy than you'd like. Even the immediateloader patch I made you the other day, it immediately failed when i started testing it further. I had to hack into that same _instance_process() method I told you about and modify it so that it could run certain attribute loaders after all the