Re: [sqlalchemy] (Hopefully) simple problem with backrefs not being loaded when eagerloading.

2010-09-14 Thread Jon Siddle

 On 13/09/10 18:21, Michael Bayer wrote:

On Sep 13, 2010, at 12:26 PM, Jon Siddle wrote:


This relationship is satisfied as you request it, and it works by looking in the current 
Session's identity map for the primary key stored by the many-to-one.   The operation 
falls under the realm of lazyloading even though no SQL is emitted.   If you 
consider that Child may have many related many-to-ones, all of which may already be in 
the Session, it would be quite wasteful for the ORM to assume that you're going to be 
working with the object in a detached state and that you need all of them.

I'm not sure I see what you're saying here. I've explicitly asked for all children 
relating to parent and these are correctly queried and loaded. While they are being 
added to the parent.children list, why not also set each child.parent since this is known?

because you didn't specify it, and it takes a palpable amount of additional 
overhead to do so
I don't see why it's more overhead than an assignment child.parent = ... 
at the same time as the list append parent.children.append(...). There's 
obviously something more complex going on behind the scenes.

  as well as a palpable amount of complexity to decide if it should do so based 
on the logic you'd apply here, when in 99% of the cases it is not needed.
I just don't see the complexity of the logic here. I've specified I want 
to join parent to each child, and it's already doing so in one 
direction. I realise this is only a problem

for detached objects, but it leads to quite confusing behaviour, I think.



I don't see how this is wasteful, but I may be missing something.

Child may have parent, foo, bar, bat attached to it, all many-to-ones.   Which 
ones should it assume the user wants to load ?
parent. Because I have explicitly asked it to using joinedload or 
eagerload.

   If you are loading 1 rows, and each Child object has three many-to-ones 
on it, and suppose it takes 120 function calls to look at a relationship, 
determine the values to send to query._get(), look in the identity map, etc., 
that is 3 x 1 x 120 = 3.6 million function calls
But you don't have to look in the identity map at all, since you've just 
set the parent-child association in the other direction and thus have 
both entities to hand, right?

, by default, almost never needed since normally they are all just there in the 
current session, without the user asking to do so.There is nothing special 
about Child.parent just because Parent.children is present up the chain.
While Hibernate may have decided that the complexity and overhead of adding 
this decisionmaking was worth it, they have many millions more function calls 
to burn with the Java VM in any case than we do in Python, and they also have a 
much higher bar to implement lazyloading since their native class 
instrumentation is itself a huge beast.   In our case it is simply unnecessary. 
 Any such automatic decisionmaking you can be sure quickly leads to many 
uncomfortable edge cases and thorny situations, causing needless surprise 
issues for users who don't really need such a behavior in the first place.
I would agree with all of this if I understood why a) it takes an 
appreciable number of function calls or b) automatic decisionmaking is 
necessary. I don't think there's any ambiguity here, but again; perhaps 
I'm missing something fundamental.

As I've mentioned, you will have an option to tell it which many-to-ones you'd like it to 
spend time pre-populating using the upcoming immedateload option.
I still think this can be done with negligible overhead if it's done at 
the same time as the other side of the relation (parent-child). Perhaps 
I'll have to dig around in the code to see why this is such a problem.

The Session's default assumption is that you're going to be leaving it around while you 
work with the objects contained, and in that way you interact with the database for as 
long as you deal with its objects, which represent proxies to the underlying 
transaction.   When objects are detached, for reasons like caching and serialization to 
other places, normally you'd merge() them back when you want to use them again.   So if 
it were me I'd normally be looking to not be closing the session.

I'm closing the session before I forward the objects to the view template in a 
web application. The template has no business doing database operations,

I disagree with this interpretation of abstraction.   That's like saying that 
pushing the button in an elevator means you're now in charge of starting up the elevator 
motors and instructing them how many feet to travel.

Huh? I didn't use the word abstraction.

The template is not doing database operations, it is working with high level 
objects that you've sent it, and knows nothing of a database.   That these objects may be 
doing database calls behind the scenes to lazily fetch additional data is known as the 
proxy pattern.  It is one of the most 

[sqlalchemy] (Hopefully) simple problem with backrefs not being loaded when eagerloading.

2010-09-13 Thread Jon Siddle
 I'm sure I'm missing something simple here, and any pointers in the 
right direction would be greatly appreciated.


Take for instance the following code:

session = Session()
parents = session.query(Parent).options(joinedload(Parent.children)).all()
session.close()

print parents[0].children  # This works
print parents[0].children[0].parent  # This gives a lazy loading error

Adding the following loop before closing the session works (and doesn't 
hit the DB):


for p in parents:
  for c in p.children:
c.parent

As far as I can tell, the mapping is correct since:

* It all works fine if I leave the session open
* If I don't use joinedload, and leave the session open it lazyloads 
correctly


I'm surprised that:

* It doesn't set both sides of the relation, considering it apparently 
knows about them
* It complains that the session is closed despite not actually requiring 
an open session (no SQL is sent to the DB for c.parent)


These apprent do-nothing loops are starting to clutter the code. There 
must be a better way.


Thanks

Jon

--
Jon Siddle, CoreFiling Limited
Software Tools Developer
http://www.corefiling.com
Phone: +44-1865-203192

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] (Hopefully) simple problem with backrefs not being loaded when eagerloading.

2010-09-13 Thread Michael Bayer

On Sep 13, 2010, at 8:48 AM, Jon Siddle wrote:

 I'm sure I'm missing something simple here, and any pointers in the right 
 direction would be greatly appreciated.
 
 Take for instance the following code:
 
 session = Session()
 parents = session.query(Parent).options(joinedload(Parent.children)).all()
 session.close()
 
 print parents[0].children  # This works
 print parents[0].children[0].parent  # This gives a lazy loading error
 
 Adding the following loop before closing the session works (and doesn't hit 
 the DB):
 
 for p in parents:
  for c in p.children:
c.parent
 
 As far as I can tell, the mapping is correct since:
 
 * It all works fine if I leave the session open
 * If I don't use joinedload, and leave the session open it lazyloads correctly
 
 I'm surprised that:
 
 * It doesn't set both sides of the relation, considering it apparently knows 
 about them

This relationship is satisfied as you request it, and it works by looking in 
the current Session's identity map for the primary key stored by the 
many-to-one.   The operation falls under the realm of lazyloading even though 
no SQL is emitted.   If you consider that Child may have many related 
many-to-ones, all of which may already be in the Session, it would be quite 
wasteful for the ORM to assume that you're going to be working with the object 
in a detached state and that you need all of them.

The Session's default assumption is that you're going to be leaving it around 
while you work with the objects contained, and in that way you interact with 
the database for as long as you deal with its objects, which represent 
proxies to the underlying transaction.   When objects are detached, for 
reasons like caching and serialization to other places, normally you'd merge() 
them back when you want to use them again.   So if it were me I'd normally be 
looking to not be closing the session.

However, when working with detached objects is necessary, two approaches here 
you can use.  One is a general approach that can load anything related, which 
is to load them in a @reconstructor.  This is illustrated at  
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/ImmediateLoading .It won't 
issue any extra SQL for the many-to-ones that are present in the session 
already.

In the specific case you have above, you can also use a trick which is to use 
contains_eager():

parents = session.query(Parent).options(joinedload(Parent.children), 
contains_eager(Parent.children, Child.parent)).all()

the above approach requires that Parent is one of the entities that you're 
requesting explicitly - i.e. if you were saying joinedload(foo, bar, 
bat), it would be kind of impossible to target bat.hohos with 
contains_eager() due to the aliasing.







this will do the get() of the Parent as you run through.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] (Hopefully) simple problem with backrefs not being loaded when eagerloading.

2010-09-13 Thread Michael Bayer

On Sep 13, 2010, at 11:45 AM, Michael Bayer wrote:

 
 
 In the specific case you have above, you can also use a trick which is to use 
 contains_eager():
 
 parents = session.query(Parent).options(joinedload(Parent.children), 
 contains_eager(Parent.children, Child.parent)).all()
 
 the above approach requires that Parent is one of the entities that you're 
 requesting explicitly - i.e. if you were saying joinedload(foo, bar, 
 bat), it would be kind of impossible to target bat.hohos with 
 contains_eager() due to the aliasing.

so let me also back that up, that we've always planned on adding an 
immediateload option that would just fire off any lazyloader as the query 
fetches results.A really short patch that adds immediateload() is at 
http://www.sqlalchemy.org/trac/ticket/1914 and hopefully will be in 0.6.5 
pending further testing.   



-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] (Hopefully) simple problem with backrefs not being loaded when eagerloading.

2010-09-13 Thread Jon Siddle

 On 13/09/10 16:45, Michael Bayer wrote:

On Sep 13, 2010, at 8:48 AM, Jon Siddle wrote:


I'm sure I'm missing something simple here, and any pointers in the right 
direction would be greatly appreciated.

Take for instance the following code:

session = Session()
parents = session.query(Parent).options(joinedload(Parent.children)).all()
session.close()

print parents[0].children  # This works
print parents[0].children[0].parent  # This gives a lazy loading error

Adding the following loop before closing the session works (and doesn't hit the 
DB):

for p in parents:
  for c in p.children:
c.parent

As far as I can tell, the mapping is correct since:

* It all works fine if I leave the session open
* If I don't use joinedload, and leave the session open it lazyloads correctly

I'm surprised that:

* It doesn't set both sides of the relation, considering it apparently knows 
about them

This relationship is satisfied as you request it, and it works by looking in the current 
Session's identity map for the primary key stored by the many-to-one.   The operation 
falls under the realm of lazyloading even though no SQL is emitted.   If you 
consider that Child may have many related many-to-ones, all of which may already be in 
the Session, it would be quite wasteful for the ORM to assume that you're going to be 
working with the object in a detached state and that you need all of them.
I'm not sure I see what you're saying here. I've explicitly asked for 
all children relating to parent and these are correctly queried and 
loaded. While they are being added to the parent.children list, why not 
also set each child.parent since this is known? I don't see how this is 
wasteful, but I may be missing something. I'm not suggesting it should 
touch relations that I haven't explicitly told it to eagerly load. The 
likes of Hibernate (yes, it's a very different beast) load both sides of 
the relation at once.

The Session's default assumption is that you're going to be leaving it around while you 
work with the objects contained, and in that way you interact with the database for as 
long as you deal with its objects, which represent proxies to the underlying 
transaction.   When objects are detached, for reasons like caching and serialization to 
other places, normally you'd merge() them back when you want to use them again.   So if 
it were me I'd normally be looking to not be closing the session.
I'm closing the session before I forward the objects to the view 
template in a web application. The template has no business doing 
database operations, and the controller *should* make sure all DB work 
has been done. In my case, I know I'll never need to write back

to the DB.

However, when working with detached objects is necessary, two approaches here 
you can use.  One is a general approach that can load anything related, which 
is to load them in a @reconstructor.  This is illustrated at  
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/ImmediateLoading .It won't 
issue any extra SQL for the many-to-ones that are present in the session 
already.

In the specific case you have above, you can also use a trick which is to use 
contains_eager():

parents = session.query(Parent).options(joinedload(Parent.children), 
contains_eager(Parent.children, Child.parent)).all()
This seems to address my problem directly. It's still a bit redundant, 
but from my initial tests it seems to solve my problem.

the above approach requires that Parent is one of the entities that you're requesting explicitly - i.e. if you were 
saying joinedload(foo, bar, bat), it would be kind of impossible to target 
bat.hohos with contains_eager() due to the aliasing.
I'm only interested in making sure both sides of the same relation are 
loaded; so this isn't a problem at all.



so let me also back that up, that we've always planned on adding an immediateload 
option that would just fire off any lazyloader as the query fetches results.A really short 
patch that adds immediateload() is athttp://www.sqlalchemy.org/trac/ticket/1914  and 
hopefully will be in 0.6.5 pending further testing.
We'll have to support 0.5 for some time, but it's good to know a 
shortcut is coming.


Thanks a lot for your help.

Jon

--
Jon Siddle, CoreFiling Limited
Software Tools Developer
http://www.corefiling.com
Phone: +44-1865-203192

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] (Hopefully) simple problem with backrefs not being loaded when eagerloading.

2010-09-13 Thread Michael Bayer

On Sep 13, 2010, at 12:26 PM, Jon Siddle wrote:

 This relationship is satisfied as you request it, and it works by looking in 
 the current Session's identity map for the primary key stored by the 
 many-to-one.   The operation falls under the realm of lazyloading even 
 though no SQL is emitted.   If you consider that Child may have many related 
 many-to-ones, all of which may already be in the Session, it would be quite 
 wasteful for the ORM to assume that you're going to be working with the 
 object in a detached state and that you need all of them.
 I'm not sure I see what you're saying here. I've explicitly asked for all 
 children relating to parent and these are correctly queried and loaded. 
 While they are being added to the parent.children list, why not also set each 
 child.parent since this is known?

because you didn't specify it, and it takes a palpable amount of additional 
overhead to do so as well as a palpable amount of complexity to decide if it 
should do so based on the logic you'd apply here, when in 99% of the cases it 
is not needed.


 I don't see how this is wasteful, but I may be missing something.

Child may have parent, foo, bar, bat attached to it, all many-to-ones.   Which 
ones should it assume the user wants to load ?  If you are loading 1 rows, 
and each Child object has three many-to-ones on it, and suppose it takes 120 
function calls to look at a relationship, determine the values to send to 
query._get(), look in the identity map, etc., that is 3 x 1 x 120 = 3.6 
million function calls, by default, almost never needed since normally they are 
all just there in the current session, without the user asking to do so.
There is nothing special about Child.parent just because Parent.children is 
present up the chain.While Hibernate may have decided that the complexity 
and overhead of adding this decisionmaking was worth it, they have many 
millions more function calls to burn with the Java VM in any case than we do in 
Python, and they also have a much higher bar to implement lazyloading since 
their native class instrumentation is itself a huge beast.   In our case it is 
simply unnecessary.  Any such automatic decisionmaking you can be sure quickly 
leads to many uncomfortable edge cases and thorny situations, causing needless 
surprise issues for users who don't really need such a behavior in the first 
place.

As I've mentioned, you will have an option to tell it which many-to-ones you'd 
like it to spend time pre-populating using the upcoming immedateload option.

 The Session's default assumption is that you're going to be leaving it 
 around while you work with the objects contained, and in that way you 
 interact with the database for as long as you deal with its objects, which 
 represent proxies to the underlying transaction.   When objects are 
 detached, for reasons like caching and serialization to other places, 
 normally you'd merge() them back when you want to use them again.   So if it 
 were me I'd normally be looking to not be closing the session.
 I'm closing the session before I forward the objects to the view template in 
 a web application. The template has no business doing database operations,

I disagree with this interpretation of abstraction.   That's like saying that 
pushing the button in an elevator means you're now in charge of starting up the 
elevator motors and instructing them how many feet to travel.

The template is not doing database operations, it is working with high level 
objects that you've sent it, and knows nothing of a database.   That these 
objects may be doing database calls behind the scenes to lazily fetch 
additional data is known as the proxy pattern.  It is one of the most 
fundamental patterns in object oriented software design. Separation of 
concerns is about what kinds of source code and awareness of systems live in 
various places - it has nothing to do operational timing or initiation.

The pre-load scenario is certainly valid if you're trying to render from an 
object graph that loads from a cache and doesn't want to do any additional 
database calls.  But this is strictly an issue of optimization, not correct 
software design.



-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.