[sqlalchemy] Re: Polymorphic and inheritance with missing children

2008-02-12 Thread Michael Bayer


On Feb 11, 2008, at 10:25 PM, Richard Levasseur wrote:


 Ok, so I tried this.  It works fine, the only catch is that yeah,  
 the pid and cid are hardcoded in there.  I couldn't figure out any  
 way to determine them programatically.  I can get the join condition  
 clause, but its just a binary expression object (essentially saying  
 pid = cid), and I don't know how to extract the portions reliably  
 (from the sounds of things, it doesn't sound possible at all).  It  
 looks like cid is always on the right, and pid is always on the  
 left, but I'm guessing thats just deterministic chance, and wouldn't  
 know what to do when its a more complicated expression.  This  
 worries me because I know there are other tables that will require  
 more complicated join conditions (where deleted/archived/hidden == 0  
 or IS NULL, or some such thing)

to programmatically determine columns on a table, use the table.c  
collection.  the direction of the join clause is not really  
important here (dont see how thats related?)  it should be easy enough  
to just say:

for c in childtable.c:
setattr(instance, c.name, None)

i.e. when the row does not contain columns from childtable.

A little bit of experimentation with the Table construct should reveal  
that pretty much anything is possible there.

 It looks like the nested post_execute def has a reference to  
 `statement` that it uses to figure out the join condition and issue  
 the subquery.  If I could simply call that inside my extension's  
 populate_instance and handle the exception, that'd probably work.

sure, try calling that.  But then, you could also adapt the source  
code of that to do more specifically what you need.

 Ok, let me give a more practical example:

 Lets say we have the following schema:
 Persons(pid, etype, name, is_active)
 Managers(pid, mid, level, full_team)
 Engineers(pid, eid, language)

 Lets say there's a single search on the webpage and users can enter  
 in queries like:
 1) engineer.language:Java OR manager.level:5
 2) name:john
 3) engineer.language: python

 the psuedo-sql for those queries should be something like
 1) select * from persons left join engineers on pid=pid left join  
 managers on pid=pid and full_team=0 where engineers.language=Java  
 or managers.level=5
 2) select * from persons where name ='john''
 3) select * from persons left join engineers using pid where  
 engineers.language='python'

If in the above example the mapper were configured with select_table,

mapper(Person, people,  
select_table=people.outerjoin(engineers).outerjoin(managers))

you can filter on those columns directly:

sess.query(Person).filter(or_(Engineer.language=='java',  
Manager.level==5))

if you don't want to use select_table, then you can set up the joins  
on a per-query basis:

 
sess 
.query 
(Person 
).select_from 
(people 
.outerjoin 
(engineers).outerjoin(managers)).filter(or_(Engineer.language=='java',  
Manager.level==5))

theres a query.join() call but that currently is used for joining  
along relations between classes, which is not quite what we have here.

As far as the mappers seeing Engineer.language and magically knowing  
to add engineers to the base table of people on its own that seems  
a little magical to me (im not sure how it could guess the desired  
action herelike how would it know to outerjoin and not join to  
engineers ?  how would it know that you didnt join to engineers in  
some other way already ?).  For reference, Hibernate inheritance could  
never do that; it would require that you query specifically for  
Engineer before specifying Engineer-specific criterion.

 Note that for (1), it has the additional full_team=0 condition as  
 part of the join itself.
 So, depending on the user's query, we need to join to different  
 tables.

yeah in any case, you'd need to specify that.  Even with joins along  
relations, we still require that you say query.join('relationname'),  
which is less verbose than using the full join condition but still  
requires explicitness. We dont guess joins at the query level.  The  
best we can do is some eventual feature like  
query(Person).polymorphic_with(Engineer).filter(), something like  
that.


 For the api, it'd be nice to do something like:
 session 
 .query(Person).filter(Manager.level=5).filter(Person.is_active==1)
 And the orm uses the join conditions we defined elsewhere (probably  
 on the mapper?).

if you said  
session 
.query 
(Person 
).filter(Manager.person_id==Person.person_id).filter(Manager.level ==  
5), then you've already joined to Managers.  Thats what I mean by we  
can't guess.


 Right now, it'll join to the tables it needs, but it won't put in  
 the join conditions.  I know its in there somewhere, otherwise it  
 couldn't do the subquery (...right?).  Another catch I'm seeing is  
 how to define those additional join conditions (use select_table  
 with a custom condition?).  It looks like its 

[sqlalchemy] Re: Polymorphic and inheritance with missing children

2008-02-11 Thread Richard Levasseur
On Feb 8, 2008 2:27 PM, Michael Bayer [EMAIL PROTECTED] wrote:



 On Feb 8, 2008, at 3:49 PM, Richard Levasseur wrote:

 
  Hm, we could do that, but that means that we have to outerjoin to
  ~15 other tables.  The primary table has ~200+ columns on it, each
  child table has 10-20 columns, there are over a million records, and
  our base filter criteria can be a bit complex.  Its indexed and
  such, but the query needs to run in  0.7 seconds (about how fast it
  goes on dev).  I have concerns it won't perform as well (we tried
  that and it really killed performance, but that was prior to a lot
  of performance improvements we've made, so it might be feasible
  again).

 you don't have to join to 15 tables.  You can also set select_table to
 the base table only, then write your mapper extension to do the work
 of _post_instance().  This extension calls the main populate_instance
 first, then issues a second query for the child table which is
 conditional.  This example is hardcoded to the example kids table
 but can be generalized if needed.  Notice that it populates the kid
 attributes with some default values, which is needed here because
 otherwise hitting them later will trigger a broken load (because
 there's no row in kids):


Ok, so I tried this.  It works fine, the only catch is that yeah, the pid
and cid are hardcoded in there.  I couldn't figure out any way to determine
them programatically.  I can get the join condition clause, but its just a
binary expression object (essentially saying pid = cid), and I don't know
how to extract the portions reliably (from the sounds of things, it doesn't
sound possible at all).  It looks like cid is always on the right, and pid
is always on the left, but I'm guessing thats just deterministic chance, and
wouldn't know what to do when its a more complicated expression.  This
worries me because I know there are other tables that will require more
complicated join conditions (where deleted/archived/hidden == 0 or IS NULL,
or some such thing)

It looks like the nested post_execute def has a reference to `statement`
that it uses to figure out the join condition and issue the subquery.  If I
could simply call that inside my extension's populate_instance and handle
the exception, that'd probably work.



 
  Thinking about this more, I essentially want to eagerload certain
  inherited child tables on a case-by-case basis.  I think I've seen
  similar questions about that here already.  I don't know the
  internals, but it seems like a parent knows about its children
  through its polymorphic map, so can't it figure out all the tables
  it would have to join to on its own?

 if by eagerload you mean, issue a second query for, then the above
 approach will get you started.  if you mean that it should construct
 joins from parent to child table, well yes it already does that if you
 query for a Child specifically, i.e. query(Child).all().  you also
 said you dont want to use a join for the base mapper since theres too
 many tables, so i dont think that's where you're referring to.   You
 can put select_table on whichever mappers you want to control who
 loads from what kinds of tables.



Ok, let me give a more practical example:

Lets say we have the following schema:
Persons(pid, etype, name, is_active)
Managers(pid, mid, level, full_team)
Engineers(pid, eid, language)

Lets say there's a single search on the webpage and users can enter in
queries like:
1) engineer.language:Java OR manager.level:5
2) name:john
3) engineer.language: python

the psuedo-sql for those queries should be something like
1) select * from persons left join engineers on pid=pid left join managers
on pid=pid and full_team=0 where engineers.language=Java or
managers.level=5
2) select * from persons where name ='john''
3) select * from persons left join engineers using pid where
engineers.language='python'

Note that for (1), it has the additional full_team=0 condition as part of
the join itself.
So, depending on the user's query, we need to join to different tables.

What I meant by eagerloading: It may or may not need to select columns from
those tables depending on the user's view (perhaps they have the '
engineer.language' field turned on, so we'd always join to Engineers that
regardless to avoid issuing lots of sub-queries).

For the api, it'd be nice to do something like:
session.query(Person).filter(Manager.level=5).filter(Person.is_active==1)
And the orm uses the join conditions we defined elsewhere (probably on the
mapper?).

Right now, it'll join to the tables it needs, but it won't put in the join
conditions.  I know its in there somewhere, otherwise it couldn't do the
subquery (...right?).  Another catch I'm seeing is how to define those
additional join conditions (use select_table with a custom condition?).  It
looks like its magically picking up the foreign key references between
Managers/Engineers and Personsjust wish I knew how to tell it that
explicitly (since some things 

[sqlalchemy] Re: Polymorphic and inheritance with missing children

2008-02-08 Thread Michael Bayer


On Feb 7, 2008, at 9:28 PM, Richard Levasseur wrote:


 Ok, so I've been looking into getting it to work when the child record
 doesn't exist.  I haven't had much luck.

 I wrote a stub mapper extension for translate_row and
 populate_instance, but it doesn't seem to be called when it goes to
 fetch the information for the child row.
 mapper(Parent, parents, polymorphic.)
 mapper(Child, children, extension=MyExtension(def translate_row, def
 populate_instance))

 query(Parent)
 my translate_row called
 my populate_instance called
 mapper.py:_get_poly_select_loader, the first inline post_execute is
 called to fetch the data:
  row = selectcontext.session.connection(self).execute(statement,
 params).fetchone()
  self.populate_instance(selectcontext, instance, row, isnew=False,
 instancekey=identitykey, ispostselect=True)

 Shouldn't translate_row be called before then?  In the working case
 (child exists), neither of my extension methods are called, too.  It
 makes sense that they would only be called once per result, but in
 this case a single result is actually 2 records fetch separately.

can you supply a more specific example of how this schema works ?
also, if a single result is a 2 record fetch, thats going to be  
extremely difficult to do - I'd want to look into perhaps supplying an  
alternate mapping for the schema.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Polymorphic and inheritance with missing children

2008-02-08 Thread Richard Levasseur
On Feb 8, 2008 7:04 AM, Michael Bayer [EMAIL PROTECTED] wrote:



 On Feb 7, 2008, at 9:28 PM, Richard Levasseur wrote:

 
  Ok, so I've been looking into getting it to work when the child record
  doesn't exist.  I haven't had much luck.
 
  I wrote a stub mapper extension for translate_row and
  populate_instance, but it doesn't seem to be called when it goes to
  fetch the information for the child row.
  mapper(Parent, parents, polymorphic.)
  mapper(Child, children, extension=MyExtension(def translate_row, def
  populate_instance))
 
  query(Parent)
  my translate_row called
  my populate_instance called
  mapper.py:_get_poly_select_loader, the first inline post_execute is
  called to fetch the data:
   row = selectcontext.session.connection(self).execute(statement,
  params).fetchone()
   self.populate_instance(selectcontext, instance, row, isnew=False,
  instancekey=identitykey, ispostselect=True)
 
  Shouldn't translate_row be called before then?  In the working case
  (child exists), neither of my extension methods are called, too.  It
  makes sense that they would only be called once per result, but in
  this case a single result is actually 2 records fetch separately.

 can you supply a more specific example of how this schema works ?
 also, if a single result is a 2 record fetch, thats going to be
 extremely difficult to do - I'd want to look into perhaps supplying an
 alternate mapping for the schema.


Ok, so I'm talking about 2 slightly different things at once:  One is that
sqlalchemy doesn't call the mapper extensions when loading the data from the
inherited tables.  The second is that it can't load instances unless a
record in the inherited table exists.

One:
By 2 record fetch I meant, a single instance of Child requires one record
to be fetched from `parents` and another record to be fetched from `kids`,
and sqlalchemy is correctly doing this with the inherited tables.  What I'm
pointing out is that, when it fetches the record from the child table, it
doesn't run the mapper extensions (unless i'm doing it wrong).
1) session.query(Parent).get(id)
2) select ... from parents ...
3) translate_row and translate row extensions
4) populate instance and populate instance extensions
5) I'm polymorphic and am a Child, so query kids: select ... from kids ...
6) populate instance (with the data from kids), but extensions aren't run.
(I don't see any calls to translate_row?)
7) return instance

This is my mapper extension:
class CustomMapper(sqlalchemy.orm.MapperExtension):
  def translate_row(self, context, row):
print translate row called
return row
  def populate_instance(self, mapper, context, row, instance, **flags):
print populate instance called
return sqlalchemy.orm.EXT_CONTINUE

mapper(Child, kids_table, inherits=Parent, extension=CustomMapper())

***

Two:
I've attached an example, essentially this:

Table(parents, Column(id), Column(type))
Table(kids, Column(child_id), Column(parent_id, FK(parents.id)))
engine.execute(insert into parents (id, type) values(1, 'child')) # not
inserting child record
session.query(Parent).get(1) # gives an error, NoneType not iteratable;
the row is None

Note that no record exists in `kids`, this is intentional, it isn't always
guaranteed to be there because of the way the system behaves (the workflow
engine may change the type outside the application)

It works in our current app (custom php) because when we save the record, we
check if the `child_id` is null, if it is we insert, otherwise we update.
During loading, if the `child_id` is null, we just set everything else to
null.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---

from sqlalchemy import create_engine
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
from sqlalchemy.orm import mapper, sessionmaker

__metaclass__ = type

metadata = MetaData()
engine = create_engine(sqlite:///:memory:, echo=True)
Session = sessionmaker(bind=engine, autoflush=False, transactional=True)
parents = Table(parents, metadata,
Column(id, Integer, primary_key=True),
Column(type, String(100)))

kids = Table(kids, metadata,
 Column(pid, Integer, ForeignKey(parents.id)),
 Column(cid, Integer, primary_key=True),
 )

metadata.create_all(engine)

class Parent:
  def __repr__(self):
return %s(%s, %s) % (self.__class__.__name__, self.id, self.type)

class Child(Parent):
  pass

mapper(Parent, parents, polymorphic_on=parents.c.type,
   polymorphic_identity=parent)
mapper(Child, kids, inherits=Parent, polymorphic_identity=child)

# Insert a Child, 

[sqlalchemy] Re: Polymorphic and inheritance with missing children

2008-02-08 Thread Richard Levasseur
On Feb 8, 2008 11:27 AM, Michael Bayer [EMAIL PROTECTED] wrote:


 On Feb 8, 2008, at 2:01 PM, Richard Levasseur wrote:

 Ok, so I'm talking about 2 slightly different things at once:  One is that
 sqlalchemy doesn't call the mapper extensions when loading the data from the
 inherited tables.  The second is that it can't load instances unless a
 record in the inherited table exists.

 One:
 By 2 record fetch I meant, a single instance of Child requires one
 record to be fetched from `parents` and another record to be fetched from
 `kids`, and sqlalchemy is correctly doing this with the inherited tables.
 What I'm pointing out is that, when it fetches the record from the child
 table, it doesn't run the mapper extensions (unless i'm doing it wrong).
 1) session.query(Parent).get(id)
 2) select ... from parents ...
 3) translate_row and translate row extensions
 4) populate instance and populate instance extensions
 5) I'm polymorphic and am a Child, so query kids: select ... from kids ...
 6) populate instance (with the data from kids), but extensions aren't
 run.  (I don't see any calls to translate_row?)
 7) return instance


 thats actually correct - the second query to the child table is not a
 top-level ORM query, its internal to the get() call, so we currently don't
 have any extension hooks there (all of that call a second query
 functionality was just introduced in 0.4.)


 However, the whole _post_instance step where that happens is optional.
  In your case, I really think you want to be using a polymorphic join or
 union so that the mapper can load parents and children from one query.  You
 do this by specifying the select_table argument to the mapper(), and
 usually its an outerjoin among all the involved tables or in some cases can
 be a big UNION of all the separate subtables.  But if your select_table only
 includes the base table, that will work too, it just wont populate the
 secondary attributes unless you did something else to make it happen (like
 in your extension).

 If you do that, the whole _post_instance() thing won't happen at all; your
 translate_row() will get the only row dealt with, as will your
 populate_instance().  So you *can* in theory issue a second SQL query within
 your own populate_instance() call that simulates what _post_instance() does,
 if you wanted to.  Or I would think you could just load everything in one
 shot here using a series of outerjoins among all the tables; theres an
 example in the docs and in the examples/polymorphic  folder illustrating how
 to load in that way.


 Two:
 I've attached an example, essentially this:

 Table(parents, Column(id), Column(type))
 Table(kids, Column(child_id), Column(parent_id, FK(parents.id)))
 engine.execute(insert into parents (id, type) values(1, 'child')) # not
 inserting child record
 session.query(Parent).get(1) # gives an error, NoneType not iteratable;
 the row is None

 Note that no record exists in `kids`, this is intentional, it isn't always
 guaranteed to be there because of the way the system behaves (the workflow
 engine may change the type outside the application)

 It works in our current app (custom php) because when we save the record,
 we check if the `child_id` is null, if it is we insert, otherwise we
 update.  During loading, if the `child_id` is null, we just set everything
 else to null.



 yup, works if you do it like this:

 mapper(Parent, parents, polymorphic_on=parents.c.type,
   polymorphic_identity=parent, select_table=parents.outerjoin(kids),
 primary_key=[parents.c.id])
 mapper(Child, kids, inherits=Parent, polymorphic_identity=child)

 note the primary_key=[parents.c.id] there, which is to override the fact
 that parents.outerjoin(kids) in fact has a primary key of [parents.c.id,
 kids.c.cid].



Hm, we could do that, but that means that we have to outerjoin to ~15 other
tables.  The primary table has ~200+ columns on it, each child table has
10-20 columns, there are over a million records, and our base filter
criteria can be a bit complex.  Its indexed and such, but the query needs to
run in  0.7 seconds (about how fast it goes on dev).  I have concerns it
won't perform as well (we tried that and it really killed performance, but
that was prior to a lot of performance improvements we've made, so it might
be feasible again).

I'll see how it goes. Thanks :)

What we do now is figure out what tables to join with based upon the user
and other conditions.  We end up joining to 2 or 3 tables instead of all
tables.  I'm not sure how to do this transparently (so we can still do
session.query(Parent) and add arbitrary pieces) in sqlalchemy, though.  The
table it selects from it defined at mapper time, and if I understand
correctly, we shouldn't be re-mapping classes on-the-fly.

What I'm currently thinking (if joining across them all is too slow) is
writing a mapper extension that does the necessary logic so that the base
session.query(Parent) is actually 

[sqlalchemy] Re: Polymorphic and inheritance with missing children

2008-02-07 Thread Michael Bayer


On Feb 7, 2008, at 1:33 PM, Richard Levasseur wrote:

 For the syntax of it, I'd go for something like
 `child_obj.change_from(parent_obj)` or `child_obj =
 ChildClass.create_from(parent_obj)`, perhaps implicitly invoked when
 you do child_obj.type = parent_obj.  Not sure about that last part,
 doesn't seem pythonic, and feels misleading.  The point is to allow
 the child the ability to perform custom operations if need be
 (__change_type__?).  Is the `session` part really necessary (I thought
 objects knew what session, if any, they were attached to?).  I don't
 recall exactly if this is possible, but perhaps the child can have its
 __dict__ set to the parent's?  That'd preserve any state and wouldn't
 require any loop-copy of attributes.

the session part is introduced only in that we try not to attach  
lots of methods and accessors to user-defined classes. we like to  
leave user-defined classes as unmodified as possible which at the very  
least helps with the possibility of name collisions (like, if someones  
SA app has a change_from method already, this change would break  
their app).   since everything else comes from session (like  
expunge(), refresh(), etc) this is where we like to put these things  
for consistency.

however, thinking further about this I think how we'd actually do this  
would be the same way we do a primary key switch - the mapper  
already knows how to detect the delete of one instance and the  
insert of another, both with the same primary key attributes, and  
turn it into an update.  So i would see this as an extension to that  
functionality, i.e.:

parent = sess.query(Parent).get(1)
child = Child(id=1)
child.someattr = parent.someattr  # copy attributes
sess.delete(parent)
sess.save(child)
sess.flush()





--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Polymorphic and inheritance with missing children

2008-02-07 Thread Richard Levasseur

On Feb 6, 7:01 pm, Michael Bayer [EMAIL PROTECTED] wrote:
 we dont yet support a seamless change of inheriting type.  your best
 bet is to issue the SQL to the table directly which changes the
 polymorphic value to the new value, then inserting/deleting from the
 child tables as needed..then reload the object into your session (i.e.
 expunge() the old one first).

Ok, I guess I'll play around with that at work a bit.


 the reason its not yet supported is because the extra steps of
 INSERTing and/or DELETEing in conjunction with what would normally
 just be an UPDATE are a little complex...also we'd have to pick an API
 for this (like, myobject = session.change_type(myobject, FooClass) ?
 not sure).  but we'll get this in there sooner or later (sooner if its
 an urgent need for people).

In our instance, its fairly important, we have about 30 different
types, half of which have child tables, the other half are within the
same parent table.  I still need to figure out how to load the child
when its record is missing, too.

For the syntax of it, I'd go for something like
`child_obj.change_from(parent_obj)` or `child_obj =
ChildClass.create_from(parent_obj)`, perhaps implicitly invoked when
you do child_obj.type = parent_obj.  Not sure about that last part,
doesn't seem pythonic, and feels misleading.  The point is to allow
the child the ability to perform custom operations if need be
(__change_type__?).  Is the `session` part really necessary (I thought
objects knew what session, if any, they were attached to?).  I don't
recall exactly if this is possible, but perhaps the child can have its
__dict__ set to the parent's?  That'd preserve any state and wouldn't
require any loop-copy of attributes.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Polymorphic and inheritance with missing children

2008-02-07 Thread Richard Levasseur

Ok, so I've been looking into getting it to work when the child record
doesn't exist.  I haven't had much luck.

I wrote a stub mapper extension for translate_row and
populate_instance, but it doesn't seem to be called when it goes to
fetch the information for the child row.
mapper(Parent, parents, polymorphic.)
mapper(Child, children, extension=MyExtension(def translate_row, def
populate_instance))

query(Parent)
my translate_row called
my populate_instance called
mapper.py:_get_poly_select_loader, the first inline post_execute is
called to fetch the data:
  row = selectcontext.session.connection(self).execute(statement,
params).fetchone()
  self.populate_instance(selectcontext, instance, row, isnew=False,
instancekey=identitykey, ispostselect=True)

Shouldn't translate_row be called before then?  In the working case
(child exists), neither of my extension methods are called, too.  It
makes sense that they would only be called once per result, but in
this case a single result is actually 2 records fetch separately.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---