[sqlalchemy] Re: Polymorphic and inheritance with missing children
On Feb 11, 2008, at 10:25 PM, Richard Levasseur wrote: Ok, so I tried this. It works fine, the only catch is that yeah, the pid and cid are hardcoded in there. I couldn't figure out any way to determine them programatically. I can get the join condition clause, but its just a binary expression object (essentially saying pid = cid), and I don't know how to extract the portions reliably (from the sounds of things, it doesn't sound possible at all). It looks like cid is always on the right, and pid is always on the left, but I'm guessing thats just deterministic chance, and wouldn't know what to do when its a more complicated expression. This worries me because I know there are other tables that will require more complicated join conditions (where deleted/archived/hidden == 0 or IS NULL, or some such thing) to programmatically determine columns on a table, use the table.c collection. the direction of the join clause is not really important here (dont see how thats related?) it should be easy enough to just say: for c in childtable.c: setattr(instance, c.name, None) i.e. when the row does not contain columns from childtable. A little bit of experimentation with the Table construct should reveal that pretty much anything is possible there. It looks like the nested post_execute def has a reference to `statement` that it uses to figure out the join condition and issue the subquery. If I could simply call that inside my extension's populate_instance and handle the exception, that'd probably work. sure, try calling that. But then, you could also adapt the source code of that to do more specifically what you need. Ok, let me give a more practical example: Lets say we have the following schema: Persons(pid, etype, name, is_active) Managers(pid, mid, level, full_team) Engineers(pid, eid, language) Lets say there's a single search on the webpage and users can enter in queries like: 1) engineer.language:Java OR manager.level:5 2) name:john 3) engineer.language: python the psuedo-sql for those queries should be something like 1) select * from persons left join engineers on pid=pid left join managers on pid=pid and full_team=0 where engineers.language=Java or managers.level=5 2) select * from persons where name ='john'' 3) select * from persons left join engineers using pid where engineers.language='python' If in the above example the mapper were configured with select_table, mapper(Person, people, select_table=people.outerjoin(engineers).outerjoin(managers)) you can filter on those columns directly: sess.query(Person).filter(or_(Engineer.language=='java', Manager.level==5)) if you don't want to use select_table, then you can set up the joins on a per-query basis: sess .query (Person ).select_from (people .outerjoin (engineers).outerjoin(managers)).filter(or_(Engineer.language=='java', Manager.level==5)) theres a query.join() call but that currently is used for joining along relations between classes, which is not quite what we have here. As far as the mappers seeing Engineer.language and magically knowing to add engineers to the base table of people on its own that seems a little magical to me (im not sure how it could guess the desired action herelike how would it know to outerjoin and not join to engineers ? how would it know that you didnt join to engineers in some other way already ?). For reference, Hibernate inheritance could never do that; it would require that you query specifically for Engineer before specifying Engineer-specific criterion. Note that for (1), it has the additional full_team=0 condition as part of the join itself. So, depending on the user's query, we need to join to different tables. yeah in any case, you'd need to specify that. Even with joins along relations, we still require that you say query.join('relationname'), which is less verbose than using the full join condition but still requires explicitness. We dont guess joins at the query level. The best we can do is some eventual feature like query(Person).polymorphic_with(Engineer).filter(), something like that. For the api, it'd be nice to do something like: session .query(Person).filter(Manager.level=5).filter(Person.is_active==1) And the orm uses the join conditions we defined elsewhere (probably on the mapper?). if you said session .query (Person ).filter(Manager.person_id==Person.person_id).filter(Manager.level == 5), then you've already joined to Managers. Thats what I mean by we can't guess. Right now, it'll join to the tables it needs, but it won't put in the join conditions. I know its in there somewhere, otherwise it couldn't do the subquery (...right?). Another catch I'm seeing is how to define those additional join conditions (use select_table with a custom condition?). It looks like its
[sqlalchemy] Re: Polymorphic and inheritance with missing children
On Feb 8, 2008 2:27 PM, Michael Bayer [EMAIL PROTECTED] wrote: On Feb 8, 2008, at 3:49 PM, Richard Levasseur wrote: Hm, we could do that, but that means that we have to outerjoin to ~15 other tables. The primary table has ~200+ columns on it, each child table has 10-20 columns, there are over a million records, and our base filter criteria can be a bit complex. Its indexed and such, but the query needs to run in 0.7 seconds (about how fast it goes on dev). I have concerns it won't perform as well (we tried that and it really killed performance, but that was prior to a lot of performance improvements we've made, so it might be feasible again). you don't have to join to 15 tables. You can also set select_table to the base table only, then write your mapper extension to do the work of _post_instance(). This extension calls the main populate_instance first, then issues a second query for the child table which is conditional. This example is hardcoded to the example kids table but can be generalized if needed. Notice that it populates the kid attributes with some default values, which is needed here because otherwise hitting them later will trigger a broken load (because there's no row in kids): Ok, so I tried this. It works fine, the only catch is that yeah, the pid and cid are hardcoded in there. I couldn't figure out any way to determine them programatically. I can get the join condition clause, but its just a binary expression object (essentially saying pid = cid), and I don't know how to extract the portions reliably (from the sounds of things, it doesn't sound possible at all). It looks like cid is always on the right, and pid is always on the left, but I'm guessing thats just deterministic chance, and wouldn't know what to do when its a more complicated expression. This worries me because I know there are other tables that will require more complicated join conditions (where deleted/archived/hidden == 0 or IS NULL, or some such thing) It looks like the nested post_execute def has a reference to `statement` that it uses to figure out the join condition and issue the subquery. If I could simply call that inside my extension's populate_instance and handle the exception, that'd probably work. Thinking about this more, I essentially want to eagerload certain inherited child tables on a case-by-case basis. I think I've seen similar questions about that here already. I don't know the internals, but it seems like a parent knows about its children through its polymorphic map, so can't it figure out all the tables it would have to join to on its own? if by eagerload you mean, issue a second query for, then the above approach will get you started. if you mean that it should construct joins from parent to child table, well yes it already does that if you query for a Child specifically, i.e. query(Child).all(). you also said you dont want to use a join for the base mapper since theres too many tables, so i dont think that's where you're referring to. You can put select_table on whichever mappers you want to control who loads from what kinds of tables. Ok, let me give a more practical example: Lets say we have the following schema: Persons(pid, etype, name, is_active) Managers(pid, mid, level, full_team) Engineers(pid, eid, language) Lets say there's a single search on the webpage and users can enter in queries like: 1) engineer.language:Java OR manager.level:5 2) name:john 3) engineer.language: python the psuedo-sql for those queries should be something like 1) select * from persons left join engineers on pid=pid left join managers on pid=pid and full_team=0 where engineers.language=Java or managers.level=5 2) select * from persons where name ='john'' 3) select * from persons left join engineers using pid where engineers.language='python' Note that for (1), it has the additional full_team=0 condition as part of the join itself. So, depending on the user's query, we need to join to different tables. What I meant by eagerloading: It may or may not need to select columns from those tables depending on the user's view (perhaps they have the ' engineer.language' field turned on, so we'd always join to Engineers that regardless to avoid issuing lots of sub-queries). For the api, it'd be nice to do something like: session.query(Person).filter(Manager.level=5).filter(Person.is_active==1) And the orm uses the join conditions we defined elsewhere (probably on the mapper?). Right now, it'll join to the tables it needs, but it won't put in the join conditions. I know its in there somewhere, otherwise it couldn't do the subquery (...right?). Another catch I'm seeing is how to define those additional join conditions (use select_table with a custom condition?). It looks like its magically picking up the foreign key references between Managers/Engineers and Personsjust wish I knew how to tell it that explicitly (since some things
[sqlalchemy] Re: Polymorphic and inheritance with missing children
On Feb 7, 2008, at 9:28 PM, Richard Levasseur wrote: Ok, so I've been looking into getting it to work when the child record doesn't exist. I haven't had much luck. I wrote a stub mapper extension for translate_row and populate_instance, but it doesn't seem to be called when it goes to fetch the information for the child row. mapper(Parent, parents, polymorphic.) mapper(Child, children, extension=MyExtension(def translate_row, def populate_instance)) query(Parent) my translate_row called my populate_instance called mapper.py:_get_poly_select_loader, the first inline post_execute is called to fetch the data: row = selectcontext.session.connection(self).execute(statement, params).fetchone() self.populate_instance(selectcontext, instance, row, isnew=False, instancekey=identitykey, ispostselect=True) Shouldn't translate_row be called before then? In the working case (child exists), neither of my extension methods are called, too. It makes sense that they would only be called once per result, but in this case a single result is actually 2 records fetch separately. can you supply a more specific example of how this schema works ? also, if a single result is a 2 record fetch, thats going to be extremely difficult to do - I'd want to look into perhaps supplying an alternate mapping for the schema. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Polymorphic and inheritance with missing children
On Feb 8, 2008 7:04 AM, Michael Bayer [EMAIL PROTECTED] wrote: On Feb 7, 2008, at 9:28 PM, Richard Levasseur wrote: Ok, so I've been looking into getting it to work when the child record doesn't exist. I haven't had much luck. I wrote a stub mapper extension for translate_row and populate_instance, but it doesn't seem to be called when it goes to fetch the information for the child row. mapper(Parent, parents, polymorphic.) mapper(Child, children, extension=MyExtension(def translate_row, def populate_instance)) query(Parent) my translate_row called my populate_instance called mapper.py:_get_poly_select_loader, the first inline post_execute is called to fetch the data: row = selectcontext.session.connection(self).execute(statement, params).fetchone() self.populate_instance(selectcontext, instance, row, isnew=False, instancekey=identitykey, ispostselect=True) Shouldn't translate_row be called before then? In the working case (child exists), neither of my extension methods are called, too. It makes sense that they would only be called once per result, but in this case a single result is actually 2 records fetch separately. can you supply a more specific example of how this schema works ? also, if a single result is a 2 record fetch, thats going to be extremely difficult to do - I'd want to look into perhaps supplying an alternate mapping for the schema. Ok, so I'm talking about 2 slightly different things at once: One is that sqlalchemy doesn't call the mapper extensions when loading the data from the inherited tables. The second is that it can't load instances unless a record in the inherited table exists. One: By 2 record fetch I meant, a single instance of Child requires one record to be fetched from `parents` and another record to be fetched from `kids`, and sqlalchemy is correctly doing this with the inherited tables. What I'm pointing out is that, when it fetches the record from the child table, it doesn't run the mapper extensions (unless i'm doing it wrong). 1) session.query(Parent).get(id) 2) select ... from parents ... 3) translate_row and translate row extensions 4) populate instance and populate instance extensions 5) I'm polymorphic and am a Child, so query kids: select ... from kids ... 6) populate instance (with the data from kids), but extensions aren't run. (I don't see any calls to translate_row?) 7) return instance This is my mapper extension: class CustomMapper(sqlalchemy.orm.MapperExtension): def translate_row(self, context, row): print translate row called return row def populate_instance(self, mapper, context, row, instance, **flags): print populate instance called return sqlalchemy.orm.EXT_CONTINUE mapper(Child, kids_table, inherits=Parent, extension=CustomMapper()) *** Two: I've attached an example, essentially this: Table(parents, Column(id), Column(type)) Table(kids, Column(child_id), Column(parent_id, FK(parents.id))) engine.execute(insert into parents (id, type) values(1, 'child')) # not inserting child record session.query(Parent).get(1) # gives an error, NoneType not iteratable; the row is None Note that no record exists in `kids`, this is intentional, it isn't always guaranteed to be there because of the way the system behaves (the workflow engine may change the type outside the application) It works in our current app (custom php) because when we save the record, we check if the `child_id` is null, if it is we insert, otherwise we update. During loading, if the `child_id` is null, we just set everything else to null. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~--- from sqlalchemy import create_engine from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey from sqlalchemy.orm import mapper, sessionmaker __metaclass__ = type metadata = MetaData() engine = create_engine(sqlite:///:memory:, echo=True) Session = sessionmaker(bind=engine, autoflush=False, transactional=True) parents = Table(parents, metadata, Column(id, Integer, primary_key=True), Column(type, String(100))) kids = Table(kids, metadata, Column(pid, Integer, ForeignKey(parents.id)), Column(cid, Integer, primary_key=True), ) metadata.create_all(engine) class Parent: def __repr__(self): return %s(%s, %s) % (self.__class__.__name__, self.id, self.type) class Child(Parent): pass mapper(Parent, parents, polymorphic_on=parents.c.type, polymorphic_identity=parent) mapper(Child, kids, inherits=Parent, polymorphic_identity=child) # Insert a Child,
[sqlalchemy] Re: Polymorphic and inheritance with missing children
On Feb 8, 2008 11:27 AM, Michael Bayer [EMAIL PROTECTED] wrote: On Feb 8, 2008, at 2:01 PM, Richard Levasseur wrote: Ok, so I'm talking about 2 slightly different things at once: One is that sqlalchemy doesn't call the mapper extensions when loading the data from the inherited tables. The second is that it can't load instances unless a record in the inherited table exists. One: By 2 record fetch I meant, a single instance of Child requires one record to be fetched from `parents` and another record to be fetched from `kids`, and sqlalchemy is correctly doing this with the inherited tables. What I'm pointing out is that, when it fetches the record from the child table, it doesn't run the mapper extensions (unless i'm doing it wrong). 1) session.query(Parent).get(id) 2) select ... from parents ... 3) translate_row and translate row extensions 4) populate instance and populate instance extensions 5) I'm polymorphic and am a Child, so query kids: select ... from kids ... 6) populate instance (with the data from kids), but extensions aren't run. (I don't see any calls to translate_row?) 7) return instance thats actually correct - the second query to the child table is not a top-level ORM query, its internal to the get() call, so we currently don't have any extension hooks there (all of that call a second query functionality was just introduced in 0.4.) However, the whole _post_instance step where that happens is optional. In your case, I really think you want to be using a polymorphic join or union so that the mapper can load parents and children from one query. You do this by specifying the select_table argument to the mapper(), and usually its an outerjoin among all the involved tables or in some cases can be a big UNION of all the separate subtables. But if your select_table only includes the base table, that will work too, it just wont populate the secondary attributes unless you did something else to make it happen (like in your extension). If you do that, the whole _post_instance() thing won't happen at all; your translate_row() will get the only row dealt with, as will your populate_instance(). So you *can* in theory issue a second SQL query within your own populate_instance() call that simulates what _post_instance() does, if you wanted to. Or I would think you could just load everything in one shot here using a series of outerjoins among all the tables; theres an example in the docs and in the examples/polymorphic folder illustrating how to load in that way. Two: I've attached an example, essentially this: Table(parents, Column(id), Column(type)) Table(kids, Column(child_id), Column(parent_id, FK(parents.id))) engine.execute(insert into parents (id, type) values(1, 'child')) # not inserting child record session.query(Parent).get(1) # gives an error, NoneType not iteratable; the row is None Note that no record exists in `kids`, this is intentional, it isn't always guaranteed to be there because of the way the system behaves (the workflow engine may change the type outside the application) It works in our current app (custom php) because when we save the record, we check if the `child_id` is null, if it is we insert, otherwise we update. During loading, if the `child_id` is null, we just set everything else to null. yup, works if you do it like this: mapper(Parent, parents, polymorphic_on=parents.c.type, polymorphic_identity=parent, select_table=parents.outerjoin(kids), primary_key=[parents.c.id]) mapper(Child, kids, inherits=Parent, polymorphic_identity=child) note the primary_key=[parents.c.id] there, which is to override the fact that parents.outerjoin(kids) in fact has a primary key of [parents.c.id, kids.c.cid]. Hm, we could do that, but that means that we have to outerjoin to ~15 other tables. The primary table has ~200+ columns on it, each child table has 10-20 columns, there are over a million records, and our base filter criteria can be a bit complex. Its indexed and such, but the query needs to run in 0.7 seconds (about how fast it goes on dev). I have concerns it won't perform as well (we tried that and it really killed performance, but that was prior to a lot of performance improvements we've made, so it might be feasible again). I'll see how it goes. Thanks :) What we do now is figure out what tables to join with based upon the user and other conditions. We end up joining to 2 or 3 tables instead of all tables. I'm not sure how to do this transparently (so we can still do session.query(Parent) and add arbitrary pieces) in sqlalchemy, though. The table it selects from it defined at mapper time, and if I understand correctly, we shouldn't be re-mapping classes on-the-fly. What I'm currently thinking (if joining across them all is too slow) is writing a mapper extension that does the necessary logic so that the base session.query(Parent) is actually
[sqlalchemy] Re: Polymorphic and inheritance with missing children
On Feb 7, 2008, at 1:33 PM, Richard Levasseur wrote: For the syntax of it, I'd go for something like `child_obj.change_from(parent_obj)` or `child_obj = ChildClass.create_from(parent_obj)`, perhaps implicitly invoked when you do child_obj.type = parent_obj. Not sure about that last part, doesn't seem pythonic, and feels misleading. The point is to allow the child the ability to perform custom operations if need be (__change_type__?). Is the `session` part really necessary (I thought objects knew what session, if any, they were attached to?). I don't recall exactly if this is possible, but perhaps the child can have its __dict__ set to the parent's? That'd preserve any state and wouldn't require any loop-copy of attributes. the session part is introduced only in that we try not to attach lots of methods and accessors to user-defined classes. we like to leave user-defined classes as unmodified as possible which at the very least helps with the possibility of name collisions (like, if someones SA app has a change_from method already, this change would break their app). since everything else comes from session (like expunge(), refresh(), etc) this is where we like to put these things for consistency. however, thinking further about this I think how we'd actually do this would be the same way we do a primary key switch - the mapper already knows how to detect the delete of one instance and the insert of another, both with the same primary key attributes, and turn it into an update. So i would see this as an extension to that functionality, i.e.: parent = sess.query(Parent).get(1) child = Child(id=1) child.someattr = parent.someattr # copy attributes sess.delete(parent) sess.save(child) sess.flush() --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Polymorphic and inheritance with missing children
On Feb 6, 7:01 pm, Michael Bayer [EMAIL PROTECTED] wrote: we dont yet support a seamless change of inheriting type. your best bet is to issue the SQL to the table directly which changes the polymorphic value to the new value, then inserting/deleting from the child tables as needed..then reload the object into your session (i.e. expunge() the old one first). Ok, I guess I'll play around with that at work a bit. the reason its not yet supported is because the extra steps of INSERTing and/or DELETEing in conjunction with what would normally just be an UPDATE are a little complex...also we'd have to pick an API for this (like, myobject = session.change_type(myobject, FooClass) ? not sure). but we'll get this in there sooner or later (sooner if its an urgent need for people). In our instance, its fairly important, we have about 30 different types, half of which have child tables, the other half are within the same parent table. I still need to figure out how to load the child when its record is missing, too. For the syntax of it, I'd go for something like `child_obj.change_from(parent_obj)` or `child_obj = ChildClass.create_from(parent_obj)`, perhaps implicitly invoked when you do child_obj.type = parent_obj. Not sure about that last part, doesn't seem pythonic, and feels misleading. The point is to allow the child the ability to perform custom operations if need be (__change_type__?). Is the `session` part really necessary (I thought objects knew what session, if any, they were attached to?). I don't recall exactly if this is possible, but perhaps the child can have its __dict__ set to the parent's? That'd preserve any state and wouldn't require any loop-copy of attributes. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Polymorphic and inheritance with missing children
Ok, so I've been looking into getting it to work when the child record doesn't exist. I haven't had much luck. I wrote a stub mapper extension for translate_row and populate_instance, but it doesn't seem to be called when it goes to fetch the information for the child row. mapper(Parent, parents, polymorphic.) mapper(Child, children, extension=MyExtension(def translate_row, def populate_instance)) query(Parent) my translate_row called my populate_instance called mapper.py:_get_poly_select_loader, the first inline post_execute is called to fetch the data: row = selectcontext.session.connection(self).execute(statement, params).fetchone() self.populate_instance(selectcontext, instance, row, isnew=False, instancekey=identitykey, ispostselect=True) Shouldn't translate_row be called before then? In the working case (child exists), neither of my extension methods are called, too. It makes sense that they would only be called once per result, but in this case a single result is actually 2 records fetch separately. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---