[sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management
I'm developing an application using the ORM, and I am getting into all sorts of trouble with what I think should be a pretty simple data model. I've tried following the ORM tutorial from the docs, but it seems to get me confused every time I try. So I'm looking for something else that maybe takes a different approach. Or maybe someone can point out what's wrong for me in the following - but even then, pointers to other tutorial material would be useful, as I don't really want to end up pestering the list every time I hit an issue :-) My data model is fairly straightforward. I have a Package class, with a name. I then have a Release class - each Release is associated with a single package, and has a unique version. Releases have a few children - Classifiers, Dependencies, and URLs. All of these are multi-valued with no natural key (at least, not one worth worrying about). There is some extra data associated with Releases and URLs, but that's not too important. (People may recognise this as the PyPI data model). This is a pretty trivial parent/child one-to-many hierarchy, and I didn't expect it to be hard to model. The obvious (to me!) model is basically (trimmed down a bit): class Package(Base): __tablename__ = 'packages' # Use a synthetic key, as package names can change id = Column(Integer, primary_key=True) name = Column(String, unique=True, nullable=False) releases = relationship(Release, backref=package, cascade=all, delete-orphan) class Release(Base): __tablename__ = 'releases' id = Column(Integer, primary_key=True) package_id = Column(Integer, ForeignKey('packages.id'), nullable=False) version = Column(String, nullable=False) # Unique within package classifiers = relationship(Classifier, backref=release, cascade=all, delete-orphan) class Classifier(Base): __tablename__ = 'classifiers' id = Column(Integer, primary_key=True) release_id = Column(Integer, ForeignKey('releases.id'), nullable=False) classifier = Column(String, nullable=False) So far, so good. But if I want to create a new Release, things get messy. This is my basic function: def new_release(package, version, data): r = Release(version) r.package = Package(package) # Populate the data in r, and create child items return r It's that package reference that messes me up. If the release is for a new package, then when I merge the release into the session, the package is created. But if it's for an existing package, a new package is created (which gives a constraint error if the package name is unique, and duplicates if it's not) rather than the session recognising that it's an existing package and linking the release to it. If I was working at the SQL core level, I'd expect to have to query for the package and create it if it wasn't there. But I'm not sure I can do that with a session, as things get cached in memory by the unit of work stuff, and I don't know if a query for the release could miss a package that's pending insertion, or if the query could cause auto-flushing which might cause other issues (either with performance or integrity). Because the session is managing the in-memory caching and the transaction management by magic, I don't want to interfere with its mechanisms any more than I have to. If I have to keep track of what's gone to the database, and query for existing instances and manage the transactions, I probably should just use the SQL layer directly (I have a lot of experiences with databases, but very little with ORMs, so pure DB code isn't too scary for me, but on the other hand I don't know what benefits the ORM should be giving me that I'm not seeing). Is this an application that doesn't actually benefit from the ORM? Or am I using it wrongly, and my problems come from misunderstanding the way it should be used? As I say, I've a lot of database experience but very little with ORMs, so maybe I have an unbalanced view of how much data management the ORM should be able to handle for me. The particular problem here is what's affecting me right now - but I'd be even more interested in a good ORM for experienced SQL developers tutorial that tells me how the ORM differs from the core level (and where its benefits lie). Thanks, Paul -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management
It sounds to me like the problem you're having has to do with how you are getting the reference to the package, which isn't shown in your example. How are you getting it? The session doesn't do anything by magic, even if it seems that way sometimes. It just manages things behind the scenes. If you want a new Package, create a new Package object. If you want an exisiting package, query for it. Just like you would in bare SQL code. If you don't know which you need, try searching and create it if it isn't found. I usually do something like: try: pkg = session.query(Package).filter(condition-goes-here) except sa.orm.exc.NotFound: pkg = Package() # populate attributes and add to session This is off the top of my head and from memory, so beware of possible (probable) errors, but hopefully you get the idea. Unfortunately (fortunately?) the SQLAlchemy docs are good enough that there isn't a huge impetus for people outside the project to write tutorials, blog posts, etc. I'd like to see more of that kind of thing as well, but everyone has limited time, I guess. Good luck! On Fri, Jan 10, 2014 at 7:44 AM, Paul Moore p.f.mo...@gmail.com wrote: I'm developing an application using the ORM, and I am getting into all sorts of trouble with what I think should be a pretty simple data model. I've tried following the ORM tutorial from the docs, but it seems to get me confused every time I try. So I'm looking for something else that maybe takes a different approach. Or maybe someone can point out what's wrong for me in the following - but even then, pointers to other tutorial material would be useful, as I don't really want to end up pestering the list every time I hit an issue :-) My data model is fairly straightforward. I have a Package class, with a name. I then have a Release class - each Release is associated with a single package, and has a unique version. Releases have a few children - Classifiers, Dependencies, and URLs. All of these are multi-valued with no natural key (at least, not one worth worrying about). There is some extra data associated with Releases and URLs, but that's not too important. (People may recognise this as the PyPI data model). This is a pretty trivial parent/child one-to-many hierarchy, and I didn't expect it to be hard to model. The obvious (to me!) model is basically (trimmed down a bit): class Package(Base): __tablename__ = 'packages' # Use a synthetic key, as package names can change id = Column(Integer, primary_key=True) name = Column(String, unique=True, nullable=False) releases = relationship(Release, backref=package, cascade=all, delete-orphan) class Release(Base): __tablename__ = 'releases' id = Column(Integer, primary_key=True) package_id = Column(Integer, ForeignKey('packages.id'), nullable=False) version = Column(String, nullable=False) # Unique within package classifiers = relationship(Classifier, backref=release, cascade=all, delete-orphan) class Classifier(Base): __tablename__ = 'classifiers' id = Column(Integer, primary_key=True) release_id = Column(Integer, ForeignKey('releases.id'), nullable=False) classifier = Column(String, nullable=False) So far, so good. But if I want to create a new Release, things get messy. This is my basic function: def new_release(package, version, data): r = Release(version) r.package = Package(package) # Populate the data in r, and create child items return r It's that package reference that messes me up. If the release is for a new package, then when I merge the release into the session, the package is created. But if it's for an existing package, a new package is created (which gives a constraint error if the package name is unique, and duplicates if it's not) rather than the session recognising that it's an existing package and linking the release to it. If I was working at the SQL core level, I'd expect to have to query for the package and create it if it wasn't there. But I'm not sure I can do that with a session, as things get cached in memory by the unit of work stuff, and I don't know if a query for the release could miss a package that's pending insertion, or if the query could cause auto-flushing which might cause other issues (either with performance or integrity). Because the session is managing the in-memory caching and the transaction management by magic, I don't want to interfere with its mechanisms any more than I have to. If I have to keep track of what's gone to the database, and query for existing instances and manage the transactions, I probably should just use the SQL layer directly (I have a lot of experiences with databases, but very little with ORMs, so pure DB code isn't too scary for me, but on the other hand I don't know what benefits the ORM should be giving me that I'm not seeing). Is this an application that doesn't
Re: [sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management
On Friday, 10 January 2014 16:27:12 UTC, Kevin H wrote: It sounds to me like the problem you're having has to do with how you are getting the reference to the package, which isn't shown in your example. How are you getting it? The new_release() function is what I do - I create a new Package object. I was under the impression that when I did a session.merge() that would match up existing objects. I'm probably being naive in that, though... The session doesn't do anything by magic, even if it seems that way sometimes. It just manages things behind the scenes. Hmm - managing things behind the scenes is what I was meaning by magic. The problem is that I can't find any clear reference about what is, and what is not, managed behind the scenes. As I said, it's more about my understanding (and hence about me knowing what code I need to write) than about SQLAlchemy doing anything wrong. If you want a new Package, create a new Package object. If you want an exisiting package, query for it. Just like you would in bare SQL code. If you don't know which you need, try searching and create it if it isn't found. I usually do something like: try: pkg = session.query(Package).filter(condition-goes-here) except sa.orm.exc.NotFound: pkg = Package() # populate attributes and add to session OK. That's essentially what I was hoping to avoid. Largely because of that query - I may not yet have committed the package to the database. For a more extended example, suppose I do the following: p = Package(pip) session.merge(p) # I could do session.add here, but I'm not 100% clear why just doing merge isn't better in case I'm not sure if pip is already present Somewhere a little later, in other code where I haven't committed yet, but I don't have a reference to p available: r = Release(1.5) r.package = Package(pip) Can I query for pip here? There's been no commit yet, and there may not even have been a flush (I've had problems with null foreign keys so I've had to force autoflush off in a few places). Essentially, will a query locate an object that's in session.new but which hasn't been flushed to the database yet? This is the crux of my issue. I really don't understand why I'm getting null foreign keys on autoflush, but switching autoflush off seems to fix it. But maybe that's what's giving me these issues, so maybe I need to turn autoflush back on. But then what do I do about my noll FKs? Unfortunately (fortunately?) the SQLAlchemy docs are good enough that there isn't a huge impetus for people outside the project to write tutorials, blog posts, etc. I'd like to see more of that kind of thing as well, but everyone has limited time, I guess. Without wishing to seem critical, I find the ORM docs pretty difficult to follow. They seem to jump around between schema design (DDL) and usage (select and DML) in a pretty haphazard fashion, and the information about transactional control and session management seems to be spread around the various sections. That's basically just my perspective, and may reflect my experience, but it is frustrating. Agreed entirely about people having limited time, and the docs are certainly far better than a lot that's around. Maybe I'll just have to have a thorough read of the docs before I carry on coding. Feels like I'm making more problems than I'm solving right now. Or maybe stick to what I know and drop back to the SQL core stuff. Thanks for the help, Paul -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management
On Fri, Jan 10, 2014 at 10:55 AM, Paul Moore p.f.mo...@gmail.com wrote: On Friday, 10 January 2014 16:27:12 UTC, Kevin H wrote: It sounds to me like the problem you're having has to do with how you are getting the reference to the package, which isn't shown in your example. How are you getting it? The new_release() function is what I do - I create a new Package object. I was under the impression that when I did a session.merge() that would match up existing objects. I'm probably being naive in that, though... I don't think merge() does what you think it does. (Beware...I don't ever really use it, so I am not an expert in it's use). My understanding is that merge takes an input instance, looks it up or creates a new instance based on the primary key of the input instance, and copies data from the input instance to the lookedup/created one, and then returns that. If your input instance doesn't have a primary key...well, that doesn't seem to make sense and I really don't know what happens. maybe this is the root of your problem? Personally, I stay away from merge() if at all possible, since it can be way more complex than I want to deal with. I would recommend trying to do things in the simplest way you can get away with, and see whether that can help you increase your understanding. Try using add() and see if that helps you grok a bit more. The session doesn't do anything by magic, even if it seems that way sometimes. It just manages things behind the scenes. Hmm - managing things behind the scenes is what I was meaning by magic. The problem is that I can't find any clear reference about what is, and what is not, managed behind the scenes. As I said, it's more about my understanding (and hence about me knowing what code I need to write) than about SQLAlchemy doing anything wrong. If you want a new Package, create a new Package object. If you want an exisiting package, query for it. Just like you would in bare SQL code. If you don't know which you need, try searching and create it if it isn't found. I usually do something like: try: pkg = session.query(Package).filter(condition-goes-here) except sa.orm.exc.NotFound: pkg = Package() # populate attributes and add to session OK. That's essentially what I was hoping to avoid. Largely because of that query - I may not yet have committed the package to the database. I don't think it has to be committed yet, it just has to exist in the session. See below. For a more extended example, suppose I do the following: p = Package(pip) session.merge(p) # I could do session.add here, but I'm not 100% clear why just doing merge isn't better in case I'm not sure if pip is already present Because p doesn't have a primary key yet? I'm kind of guessing...but I really think you need a pkey on the input object to merge(). Somewhere a little later, in other code where I haven't committed yet, but I don't have a reference to p available: r = Release(1.5) r.package = Package(pip) Can I query for pip here? There's been no commit yet, and there may not even have been a flush (I've had problems with null foreign keys so I've had to force autoflush off in a few places). Essentially, will a query locate an object that's in session.new but which hasn't been flushed to the database yet? If you use session.add() then yes, you can query for pip there. The ORM will find it. On Postgres, at least, it seems that when you add it to the session, it grabs a value from the sequence used for the primary key, but doesn't yet add the actual row (in this case...other cases might be more complex) for the package. So the session knows what the primary key will be once you commit the package, and can use that to find the package. At least that's how it looks like it works, and how I think about it. I don't claim to know what it actually does. This is the crux of my issue. I really don't understand why I'm getting null foreign keys on autoflush, but switching autoflush off seems to fix it. But maybe that's what's giving me these issues, so maybe I need to turn autoflush back on. But then what do I do about my noll FKs? I'm not sure what to say about this. Probably we're hitting the limits of my understanding here. Unfortunately (fortunately?) the SQLAlchemy docs are good enough that there isn't a huge impetus for people outside the project to write tutorials, blog posts, etc. I'd like to see more of that kind of thing as well, but everyone has limited time, I guess. Without wishing to seem critical, I find the ORM docs pretty difficult to follow. They seem to jump around between schema design (DDL) and usage (select and DML) in a pretty haphazard fashion, and the information about transactional control and session management seems to be spread around the various sections. That's basically just my perspective, and may reflect my experience, but it is frustrating. Agreed entirely about
Re: [sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management
On Jan 10, 2014, at 8:44 AM, Paul Moore p.f.mo...@gmail.com wrote: So far, so good. But if I want to create a new Release, things get messy. This is my basic function: def new_release(package, version, data): r = Release(version) r.package = Package(package) # Populate the data in r, and create child items return r It's that package reference that messes me up. If the release is for a new package, then when I merge the release into the session, the package is created. But if it's for an existing package, a new package is created (which gives a constraint error if the package name is unique, and duplicates if it's not) rather than the session recognising that it's an existing package and linking the release to it. If I was working at the SQL core level, I'd expect to have to query for the package and create it if it wasn't there. But I'm not sure I can do that with a session, as things get cached in memory by the unit of work stuff, and I don't know if a query for the release could miss a package that's pending insertion, or if the query could cause auto-flushing which might cause other issues (either with performance or integrity). Because the session is managing the in-memory caching and the transaction management by magic, I don't want to interfere with its mechanisms any more than I have to. If I have to keep track of what's gone to the database, and query for existing instances and manage the transactions, I probably should just use the SQL layer directly (I have a lot of experiences with databases, but very little with ORMs, so pure DB code isn't too scary for me, but on the other hand I don't know what benefits the ORM should be giving me that I'm not seeing). there’s various patterns for dealing with the very common issue of “create unique object if not exists, else use the current one”. One that I frequently point to is the unique object recipe: http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject uniqueobject is a generalization. You can of course just run the specific query yourself. this is not in any way interfering with the ORM mechanisms - the unit of work is pretty much a simple identity dictionary in this sense, either a key exists for a given identity or it doesn’t (but keep in mind, “identity” to the unit of work means, “primary key identity”. Not any of the other columns that are just part of unique constraints). As mentioned elsewhere, Session.merge() kind of does this as well, but merge() only does so with the primary key of the instance, not for arbitrary fields, and I would agree that merge() is often more trouble than it’s worth - its complexity is that it can cascade the merge operation along relationships, not often needed for simple cases. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management
On Friday, 10 January 2014 17:52:45 UTC, Michael Bayer wrote: there’s various patterns for dealing with the very common issue of “create unique object if not exists, else use the current one”. One that I frequently point to is the unique object recipe: OK, so looking at that seems to imply that pkg = Package(name=pip) session.add(pkg) is_it_there = session.query(Package).filter(Package.name == 'pip').first() assert is_it_there is pkg will fail the assertion (unless there is a flush). Otherwise, I don't see why the unique cache is needed in the recipe you reference. I think I'm still missing something here... Paul. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management
On Jan 10, 2014, at 1:42 PM, Paul Moore p.f.mo...@gmail.com wrote: On Friday, 10 January 2014 17:52:45 UTC, Michael Bayer wrote: there’s various patterns for dealing with the very common issue of “create unique object if not exists, else use the current one”. One that I frequently point to is the unique object recipe: OK, so looking at that seems to imply that pkg = Package(name=pip) session.add(pkg) is_it_there = session.query(Package).filter(Package.name == 'pip').first() assert is_it_there is pkg will fail the assertion (unless there is a flush). if autoflush is turned on, it will pass the assertion. the Session is flushed before any SQL query when autoflush is left at its default of True. So first pkg goes in via INSERT, the SELECT is emitted, locates the row with the identity of pkg, locates “pkg” itself in the identity map, and returns it. Otherwise, I don't see why the unique cache is needed in the recipe you reference. the cache isn’t needed but is nice so that many objects all doing the same thing don’t have to keep hitting the database for an object that’s already local in memory. keep in mind, the Query always emits SQL, except in the case of the get() method which will look locally in the identity map first. Otherwise, as rows are returned from the SQL statement, the primary key identity of each row is matched to whatever objects are already in the identity map. That’s how the identity map works. You still get the SQL being emitted in most cases so it’s not quite a “cache”. More detail on this here: http://docs.sqlalchemy.org/en/rel_0_9/orm/session.html#is-the-session-a-cache signature.asc Description: Message signed with OpenPGP using GPGMail