[sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management

2014-01-10 Thread Paul Moore
I'm developing an application using the ORM, and I am getting into all 
sorts of trouble with what I think should be a pretty simple data model. 
I've tried following the ORM tutorial from the docs, but it seems to get me 
confused every time I try. So I'm looking for something else that maybe 
takes a different approach. Or maybe someone can point out what's wrong for 
me in the following - but even then, pointers to other tutorial material 
would be useful, as I don't really want to end up pestering the list every 
time I hit an issue :-)

My data model is fairly straightforward. I have a Package class, with a 
name. I then have a Release class - each Release is associated with a 
single package, and has a unique version. Releases have a few children - 
Classifiers, Dependencies, and URLs. All of these are multi-valued with no 
natural key (at least, not one worth worrying about). There is some extra 
data associated with Releases and URLs, but that's not too important. 
(People may recognise this as the PyPI data model). This is a pretty 
trivial parent/child one-to-many hierarchy, and I didn't expect it to be 
hard to model.

The obvious (to me!) model is basically (trimmed down a bit):

class Package(Base):
__tablename__ = 'packages'
# Use a synthetic key, as package names can change
id = Column(Integer, primary_key=True)
name = Column(String, unique=True, nullable=False)
releases = relationship(Release, backref=package,
cascade=all, delete-orphan)

class Release(Base):
__tablename__ = 'releases'
id = Column(Integer, primary_key=True)
package_id = Column(Integer, ForeignKey('packages.id'), nullable=False)
version = Column(String, nullable=False) # Unique within package
classifiers = relationship(Classifier, backref=release,
cascade=all, delete-orphan)

class Classifier(Base):
__tablename__ = 'classifiers'
id = Column(Integer, primary_key=True)
release_id = Column(Integer, ForeignKey('releases.id'), nullable=False)
classifier = Column(String, nullable=False)

So far, so good. But if I want to create a new Release, things get messy. 
This is my basic function:

def new_release(package, version, data):
r = Release(version)
r.package = Package(package)
# Populate the data in r, and create child items
return r

It's that package reference that messes me up. If the release is for a new 
package, then when I merge the release into the session, the package is 
created. But if it's for an existing package, a new package is created 
(which gives a constraint error if the package name is unique, and 
duplicates if it's not) rather than the session recognising that it's an 
existing package and linking the release to it.

If I was working at the SQL core level, I'd expect to have to query for the 
package and create it if it wasn't there. But I'm not sure I can do that 
with a session, as things get cached in memory by the unit of work stuff, 
and I don't know if a query for the release could miss a package that's 
pending insertion, or if the query could cause auto-flushing which might 
cause other issues (either with performance or integrity). Because the 
session is managing the in-memory caching and the transaction management 
by magic, I don't want to interfere with its mechanisms any more than I 
have to. If I have to keep track of what's gone to the database, and query 
for existing instances and manage the transactions, I probably should just 
use the SQL layer directly (I have a lot of experiences with databases, but 
very little with ORMs, so pure DB code isn't too scary for me, but on the 
other hand I don't know what benefits the ORM should be giving me that I'm 
not seeing).

Is this an application that doesn't actually benefit from the ORM? Or am I 
using it wrongly, and my problems come from misunderstanding the way it 
should be used? As I say, I've a lot of database experience but very little 
with ORMs, so maybe I have an unbalanced view of how much data management 
the ORM should be able to handle for me.

The particular problem here is what's affecting me right now - but I'd be 
even more interested in a good ORM for experienced SQL developers 
tutorial that tells me how the ORM differs from the core level (and where 
its benefits lie).

Thanks,
Paul

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management

2014-01-10 Thread Kevin Horn
It sounds to me like the problem you're having has to do with how you are
getting the reference to the package, which isn't shown in your example.
 How are you getting it?

The session doesn't do anything by magic, even if it seems that way
sometimes.  It just manages things behind the scenes.

If you want a new Package, create a new Package object.  If you want an
exisiting package, query for it.  Just like you would in bare SQL code.

If you don't know which you need, try searching and create it if it isn't
found.  I usually do something like:

try:
pkg = session.query(Package).filter(condition-goes-here)
except sa.orm.exc.NotFound:
pkg = Package()
# populate attributes and add to session

This is off the top of my head and from memory, so beware of possible
(probable) errors, but hopefully you get the idea.

Unfortunately (fortunately?) the SQLAlchemy docs are good enough that there
isn't a huge impetus for people outside the project to write tutorials,
blog posts, etc.  I'd like to see more of that kind of thing as well, but
everyone has limited time, I guess.

Good luck!



On Fri, Jan 10, 2014 at 7:44 AM, Paul Moore p.f.mo...@gmail.com wrote:

 I'm developing an application using the ORM, and I am getting into all
 sorts of trouble with what I think should be a pretty simple data model.
 I've tried following the ORM tutorial from the docs, but it seems to get me
 confused every time I try. So I'm looking for something else that maybe
 takes a different approach. Or maybe someone can point out what's wrong for
 me in the following - but even then, pointers to other tutorial material
 would be useful, as I don't really want to end up pestering the list every
 time I hit an issue :-)

 My data model is fairly straightforward. I have a Package class, with a
 name. I then have a Release class - each Release is associated with a
 single package, and has a unique version. Releases have a few children -
 Classifiers, Dependencies, and URLs. All of these are multi-valued with no
 natural key (at least, not one worth worrying about). There is some extra
 data associated with Releases and URLs, but that's not too important.
 (People may recognise this as the PyPI data model). This is a pretty
 trivial parent/child one-to-many hierarchy, and I didn't expect it to be
 hard to model.

 The obvious (to me!) model is basically (trimmed down a bit):

 class Package(Base):
 __tablename__ = 'packages'
 # Use a synthetic key, as package names can change
 id = Column(Integer, primary_key=True)
 name = Column(String, unique=True, nullable=False)
 releases = relationship(Release, backref=package,
 cascade=all, delete-orphan)

 class Release(Base):
 __tablename__ = 'releases'
 id = Column(Integer, primary_key=True)
 package_id = Column(Integer, ForeignKey('packages.id'),
 nullable=False)
 version = Column(String, nullable=False) # Unique within package
 classifiers = relationship(Classifier, backref=release,
 cascade=all, delete-orphan)

 class Classifier(Base):
 __tablename__ = 'classifiers'
 id = Column(Integer, primary_key=True)
 release_id = Column(Integer, ForeignKey('releases.id'),
 nullable=False)
 classifier = Column(String, nullable=False)

 So far, so good. But if I want to create a new Release, things get messy.
 This is my basic function:

 def new_release(package, version, data):
 r = Release(version)
 r.package = Package(package)
 # Populate the data in r, and create child items
 return r

 It's that package reference that messes me up. If the release is for a new
 package, then when I merge the release into the session, the package is
 created. But if it's for an existing package, a new package is created
 (which gives a constraint error if the package name is unique, and
 duplicates if it's not) rather than the session recognising that it's an
 existing package and linking the release to it.

 If I was working at the SQL core level, I'd expect to have to query for
 the package and create it if it wasn't there. But I'm not sure I can do
 that with a session, as things get cached in memory by the unit of work
 stuff, and I don't know if a query for the release could miss a package
 that's pending insertion, or if the query could cause auto-flushing which
 might cause other issues (either with performance or integrity). Because
 the session is managing the in-memory caching and the transaction
 management by magic, I don't want to interfere with its mechanisms any
 more than I have to. If I have to keep track of what's gone to the
 database, and query for existing instances and manage the transactions, I
 probably should just use the SQL layer directly (I have a lot of
 experiences with databases, but very little with ORMs, so pure DB code
 isn't too scary for me, but on the other hand I don't know what benefits
 the ORM should be giving me that I'm not seeing).

 Is this an application that doesn't 

Re: [sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management

2014-01-10 Thread Paul Moore
On Friday, 10 January 2014 16:27:12 UTC, Kevin H wrote:

 It sounds to me like the problem you're having has to do with how you are 
 getting the reference to the package, which isn't shown in your example. 
  How are you getting it?


The new_release() function is what I do - I create a new Package object. I 
was under the impression that when I did a session.merge() that would match 
up existing objects. I'm probably being naive in that, though...
 

 The session doesn't do anything by magic, even if it seems that way 
 sometimes.  It just manages things behind the scenes.


Hmm - managing things behind the scenes is what I was meaning by magic. 
The problem is that I can't find any clear reference about what is, and 
what is not, managed behind the scenes. As I said, it's more about my 
understanding (and hence about me knowing what code I need to write) than 
about SQLAlchemy doing anything wrong.
 

 If you want a new Package, create a new Package object.  If you want an 
 exisiting package, query for it.  Just like you would in bare SQL code.

 If you don't know which you need, try searching and create it if it isn't 
 found.  I usually do something like:

 try:
 pkg = session.query(Package).filter(condition-goes-here)
 except sa.orm.exc.NotFound:
 pkg = Package()
 # populate attributes and add to session


OK. That's essentially what I was hoping to avoid. Largely because of that 
query - I may not yet have committed the package to the database.

For a more extended example, suppose I do the following:

p = Package(pip)
session.merge(p) # I could do session.add here, but I'm not 100% clear 
why just doing merge isn't better in case I'm not sure if pip is already 
present

Somewhere a little later, in other code where I haven't committed yet, but 
I don't have a reference to p available:

r = Release(1.5)
r.package = Package(pip)

Can I query for pip here? There's been no commit yet, and there may not 
even have been a flush (I've had problems with null foreign keys so I've 
had to force autoflush off in a few places). Essentially, will a query 
locate an object that's in session.new but which hasn't been flushed to the 
database yet?

This is the crux of my issue. I really don't understand why I'm getting 
null foreign keys on autoflush, but switching autoflush off seems to fix 
it. But maybe that's what's giving me these issues, so maybe I need to turn 
autoflush back on. But then what do I do about my noll FKs?

Unfortunately (fortunately?) the SQLAlchemy docs are good enough that there 
 isn't a huge impetus for people outside the project to write tutorials, 
 blog posts, etc.  I'd like to see more of that kind of thing as well, but 
 everyone has limited time, I guess.


Without wishing to seem critical, I find the ORM docs pretty difficult to 
follow. They seem to jump around between schema design (DDL) and usage 
(select and DML) in a pretty haphazard fashion, and the information about 
transactional control and session management seems to be spread around the 
various sections. That's basically just my perspective, and may reflect my 
experience, but it is frustrating. Agreed entirely about people having 
limited time, and the docs are certainly far better than a lot that's 
around.

Maybe I'll just have to have a thorough read of the docs before I carry on 
coding. Feels like I'm making more problems than I'm solving right now. Or 
maybe stick to what I know and drop back to the SQL core stuff.

Thanks for the help,
Paul

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management

2014-01-10 Thread Kevin Horn
On Fri, Jan 10, 2014 at 10:55 AM, Paul Moore p.f.mo...@gmail.com wrote:

 On Friday, 10 January 2014 16:27:12 UTC, Kevin H wrote:

 It sounds to me like the problem you're having has to do with how you are
 getting the reference to the package, which isn't shown in your example.
  How are you getting it?


 The new_release() function is what I do - I create a new Package object. I
 was under the impression that when I did a session.merge() that would match
 up existing objects. I'm probably being naive in that, though...


I don't think merge() does what you think it does. (Beware...I don't ever
really use it, so I am not an expert in it's use).  My understanding is
that merge takes an input instance, looks it up or creates a new instance
based on the primary key of the input instance, and copies data from the
input instance to the lookedup/created one, and then returns that.  If your
input instance doesn't have a primary key...well, that doesn't seem to make
sense and I really don't know what happens.  maybe this is the root of your
problem?

Personally, I stay away from merge() if at all possible, since it can be
way more complex than I want to deal with.

I would recommend trying to do things in the simplest way you can get away
with, and see whether that can help you increase your understanding.  Try
using add() and see if that helps you grok a bit more.


  The session doesn't do anything by magic, even if it seems that way
 sometimes.  It just manages things behind the scenes.

 Hmm - managing things behind the scenes is what I was meaning by magic.
 The problem is that I can't find any clear reference about what is, and
 what is not, managed behind the scenes. As I said, it's more about my
 understanding (and hence about me knowing what code I need to write) than
 about SQLAlchemy doing anything wrong.


 If you want a new Package, create a new Package object.  If you want an
 exisiting package, query for it.  Just like you would in bare SQL code.

 If you don't know which you need, try searching and create it if it isn't
 found.  I usually do something like:

 try:
 pkg = session.query(Package).filter(condition-goes-here)
 except sa.orm.exc.NotFound:
 pkg = Package()
 # populate attributes and add to session


 OK. That's essentially what I was hoping to avoid. Largely because of that
 query - I may not yet have committed the package to the database.


I don't think it has to be committed yet, it just has to exist in the
session.  See below.


 For a more extended example, suppose I do the following:

 p = Package(pip)
 session.merge(p) # I could do session.add here, but I'm not 100% clear
 why just doing merge isn't better in case I'm not sure if pip is already
 present


Because p doesn't have a primary key yet?  I'm kind of guessing...but I
really think you need a pkey on the input object to merge().


 Somewhere a little later, in other code where I haven't committed yet, but
 I don't have a reference to p available:

 r = Release(1.5)
 r.package = Package(pip)

 Can I query for pip here? There's been no commit yet, and there may not
 even have been a flush (I've had problems with null foreign keys so I've
 had to force autoflush off in a few places). Essentially, will a query
 locate an object that's in session.new but which hasn't been flushed to the
 database yet?


If you use session.add() then yes, you can query for pip there.  The ORM
will find it.  On Postgres, at least, it seems that when you add it to the
session, it grabs a value from the sequence used for the primary key, but
doesn't yet add the actual row (in this case...other cases might be more
complex) for the package.  So the session knows what the primary key will
be once you commit the package, and can use that to find the package.

At least that's how it looks like it works, and how I think about it.  I
don't claim to know what it actually does.


 This is the crux of my issue. I really don't understand why I'm getting
 null foreign keys on autoflush, but switching autoflush off seems to fix
 it. But maybe that's what's giving me these issues, so maybe I need to turn
 autoflush back on. But then what do I do about my noll FKs?


I'm not sure what to say about this.  Probably we're hitting the limits of
my understanding here.


 Unfortunately (fortunately?) the SQLAlchemy docs are good enough that
 there isn't a huge impetus for people outside the project to write
 tutorials, blog posts, etc.  I'd like to see more of that kind of thing as
 well, but everyone has limited time, I guess.


 Without wishing to seem critical, I find the ORM docs pretty difficult to
 follow. They seem to jump around between schema design (DDL) and usage
 (select and DML) in a pretty haphazard fashion, and the information about
 transactional control and session management seems to be spread around the
 various sections. That's basically just my perspective, and may reflect my
 experience, but it is frustrating. Agreed entirely about 

Re: [sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management

2014-01-10 Thread Michael Bayer

On Jan 10, 2014, at 8:44 AM, Paul Moore p.f.mo...@gmail.com wrote:

 
 So far, so good. But if I want to create a new Release, things get messy. 
 This is my basic function:
 
 def new_release(package, version, data):
 r = Release(version)
 r.package = Package(package)
 # Populate the data in r, and create child items
 return r
 
 It's that package reference that messes me up. If the release is for a new 
 package, then when I merge the release into the session, the package is 
 created. But if it's for an existing package, a new package is created (which 
 gives a constraint error if the package name is unique, and duplicates if 
 it's not) rather than the session recognising that it's an existing package 
 and linking the release to it.
 
 If I was working at the SQL core level, I'd expect to have to query for the 
 package and create it if it wasn't there. But I'm not sure I can do that with 
 a session, as things get cached in memory by the unit of work stuff, and I 
 don't know if a query for the release could miss a package that's pending 
 insertion, or if the query could cause auto-flushing which might cause other 
 issues (either with performance or integrity). Because the session is 
 managing the in-memory caching and the transaction management by magic, I 
 don't want to interfere with its mechanisms any more than I have to. If I 
 have to keep track of what's gone to the database, and query for existing 
 instances and manage the transactions, I probably should just use the SQL 
 layer directly (I have a lot of experiences with databases, but very little 
 with ORMs, so pure DB code isn't too scary for me, but on the other hand I 
 don't know what benefits the ORM should be giving me that I'm not seeing).

there’s various patterns for dealing with the very common issue of “create 
unique object if not exists, else use the current one”.  One that I frequently 
point to is the unique object recipe:

http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject

uniqueobject is a generalization.   You can of course just run the specific 
query yourself.  this is not in any way interfering with the ORM mechanisms - 
the unit of work is pretty much a simple identity dictionary in this sense, 
either a key exists for a given identity or it doesn’t (but keep in mind, 
“identity” to the unit of work means, “primary key identity”.  Not any of the 
other columns that are just part of unique constraints).

As mentioned elsewhere, Session.merge() kind of does this as well, but merge() 
only does so with the primary key of the instance, not for arbitrary fields, 
and I would agree that merge() is often more trouble than it’s worth - its 
complexity is that it can cascade the merge operation along relationships, not 
often needed for simple cases.




signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management

2014-01-10 Thread Paul Moore
On Friday, 10 January 2014 17:52:45 UTC, Michael Bayer wrote:

 there’s various patterns for dealing with the very common issue of “create 
 unique object if not exists, else use the current one”.  One that I 
 frequently point to is the unique object recipe: 


OK, so looking at that seems to imply that

pkg = Package(name=pip)
session.add(pkg)
is_it_there = session.query(Package).filter(Package.name == 'pip').first()
assert is_it_there is pkg

will fail the assertion (unless there is a flush). Otherwise, I don't see 
why the unique cache is needed in the recipe you reference.

I think I'm still missing something here...
Paul.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] Is there a good ORM tutorial? And advice on parent/child management

2014-01-10 Thread Michael Bayer

On Jan 10, 2014, at 1:42 PM, Paul Moore p.f.mo...@gmail.com wrote:

 On Friday, 10 January 2014 17:52:45 UTC, Michael Bayer wrote:
 there’s various patterns for dealing with the very common issue of “create 
 unique object if not exists, else use the current one”.  One that I 
 frequently point to is the unique object recipe: 
 
 OK, so looking at that seems to imply that
 
 pkg = Package(name=pip)
 session.add(pkg)
 is_it_there = session.query(Package).filter(Package.name == 'pip').first()
 assert is_it_there is pkg
 
 will fail the assertion (unless there is a flush).

if autoflush is turned on, it will pass the assertion.  the Session is flushed 
before any SQL query when autoflush is left at its default of True. So first 
pkg goes in via INSERT, the SELECT is emitted, locates the row with the 
identity of pkg, locates “pkg” itself in the identity map, and returns it.


 Otherwise, I don't see why the unique cache is needed in the recipe you 
 reference.

the cache isn’t needed but is nice so that many objects all doing the same 
thing don’t have to keep hitting the database for an object that’s already 
local in memory.

keep in mind, the Query always emits SQL, except in the case of the get() 
method which will look locally in the identity map first.  Otherwise, as rows 
are returned from the SQL statement, the primary key identity of each row is 
matched to whatever objects are already in the identity map.   That’s how the 
identity map works.   You still get the SQL being emitted in most cases so it’s 
not quite a “cache”.   More detail on this here: 
http://docs.sqlalchemy.org/en/rel_0_9/orm/session.html#is-the-session-a-cache



signature.asc
Description: Message signed with OpenPGP using GPGMail