[sqlalchemy] Declarative and deferred

2012-08-11 Thread David McKeone
Short:
---
Is there a way to backfill multiple deferred columns in a declarative object 
result instance in a dynamic way when groups can't be predicted in the model?


Long:

First, let me just say thanks for SQLAlchemy.  This is my first post to this 
list and after working with it for quite a while I've found it to be an 
excellent tool for working with the database.  My previous work (non-Python) 
was done with a database abstraction layer that was more relational and less 
object-oriented and I've found SQLAlchemy to be amazing for letting me have my 
cake (objects) and eat it too (hand-crafted sql optimizations).

Alright, so a few caveats for background:
1) I'm fairly new to Python (~4 months), but not to programming (~10 years)
2) This is the first time I've used an ORM, so my question may be more about 
the object-relational mismatch handling, rather than SQLAlchemy directly. 
3) I'm using SQLAlchemy with Flask's plug-in flask-sqlalchemy.  That may not 
have much do with my question, but just in case there is some subtle difference 
between declarative's base model and Flask's db.Model
4) The current project is to use Flask and SQLAlchemy to create a web site with 
an existing database in a deployed client-server application (96 Tables, can be 
anywhere between ~200MB and 30GB)
5) Assumptions abound... this is a fairly complicated/specific case (I think) 
so there may be underlying assumptions about how I'm doing things that are 
incorrect.  If I'm wrong in those underlying assumptions, then feel free to 
challenge them.
6) SQLAlchemy 0.7.8

Cool.

So, I see that using declarative objects has a quite a few advantages;  you can 
easily add attributes(columns, relationships, etc...), validators, and methods 
-- all great stuff for keeping things logically grouped. Then when you get to 
performance optimizations there is a significant benefit with larger models to 
not fetch all the columns for every request (this is a web app after all, so 
lower response times are a goal).  Great, so deferred looks like the ticket to 
be able to handle this particular mis-match in a good enough way.  I can defer 
any non-essential columns and if I need one or two other columns down the line 
then they'll be lazy-loaded as required. 

Contrived example:

class User(db.Model, HelperMixin):
__tablename__ =  'user'

id = db.Column(db.Integer, primary_key=True)
password = db.Column(db.String)
type = db.Column(db.Integer)
first_name = db.Column(db.String)
last_name = db.Column(db.String)
title = db.Column(db.String)
birthday = db.Column(db.Date)
height = db.Column(db.Numeric)
width = db.Column(db.Numeric)
# etc...

   def is_valid(self, check_password):
  # check password in a horribly insecure, but easy way
  return True if check_password == self.password else False

So with this model I want to validate a users password on login, but not load 
all the other unnecessary stuff, because login probably doesn't need all the 
rest of those columns.  Because I also want to keep things simple on the model, 
I don't use deferred directly, but rather I created a couple helper methods in 
a mixin.   (Note that other parts of the application may need more columns or 
less columns or different columns, depending on context, so putting deferreds 
directly in the model would also be impractical)

The mixin looks like this:

from sqlalchemy.orm import defer
from sqlalchemy.orm.properties import ColumnProperty
from sqlalchemy.orm.util import class_mapper

class HelperMixin(object):

@classmethod
def itercolumns(cls):
for prop in class_mapper(cls).iterate_properties:
if isinstance(prop, ColumnProperty):
yield prop.key

@classmethod
def get_deferred_except(cls, *attributes):
attribute_set = set(attributes)

ret = list()
for name in cls.itercolumns():
if name not in attribute_set:
ret.append(defer(name))

return ret

so with this helper I can context sensitively build up a result object with 
just the stuff I need (but without losing the benefits of the associated 
methods):

deferred = User.get_deferred_except('id', 'password') # Get list of defer() 
instances for all columns, but those specified
user = User.query.options(*deferred).first()

# SQL Emitted -- SELECT id, password FROM user

if user.is_valid(the_password):
# Valid stuff
else:
# Invalid stuff

Ok, well that worked great, but now I need to get the patrons name for some 
runtime specific reason.  So I do this:

full_name =  .join([user.title, user.first_name, user.last_name])

I now emit:

SELECT title FROM user
SELECT first_name FROM user
SELECT last_name FROM user

When what I really want at this point, and can predictably know in this case, 
is:

SELECT title, first_name, last_name FROM user

So, the question is, what is the best way to back-fill an object in a way that 
you keep the number 

Re: [sqlalchemy] Declarative and deferred

2012-08-11 Thread Michael Bayer

On Aug 11, 2012, at 10:08 AM, David McKeone wrote:

 so with this helper I can context sensitively build up a result object with 
 just the stuff I need (but without losing the benefits of the associated 
 methods):
 
 deferred = User.get_deferred_except('id', 'password') # Get list of defer() 
 instances for all columns, but those specified
 user = User.query.options(*deferred).first()
 
 # SQL Emitted -- SELECT id, password FROM user
 
 if user.is_valid(the_password):
# Valid stuff
 else:
# Invalid stuff
 
 Ok, well that worked great, but now I need to get the patrons name for some 
 runtime specific reason.  So I do this:
 
 full_name =  .join([user.title, user.first_name, user.last_name])
 
 I now emit:
 
 SELECT title FROM user
 SELECT first_name FROM user
 SELECT last_name FROM user
 
 When what I really want at this point, and can predictably know in this case, 
 is:
 
 SELECT title, first_name, last_name FROM user
 
 So, the question is, what is the best way to back-fill an object in a way 
 that you keep the number of SQL queries low, while also getting the 
 advantages of using a declarative instance?  

think about just from a python level how that would necessarily have to work.   
the requirements are:

1. object is loaded with most attributes deferred.

2. code executes against loaded object, which asks for three attributes.the 
three attributes should load at once.   however, the request for those 
attributes are separate.

Right off, your code is going to be more verbose - user.title, user.firstname, 
user.lastname are three separate instructions. 

so no matter what, to follow the requested behavior, it *has* to be:

user = User.query.options(..).first()

load_more_attributes(user, [title, first_name, last_name])

full_name =  .join([user.title, user.first_name, user.last_name])

that is, there must be an explicit instruction of some kind that tells it to 
load these three at once.

So for load_more_attributes, two choices.  One is use the function you have:

query(User).options(User.get_deferred_except(title, first_name, 
last_name)).filter(...).first()

because, the identity map will have the effect that the same User instance is 
the target.

Or use session.refresh(), probably more direct:

session.refresh(user, [title, first_name, last_name])

also, if the columns you're actually using are along these lines, that is, they 
aren't 10K text files, I'd strongly encourage you to do some actual profiling 
to determine if all this complexity is necessary and not actually hurting 
performance much more.To pull over 3K of text over the wire in a single 
result is probably a lot less overhead than to pull 1000K of text in two or 
three queries.


 Additionally, I'd also wonder if it would be possible to make a 'faulting' 
 type of object;  one where requesting any of the deferred columns would cause 
 all of the deferred columns in that instance to be loaded.

well this is what the group feature does, if any columns in the group are 
touched, the whole group is loaded.  My advice would be to do some profiling, 
come up with groups that are tailored close enough to the groups of attributes 
that tend to be called together, and to not overthink it.


-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Declarative and deferred

2012-08-11 Thread David McKeone
 

 session.refresh(user, [title, first_name, last_name]) 


This was the part that I was missing.  It's fairly readable and it does 
exactly what I'd need.
 


 also, if the columns you're actually using are along these lines, that is, 
 they aren't 10K text files, I'd strongly encourage you to do some actual 
 profiling to determine if all this complexity is necessary and not actually 
 hurting performance much more.To pull over 3K of text over the wire in 
 a single result is probably a lot less overhead than to pull 1000K of text 
 in two or three queries. 


At this point I'm really just exploring the boundaries of the tool so that 
I can select a flexible design.  I still haven't quite found the sweet spot 
between what can/should be lazy and what cannot/shouldn't be lazy.  In the 
existing application (the non-ORM one) all of this is done with an 
abstracted form of direct SQL (kind of like SQLAlchemy core).  I'd like to 
convert some of those sections to use declarative objects instead, so the 
point of of this is to know that if I do go down that path then I could 
still optimize the columns if I needed to (read: after I profiled it and 
determined that it was necessary) without having to drop all the way down 
to SQLAlchemy core and then change things from passing objects around to 
passing keys in some circumstances.  Although it's very likely that you are 
correct and that the complexity induced from using this kind of system may 
outweigh the over-the-wire savings -- I guess we'll see when I get there.


well this is what the group feature does, if any columns in the group are 
 touched, the whole group is loaded.  My advice would be to do some 
 profiling, come up with groups that are tailored close enough to the groups 
 of attributes that tend to be called together, and to not overthink it. 


I will certainly look into this some more, since there are certainly groups 
of columns that can be naturally grouped.  Plus I imagine that 
session.refresh() would load the entire group if an attribute from a group 
was passed to it.  So that could be an interesting way to chunk it.


Thanks for responding with such a great post.  Its certainly helped me 
think through the issues from different angle.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sqlalchemy/-/zQNUzzPhzFAJ.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Declarative and deferred

2012-08-11 Thread Michael Bayer

On Aug 11, 2012, at 3:43 PM, David McKeone wrote:

 Plus I imagine that session.refresh() would load the entire group if an 
 attribute from a group was passed to it.  So that could be an interesting way 
 to chunk it.

I think the attributes to session.refresh() trump any deferred rules.  It will 
load just what you send it.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.