[sqlalchemy] Re: Does SQLAlchemy ORM use column indexes to optimize queries?
Jason, Thanks for your examples - there are lots of useful SQLAlchemy coding hints in there for me... On Apr 5, 5:30 pm, jason kirtland [EMAIL PROTECTED] wrote: GK wrote: Michael, Thank you for your response - it was very helpful for me. It turns out my main problem was that I was importing an order of magnitude or so more data than I realized, but you were also right about using flush(). You were also right about the overhead of creating extra indexes. In the spirit of putting some data in a public space... Starting with en empty database, with a test dataset of 1200 values (about 1150 unique insertions) and flushing after every insertion I have the following timings: No extra indexes: 2:00 Three extra indexes: 2:15 This is using SQLite with a flat file on a 1.8GHz laptop. The records are each nearly 1Kb. There's an overhead of about 5 seconds for reading the data, so most of the above time is loading the database. I haven't yet had time to judge how the performance varies with larger datasets. 2:00 seems very high- is that 2 minutes? Below are two similar bulk table loads. The first uses the same insert-or-update methodology and only the relational layer (no ORM)- that clocks in at 1.25 seconds on my laptop. The second is an ORM implementation with a different duplicate detection methodology- that clocks in at 2.0 seconds. --- ## ## Relational version ## import os import time import random from sqlalchemy import * from sqlalchemy.exceptions import IntegrityError data_cols = ( 'sendadr', 'fromadr', 'toadr', 'sendnam', 'fromnam', 'tonam', 'subject', 'received', 'spam', 'folderid' ) chunk = lambda: '%x' % random.getrandbits(400) dataset = [dict((col, chunk()) for col in data_cols) for _ in xrange(1200)] dupes = random.sample(dataset, 50) db = '1krows.db' if os.path.exists(db): os.unlink(db) engine = create_engine('sqlite:///%s' % db) metadata = MetaData(engine) table = Table('t', metadata, Column('id', Integer, primary_key=True), Column('occurs', Integer, default=1), *(Column(col, Text) for col in data_cols)) table.append_constraint(UniqueConstraint(*data_cols)) metadata.create_all() table.insert().execute(dupes) assert table.select().count().scalar() == 50 start = time.time() insert = table.insert() update = (table.update(). where(and_(*((table.c[col] == bindparam(col)) for col in data_cols))). values({'occurs': table.c.occurs+1})) conn = engine.connect() tx = conn.begin() for row in dataset: try: conn.execute(insert, row) except IntegrityError: conn.execute(update, row) tx.commit() end = time.time() assert table.select().count().scalar() == 1200 assert select([func.count(table.c.id)], table.c.occurs==2).scalar() == 50 print elapsed: %04f % (end - start) ## ## ORM version ## import hashlib import os import time import random from sqlalchemy import * from sqlalchemy.orm import * data_cols = ( 'sendadr', 'fromadr', 'toadr', 'sendnam', 'fromnam', 'tonam', 'subject', 'received', 'spam', 'folderid' ) chunk = lambda: '%x' % random.getrandbits(400) dataset = [dict((col, chunk()) for col in data_cols) for _ in xrange(1200)] def hashrow(row): return hashlib.sha1( ','.join(row[c] for c in data_cols)).hexdigest() dupes = [] for row in random.sample(dataset, 50): dupe = row.copy() dupe['hash'] = hashrow(dupe) dupes.append(dupe) db = '1krows.db' if os.path.exists(db): os.unlink(db) engine = create_engine('sqlite:///%s' % db) metadata = MetaData(engine) table = Table('t', metadata, Column('id', Integer, primary_key=True), Column('occurs', Integer, default=1), Column('hash', String(40), unique=True), *(Column(col, Text) for col in data_cols)) metadata.create_all() table.insert().execute(dupes) assert table.select().count().scalar() == 50 class Email(object): def __init__(self, **kw): for key, value in kw.items(): setattr(self, key, value) def hashval(self): return hashrow(dict((col, getattr(self, col)) for col in data_cols)) mapper(Email, table) start = time.time() session = create_session() session.begin() data = [Email(**row) for row in dataset] chunk, remaining = [], [(e.hashval(), e) for e in data] while remaining: chunk, remaining = remaining[:100], remaining[100:] by_hash = dict(chunk) dupes = (session.query(Email). filter(Email.hash.in_(by_hash.keys(.all() for dupe in dupes: dupe.occurs += 1 by_hash.pop(dupe.hash) for hashval, email in by_hash.items(): email.hash = hashval session.save(email)
[sqlalchemy] Insert Expressions - Executing Multiple Statements
Hello all I'd like to insert more than one expression in a table. Some expressions don't have an entry in every column: conn = engine.connect() conn.execute(addresses.insert(), [ ...{'name': 'mike', 'email_address' : '[EMAIL PROTECTED]'}, ...{'name': 'wendy', 'fullname': 'Wendy Williams', 'email_address' : '[EMAIL PROTECTED]'}, ...{'name': 'jack', 'fullname': 'Jack Jones', 'email_address' : '[EMAIL PROTECTED]'}, ...{'name': 'julia', 'email_address' : '[EMAIL PROTECTED]'}, ... ]) SQLAlchemy generates: INSERT INTO addresses (name, email_address) VALUES (%s, %s) [['mike', '[EMAIL PROTECTED]'], ['wendy', '[EMAIL PROTECTED]'], ['jack', '[EMAIL PROTECTED]'], ['julia', '[EMAIL PROTECTED]']] The expressions to insert aren't complete, because the first expression doesn't have an entry 'fullname'. Why? Is it not possible to start with an 'incomplete' expression? What can I do with big tables (more columns) and only few entries? Thanks a lot Carla --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Creating multiple tables based on the same class
This is another I'm new to SQLAlchemy kind of question... I have two datasets that have exactly the same structure, and I wish to create separate database tables for these that are mapped to/from the structure of same underlying Python class. I looked for discussion of this in the SQLAlchemy manual, but the references I found (e.g. [1]) seem to discuss spreading a class over several tables rather than multiple instances/datasets. It appears that the mapper structure always associates a given python class with a single table. Is this correct? The pattern I'm contemplating to support multiple datasets is to define a subclass of the main class for each one, then map each subclass to its own table. Is there a better way? #g [1] http://www.sqlalchemy.org/docs/04/mappers.html#advdatamapping_mapper_joins --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Insert Expressions - Executing Multiple Statements
You could specify that fullname is null (None) for the columns that don't have a full name value. In fact you *need* to do that. The insert statement is compiled based on the first set of values given, so that will drive the remainder of the operation. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Creating multiple tables based on the same class
Yes, you can have the same class be associated with different tables at different times. Sqlalchemy uses a 'maper' to associate a class with a table. You can define multiple mappers for the same class that map it to different tables. One of these will be the primary mapper and will be used by default when you don't specify which mapper to use. To use the other mapper, you use the keyword argument entity_name and specify the mapper by name. Most session functions support that keyword arg. Here is the documentation: http://www.sqlalchemy.org/docs/04/mappers.html#advdatamapping_mapper_multiple On Mon, Apr 7, 2008 at 9:37 AM, GK [EMAIL PROTECTED] wrote: This is another I'm new to SQLAlchemy kind of question... I have two datasets that have exactly the same structure, and I wish to create separate database tables for these that are mapped to/from the structure of same underlying Python class. I looked for discussion of this in the SQLAlchemy manual, but the references I found (e.g. [1]) seem to discuss spreading a class over several tables rather than multiple instances/datasets. It appears that the mapper structure always associates a given python class with a single table. Is this correct? The pattern I'm contemplating to support multiple datasets is to define a subclass of the main class for each one, then map each subclass to its own table. Is there a better way? #g [1] http://www.sqlalchemy.org/docs/04/mappers.html#advdatamapping_mapper_joins --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: web application too slow
On Sun, Apr 06, 2008 at 05:26:31PM -0700, tux21b wrote: It's an Apache 2 with mod_wsgi. The server setup isn't configured for performance yet, and we can probably tune it a bit. But even when I test with the wsgi-ref server (without parallel requests) the application is slow. Beside sqlalchemy and memcached, we are using django (there are still some dependencies left, since it started as a django-app, but we are working on that *g*), werkzeug (a wsgi toolkit) and jinja (template engine). Are you testing on X11 with Mozilla, and if so, do you have transparent backgrounds? There's a pretty major performance problem with Mozilla and X11 where render operations on pages with transparent divs are excruciatingly slow on many video cards. The problem is compounded by multiple layers of transparency. In other words - make sure the slowness isn't the client :) -- Ross Vandegrift [EMAIL PROTECTED] The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell. --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: web application too slow
On Apr 7, 4:48 pm, Ross Vandegrift [EMAIL PROTECTED] wrote: In other words - make sure the slowness isn't the client :) I hope this post wasn't serious. Mozilla's CSS rendering engine, doesn't write in our python profile on the server and no, we don't use Mozilla for benchmarking (we are currently using ab and siege). But to ask otherwise, is there anybody else who has experienced similar performance problems when accessing lots of heavy eagerloaded sqlalchemy models wich are all already in the memory? Christoph --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: web application too slow
But to ask otherwise, is there anybody else who has experienced similar performance problems when accessing lots of heavy eagerloaded sqlalchemy models wich are all already in the memory? My app does quite a bit of heavy eagerloading in a lazy as-needed fashion, and then clears the session used to load the data and leaves the now-unpersisted data in cache. Works fine and runs quite quickly. For even faster loads, you can bypass the ORM and use the relational layer directly. The relational layer can in some instances perform much faster than ORM access, albeit with less convenience. Using sqla's relational layer doesn't have much of a performance hit than using the DB-API directly. If in your app you are accessing the cached data without a follow-on query on each web hit, it may be that you never cleared the instances from the session, and so they are thus still persistent and will reload from the DB in certain instances. If on the other hand your design is to incur a DB hit on each and every web hit, then you will probably have better luck tuning the query itself and using the relational layer than looking for the answer in tuning or tweaking the ORM. Your profile looks as though it's spending most of its time with processing attributes. Using the relational layer and working directly with native Python types would eliminate that. Rick --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Save instead of update
On Apr 6, 2008, at 11:19 PM, Michael Robellard wrote: Michael, Thanks for your reply. I have been trying what you suggested as well as several variants and I keep getting stuck with something along the lines of: FlushError: New instance [EMAIL PROTECTED] with identity key (class 'models. dbmodel.Node', (2L,), None) conflicts with persistent instance [EMAIL PROTECTED] 510 when I try to commit the new object. OK well the pattern you're attempting to do here is that you'd have a Node with ID #1, and then multiple VersionNodes with ID (1, 1), (1, 2).But the way joined table inheritance works is, a row from the base table corresponds to exactly one instance. In joined table inhertiance, the relation from parent to child table is necessarily one-to-one. So using joined inheritance in this manner for this particular schema is not appropriate. Since you want to have multiple VersionNode objects which all reference the same row in the nodes table, that is a many-to-one relation. So on the mapping side, you want to use composition to create this setup instead of inheritance.But through the usage of various tricks we can make it look very similar to the single VersionNode/Individual interface you're looking for. I'm not sure if the datamodel you've proposed is actually what you want (such as, what is an Individual? is it by design that a single Node can change its sex by having multiple Individual records of different sexes ?), but anyway below is a version that composes VersionNode and Node together, and uses a with_polymorphic selectable so that the nodes_table can provide the discriminator field to the VersionNode/Individual hierarchy (though im suspecting that you might want Individual to subclass Node, and have VersionNode be on its own...that would be much easier to set up than what I have below): from sqlalchemy import * from sqlalchemy.orm import * db = create_engine('sqlite:///:memory:', echo=True) metadata = MetaData() Session = sessionmaker(bind=db, autoflush=True, transactional=True) nodes_table = Table('nodes', metadata, Column('id', Integer, Sequence('node_id_seq'),primary_key=True), Column('type', String(50), nullable=False), ) versionnodes_table = Table('versionnodes', metadata, Column('id', Integer,ForeignKey('nodes.id'), primary_key=True), Column('vernum', Integer, primary_key=True,default=1), ) individual_table = Table('individual', metadata, Column('id', Integer(), primary_key=True), Column('vernum', Integer, primary_key=True), Column('sex', String(1)), ForeignKeyConstraint(['id','vernum'], ['versionnodes.id','versionnodes.vernum'], ondelete=CASCADE) ) node_relation_table = Table('noderelations', metadata, Column('id1', Integer,ForeignKey('nodes.id'), primary_key=True), Column('id2', Integer,ForeignKey('nodes.id'), primary_key=True) ) class Node(object): pass class VersionNode(object): def __init__(self, node=None): self.node = node or Node() self.node.type = versionnode def childnodes(self): return self.node.childnodes childnodes = property(childnodes) def IncrementVersion(self): session = Session() maxversion = session .execute (select ([func .max (versionnodes_table .c.vernum)],versionnodes_table.c.id==self.id)).scalar() self.vernum = maxversion + 1 def __repr__(self): return %s(id=%r, version=%r) % (self.__class__.__name__, self.id, self.vernum) class Individual(VersionNode): def __init__(self, node=None): self.node = node or Node() self.node.type = individual def create_new_version(self): v = Individual(self.node) v.sex = self.sex v.id = self.id v.IncrementVersion() return v def __repr__(self): return %s(id=%r, version=%r, sex=%r) % (self.__class__.__name__, self.id, self.vernum, self.sex) mapper(Node, nodes_table, properties={'childnodes':relation(Node, secondary=node_relation_table, primaryjoin=nodes_table.c.id==node_relation_table.c.id1, secondaryjoin=nodes_table.c.id==node_relation_table.c.id2, backref='parentnodes'), } ) vn_select = select([versionnodes_table, nodes_table .c.type]).select_from(versionnodes_table.join(nodes_table)).alias() ind_select = select([versionnodes_table, individual_table, nodes_table .c .type ]).select_from (versionnodes_table.join(individual_table).join(nodes_table)).alias() mapper(VersionNode,
[sqlalchemy] has() and multiple tables
I'm having some trouble using the has() operator to avoid a long chain of joins in a select. A stripped-down version of my schema looks like this: a Caller has a (phone) number. A Caller can create Connections (each to a phone number). A Connection can involve one or more Tasks, each of which involves zero or more Actions. Each one-to-many is represented by a bog-standard mapper on an ORM object: callers = Table('callers', metadata, Column('id', Integer, primary_key=True), Column('number', String(20))) connections = Table('connections', metadata, Column('id', Integer, primary_key=True), Column('phone_number', String(10)), Column('caller_id', Integer, ForeignKey('callers.id'))) tasks = Table('tasks', metadata, Column('id', Integer, primary_key=True), Column('connection_id', Integer, ForeignKey('connections.id'))) actions = Table('actions', metadata, Column('id', Integer, primary_key=True), Column('task_id', Integer, ForeignKey('tasks.id')), Column('description', String(100))) class Caller(object): pass class Connection(object): pass class Task(object): pass class Action(object): pass mapper(Action, actions) mapper(Task, tasks, properties={ 'actions':relation(Action, backref='task')}) mapper(Connection, connections, properties={ 'tasks':relation(Task, backref='connection')}) mapper(Caller, callers, properties={ 'connections':relation(Connection, backref='caller')}) So, to find out what actions have been initiated by a given caller, I'm doing: session.query(Action).filter(Action.task.has(Connection.caller==caller1)) However, this appears to generate a cartesian join on the 'tasks' and 'connections' in the EXISTS() subselect, which leads to extra actions being returned. Am I misusing has() here? The obvious workaround is explicitly joining or explicitly sub-selecting, both of which appear to work, but I was hoping that using has() would produce cleaner code. :-) A runnable, commented version of the example is at http://pastebin.com/m674fc403. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: mssql, Linux, unixODBC - Could not locate column in row for column
The limitation here I guess is 30 char identifier limit but I will need to test it. Ah yeah, you're going to have bigger problems than 30 char identifiers with the Sybase TDS settings. MSSQL uses a different set of wire-protocol representations of datetime types, and your dates are going to be off. In order to use more characters the FreeTDS should be configured to use TDS protocol 7.0 which: Even that one is old: MSSQL switched to TDSv7 back in MSSQL 7.0 (circa 1997) They're now on TDSv8 as of MSSQL-2000 Here's a chart: http://www.freetds.org/tds.html It beats me why FreeTDS still defaults to Sybase instead of MSSQL, the installed base has to run at least 100 to 1 in favor of MSSQL. Oh well. When you get all this set up correctly, you may want to update the sqlalchemy wiki with all this stuff. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: has() and multiple tables
On Apr 7, 2008, at 4:54 PM, Tim Lesher wrote: I'm having some trouble using the has() operator to avoid a long chain of joins in a select. session .query(Action).filter(Action.task.has(Connection.caller==caller1)) However, this appears to generate a cartesian join on the 'tasks' and 'connections' in the EXISTS() subselect, which leads to extra actions being returned. for this kind of thing, you're joining across three tables, so you can put an extra join condition in the has(): filter(Action.task.has(and_(Connection.caller==caller1, Task.connection_id==Connection.id))) or do it with join(): query(Action).join('task').filter(Task.has(Connection.caller==caller1)) I have an intuitive sense that there should be some nice syntax to get has() and any() to join across more tables but its not occuring to me at the moment. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: sqlalchemy 0.4.5 released
i.e. session.query([MappedObject.id, MappedObject.name]).filter(...).all() I actually thought of this mapping as well. Only because it seemed consistent with select. Granted I don't understand all the reasons why it could / could not work. That said I'm happy with values as it is, with just a couple of suggestions which I posted on the dev list. instead of session.query(MappedObject).filter(...).values(MappedObject.id, MappedObject.name).all() I actually like _values (and _select_from) because it gives you the option to modify *existing* query. This is useful because you don't have know everything at the time of query creation (you can write library functions that operate on user-created queries). It is also consistent with generative nature of query/select. Except it's not generative itself. :) Michael Trier blog.michaeltrier.com --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: has() and multiple tables
On 4/7/08, Michael Bayer [EMAIL PROTECTED] wrote: On Apr 7, 2008, at 4:54 PM, Tim Lesher wrote: session .query(Action).filter(Action.task.has(Connection.caller==caller1)) for this kind of thing, you're joining across three tables, so you can put an extra join condition in the has(): filter(Action.task.has(and_(Connection.caller==caller1, Task.connection_id==Connection.id))) or do it with join(): query(Action).join('task').filter(Task.has(Connection.caller==caller1)) I have an intuitive sense that there should be some nice syntax to get has() and any() to join across more tables but its not occuring to me at the moment. I'm confused -- this sample does correlate actions with tasks, but not tasks with connections. Intuitively it seems that X.has(Y) should always add a clause to the the exists for y.y_id = x.y_id. No? -Jonathan --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Sqlalchemy Stored Procedures
Hi, I would like to use advantage of the Sqlalchemy Stored Procedures. engine = create_engine('postgres://postgres:[EMAIL PROTECTED]:5432/ neithal', echo=True) session = scoped_session(sessionmaker(bind=eng,transaction=True,autoflush=False)) trans=session.begin() sql = select([func.add_user_f(108,'kk','kk')]) result = session.execute(sql) result.close() session.flush() session.commit() Thanks Madhu Alagu --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---