[sqlalchemy] ShardedQuery bulk delete
Hi, I have a problem of bulk deleting from a sharded session, I tried to search the previous posts without any finding. I have the following table: usersession_table = Table('kg_usersession', meta, Column('session_id', String(32), primary_key=True), Column('user_id', BigInteger, index=True, nullable=False), Column('create_time', DateTime, index=True), Column('expire_time', DateTime, index=True), Column('site', String(10)), mysql_engine='MyISAM' ) I use horizontal sharding by session_id, with the following shard chooser: def shard_chooser(mapper, instance, clause=None): if instance: return shard_value(instance.session_id) Then, I want to delete all record earlier than a given expire_time, with the following code: session.query(UserSession).filter(UserSession.expire_time=expire_time).delete(); It raises an error: Traceback (most recent call last): File delete_expire_session.py, line 20, in module delete_expire_session(expire_time) File delete_expire_session.py, line 13, in delete_expire_session session.query(UserSession).filter(UserSession.expire_time=expire_time).delete(); File /usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.6.5- py2.6.egg/sqlalchemy/orm/query.py, line 2142, in delete result = session.execute(delete_stmt, params=self._params) File /usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.6.5- py2.6.egg/sqlalchemy/orm/session.py, line 726, in execute engine = self.get_bind(mapper, clause=clause, **kw) File /usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.6.5- py2.6.egg/sqlalchemy/ext/horizontal_shard.py, line 73, in get_bind return self.__binds[shard_id] KeyError: None I guess shard_chooser return None because of instance is None at runtime. I read from the docs: shard_chooser maybe in some round- robin scheme. But I don't have any idea what does it exactly mean in my case. I appreciate any advice. Best regards! can PS: you can access partial source code in the gist: https://gist.github.com/930708 -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] default viewonly=True when lazy='dynamic'
Just a suggestion, but wouldn't we want to always default viewonly=True when lazy='dynamic'? Or are there use cases such that the orm can actually still be expected to understand the relationship correctly even when unknown filter criteria are added? -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Best design for commits?
Hey guys, I have a Pylons back-end running on SQLAlchemy. I have a script that reads a tree of XML files from an HTTP server (it downloads an xml X, and then downloads that X's children, and then iterates the children, and so forth in recursion). Each xml file represents an SQLAlchemy model. The problem is that I have thousands of these xml's (sometimes 5000, sometimes 26000). I was able to optimize the download process with HTTP pooling, but I cannot seem to think of the best approach as to committing the models to the DB. Every time an xml file is downloaded, I create an orm object for it and add it to my session. Problem 1: some xml's will exists multiple times in the tree so I am checking that there is no duplicate insertion. Is the check in my code optimal or should I keep an indexed collection on the side and use it to check for duplicates? Problem 2: my autocommit is set to False because I don't want to commit on every add (not because its bad design, but because of performance). But I also don't want to iterate the entire tree of thousands of categories without committing at all. Therefor, I created a constant number upon which my code commits the data. Is this a good approach? What would be a good number for that? It might be important to mention that I do not know in advance how many xml's I am looking at. Here is what my pseudo-code looks like now (ignore syntax errors): count = 0 COMMIT_EVERY = 50 def recursion(parent): global count, COMMIT_EVERY pool = get_http_connection_pool(...) sub_xmls = get_sub_xmls(pool, parent) if sub_xmls == None: return for sub_xml in sub_xmls: orm_obj = MyObj(sub_xml) duplicate = Session.query(MyObj).filter(MyObj.id == orm_obj.id).first() if not duplicate: Session.add(orm_obj) count = count + 1 if count % COMMIT_EVERY == 0: Session.commit() recursion(orm_obj.id) recursion(0) -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Referencing col from a joined table
I have a query: query = Session.query(SmartdataEligibilityRecord).order_by(SmartdataEligibilityRecord.term_date) Which creates the following sql statement (the value of query): SELECT smartdata_eligibility_records.id AS smartdata_eligibility_records_id,smartdata_eligibility_records.hic_number AS smartdata_eligibility_records_hic_number, ... members_1.id AS members_1_id,... members_1.last_name AS members_1_last_name FROM smartdata_eligibility_records LEFT OUTER JOIN members AS members_1 ON members_1.id = smartdata_eligibility_records.member_id; the mapper between these two tables,SmartdataEligibilityRecord and Member, is: mapper(SmartdataEligibilityRecord, smartdata_eligibility_records, properties = { 'member':relation(Member, lazy=False, uselist=False), }) and I can access the values in the rendered page with something like: % for result in c.results: div class=claimrow span${result.hic_number}/span All works fine. HOWEVER, when I try to access one of the fields from the members table, I just get errors? how do I access that? I have tried ${result.members.last_name}, or the singular on the table such as ${result.member.last_name} or as it is put in the sql $ {result.members_1_last_name}..nothing. I have looked through the docs like crazy and just canot find what the syntax should be to access the members (the left outer joined) table in this.Thanks, RVince -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] default viewonly=True when lazy='dynamic'
On Apr 20, 2011, at 7:48 AM, Kent wrote: Just a suggestion, but wouldn't we want to always default viewonly=True when lazy='dynamic'? Or are there use cases such that the orm can actually still be expected to understand the relationship correctly even when unknown filter criteria are added? dynamic relations are completely mutable from the base attribute. you can append() and remove() items from them and the changes are held in a queue thats released at flush time. there's no case that additional filtered criteria comes into play here, as the resulting object is a plain Query and has no append()/remove() methods. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Best design for commits?
What I tend to do in cases like this is to break things into commit chunks. For instance, I've got an import script that goes through and processes 10 at a time and performs a commit every 10. This is tweakable via a config setting, but so far 10 works for my needs. As for the duplicates, If they're exact and you don't need to processes changes between one record and another of the same pk, just try: except and catch the pk error on the second insert. With my import script, if I get an exception, I then run through all 10 committing one at a time and finding the 'bad egg' so that I can log out that this one failed because of the exception. Sure, for that batch of 10 I'm doing a bunch of single commits, but more often than not I'm running in my batch mode. I'd just the db/exceptions tell me a record exists vs trying to query the server for each one to check first. Good ole case of 'better to ask for forgiveness than permission'. Rick -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Referencing col from a joined table
What is the error that you're getting? Is it just that the value is None? You're doing a left outer join which means you might gets rows back that don't have any member record data tied to it. In that case you need to check if you have a member first, then access the properties on it. % if result.member: ${ result.member.last_name} % else: No Last Name % endif type of thing Rick -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Best design for commits?
Dear Rick, Thank you for your reply. I understand, but is there not a better way than doing a lot of single commits in case of a commit exception? In other words, is there a way to tell SQLAlchemy to throw an exception on the Session.add if there's a duplicate as opposed to on the Session.commit? Or else, is it not better to keep an indexed collection on the side to check that an id was inserted before? Thanks! On Apr 20, 12:05 pm, Richard Harding rhard...@mitechie.com wrote: What I tend to do in cases like this is to break things into commit chunks. For instance, I've got an import script that goes through and processes 10 at a time and performs a commit every 10. This is tweakable via a config setting, but so far 10 works for my needs. As for the duplicates, If they're exact and you don't need to processes changes between one record and another of the same pk, just try: except and catch the pk error on the second insert. With my import script, if I get an exception, I then run through all 10 committing one at a time and finding the 'bad egg' so that I can log out that this one failed because of the exception. Sure, for that batch of 10 I'm doing a bunch of single commits, but more often than not I'm running in my batch mode. I'd just the db/exceptions tell me a record exists vs trying to query the server for each one to check first. Good ole case of 'better to ask for forgiveness than permission'. Rick -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Best design for commits?
Not that I'm aware of. When you do a Session.add() it's not touching the database yet. It's part of the performance tradeoff. There's not a good way for it to *know* there's a record in the db with that pk id until it does chat with the db on it. Sure, you can keep a list of ids on the side if you want, but it just seems that you're going to have a try: except block there anyway in case of other issues, db connection fails, bad values, etc, that you might as well just catch the exception for a row already existing as well. Rick -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Referencing col from a joined table
You are the man. That was it! Thanks so much. -RVince -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Best design for commits?
I agree, but the thing is that committing every 10 entries is a little low for me, I was thinking of around 50, at which case having 50 individual commits is quite costly.. In case I choose the implement your method, how would you go about it? How do you keep objects of the last 50 or whatever records from the last commits that have not been committed yet (taking into account my recursion)? Thanks again! On Apr 20, 12:17 pm, Richard Harding rhard...@mitechie.com wrote: Not that I'm aware of. When you do a Session.add() it's not touching the database yet. It's part of the performance tradeoff. There's not a good way for it to *know* there's a record in the db with that pk id until it does chat with the db on it. Sure, you can keep a list of ids on the side if you want, but it just seems that you're going to have a try: except block there anyway in case of other issues, db connection fails, bad values, etc, that you might as well just catch the exception for a row already existing as well. Rick -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Best design for commits?
I'm not sure, but I'd check the exception and see if you can get the info about which of your 50 were the dupe. I don't recall if it's in the traceback or exception error. If you can identify it then you could store it aside and remove it from the session and retry the other 49 again. Otherwise, it's the case of finding the mole. Maybe run some sort of binary split of the 50 so that you split the list in half, try to commit each half, one works, one fails. Split the fail side again, etc. In this way you should really only get down to what, 7 commits per 50? This is all assuming one dupe/bad record in the group of 50. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
RE: [sqlalchemy] Best design for commits?
-Original Message- From: sqlalchemy@googlegroups.com [mailto:sqlalchemy@googlegroups.com] On Behalf Of Aviv Giladi Sent: 20 April 2011 15:53 To: sqlalchemy Subject: [sqlalchemy] Best design for commits? Hey guys, I have a Pylons back-end running on SQLAlchemy. I have a script that reads a tree of XML files from an HTTP server (it downloads an xml X, and then downloads that X's children, and then iterates the children, and so forth in recursion). Each xml file represents an SQLAlchemy model. The problem is that I have thousands of these xml's (sometimes 5000, sometimes 26000). I was able to optimize the download process with HTTP pooling, but I cannot seem to think of the best approach as to committing the models to the DB. Every time an xml file is downloaded, I create an orm object for it and add it to my session. Problem 1: some xml's will exists multiple times in the tree so I am checking that there is no duplicate insertion. Is the check in my code optimal or should I keep an indexed collection on the side and use it to check for duplicates? Problem 2: my autocommit is set to False because I don't want to commit on every add (not because its bad design, but because of performance). But I also don't want to iterate the entire tree of thousands of categories without committing at all. Therefor, I created a constant number upon which my code commits the data. Is this a good approach? What would be a good number for that? It might be important to mention that I do not know in advance how many xml's I am looking at. Here is what my pseudo-code looks like now (ignore syntax errors): count = 0 COMMIT_EVERY = 50 def recursion(parent): global count, COMMIT_EVERY pool = get_http_connection_pool(...) sub_xmls = get_sub_xmls(pool, parent) if sub_xmls == None: return for sub_xml in sub_xmls: orm_obj = MyObj(sub_xml) duplicate = Session.query(MyObj).filter(MyObj.id == orm_obj.id).first() if not duplicate: Session.add(orm_obj) count = count + 1 if count % COMMIT_EVERY == 0: Session.commit() recursion(orm_obj.id) recursion(0) I'm not sure I can comment on the overall approach, but there are a couple of things that might help you. 1. If you use Query.get rather than Query.filter, you won't actually query the database when the object already exists in the session. You'll probably need to clear the session every now and then (I don't think flush() or commit() clear it, but I could be wrong) 2. You may want to distinguish Session.flush() from Session.commit() - you could flush every N new objects, and only commit once at the very end. 3. If you know you are the only person writing to the database, consider setting expire_on_commit=False on your session. Otherwise I think accessing orm_obj.id after Session.commit() will trigger another (possibly unnecessary) query to the database. Hope that helps, Simon -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Best design for commits?
Thanks again Rick. The issue is that I have a LOT of duplicates (around 20-30%) - that's just how that tree is structured. Therefore, I think I am going to go with catching DB exceptions regardless, but also use an indexed collection to prevent duplicates. Cheers! On Apr 20, 12:43 pm, Richard Harding rhard...@mitechie.com wrote: I'm not sure, but I'd check the exception and see if you can get the info about which of your 50 were the dupe. I don't recall if it's in the traceback or exception error. If you can identify it then you could store it aside and remove it from the session and retry the other 49 again. Otherwise, it's the case of finding the mole. Maybe run some sort of binary split of the 50 so that you split the list in half, try to commit each half, one works, one fails. Split the fail side again, etc. In this way you should really only get down to what, 7 commits per 50? This is all assuming one dupe/bad record in the group of 50. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Best way to insert different string to Unicode columns?
Hey guys, I have a SQLAlchemy model with a Unicode column. I sometimes insert unicode values to it (u'Value'), but also sometimes insert ASCII strings. What is the best way to go about this? When I insert ASCII strings with special characters I get this warning: SAWarning: Unicode type received non-unicode bind param value ... How do I avoid this? What is the proper way to insert my different types of strings? -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Support for tuple expressions?
On Apr 20, 2011, at 1:09 AM, bukzor wrote: Thanks Michael. I won't be using or supporting SQL Server, so I'm quite fine with that. I guess the way forward would be to install SA in develop mode and make the changes directly? no, to create your own SQL constructs use the compiler extension: http://www.sqlalchemy.org/docs/core/compiler.html Excuse my ignorance, but why would you do it that way? Since I don't want to change any of the functionality of _Tuple I wouldn't think a subclass would be necessary. Also, since comparison are already captured by _CompareMixin and represented as _BinaryExpression, I would have thought that intercepting the compilation of _BinaryExpression in the sqlite dialect would be the way to go. --Buck On Apr 19, 6:51 am, Michael Bayer mike...@zzzcomputing.com wrote: there's a tuple_() operator: http://www.sqlalchemy.org/docs/core/expression_api.html?highlight=tup... the object returns is _Tuple. if you wanted fancy per-dialect expression behavior, mmm tricky, you'd need to subclass _Tuple (or write a new object), intercept comparisons like __eq__(), then return *another* construct that represents a Tuple Comparison, and that object would need per-dialect compilation rules. or you could lobby the sqlite folks to add support for the construct in the first place.though i doubt SQL Server has it either. On Apr 18, 2011, at 4:57 PM, bukzor wrote: SQL-92 defines a row value constructor expression like (1,2,3) which looks and behaves exactly like a Python tuple, as far as I can tell. These are implemented correctly in mysql at least, and I believe PostgreSQL and Oracle as well, although I don't have access to those systems. What would be the best way to deal with this type of value in SQLAlchemy? Should I create a RowValue class which can be visited by the various dialects? If I wanted to provide emulation for dialects which don't directly support this standard, what would be the way to go? For example, I'd like to be able to expand RowValue((1,2)) = RowValue(colA, colB) to 1 colA or (1=colA and 2 = ColB) under sqlite. --Buck -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] ShardedQuery bulk delete
On Apr 20, 2011, at 4:37 AM, can xiang wrote: Hi, I have a problem of bulk deleting from a sharded session, I tried to search the previous posts without any finding. I have the following table: usersession_table = Table('kg_usersession', meta, Column('session_id', String(32), primary_key=True), Column('user_id', BigInteger, index=True, nullable=False), Column('create_time', DateTime, index=True), Column('expire_time', DateTime, index=True), Column('site', String(10)), mysql_engine='MyISAM' ) I use horizontal sharding by session_id, with the following shard chooser: def shard_chooser(mapper, instance, clause=None): if instance: return shard_value(instance.session_id) Then, I want to delete all record earlier than a given expire_time, with the following code: session.query(UserSession).filter(UserSession.expire_time=expire_time).delete(); range deletions are not supported by the ShardedSession extension right now. You'd need to implement your own delete() onto ShardedQuery. Note that the horizontal shard extension really should have been an example, not a full extension. It's really just a proof of concept and real-world horizontal sharding scenarios will usually need to tweak it for specific use cases. It raises an error: Traceback (most recent call last): File delete_expire_session.py, line 20, in module delete_expire_session(expire_time) File delete_expire_session.py, line 13, in delete_expire_session session.query(UserSession).filter(UserSession.expire_time=expire_time).delete(); File /usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.6.5- py2.6.egg/sqlalchemy/orm/query.py, line 2142, in delete result = session.execute(delete_stmt, params=self._params) File /usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.6.5- py2.6.egg/sqlalchemy/orm/session.py, line 726, in execute engine = self.get_bind(mapper, clause=clause, **kw) File /usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.6.5- py2.6.egg/sqlalchemy/ext/horizontal_shard.py, line 73, in get_bind return self.__binds[shard_id] KeyError: None I guess shard_chooser return None because of instance is None at runtime. I read from the docs: shard_chooser maybe in some round- robin scheme. But I don't have any idea what does it exactly mean in my case. I appreciate any advice. Best regards! can PS: you can access partial source code in the gist: https://gist.github.com/930708 -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Best design for commits?
my practices with this kind of situation are: 1. theres just one commit() at the end. I'd like the whole operation in one transaction 2. There are flush() calls every 100-1000 or so. 10 is very low. 3. I frequently will disable autoflush, if there are many flushes occurring due to queries for related data as the bulk proceeds. 4. I dont use try/except to find duplicates - this invalidates the transaction (SQLAlchemy does this but many DBs force it anyway). I use a SELECT to get things ahead of time, preferably loading the entire database worth of keys into a set, or loading the keys that I know we're dealing with, so that individual per-key SELECTs are not needed.Or if the set of data I'm working with is the whole thing at once, I store the keys in a set as I get them, then I know which one's I've got as I go along. 5. if i really need to do try/except, use savepoints, i.e. begin_nested(). -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Self-referencing Table Cannot Have 0 as Primary Index?
On Apr 20, 2011, at 12:33 AM, Aviv Giladi wrote: Hey guys, I reproduced the problem within my controllers, but I also ran this test (after fully loading my environment of course): parent = Node() parent.id = 1 parent.parent_id = None parent.name = 'parent' Session.add(parent) child = Node() child.id = 20 child.parent_id = 1 child.name = 'child' Session.add(child) Session.commit() foreign key constraint fails (`db`.`nodes`, CONSTRAINT `nodes_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `nodes` (`id`))') 'INSERT INTO nodes (id, parent_id, name) VALUES (%s, %s, %s)' (20, 0, 'child') The problem arises when I change the `parent` node's id to 0 (and the `child`'s parent_id to 0 accordingly). Then, I get the following exception: if you update the value of a primary key column that's referenced by foreign keys to a new value, all referencing foreign keys must be updated simultaneously, that is, within the single UPDATE statement. You use ON UPDATE CASCADE with your database to achieve this. That said, a primary key value of 0 is a poor choice - you should stick with 1-based integer primary key values. The actual value in a so-called surrogate primary key should also never be significant within the application. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] How to determine many-to-many relations using object_mapper().iterate_properties?
I'm having a bit of trouble with this snippet from another stackoverflow question: http://stackoverflow.com/questions/1623661/sqlalchemy-shallow-copy-avoiding-lazy-loading The snippet in question: from sqlalchemy.orm import object_mapper, ColumnProperty, RelationProperty newobj = type(src)() for prop in object_mapper(src).iterate_properties: if (isinstance(prop, ColumnProperty) or isinstance(prop, RelationProperty) and prop.secondary): setattr(newobj, prop.key, getattr(src, prop.key)) It looks like some things might've changed in SQLAlchemy after this snippet was written, as the interesting tidbit from the answer is: Many-to-many relations can be determined with isinstance(prop, RelationProperty) and prop.secondary test. So, taking that to heart, I have some code that returns all the attributes on a declarative base model in a dictionary, here's a simple snippet of code below: from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import Column, Integer, String, ForeignKey, create_engine from sqlalchemy.orm import Session, relationship from sqlalchemy.orm.util import object_mapper from sqlalchemy.orm.properties import ColumnProperty, RelationProperty Base = declarative_base() class Friend(Base): __tablename__ = 'friends' user_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), primary_key=True) friend_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), primary_key=True) class User(Base): __tablename__ = 'users' id = Column(Integer, primary_key=True) name = Column(String(60), nullable=False) friends = relationship('User', secondary=Friend.__table__, primaryjoin=(id == Friend.__table__.c.user_id), secondaryjoin=(Friend.__table__.c.friend_id == id), cascade='delete', passive_deletes=True) e = create_engine('sqlite://', echo=True) Base.metadata.create_all(e) sess = Session(e) me = User(name='me') you = User(name='you') sess.add_all([me, you]) sess.commit() sess.add(Friend(user_id=me.id, friend_id=you.id)) sess.commit() # using the snippet above user = sess.query(User).first() dictionary = {} for model_property in object_mapper(user).iterate_properties: if isinstance(model_property, ColumnProperty): key = model_property.key dictionary[key] = getattr(user, key) elif (isinstance(model_property, RelationProperty) and model_property.secondary): key = model_property.key dictionary[key] = getattr(user, key) print dictionary You'll notice the line where it's checking for a relationship property and if the property is actually secondary throws an exception: TypeError: Boolean value of this clause is not defined What's the correct way to do what I'm trying to do? Thanks! Mahmoud -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] How to determine many-to-many relations using object_mapper().iterate_properties?
On Apr 20, 2011, at 5:09 PM, Mahmoud Abdelkader wrote: for model_property in object_mapper(user).iterate_properties: if isinstance(model_property, ColumnProperty): key = model_property.key dictionary[key] = getattr(user, key) elif (isinstance(model_property, RelationProperty) and model_property.secondary): key = model_property.key dictionary[key] = getattr(user, key) print dictionary You'll notice the line where it's checking for a relationship property and if the property is actually secondary throws an exception: TypeError: Boolean value of this clause is not defined What's the correct way to do what I'm trying to do? .secondary points to a Table object and you'll find information about testing SQL expression constructs for boolean here: http://www.sqlalchemy.org/trac/wiki/06Migration#AnImportantExpressionLanguageGotcha Thanks! Mahmoud -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] ShardedQuery bulk delete
Thanks anyway. It's sad horizontal shard extension is only considered as a example. I hardly believe it, because it works so great in some of my simple use case. I really hope there would be more work on this extension or more docs on how to do it. Best regards! can On Thu, Apr 21, 2011 at 3:19 AM, Michael Bayer mike...@zzzcomputing.comwrote: On Apr 20, 2011, at 4:37 AM, can xiang wrote: Hi, I have a problem of bulk deleting from a sharded session, I tried to search the previous posts without any finding. I have the following table: usersession_table = Table('kg_usersession', meta, Column('session_id', String(32), primary_key=True), Column('user_id', BigInteger, index=True, nullable=False), Column('create_time', DateTime, index=True), Column('expire_time', DateTime, index=True), Column('site', String(10)), mysql_engine='MyISAM' ) I use horizontal sharding by session_id, with the following shard chooser: def shard_chooser(mapper, instance, clause=None): if instance: return shard_value(instance.session_id) Then, I want to delete all record earlier than a given expire_time, with the following code: session.query(UserSession).filter(UserSession.expire_time=expire_time).delete(); range deletions are not supported by the ShardedSession extension right now. You'd need to implement your own delete() onto ShardedQuery. Note that the horizontal shard extension really should have been an example, not a full extension. It's really just a proof of concept and real-world horizontal sharding scenarios will usually need to tweak it for specific use cases. It raises an error: Traceback (most recent call last): File delete_expire_session.py, line 20, in module delete_expire_session(expire_time) File delete_expire_session.py, line 13, in delete_expire_session session.query(UserSession).filter(UserSession.expire_time=expire_time).delete(); File /usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.6.5- py2.6.egg/sqlalchemy/orm/query.py, line 2142, in delete result = session.execute(delete_stmt, params=self._params) File /usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.6.5- py2.6.egg/sqlalchemy/orm/session.py, line 726, in execute engine = self.get_bind(mapper, clause=clause, **kw) File /usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.6.5- py2.6.egg/sqlalchemy/ext/horizontal_shard.py, line 73, in get_bind return self.__binds[shard_id] KeyError: None I guess shard_chooser return None because of instance is None at runtime. I read from the docs: shard_chooser maybe in some round- robin scheme. But I don't have any idea what does it exactly mean in my case. I appreciate any advice. Best regards! can PS: you can access partial source code in the gist: https://gist.github.com/930708 -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Best design for commits?
Thank you for your responses everyone. I have one more question - the really time heavy task here is retrieving the URLs over HTTP (it takes almost a second per URL). I am using urllib3 that has connection pooling, but other than that, is there any other way to speed this up? Perhaps multi-threading? On Apr 20, 3:26 pm, Michael Bayer mike...@zzzcomputing.com wrote: my practices with this kind of situation are: 1. theres just one commit() at the end. I'd like the whole operation in one transaction 2. There are flush() calls every 100-1000 or so. 10 is very low. 3. I frequently will disable autoflush, if there are many flushes occurring due to queries for related data as the bulk proceeds. 4. I dont use try/except to find duplicates - this invalidates the transaction (SQLAlchemy does this but many DBs force it anyway). I use a SELECT to get things ahead of time, preferably loading the entire database worth of keys into a set, or loading the keys that I know we're dealing with, so that individual per-key SELECTs are not needed. Or if the set of data I'm working with is the whole thing at once, I store the keys in a set as I get them, then I know which one's I've got as I go along. 5. if i really need to do try/except, use savepoints, i.e. begin_nested(). -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] ShardedQuery bulk delete
ah well, glad to hear that ! It certainly can be more of a core element if it had some more dedicated maintainers. I haven't actually used it in a real project so i can't vouch strongly for it.It also might be a nice third-party project. Sharding is very tough and there's lots of complex cases that come up pretty fast. On Apr 20, 2011, at 8:05 PM, can xiang wrote: Thanks anyway. It's sad horizontal shard extension is only considered as a example. I hardly believe it, because it works so great in some of my simple use case. I really hope there would be more work on this extension or more docs on how to do it. Best regards! can On Thu, Apr 21, 2011 at 3:19 AM, Michael Bayer mike...@zzzcomputing.com wrote: On Apr 20, 2011, at 4:37 AM, can xiang wrote: Hi, I have a problem of bulk deleting from a sharded session, I tried to search the previous posts without any finding. I have the following table: usersession_table = Table('kg_usersession', meta, Column('session_id', String(32), primary_key=True), Column('user_id', BigInteger, index=True, nullable=False), Column('create_time', DateTime, index=True), Column('expire_time', DateTime, index=True), Column('site', String(10)), mysql_engine='MyISAM' ) I use horizontal sharding by session_id, with the following shard chooser: def shard_chooser(mapper, instance, clause=None): if instance: return shard_value(instance.session_id) Then, I want to delete all record earlier than a given expire_time, with the following code: session.query(UserSession).filter(UserSession.expire_time=expire_time).delete(); range deletions are not supported by the ShardedSession extension right now. You'd need to implement your own delete() onto ShardedQuery. Note that the horizontal shard extension really should have been an example, not a full extension. It's really just a proof of concept and real-world horizontal sharding scenarios will usually need to tweak it for specific use cases. It raises an error: Traceback (most recent call last): File delete_expire_session.py, line 20, in module delete_expire_session(expire_time) File delete_expire_session.py, line 13, in delete_expire_session session.query(UserSession).filter(UserSession.expire_time=expire_time).delete(); File /usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.6.5- py2.6.egg/sqlalchemy/orm/query.py, line 2142, in delete result = session.execute(delete_stmt, params=self._params) File /usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.6.5- py2.6.egg/sqlalchemy/orm/session.py, line 726, in execute engine = self.get_bind(mapper, clause=clause, **kw) File /usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.6.5- py2.6.egg/sqlalchemy/ext/horizontal_shard.py, line 73, in get_bind return self.__binds[shard_id] KeyError: None I guess shard_chooser return None because of instance is None at runtime. I read from the docs: shard_chooser maybe in some round- robin scheme. But I don't have any idea what does it exactly mean in my case. I appreciate any advice. Best regards! can PS: you can access partial source code in the gist: https://gist.github.com/930708 -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Best design for commits?
if the URL fetch is an IO-bound operation (i.e. the time is spent waiting for IO), it might work if you did a standard consumer/producer model using Queue.Queue. One thread retrieves data from each URL and places the datasets into the Queue. the other thread pulls off items and loads them into the DB. Or the same idea, using the multiprocessing module instead of threading if the GIL is still getting in the way. Or using Celery. Maybe a deferred approach like that of Twisted. There's lots of ways to offload slow IO operations while work continues. On Apr 20, 2011, at 8:33 PM, Aviv Giladi wrote: Thank you for your responses everyone. I have one more question - the really time heavy task here is retrieving the URLs over HTTP (it takes almost a second per URL). I am using urllib3 that has connection pooling, but other than that, is there any other way to speed this up? Perhaps multi-threading? On Apr 20, 3:26 pm, Michael Bayer mike...@zzzcomputing.com wrote: my practices with this kind of situation are: 1. theres just one commit() at the end. I'd like the whole operation in one transaction 2. There are flush() calls every 100-1000 or so. 10 is very low. 3. I frequently will disable autoflush, if there are many flushes occurring due to queries for related data as the bulk proceeds. 4. I dont use try/except to find duplicates - this invalidates the transaction (SQLAlchemy does this but many DBs force it anyway). I use a SELECT to get things ahead of time, preferably loading the entire database worth of keys into a set, or loading the keys that I know we're dealing with, so that individual per-key SELECTs are not needed.Or if the set of data I'm working with is the whole thing at once, I store the keys in a set as I get them, then I know which one's I've got as I go along. 5. if i really need to do try/except, use savepoints, i.e. begin_nested(). -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Oracle column names beginning with a digit
Hi Michael, On Apr 20, 11:20 am, Michael Bayer mike...@zzzcomputing.com wrote: yeah that is pretty awful, Oracle is super picky about names, and there's already a lot of stuff to appease its limitations with bind parameters. The bind name is ultimately derived from the key of the column so this would change the name: Table(my_table, metadata, Column(100K_Name, String, key=hundredkname), autoload=True) Thank you for the key tip, for now I'll use something like this: ---8--- FOUNDDIGITNAME=False digitcols = [] for c in vals: if re.match('^[0-9]',c): digitcols.append(c) FOUNDDIGITNAME=True if FOUNDDIGITNAME: cols = [] for dc in digitcols: dckey='dc_' + dc col = Column(dc,site_tbl.c[dc].type,key=dckey) cols.append(col) vals[dckey]=vals[dc] del vals[dc] site_tbl = Table('TBL_SITE', metadata, *cols, schema='foo', useexisting=True, autoload=True ---8--- beyond using the key thing, which changes how you reference the column object in Python, we'd have to dig into the core and handle this case. That would be a great rainy day project but for now I am happy with that little overhead. Kind Regards, Sirko -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Oracle column names beginning with a digit
Hi Michael, On Apr 20, 11:20 am, Michael Bayer mike...@zzzcomputing.com wrote: yeah that is pretty awful, Oracle is super picky about names, and there's already a lot of stuff to appease its limitations with bind parameters. The bind name is ultimately derived from the key of the column so this would change the name: Table(my_table, metadata, Column(100K_Name, String, key=hundredkname), autoload=True) Thank you for the key tip. For now I'll do something like this: ---8--- FOUNDDIGITNAME=False digitcols = [] for c in vals: if re.match('^[0-9]',c): digitcols.append(c) FOUNDDIGITNAME=True if FOUNDDIGITNAME: cols = [] for dc in digitcols: dckey='dc_' + dc col = Column(dc,site_tbl.c[dc].type,key=dckey) cols.append(col) vals[dckey]=vals[dc] del vals[dc] site_tbl = Table('TBL_SITE', metadata, *cols, schema='foo', useexisting=True, autoload=True ) ---8--- beyond using the key thing, which changes how you reference the column object in Python, we'd have to dig into the core and handle this case. This would be a great rainy day project. Kind Regards, Sirko -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.