[sqlalchemy] Bulk update using multi threading and exception handling
I am developing a back-end scripts to populate a bunch of tables. 1. dbinterface.py : contains all mapper classes and DBHandler class which has required static methods for queries. e.g. class Destination(Base): __table__ = Base.metadata.tables['destination'] @staticmethod def _save(rows): try: DBHandler.session.add_all(rows) DBHandler.session.commit() except Exception as exp: logger.debug(Error saving data: %s, str(exp)) Question 1: Is this a good idea to handle exceptions like this ? The save method accepts list of objects to be saved. If there is problem with one object I don't want to prevent others from getting updated. Other alternative will be handling exception in calling code, roll back and retry saving the list. 2. basetables.py : This will populate base tables in the DB. The data will be fetched from a web service using async calls. def populate(hosts): 100 '''Fetch property value for each host''' 101 with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: 102 future_to_host = {executor.submit(call__ws, host.host_name): host 103 for host in hosts} 104 for future in concurrent.futures.as_completed(future_to_host): 105 host = future_to_host[future] 106 try: 107 property = future.result() 108 except Exception as exp: 109 logger.debug(Error fetching hosts from webervice: \ 110 %s, exp) 111 else: 112 host.property = property 113 logger.info(host: %s, property: %s, host, property) 114 finally: 115 try: 116 colo = find_colo(host.host_name) 117 logger.info(Colo: %s, colo) 118 except Exception as exp: 119 logger.debug(Invalid colo: %s, exp) 120 hosts.remove(host) 121 else: 122 host_colo = DBHandler.fetch_colo(colo) 123 if host_colo is not None: 124 host.emitter_host_colo = host_colo 125 logger.info(Host Colo: %s, host_colo) 126 else: 127 hosts.remove(host) 128 logger.debug(Removing Host error processing colo : %s, host) 129 DBHandler._save(hosts) Questions 2 : is this a good model considering there will be 1000s of hosts being updated in a bulk update and fact being session is static variable in DBHandler ? Colo is another table and hence mapper object. If the colo is not present for any of the host it will be fetched from static method fetch_colo of DB handler based on name. Now there is one to many mapping from colo to host, aka one colo can represent many hosts. In such case, would the call to DB be avoided if the colo is already fetched previously and only relationship with new host will be taken care of? Pardon my ignorance for I am newbie to sqlalchemy n python in general -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
[sqlalchemy] [SQLAlchemy-migrate] Add a TSVectorType column to 4 existing tables
Hi ! Does somebody here can tell me something about database migration with SQLAlchemy-migrate on a TurboGears application please ? In fact I want to add a column containing a TSVectorType ( http://sqlalchemy-searchable.readthedocs.org/en/latest/) to 4 of my existing tables and I’m not sure to know how to proceed… ex : search_vector = Column(TSVectorType('name', 'content')) I watch the given example here https://sqlalchemy-migrate.readthedocs.org/en/latest/changeset.html with the col.create() of sqlalchemy-migrate but this example seems to be made for a « standart » type of column :/ ex : col = Column('col1', String, default='foobar')col.create(table, populate_default=True) So I’m asking here for some advice from you guys… Thanks a lot for your time. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Bulk update using multi threading and exception handling
On Jul 22, 2014, at 7:47 AM, Milind Vaidya kava...@gmail.com wrote: I am developing a back-end scripts to populate a bunch of tables. 1. dbinterface.py : contains all mapper classes and DBHandler class which has required static methods for queries. e.g. class Destination(Base): __table__ = Base.metadata.tables['destination'] @staticmethod def _save(rows): try: DBHandler.session.add_all(rows) DBHandler.session.commit() except Exception as exp: logger.debug(Error saving data: %s, str(exp)) Question 1: Is this a good idea to handle exceptions like this ? The save method accepts list of objects to be saved. If there is problem with one object I don't want to prevent others from getting updated. Other alternative will be handling exception in calling code, roll back and retry saving the list. the use case of if there is a problem with one object i don't want others having an issue can't be handled generically, for the simple reason that if object X is dependent on object Y, a failure to persist X means Y can't be persisted either. It sounds like what you're really dealing with is inserting unrelated rows. The mechanics of transactions at both the SQLAlchemy level and in many cases at the DB level prevents the transaction from proceeding from a failed INSERT, *unless* you use a savepoint. So the pattern, when you expect *individual* rows to fail, is: for row in rows: try: with session.begin_nested(): session.add(row) except orm_exc.IntegrityError: logger.error(Error, exc_info=True) session.commit() the above will not be as performant as a mass insert of all the rows as once as the Session needs to flush for each one individually. 2. basetables.py : This will populate base tables in the DB. The data will be fetched from a web service using async calls. def populate(hosts): 100 '''Fetch property value for each host''' 101 with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: 102 future_to_host = {executor.submit(call__ws, host.host_name): host 103 for host in hosts} 104 for future in concurrent.futures.as_completed(future_to_host): 105 host = future_to_host[future] 106 try: 107 property = future.result() 108 except Exception as exp: 109 logger.debug(Error fetching hosts from webervice: \ 110 %s, exp) 111 else: 112 host.property = property 113 logger.info(host: %s, property: %s, host, property) 114 finally: 115 try: 116 colo = find_colo(host.host_name) 117 logger.info(Colo: %s, colo) 118 except Exception as exp: 119 logger.debug(Invalid colo: %s, exp) 120 hosts.remove(host) 121 else: 122 host_colo = DBHandler.fetch_colo(colo) 123 if host_colo is not None: 124 host.emitter_host_colo = host_colo 125 logger.info(Host Colo: %s, host_colo) 126 else: 127 hosts.remove(host) 128 logger.debug(Removing Host error processing colo : %s, host) 129 DBHandler._save(hosts) Questions 2 : is this a good model considering there will be 1000s of hosts being updated in a bulk update and fact being session is static variable in DBHandler ? Colo is another table and hence mapper object. If the colo is not present for any of the host it will be fetched from static method fetch_colo of DB handler based on name. Now there is one to many mapping from colo to host, aka one colo can represent many hosts. In such case, would the call to DB be avoided if the colo is already fetched previously and only relationship with new host will be taken care of? Pardon my ignorance for I am newbie to sqlalchemy n python in general I'm not very familiar with concurrent.futures but I will note that the Session isn't thread safe, as it refers to a single DBAPI Connection/transaction in progress as well as lots of internal state, and all the objects associated with a given Session are in fact proxies to the state of that Session and transaction so they aren't either. If you are running concurrent, multiple threads, you typically have a Session per thread, and every object that is handled within that thread should originate within that thread. To pass the state of objects between threads, you should use the Session.merge() method. I talk in depth about why the Session works this way in my talk The SQLAlchemy Session in Depth http://www.sqlalchemy.org/library.html#thesqlalchemysessionindepth . There is also discussion of this in
[sqlalchemy] Re: condition on every query
I had the same problem and decided to look into implementing a custom relationship like you suggested. I'll post what I have here in case anyone else has the same problem. :) The first thing I noticed was the query_class attribute on the RelationshipProperty class. If you don't want to implement a new relationship, you can just do this, which is a little cleaner than respecifying the primaryjoin: children = relationship(Child, query_class=MyPrefilteredQuery) (Where MyPrefilteredQuery is the PreFilteredQuery class you implemented as per https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/PreFilteredQuery .) Making your own relationship is pretty straightforward from there, though: from sqlalchemy.util.langhelpers import public_factory from sqlalchemy.orm.relationships import RelationshipProperty class MyPrefilteredRelationship(RelationshipProperty): def __init__(self, *args, **kwargs): kwargs['query_class'] = MyPrefilteredQuery super(MyPrefilteredRelationship, self).__init__(*args, **kwargs) filtered_relationship = public_factory(MyPrefilteredRelationship, .mylib.db) Then you can use it like so: from mylib.db import filtered_relationship class Parent(Base): __table__ = parent_table children = filtered_relationship(Child) On Sunday, February 20, 2011 8:19:58 PM UTC-8, robert wrote: On Feb 20, 8:59 pm, Michael Bayer mike...@zzzcomputing.com wrote: the cleanest and most foolproof way is to join on relationships that have the condition: related_non_deleted = relationship(Related, primaryjoin=and_(Related.deleted==False, Related.foo_id==id), viewonly=True) I agree that the explicit approach is the most foolproof. The main reason I was trying to avoid it is that, in my case, the condition is ubiquitous. Thus all of my relationships that look like this children = relationship(Child) would have to change to children = relationship(Child, primaryjoin=and_(Child.deleted==False, Child.parent_id==id) Perhaps I could write a custom relationship() to simplify things. I'll give that a try. I'll also have to add .filter(foo.deleted==False) to all top level queries. Maybe a custom Session that always answers a query with that filter already added would be cleaner. There may ultimately be many queries and many relationships in this app, and I'm trying to avoid having to remember (or worse, have someone else remember) to always include the deleted condition. Thanks for your help. I've got a bit of digging to do now. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Updating children of a one-to-many bi-directional relationship
Hi Michael, thanks for the reply. I thought I had tested the code, but I did not do so properly. The essence of the logic I posted works. Here are the few corrections: You are right that I should not use strings for partition_id when it is defined as integer. This is a piece of test code I cooked up. In the actual production code, partition_id comes from another API and it is already integer. The other bug I had in my code was that: in the lines: for db_part in db_parts: dev_part = dev_parts.get(db_part.partition_id, None) if dev_part is not None: # update db partition with that from device db_part.partition_id = dev_part.partition_id db_part.partition_name = dev_part.partition_name # remove the device partition since we have updated the db del dev_part[db_part.partition_id] The last line should have read: del dev_parts[db_part.partition_id] You also had mentioned that the code doesn’t seem to show anything that would result in any DELETEs emitted. I am using a list comprehension to update the list of test_partitions associated with the test_device database entry. This list comprehension updates the test_partition list ('partitions' attribute) for the device by keeping only those partitions that are in the dict data structure (simulated to get data from a data collector on the device). The lines of code that updates the 'partitions' attribute is: # First delete partitions in database that are not in device db_parts[:] = [db_part for db_part in db_parts if dev_parts.get(db_part.partition_id, None) is not None] db_parts was set in an earlier line of code as follows: ndev = sess.query(TestDevice).filter(TestDevice.dns_name=='testdev3').first() print 'ndev:', ndev if ndev is not None: db_parts = ndev.partitions # --- partitions attribute set through backref on TestPartition object. After I made the changes, the code works as expected: Before update, the test_partition table: basicinv= select * from test_partition ; id | partition_name | partition_id | device_id ++--+--- 42 | test456| 456 |41 43 | test345| 345 |41 44 | test123| 123 |41 (3 rows) After running the code: basicinv= select * from test_partition ; id | partition_name | partition_id | device_id ++--+--- 42 | test456-2 | 456 |41 43 | test345-1 | 345 |41 45 | test678| 678 |41 46 | test567| 567 |41 (4 rows) basicinv= As I expected, partitions with primary keys 42 and 43 got updated (ie. not deleted and inserted with new primary key rows). Thanks for your feedback. On Monday, July 21, 2014 7:13:46 AM UTC-7, Michael Bayer wrote: not to mention you’re using strings to set integer values, again a bad idea, can only confuse your database: part.partition_id = ‘345’ # — this is a string partition_id = sa.Column(sa.Integer, nullable=False) # — should be integer with this program you need to create a short test case that actually runs and step through it with pdb as well as echo=True to analyze more accurately what’s happening and when. On Jul 21, 2014, at 10:10 AM, Michael Bayer mik...@zzzcomputing.com javascript: wrote: the code there doesn’t seem to show anything that would result in any DELETEs emitted. a DELETE here would only occur if you deassociated a TestDevice and a TestPartition by removing from the TestDevice.partitions collection or setting a TestPartition.device to None, and I don’t see that. All of the manipulations you’re doing with part.partition_id have nothing to do with any of that, SQLAlchemy’s relationships have no idea what you’re doing with those, and overall it's a bad idea to mix the usage of “obj.foreign_key = id” along with direct manipulation of the relationship (where you say dev_part.device = ndev) together. SQLAlchemy’s relationship management code knows nothing about any of those foreign key sets. See http://docs.sqlalchemy.org/en/rel_0_9/faq.html#i-set-the-foo-id-attribute-on-my-instance-to-7-but-the-foo-attribute-is-still-none-shouldn-t-it-have-loaded-foo-with-id-7 . On Jul 21, 2014, at 2:55 AM, Bala Ramakrishnan bal...@gmail.com javascript: wrote: I had asked this question on Stack Overflow. The details are at this link: http://stackoverflow.com/questions/24836816/updating-a-few-children-in-one-to-many-relationship-deletes-all-rows-and-adds-ne The summary is I have a parent class A and a bidirectional one-to-many relationship with class B. When I update the class B list for an instance of A, the update may involve deleting some class B instances, updating some of them, and adding new ones. However, I find that SqlAlchemy deletes all ROWS of
[sqlalchemy] SQLAlchemy 0.8.7 Released
Hey list - SQLAlchemy release 0.8.7 is now available. Release 0.8.7 contains a selected set of bug fix backports from release 0.9, including a selected set of ORM and extension fixes as well as dialect fixes for MySQL, Postgresql, and SQL Server. Users should carefully review the Changelog (http://docs.sqlalchemy.org/en/latest/changelog/changelog_08.html#change-0.8.7) to note which behaviors and issues are affected. We'd like to thank the many contributors who helped with this release. SQLAlchemy 0.8.7 is available on the Download Page: http://www.sqlalchemy.org/download.html -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
[sqlalchemy] SQLAlchemy 0.9.7 Released
Hey list - SQLAlchemy release 0.9.7 is now available. The 0.9 series is now entering maintenance mode where major new features are now targeted at the 1.0 series. Release 0.9.7 contains a wide range of bugfixes, many of which are identified as regressions from previous 0.9.X releases including three introduced in 0.9.5. All relevant fixes are of course forwards-ported to the 1.0 series and a handful are also backported to 0.8.7, being released at the same time. A small number of new features include Postgresql JSONB support, improved Postgresql text matching features and a new event hook which allows database exceptions to be intercepted and re-thrown as new ones. Users should carefully review the Changelog (http://docs.sqlalchemy.org/en/latest/changelog/changelog_09.html#change-0.9.7) to note which behaviors and issues are affected. We'd like to thank the many contributors who helped with this release. SQLAlchemy 0.9.7 is available on the Download Page: http://www.sqlalchemy.org/download.html -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
[sqlalchemy] flake8 tips?
i recently introduced flake8 testing into our automated builds. (an idea inspired by mike bayer) i already ran into an issue with Flake8 generating errors for ` == True` comparisons ( which are required under SqlAlchemy) does anyone have any tips/tricks/gotchas for dealing with flake8 in SqlAlchemy projects ? -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
RE: [sqlalchemy] flake8 tips?
You can try adding # nopep8 or # noqa at the end of any line to suppress pep8/flake8 messages for that line. There may be more in the docs about this about how to suppress which errors specifically: https://pypi.python.org/pypi/flake8 --Mike From: sqlalchemy@googlegroups.com [sqlalchemy@googlegroups.com] on behalf of Jonathan Vanasco [jonat...@findmeon.com] Sent: Tuesday, July 22, 2014 5:36 PM To: sqlalchemy@googlegroups.com Subject: [sqlalchemy] flake8 tips? i recently introduced flake8 testing into our automated builds. (an idea inspired by mike bayer) i already ran into an issue with Flake8 generating errors for ` == True` comparisons ( which are required under SqlAlchemy) does anyone have any tips/tricks/gotchas for dealing with flake8 in SqlAlchemy projects ? -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.commailto:sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.commailto:sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] flake8 tips?
thanks. I already added `# noqa` to those lines. that pep8 test is actually bad, digging into the docs, it's actually recommending the worst case format. there's a bunch of debate on it in a github ticket. do you know of any other odd behaviors? -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Multiple tables and foreignkey constraints
If I'm understanding correctly, something like the following: # Keep track of manufacturer names manufacturers_table = Table(u'manufacturer', metadata, Column(u'id', Integer, primary_key=True, Column(u'name', String(20)) ) # Keep track of model names models_table = Table(u'model', metadata, Column(u'id', Integer, primary_key=True, Column(u'name', String(20)) ) # Keep track of inputs available to models sources_table = Table(u'source', metadata, Column(u'id', Integer, primary_key=True, Column(u'pjlink', String(2)), Column(u'text', String(20)) ) # Foreign key table for mfgr/model/sources projector_table = Table(u'projector', metadata, Column(u'manufacturer_id', Integer, primary_key=True, ForeignKey(u'manufacturer.id')), Column(u'model_id', Integer, primary_key=True, ForeignKey(u'model.id')), Column(u'source_id', Integer, primary_key=True, ForeignKey(u'source.id')), ) # Persistent storage for installed projectors # model_id would be an index for model_table.id installed_table = Table(u'installed', metadata, Column(u'id', Integer, primary_key=True), Column(u'name', String(20)), Column(u'model_id', Integer) ) Or am I still off in left field? -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.