[sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?
Hello! SQLAlchemy has the (really useful) behavior of batching together inserts when all the primary keys are defined. It looks like it groups similar tables together (while maintaining insert order ) to minimize the number of Insert statements sent to the database. I'm unsure what the expected behavior here is for single-table inheritance tables. Here's some example models: class Base(db.Model): __tablename__ = 'base' id = db.Column(db.Integer, primary_key=True) base_type = db.Column(db.String) __mapper_args__ = { 'polymorphic_on': base_type, } def __init__(self, id_): self.id = id_ class SubBase1(Base): __mapper_args__ = { 'polymorphic_identity': '1', } col = db.Column(db.String) class SubBase2(Base): __mapper_args__ = { 'polymorphic_identity': '2', } col2 = db.Column(db.String) class OtherModel(db.Model): id = db.Column(db.Integer, primary_key=True) def __init__(self, id_): self.id = id_ Base, SubBase, and SubBase2 form a tree, while OtherModel is just another model. When creating alternating SubBase1s and OtherModels, SQLAlchemy batches the INSERTs such that only two INSERT statements are sent to the DB: (Snippet #1) for i in xrange(0, 10, 2): db.session.add(SubBase1(i)) db.session.add(OtherModel(i + 1)) db.session.flush() # Performs 2 insert statements, each with 5 elements in the VALUES clause However, when creating alternating SubBase1s and SubBase2s, it actually performs 10 separate insert statements: (Snippet #2) for i in xrange(0, 10, 2): db.session.add(SubBase1(i)) db.session.add(SubBase2(i + 1)) db.session.flush() # Performs 10 insert statements, each with 1 element in the VALUES clause It seems like SQLAlchemy maintains the insert_order *across* SubBase1 and SubBase2, but also isn't able to do a single INSERT statement with 10 elements in the VALUES clause (or 2 INSERT statements with 5 elements each). Questions: 1) Is Snippet #2 the desired behavior? I read through the unit-of-work summary at http://www.aosabook.org/en/sqlalchemy.html, but I wasn't totally sure what the desired behavior was for single-table inheritance models. 2) If this is desired behavior, do you have any suggestions of how to optimize the number of queries here in e.g. a before_flush hook? Obviously one way would be to rewrite our application such that inserts of like tables happen together, but if there is a way to do this without rewriting the application code, that would be even better. One option I've thought of is: - Overwrite the insert_order in the before_flush hook to rearrange the models in the order I want. (Not sure if this has effects that I'm unaware of though) Thanks! -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Batching INSERT statements
> > if you're using Postgresql, there's a vastly easier technique to use > which is just to pre-fetch from the sequence: > identities = [ > val for val, in session.execute( > "select nextval('mytable_seq') from " > "generate_series(1,%s)" % len(my_objects)) > ) > ] > for ident, obj in zip(identities, my_objects): > obj.pk = ident Wow, that's a great idea! I got it working for most of our models. I have some questions about how inserts for joined-table inheritance tables are batched together, but I'll ask them in a separate post since they're somewhat unrelated to this. So the complexity of adding multi-values insert with sequences would > benefit an extremely narrow set of use cases, would be very labor > intensive to implement and maintain, and is unnecessary for the single > target platform in which this case would work. > That makes sense, thanks for the explanation! On Monday, October 9, 2017 at 8:44:51 AM UTC-7, Mike Bayer wrote: > > On Mon, Oct 9, 2017 at 4:15 AM,> wrote: > > Hello! I've spent some time looking at SQLAlchemy's ability to batch > > inserts, and have a few questions about bulk_save_objects (and flushing > in > > general). > > > > Two statements that I think are true: > > > > Right now, bulk_save_objects does not fetch primary keys for inserted > rows > > (unless return_defaults is passed in, which mitigates the performance > gains > > by performing one insert statement per model). > > the point of bulk save is that the objects passed to it are considered > as throwaway after the operation, to avoid the overhead of maintaining > the objects' persistence state in memory. Using it with > return_defaults is nearly pointless as this will blow away the > efficiency gains you might get from the bulk operation, and is there > mostly to allow the joined-table inheritance use case to be usable. > > > > When running db.session.flush(), it looks like SQLAlchemy can batch > model > > creation into a single multi-value insert statement *only if all of the > > primary keys are already defined on the model*. (Verified by looking at > > sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements) > > > > > > Questions: > > > > Would it be possible for bulk_save_objects to fetch primary keys while > still > > batching INSERTs into a single query? > > What do you mean "single query", do you mean, executemany()? the > answer is no because DBAPI drivers don't return result sets with > executemany(). Do you mean, a single INSERT..VALUES with all the > parameters in one statement ? The answer is no because SQLAlchemy > doesn't include multi-values as a transparent option; the DBAPI > drivers instead make use of the multi-values syntax within their > executemany() implementations, where again, they don't return result > sets. > > Right now, only the MySQL drivers do this by default, the psycopg2 > driver does it with a recently added option that SQLAlchemy does not > directly support, however you can set up via a connection event. For > psycopg2 it can vastly speed up inserts as psycopg2's normal > executemany() implementation has some performance issues. > > > (This > > would help with e.g. inserting a bunch of rows into a table with an > > auto-incrementing primary key). > > if you're using Postgresql, there's a vastly easier technique to use > which is just to pre-fetch from the sequence: > > identities = [ > val for val, in session.execute( > "select nextval('mytable_seq') from " > "generate_series(1,%s)" % len(my_objects)) > ) > ] > for ident, obj in zip(identities, my_objects): > obj.pk = ident > > > Now you don't need to do RETURNING or anything and the inserts can be > at their top efficiency. > > > > > > > At least in Postgres (haven't verified for other databases), it looks > like > > one can use RETURNING for inserts with multi-value VALUES clauses. At > the > > surface, it seems like one could extend the current behavior for a > single > > row INSERT: > > > > INSERT INTO table ... VALUES (1) RETURNING id > > > > to multi-value INSERTS: > > > > INSERT INTO table ... VALUES (1), (2), ... RETURNING id > > > > and get all the benefits of the ORM while still batching inserts. > > > This would be an enormous undertaking to implement, test, and release. > It would have to exist as an entirely additional series of codepaths > within persistence.py as multi-valued INSERT is not available on most > databases as well as on earlier versions of the databases that do > support it. For all of that complexity, the approach would work on > exactly: Postgresql only - SQLite and MySQL don't support RETURNING, > Oracle, SQL Server and others don't support INSERT..VALUES with > multiple sets. > > If you are using Postgresql, you can instead pre-fetch the sequence up > front. This can
Re: [sqlalchemy] "execute(query)" behaving strangely
You were exactly right. I needed to commit. On Monday, October 9, 2017 at 4:08:05 PM UTC-7, Mike Bayer wrote: > > On Mon, Oct 9, 2017 at 3:57 PM, Colton Allen> wrote: > > I'm trying to execute a fairly simple UPDATE query. > > > > query = update(Model).where(Model.id.in_(list_of_ids)).values(x=1) > > > > I know of two methods to execute it. One using the session and the > other > > using the engine. However, depending on which I use, the results I get > are > > very different. > > > > db.session.execute(query) # works in test suite but not live. > > db.engine.connect().execute(query) # works live but not in test suite. > > > > I'm trying to understand why this would be the case. I believe either > my > > test suite of my implementation of sqlalchemy is broken. I was > wondering if > > you had any tips. > > well the session version won't be committed unless you call > session.commit(). so...it seems like your test suite is probably > looking at the data uncommitted, which is fine, but for live you'd > want to make sure data is commited. > > I guess in the opposite case, your test suite which relies upon the > data being rolled back for teardown doesn't occur when you use > engine.connect() because that makes its own transaction that is > autocommitting. > > > > > > > > Test Suite: > > > > def setUp(self): > > """Create app test client.""" > > self.app = app > > self.app_context = self.app.app_context() > > self.app_context.push() > > self.client = self.app.test_client() > > > > self.transaction = connection.begin() > > db.session = scoped_session( > > sessionmaker(bind=connection, query_cls=db.query_class)) > > db.session.begin_nested() > > > > @event.listens_for(db.session, "after_transaction_end") > > def restart_savepoint(session, transaction): > > if transaction.nested and not transaction._parent.nested: > > session.expire_all() > > session.begin_nested() > > > > # this is a cleanup function rather than a teardown function in case > > # the db gets into a bad state and setup fails, in which case we > still > > # want the drop_all to be called > > self.addCleanup(self.cleanup) > > > > def cleanup(self): > > """Tear down database.""" > > db.session.close() > > self.transaction.rollback() > > self.app_context.pop() > > > > @classmethod > > def setUpClass(cls): > > """Create the database.""" > > global app, engine, connection > > > > app = cls._create_app() > > engine = db.engine > > connection = engine.connect() > > > > @classmethod > > def tearDownClass(cls): > > """Destroy the database.""" > > connection.close() > > engine.dispose() > > > > > > Session construction when live: > > > > @property > > def engine(self): > > """Return an engine instance.""" > > if not self._engine: > > database_uri = self.app.config.get('SQLALCHEMY_DATABASE_URI') > > self._engine = sqlalchemy.create_engine(database_uri) > > return self._engine > > > > @property > > def session(self): > > """Return database session.""" > > if not self._session: > > factory = sqlalchemy.orm.sessionmaker( > > bind=self.engine, query_cls=self.query_class) > > self._session = sqlalchemy.orm.scoped_session( > > factory, scopefunc=_app_ctx_stack.__ident_func__) > > return self._session() > > > > > > -- > > SQLAlchemy - > > The Python SQL Toolkit and Object Relational Mapper > > > > http://www.sqlalchemy.org/ > > > > To post example code, please provide an MCVE: Minimal, Complete, and > > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > > description. > > --- > > You received this message because you are subscribed to the Google > Groups > > "sqlalchemy" group. > > To unsubscribe from this group and stop receiving emails from it, send > an > > email to sqlalchemy+...@googlegroups.com . > > To post to this group, send email to sqlal...@googlegroups.com > . > > Visit this group at https://groups.google.com/group/sqlalchemy. > > For more options, visit https://groups.google.com/d/optout. > -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] "execute(query)" behaving strangely
On Mon, Oct 9, 2017 at 3:57 PM, Colton Allenwrote: > I'm trying to execute a fairly simple UPDATE query. > > query = update(Model).where(Model.id.in_(list_of_ids)).values(x=1) > > I know of two methods to execute it. One using the session and the other > using the engine. However, depending on which I use, the results I get are > very different. > > db.session.execute(query) # works in test suite but not live. > db.engine.connect().execute(query) # works live but not in test suite. > > I'm trying to understand why this would be the case. I believe either my > test suite of my implementation of sqlalchemy is broken. I was wondering if > you had any tips. well the session version won't be committed unless you call session.commit(). so...it seems like your test suite is probably looking at the data uncommitted, which is fine, but for live you'd want to make sure data is commited. I guess in the opposite case, your test suite which relies upon the data being rolled back for teardown doesn't occur when you use engine.connect() because that makes its own transaction that is autocommitting. > > Test Suite: > > def setUp(self): > """Create app test client.""" > self.app = app > self.app_context = self.app.app_context() > self.app_context.push() > self.client = self.app.test_client() > > self.transaction = connection.begin() > db.session = scoped_session( > sessionmaker(bind=connection, query_cls=db.query_class)) > db.session.begin_nested() > > @event.listens_for(db.session, "after_transaction_end") > def restart_savepoint(session, transaction): > if transaction.nested and not transaction._parent.nested: > session.expire_all() > session.begin_nested() > > # this is a cleanup function rather than a teardown function in case > # the db gets into a bad state and setup fails, in which case we still > # want the drop_all to be called > self.addCleanup(self.cleanup) > > def cleanup(self): > """Tear down database.""" > db.session.close() > self.transaction.rollback() > self.app_context.pop() > > @classmethod > def setUpClass(cls): > """Create the database.""" > global app, engine, connection > > app = cls._create_app() > engine = db.engine > connection = engine.connect() > > @classmethod > def tearDownClass(cls): > """Destroy the database.""" > connection.close() > engine.dispose() > > > Session construction when live: > > @property > def engine(self): > """Return an engine instance.""" > if not self._engine: > database_uri = self.app.config.get('SQLALCHEMY_DATABASE_URI') > self._engine = sqlalchemy.create_engine(database_uri) > return self._engine > > @property > def session(self): > """Return database session.""" > if not self._session: > factory = sqlalchemy.orm.sessionmaker( > bind=self.engine, query_cls=self.query_class) > self._session = sqlalchemy.orm.scoped_session( > factory, scopefunc=_app_ctx_stack.__ident_func__) > return self._session() > > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To post to this group, send email to sqlalchemy@googlegroups.com. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
[sqlalchemy] Re: "execute(query)" behaving strangely
Both are using postgres. But I did try updating the bind to both the engine (db.engine) and the session (db.session) and it didn't have any affect. On Monday, October 9, 2017 at 2:09:02 PM UTC-7, Jonathan Vanasco wrote: > > OTOMH (I didn't go through your code), are the two databases the same? > > If not, this is possibly related to database specific compiling (or the > lack of) and a common error. Note the `bind` references in the docs: > http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing > Depending on how you invoke a select, the statement will either be > compiled as a generic string or sqlalchemy knows that it should be > customized for the specific database. Depending on the database, the > generic version will or will not work. This might not be happening -- I > didn't go through your code -- but many situations of "It works when I ___ > but not when I " are related to this. > -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
[sqlalchemy] Re: "execute(query)" behaving strangely
OTOMH (I didn't go through your code), are the two databases the same? If not, this is possibly related to database specific compiling (or the lack of) and a common error. Note the `bind` references in the docs: http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing Depending on how you invoke a select, the statement will either be compiled as a generic string or sqlalchemy knows that it should be customized for the specific database. Depending on the database, the generic version will or will not work. This might not be happening -- I didn't go through your code -- but many situations of "It works when I ___ but not when I " are related to this. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
[sqlalchemy] "execute(query)" behaving strangely
I'm trying to execute a fairly simple UPDATE query. query = update(Model).where(Model.id.in_(list_of_ids)).values(x=1) I know of two methods to execute it. One using the session and the other using the engine. However, depending on which I use, the results I get are very different. db.session.execute(query) # works in test suite but not live. db.engine.connect().execute(query) # works live but not in test suite. I'm trying to understand why this would be the case. I believe either my test suite of my implementation of sqlalchemy is broken. I was wondering if you had any tips. Test Suite: def setUp(self): """Create app test client.""" self.app = app self.app_context = self.app.app_context() self.app_context.push() self.client = self.app.test_client() self.transaction = connection.begin() db.session = scoped_session( sessionmaker(bind=connection, query_cls=db.query_class)) db.session.begin_nested() @event.listens_for(db.session, "after_transaction_end") def restart_savepoint(session, transaction): if transaction.nested and not transaction._parent.nested: session.expire_all() session.begin_nested() # this is a cleanup function rather than a teardown function in case # the db gets into a bad state and setup fails, in which case we still # want the drop_all to be called self.addCleanup(self.cleanup) def cleanup(self): """Tear down database.""" db.session.close() self.transaction.rollback() self.app_context.pop() @classmethod def setUpClass(cls): """Create the database.""" global app, engine, connection app = cls._create_app() engine = db.engine connection = engine.connect() @classmethod def tearDownClass(cls): """Destroy the database.""" connection.close() engine.dispose() Session construction when live: @property def engine(self): """Return an engine instance.""" if not self._engine: database_uri = self.app.config.get('SQLALCHEMY_DATABASE_URI') self._engine = sqlalchemy.create_engine(database_uri) return self._engine @property def session(self): """Return database session.""" if not self._session: factory = sqlalchemy.orm.sessionmaker( bind=self.engine, query_cls=self.query_class) self._session = sqlalchemy.orm.scoped_session( factory, scopefunc=_app_ctx_stack.__ident_func__) return self._session() -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Batching INSERT statements
On Mon, Oct 9, 2017 at 4:15 AM,wrote: > Hello! I've spent some time looking at SQLAlchemy's ability to batch > inserts, and have a few questions about bulk_save_objects (and flushing in > general). > > Two statements that I think are true: > > Right now, bulk_save_objects does not fetch primary keys for inserted rows > (unless return_defaults is passed in, which mitigates the performance gains > by performing one insert statement per model). the point of bulk save is that the objects passed to it are considered as throwaway after the operation, to avoid the overhead of maintaining the objects' persistence state in memory. Using it with return_defaults is nearly pointless as this will blow away the efficiency gains you might get from the bulk operation, and is there mostly to allow the joined-table inheritance use case to be usable. > When running db.session.flush(), it looks like SQLAlchemy can batch model > creation into a single multi-value insert statement *only if all of the > primary keys are already defined on the model*. (Verified by looking at > sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements) > > > Questions: > > Would it be possible for bulk_save_objects to fetch primary keys while still > batching INSERTs into a single query? What do you mean "single query", do you mean, executemany()? the answer is no because DBAPI drivers don't return result sets with executemany(). Do you mean, a single INSERT..VALUES with all the parameters in one statement ? The answer is no because SQLAlchemy doesn't include multi-values as a transparent option; the DBAPI drivers instead make use of the multi-values syntax within their executemany() implementations, where again, they don't return result sets. Right now, only the MySQL drivers do this by default, the psycopg2 driver does it with a recently added option that SQLAlchemy does not directly support, however you can set up via a connection event. For psycopg2 it can vastly speed up inserts as psycopg2's normal executemany() implementation has some performance issues. (This > would help with e.g. inserting a bunch of rows into a table with an > auto-incrementing primary key). if you're using Postgresql, there's a vastly easier technique to use which is just to pre-fetch from the sequence: identities = [ val for val, in session.execute( "select nextval('mytable_seq') from " "generate_series(1,%s)" % len(my_objects)) ) ] for ident, obj in zip(identities, my_objects): obj.pk = ident Now you don't need to do RETURNING or anything and the inserts can be at their top efficiency. > > At least in Postgres (haven't verified for other databases), it looks like > one can use RETURNING for inserts with multi-value VALUES clauses. At the > surface, it seems like one could extend the current behavior for a single > row INSERT: > > INSERT INTO table ... VALUES (1) RETURNING id > > to multi-value INSERTS: > > INSERT INTO table ... VALUES (1), (2), ... RETURNING id > > and get all the benefits of the ORM while still batching inserts. This would be an enormous undertaking to implement, test, and release. It would have to exist as an entirely additional series of codepaths within persistence.py as multi-valued INSERT is not available on most databases as well as on earlier versions of the databases that do support it. For all of that complexity, the approach would work on exactly: Postgresql only - SQLite and MySQL don't support RETURNING, Oracle, SQL Server and others don't support INSERT..VALUES with multiple sets. If you are using Postgresql, you can instead pre-fetch the sequence up front. This can even be integrated into a before_flush() event. So the complexity of adding multi-values insert with sequences would benefit an extremely narrow set of use cases, would be very labor intensive to implement and maintain, and is unnecessary for the single target platform in which this case would work. > > I'm sure this is something that was thought about already, so I'd mostly > love to hear any thoughts about what makes this hard. We have a lot of > complex relationships and joined-table inheritance, so working with > bulk_save_objects has been a bit challenging. From the comments in > sqlalchemy/lib/sqlalchemy/sql/dml.py:return_defaults(), it seems like > SQLAlchemy is already aware that RETURNING for insert statements with > multi-values VALUES clauses is supported, so it's possible there is a reason > that this is hard or can't be done. > > This is also something that, if it just a “missing feature”, I would love to > learn more and see if I could contribute. > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because
Re: [sqlalchemy] Postgres 10 identity keyword
On Sun, Oct 8, 2017 at 10:45 PM, Seth Pwrote: > Apologies if I missed something, but does SQLAlchemy (1.2.0?) support the new > Postgres 10 identity keyword > (https://blog.2ndquadrant.com/postgresql-10-identity-columns/)? not directly, however you can intercept SERIAL and replace it with the new syntax as follow: from sqlalchemy.schema import CreateColumn from sqlalchemy.ext.compiler import compiles @compiles(CreateColumn, 'postgresql') def use_identity(element, compiler, **kw): text = compiler.visit_create_column(element, **kw) text = text.replace("SERIAL", "INT GENERATED BY DEFAULT AS IDENTITY") return text if __name__ == '__main__': from sqlalchemy import MetaData, Table, Column, Integer, String, create_engine m = MetaData() t = Table( 't', m, Column('id', Integer, primary_key=True), Column('data', String) ) e = create_engine("postgresql://scott:tiger@pg10/test", echo=True) m.drop_all(e) m.create_all(e) Adding a note to the docs now. > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To post to this group, send email to sqlalchemy@googlegroups.com. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
[sqlalchemy] Batching INSERT statements
Hello! I've spent some time looking at SQLAlchemy's ability to batch inserts, and have a few questions about bulk_save_objects (and flushing in general). Two statements that I think are true: 1. Right now, bulk_save_objects does not fetch primary keys for inserted rows (unless return_defaults is passed in, which mitigates the performance gains by performing one insert statement per model). 2. When running db.session.flush(), it looks like SQLAlchemy can batch model creation into a single multi-value insert statement *only if all of the primary keys are already defined on the model*. (Verified by looking at sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements) Questions: 1. Would it be possible for bulk_save_objects to fetch primary keys while still batching INSERTs into a single query? 2. Or even better, would it be possible to batch INSERT statements during a flush even when models don't have their primary keys pre-defined? (This would help with e.g. inserting a bunch of rows into a table with an auto-incrementing primary key). At least in Postgres (haven't verified for other databases), it looks like one can use RETURNING for inserts with multi-value VALUES clauses. At the surface, it seems like one could extend the current behavior for a single row INSERT: INSERT INTO table ... VALUES (1) RETURNING id to multi-value INSERTS: INSERT INTO table ... VALUES (1), (2), ... RETURNING id and get all the benefits of the ORM while still batching inserts. I'm sure this is something that was thought about already, so I'd mostly love to hear any thoughts about what makes this hard. We have a lot of complex relationships and joined-table inheritance, so working with bulk_save_objects has been a bit challenging. From the comments in sqlalchemy/lib/sqlalchemy/sql/dml.py:return_defaults(), it seems like SQLAlchemy is already aware that RETURNING for insert statements with multi-values VALUES clauses is supported, so it's possible there is a reason that this is hard or can't be done. This is also something that, if it just a “missing feature”, I would love to learn more and see if I could contribute. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.