[sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?

2017-10-09 Thread vineet
Hello! SQLAlchemy has the (really useful) behavior of batching together 
inserts when all the primary keys are defined. It looks like it groups 
similar tables together (while maintaining insert order ) to minimize the 
number of Insert statements sent to the database.

I'm unsure what the expected behavior here is for single-table inheritance 
tables. Here's some example models:
class Base(db.Model):
__tablename__ = 'base'
id = db.Column(db.Integer, primary_key=True)
base_type = db.Column(db.String)

__mapper_args__ = {
'polymorphic_on': base_type,
}

def __init__(self, id_):
self.id = id_


class SubBase1(Base):
__mapper_args__ = {
'polymorphic_identity': '1',
}

col = db.Column(db.String)


class SubBase2(Base):
__mapper_args__ = {
'polymorphic_identity': '2',
}

col2 = db.Column(db.String)


class OtherModel(db.Model):
id = db.Column(db.Integer, primary_key=True)

def __init__(self, id_):
self.id = id_


Base, SubBase, and SubBase2 form a tree, while OtherModel is just another 
model.

When creating alternating SubBase1s and OtherModels, SQLAlchemy batches the 
INSERTs such that only two INSERT statements are sent to the DB:
(Snippet #1)
for i in xrange(0, 10, 2):
db.session.add(SubBase1(i))
db.session.add(OtherModel(i + 1))

db.session.flush()  # Performs 2 insert statements, each with 5 elements in 
the VALUES clause

However, when creating alternating SubBase1s and SubBase2s, it actually 
performs 10 separate insert statements:
(Snippet #2)
for i in xrange(0, 10, 2):
db.session.add(SubBase1(i))
db.session.add(SubBase2(i + 1))
db.session.flush()  # Performs 10 insert statements, each with 1 element in 
the VALUES clause


It seems like SQLAlchemy maintains the insert_order *across* SubBase1 and 
SubBase2, but also isn't able to do a single INSERT statement with 10 
elements in the VALUES clause (or 2 INSERT statements with 5 elements each).

Questions:
1) Is Snippet #2 the desired behavior? I read through the unit-of-work 
summary at http://www.aosabook.org/en/sqlalchemy.html, but I wasn't totally 
sure what the desired behavior was for single-table inheritance models.
2) If this is desired behavior, do you have any suggestions of how to 
optimize the number of queries here in e.g. a before_flush hook? Obviously 
one way would be to rewrite our application such that inserts of like 
tables happen together, but if there is a way to do this without rewriting 
the application code, that would be even better. One option I've thought of 
is:
  - Overwrite the insert_order in the before_flush hook to rearrange the 
models in the order I want. (Not sure if this has effects that I'm unaware 
of though)

Thanks!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Batching INSERT statements

2017-10-09 Thread vineet

>
> if you're using Postgresql, there's a vastly easier technique to use 
> which is just to pre-fetch from the sequence: 
> identities = [ 
> val for val, in session.execute( 
>  "select nextval('mytable_seq') from " 
>  "generate_series(1,%s)" % len(my_objects)) 
> ) 
> ] 
> for ident, obj in zip(identities, my_objects): 
> obj.pk = ident 

Wow, that's a great idea! I got it working for most of our models. I have 
some questions about how inserts for joined-table inheritance tables are 
batched together, but I'll ask them in a separate post since they're 
somewhat unrelated to this.

So the complexity of adding multi-values insert with sequences would 
> benefit an extremely narrow set of use cases, would be very labor 
> intensive to implement and maintain, and is unnecessary for the single 
> target platform in which this case would work. 
>
That makes sense, thanks for the explanation!

 

On Monday, October 9, 2017 at 8:44:51 AM UTC-7, Mike Bayer wrote:
>
> On Mon, Oct 9, 2017 at 4:15 AM,   
> wrote: 
> > Hello! I've spent some time looking at SQLAlchemy's ability to batch 
> > inserts, and have a few questions about bulk_save_objects (and flushing 
> in 
> > general). 
> > 
> > Two statements that I think are true: 
> > 
> > Right now, bulk_save_objects does not fetch primary keys for inserted 
> rows 
> > (unless return_defaults is passed in, which mitigates the performance 
> gains 
> > by performing one insert statement per model). 
>
> the point of bulk save is that the objects passed to it are considered 
> as throwaway after the operation, to avoid the overhead of maintaining 
> the objects' persistence state in memory.   Using it with 
> return_defaults is nearly pointless as this will blow away the 
> efficiency gains you might get from the bulk operation, and is there 
> mostly to allow the joined-table inheritance use case to be usable. 
>
>
> > When running db.session.flush(), it looks like SQLAlchemy can batch 
> model 
> > creation into a single multi-value insert statement *only if all of the 
> > primary keys are already defined on the model*. (Verified by looking at 
> > sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements) 
> > 
> > 
> > Questions: 
> > 
> > Would it be possible for bulk_save_objects to fetch primary keys while 
> still 
> > batching INSERTs into a single query? 
>
> What do you mean "single query", do you mean, executemany()?  the 
> answer is no because DBAPI drivers don't return result sets with 
> executemany().  Do you mean, a single INSERT..VALUES with all the 
> parameters in one statement ?  The answer is no because SQLAlchemy 
> doesn't include multi-values as a transparent option; the DBAPI 
> drivers instead make use of the multi-values syntax within their 
> executemany() implementations, where again, they don't return result 
> sets. 
>
> Right now, only the MySQL drivers do this by default, the psycopg2 
> driver does it with a recently added option that SQLAlchemy does not 
> directly support, however you can set up via a connection event.   For 
> psycopg2 it can vastly speed up inserts as psycopg2's normal 
> executemany() implementation has some performance issues. 
>
>
> (This 
> > would help with e.g. inserting a bunch of rows into a table with an 
> > auto-incrementing primary key). 
>
> if you're using Postgresql, there's a vastly easier technique to use 
> which is just to pre-fetch from the sequence: 
>
> identities = [ 
> val for val, in session.execute( 
>  "select nextval('mytable_seq') from " 
>  "generate_series(1,%s)" % len(my_objects)) 
> ) 
> ] 
> for ident, obj in zip(identities, my_objects): 
> obj.pk = ident 
>
>
> Now you don't need to do RETURNING or anything and the inserts can be 
> at their top efficiency. 
>
>
>
> > 
> > At least in Postgres (haven't verified for other databases), it looks 
> like 
> > one can use RETURNING for inserts with multi-value VALUES clauses. At 
> the 
> > surface, it seems like one could extend the current behavior for a 
> single 
> > row INSERT: 
> > 
> > INSERT INTO table ... VALUES (1) RETURNING id 
> > 
> > to multi-value INSERTS: 
> > 
> > INSERT INTO table ... VALUES (1), (2), ... RETURNING id 
> > 
> > and get all the benefits of the ORM while still batching inserts. 
>
>
> This would be an enormous undertaking to implement, test, and release. 
>   It would have to exist as an entirely additional series of codepaths 
> within persistence.py as multi-valued INSERT is not available on most 
> databases as well as on earlier versions of the databases that do 
> support it. For all of that complexity, the approach would work on 
> exactly:  Postgresql only - SQLite and MySQL don't support RETURNING, 
> Oracle, SQL Server and others don't support INSERT..VALUES with 
> multiple sets. 
>
> If you are using Postgresql, you can instead pre-fetch the sequence up 
> front.  This can 

Re: [sqlalchemy] "execute(query)" behaving strangely

2017-10-09 Thread Colton Allen
You were exactly right.  I needed to commit.

On Monday, October 9, 2017 at 4:08:05 PM UTC-7, Mike Bayer wrote:
>
> On Mon, Oct 9, 2017 at 3:57 PM, Colton Allen  > wrote: 
> > I'm trying to execute a fairly simple UPDATE query. 
> > 
> > query = update(Model).where(Model.id.in_(list_of_ids)).values(x=1) 
> > 
> > I know of two methods to execute it.  One using the session and the 
> other 
> > using the engine.  However, depending on which I use, the results I get 
> are 
> > very different. 
> > 
> > db.session.execute(query)  # works in test suite but not live. 
> > db.engine.connect().execute(query) # works live but not in test suite. 
> > 
> > I'm trying to understand why this would be the case.  I believe either 
> my 
> > test suite of my implementation of sqlalchemy is broken.  I was 
> wondering if 
> > you had any tips. 
>
> well the session version won't be committed unless you call 
> session.commit().  so...it seems like your test suite is probably 
> looking at the data uncommitted, which is fine, but for live you'd 
> want to make sure data is commited. 
>
> I guess in the opposite case, your test suite which relies upon the 
> data being rolled back for teardown doesn't occur when you use 
> engine.connect() because that makes its own transaction that is 
> autocommitting. 
>
>
>
>
> > 
> > Test Suite: 
> > 
> > def setUp(self): 
> > """Create app test client.""" 
> > self.app = app 
> > self.app_context = self.app.app_context() 
> > self.app_context.push() 
> > self.client = self.app.test_client() 
> > 
> > self.transaction = connection.begin() 
> > db.session = scoped_session( 
> > sessionmaker(bind=connection, query_cls=db.query_class)) 
> > db.session.begin_nested() 
> > 
> > @event.listens_for(db.session, "after_transaction_end") 
> > def restart_savepoint(session, transaction): 
> > if transaction.nested and not transaction._parent.nested: 
> > session.expire_all() 
> > session.begin_nested() 
> > 
> > # this is a cleanup function rather than a teardown function in case 
> > # the db gets into a bad state and setup fails, in which case we 
> still 
> > # want the drop_all to be called 
> > self.addCleanup(self.cleanup) 
> > 
> > def cleanup(self): 
> > """Tear down database.""" 
> > db.session.close() 
> > self.transaction.rollback() 
> > self.app_context.pop() 
> > 
> > @classmethod 
> > def setUpClass(cls): 
> > """Create the database.""" 
> > global app, engine, connection 
> > 
> > app = cls._create_app() 
> > engine = db.engine 
> > connection = engine.connect() 
> > 
> > @classmethod 
> > def tearDownClass(cls): 
> > """Destroy the database.""" 
> > connection.close() 
> > engine.dispose() 
> > 
> > 
> > Session construction when live: 
> > 
> > @property 
> > def engine(self): 
> > """Return an engine instance.""" 
> > if not self._engine: 
> > database_uri = self.app.config.get('SQLALCHEMY_DATABASE_URI') 
> > self._engine = sqlalchemy.create_engine(database_uri) 
> > return self._engine 
> > 
> > @property 
> > def session(self): 
> > """Return database session.""" 
> > if not self._session: 
> > factory = sqlalchemy.orm.sessionmaker( 
> > bind=self.engine, query_cls=self.query_class) 
> > self._session = sqlalchemy.orm.scoped_session( 
> > factory, scopefunc=_app_ctx_stack.__ident_func__) 
> > return self._session() 
> > 
> > 
> > -- 
> > SQLAlchemy - 
> > The Python SQL Toolkit and Object Relational Mapper 
> > 
> > http://www.sqlalchemy.org/ 
> > 
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > description. 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to sqlalchemy+...@googlegroups.com . 
> > To post to this group, send email to sqlal...@googlegroups.com 
> . 
> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] "execute(query)" behaving strangely

2017-10-09 Thread Mike Bayer
On Mon, Oct 9, 2017 at 3:57 PM, Colton Allen  wrote:
> I'm trying to execute a fairly simple UPDATE query.
>
> query = update(Model).where(Model.id.in_(list_of_ids)).values(x=1)
>
> I know of two methods to execute it.  One using the session and the other
> using the engine.  However, depending on which I use, the results I get are
> very different.
>
> db.session.execute(query)  # works in test suite but not live.
> db.engine.connect().execute(query) # works live but not in test suite.
>
> I'm trying to understand why this would be the case.  I believe either my
> test suite of my implementation of sqlalchemy is broken.  I was wondering if
> you had any tips.

well the session version won't be committed unless you call
session.commit().  so...it seems like your test suite is probably
looking at the data uncommitted, which is fine, but for live you'd
want to make sure data is commited.

I guess in the opposite case, your test suite which relies upon the
data being rolled back for teardown doesn't occur when you use
engine.connect() because that makes its own transaction that is
autocommitting.




>
> Test Suite:
>
> def setUp(self):
> """Create app test client."""
> self.app = app
> self.app_context = self.app.app_context()
> self.app_context.push()
> self.client = self.app.test_client()
>
> self.transaction = connection.begin()
> db.session = scoped_session(
> sessionmaker(bind=connection, query_cls=db.query_class))
> db.session.begin_nested()
>
> @event.listens_for(db.session, "after_transaction_end")
> def restart_savepoint(session, transaction):
> if transaction.nested and not transaction._parent.nested:
> session.expire_all()
> session.begin_nested()
>
> # this is a cleanup function rather than a teardown function in case
> # the db gets into a bad state and setup fails, in which case we still
> # want the drop_all to be called
> self.addCleanup(self.cleanup)
>
> def cleanup(self):
> """Tear down database."""
> db.session.close()
> self.transaction.rollback()
> self.app_context.pop()
>
> @classmethod
> def setUpClass(cls):
> """Create the database."""
> global app, engine, connection
>
> app = cls._create_app()
> engine = db.engine
> connection = engine.connect()
>
> @classmethod
> def tearDownClass(cls):
> """Destroy the database."""
> connection.close()
> engine.dispose()
>
>
> Session construction when live:
>
> @property
> def engine(self):
> """Return an engine instance."""
> if not self._engine:
> database_uri = self.app.config.get('SQLALCHEMY_DATABASE_URI')
> self._engine = sqlalchemy.create_engine(database_uri)
> return self._engine
>
> @property
> def session(self):
> """Return database session."""
> if not self._session:
> factory = sqlalchemy.orm.sessionmaker(
> bind=self.engine, query_cls=self.query_class)
> self._session = sqlalchemy.orm.scoped_session(
> factory, scopefunc=_app_ctx_stack.__ident_func__)
> return self._session()
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Re: "execute(query)" behaving strangely

2017-10-09 Thread Colton Allen
Both are using postgres.  But I did try updating the bind to both the 
engine (db.engine) and the session (db.session) and it didn't have any 
affect.

On Monday, October 9, 2017 at 2:09:02 PM UTC-7, Jonathan Vanasco wrote:
>
> OTOMH (I didn't go through your code), are the two databases the same?
>
> If not, this is possibly related to database specific compiling (or the 
> lack of) and a common error.  Note the `bind` references in the docs: 
> http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing 
>  Depending on how you invoke a select, the statement will either be 
> compiled as a generic string or sqlalchemy knows that it should be 
> customized for the specific database.  Depending on the database, the 
> generic version will or will not work.  This might not be happening -- I 
> didn't go through your code -- but many situations of "It works when I ___ 
> but not when I " are related to this.
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Re: "execute(query)" behaving strangely

2017-10-09 Thread Jonathan Vanasco
OTOMH (I didn't go through your code), are the two databases the same?

If not, this is possibly related to database specific compiling (or the 
lack of) and a common error.  Note the `bind` references in the 
docs: http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing 
 Depending on how you invoke a select, the statement will either be 
compiled as a generic string or sqlalchemy knows that it should be 
customized for the specific database.  Depending on the database, the 
generic version will or will not work.  This might not be happening -- I 
didn't go through your code -- but many situations of "It works when I ___ 
but not when I " are related to this.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] "execute(query)" behaving strangely

2017-10-09 Thread Colton Allen
I'm trying to execute a fairly simple UPDATE query.

query = update(Model).where(Model.id.in_(list_of_ids)).values(x=1)

I know of two methods to execute it.  One using the session and the other 
using the engine.  However, depending on which I use, the results I get are 
very different.

db.session.execute(query)  # works in test suite but not live.
db.engine.connect().execute(query) # works live but not in test suite.

I'm trying to understand why this would be the case.  I believe either my 
test suite of my implementation of sqlalchemy is broken.  I was wondering 
if you had any tips.

Test Suite:

def setUp(self):
"""Create app test client."""
self.app = app
self.app_context = self.app.app_context()
self.app_context.push()
self.client = self.app.test_client()

self.transaction = connection.begin()
db.session = scoped_session(
sessionmaker(bind=connection, query_cls=db.query_class))
db.session.begin_nested()

@event.listens_for(db.session, "after_transaction_end")
def restart_savepoint(session, transaction):
if transaction.nested and not transaction._parent.nested:
session.expire_all()
session.begin_nested()

# this is a cleanup function rather than a teardown function in case
# the db gets into a bad state and setup fails, in which case we still
# want the drop_all to be called
self.addCleanup(self.cleanup)

def cleanup(self):
"""Tear down database."""
db.session.close()
self.transaction.rollback()
self.app_context.pop()

@classmethod
def setUpClass(cls):
"""Create the database."""
global app, engine, connection

app = cls._create_app()
engine = db.engine
connection = engine.connect()

@classmethod
def tearDownClass(cls):
"""Destroy the database."""
connection.close()
engine.dispose()


Session construction when live:

@property
def engine(self):
"""Return an engine instance."""
if not self._engine:
database_uri = self.app.config.get('SQLALCHEMY_DATABASE_URI')
self._engine = sqlalchemy.create_engine(database_uri)
return self._engine

@property
def session(self):
"""Return database session."""
if not self._session:
factory = sqlalchemy.orm.sessionmaker(
bind=self.engine, query_cls=self.query_class)
self._session = sqlalchemy.orm.scoped_session(
factory, scopefunc=_app_ctx_stack.__ident_func__)
return self._session()


-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Batching INSERT statements

2017-10-09 Thread Mike Bayer
On Mon, Oct 9, 2017 at 4:15 AM,   wrote:
> Hello! I've spent some time looking at SQLAlchemy's ability to batch
> inserts, and have a few questions about bulk_save_objects (and flushing in
> general).
>
> Two statements that I think are true:
>
> Right now, bulk_save_objects does not fetch primary keys for inserted rows
> (unless return_defaults is passed in, which mitigates the performance gains
> by performing one insert statement per model).

the point of bulk save is that the objects passed to it are considered
as throwaway after the operation, to avoid the overhead of maintaining
the objects' persistence state in memory.   Using it with
return_defaults is nearly pointless as this will blow away the
efficiency gains you might get from the bulk operation, and is there
mostly to allow the joined-table inheritance use case to be usable.


> When running db.session.flush(), it looks like SQLAlchemy can batch model
> creation into a single multi-value insert statement *only if all of the
> primary keys are already defined on the model*. (Verified by looking at
> sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements)
>
>
> Questions:
>
> Would it be possible for bulk_save_objects to fetch primary keys while still
> batching INSERTs into a single query?

What do you mean "single query", do you mean, executemany()?  the
answer is no because DBAPI drivers don't return result sets with
executemany().  Do you mean, a single INSERT..VALUES with all the
parameters in one statement ?  The answer is no because SQLAlchemy
doesn't include multi-values as a transparent option; the DBAPI
drivers instead make use of the multi-values syntax within their
executemany() implementations, where again, they don't return result
sets.

Right now, only the MySQL drivers do this by default, the psycopg2
driver does it with a recently added option that SQLAlchemy does not
directly support, however you can set up via a connection event.   For
psycopg2 it can vastly speed up inserts as psycopg2's normal
executemany() implementation has some performance issues.


(This
> would help with e.g. inserting a bunch of rows into a table with an
> auto-incrementing primary key).

if you're using Postgresql, there's a vastly easier technique to use
which is just to pre-fetch from the sequence:

identities = [
val for val, in session.execute(
 "select nextval('mytable_seq') from "
 "generate_series(1,%s)" % len(my_objects))
)
]
for ident, obj in zip(identities, my_objects):
obj.pk = ident


Now you don't need to do RETURNING or anything and the inserts can be
at their top efficiency.



>
> At least in Postgres (haven't verified for other databases), it looks like
> one can use RETURNING for inserts with multi-value VALUES clauses. At the
> surface, it seems like one could extend the current behavior for a single
> row INSERT:
>
> INSERT INTO table ... VALUES (1) RETURNING id
>
> to multi-value INSERTS:
>
> INSERT INTO table ... VALUES (1), (2), ... RETURNING id
>
> and get all the benefits of the ORM while still batching inserts.


This would be an enormous undertaking to implement, test, and release.
  It would have to exist as an entirely additional series of codepaths
within persistence.py as multi-valued INSERT is not available on most
databases as well as on earlier versions of the databases that do
support it. For all of that complexity, the approach would work on
exactly:  Postgresql only - SQLite and MySQL don't support RETURNING,
Oracle, SQL Server and others don't support INSERT..VALUES with
multiple sets.

If you are using Postgresql, you can instead pre-fetch the sequence up
front.  This can even be integrated into a before_flush() event.

So the complexity of adding multi-values insert with sequences would
benefit an extremely narrow set of use cases, would be very labor
intensive to implement and maintain, and is unnecessary for the single
target platform in which this case would work.


>
> I'm sure this is something that was thought about already, so I'd mostly
> love to hear any thoughts about what makes this hard. We have a lot of
> complex relationships and joined-table inheritance, so working with
> bulk_save_objects has been a bit challenging. From the comments in
> sqlalchemy/lib/sqlalchemy/sql/dml.py:return_defaults(), it seems like
> SQLAlchemy is already aware that RETURNING for insert statements with
> multi-values VALUES clauses is supported, so it's possible there is a reason
> that this is hard or can't be done.
>
> This is also something that, if it just a “missing feature”, I would love to
> learn more and see if I could contribute.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because 

Re: [sqlalchemy] Postgres 10 identity keyword

2017-10-09 Thread Mike Bayer
On Sun, Oct 8, 2017 at 10:45 PM, Seth P  wrote:
> Apologies if I missed something, but does SQLAlchemy (1.2.0?) support the new 
> Postgres 10 identity keyword 
> (https://blog.2ndquadrant.com/postgresql-10-identity-columns/)?

not directly, however you can intercept SERIAL and replace it with the
new syntax as follow:

from sqlalchemy.schema import CreateColumn
from sqlalchemy.ext.compiler import compiles


@compiles(CreateColumn, 'postgresql')
def use_identity(element, compiler, **kw):
text = compiler.visit_create_column(element, **kw)
text = text.replace("SERIAL", "INT GENERATED BY DEFAULT AS IDENTITY")
return text

if __name__ == '__main__':

from sqlalchemy import MetaData, Table, Column, Integer, String,
create_engine

m = MetaData()

t = Table(
't', m,
Column('id', Integer, primary_key=True),
Column('data', String)
)

e = create_engine("postgresql://scott:tiger@pg10/test", echo=True)

m.drop_all(e)
m.create_all(e)


Adding a note to the docs now.




>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Batching INSERT statements

2017-10-09 Thread vineet
Hello! I've spent some time looking at SQLAlchemy's ability to batch 
inserts, and have a few questions about bulk_save_objects (and flushing in 
general).

Two statements that I think are true:

   1. Right now, bulk_save_objects does not fetch primary keys for inserted 
   rows (unless return_defaults is passed in, which mitigates the performance 
   gains by performing one insert statement per model).
   2. When running db.session.flush(), it looks like SQLAlchemy can batch 
   model creation into a single multi-value insert statement *only if all of 
   the primary keys are already defined on the model*. (Verified by looking at 
   sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements)


Questions:

   1. Would it be possible for bulk_save_objects to fetch primary keys 
   while still batching INSERTs into a single query?
   2. Or even better, would it be possible to batch INSERT statements 
   during a flush even when models don't have their primary keys pre-defined? 
   (This would help with e.g. inserting a bunch of rows into a table with an 
   auto-incrementing primary key).

At least in Postgres (haven't verified for other databases), it looks like 
one can use RETURNING for inserts with multi-value VALUES clauses. At the 
surface, it seems like one could extend the current behavior for a single 
row INSERT:

INSERT INTO table ... VALUES (1) RETURNING id

to multi-value INSERTS:

INSERT INTO table ... VALUES (1), (2), ... RETURNING id

and get all the benefits of the ORM while still batching inserts.

I'm sure this is something that was thought about already, so I'd mostly 
love to hear any thoughts about what makes this hard. We have a lot of 
complex relationships and joined-table inheritance, so working with 
bulk_save_objects has been a bit challenging. From the comments in 
sqlalchemy/lib/sqlalchemy/sql/dml.py:return_defaults(), it seems like 
SQLAlchemy is already aware that RETURNING for insert statements with 
multi-values VALUES clauses is supported, so it's possible there is a 
reason that this is hard or can't be done.

This is also something that, if it just a “missing feature”, I would love 
to learn more and see if I could contribute.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.