Re: [sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?

2017-10-12 Thread Mike Bayer
I'm actually happy with those results.  execute_batch is just a
drop-in, while execute_values is more complicated since we have to
intercept specific kinds of statements (INSERTS that have exactly one
VALUES clause invoked against multiple parameter sets), and still may
be more error prone.

a 7% slowdown for the 10 rows w/ ORM case (13.9 to 14.9 seconds)
is fairly negligible considering the speedup from 29 seconds.


On Thu, Oct 12, 2017 at 1:07 AM,   wrote:
> I ran some tests using raw psycopg2 on my local computer, and also ran some
> tests with SQLAlchemy. I misunderstood a few things in my tests above, but
> I've explained the tests below and think they are more accurate.
>
> Context:
> - Each test was run 5 times and averaged.
> - Both the python and database were running on the same computer (Macbook
> Pro (2015), using Postgres 9.5 and Docker)
>
> Raw psycopg2:
> # Rough code:
> extras.execute_values(cursor, 'INSERT INTO table_a (field1, field2) VALUES
> %s', [...])
> extras.execute_batch(cursor, 'INSERT INTO table_a (field1, field2) VALUES
> (%s, %s)', [...])
> cursor.executemany('INSERT INTO table_a (field1, field2) VALUES (%s, %s)',
> [...])
>
> # Inserting 1000 rows
> execute_values, elapsed time: 0.023967s
> execute_batch, elapsed time: 0.051530s
> executemany, elapsed time: 0.173563s
>
>
> # Inserting 1 rows
> execute_values, elapsed time: 0.268656s
> execute_batch, elapsed time: 0.488736s
> executemany, elapsed time: 2.017565s
>
>
> # Inserting 10 rows
> execute_values, elapsed time: 1.858675s
> execute_batch, elapsed time: 4.062823s
> executemany, elapsed time: 19.900875s
>
>
> SQLAlchemy layer:
> for field1, field2 in genome_infos(rows):
> db.session.add(TableA(field1, field2))
>
> # Time this part:
> db.session.flush()
>
> I used the code you provided above (instead of the flag you recently
> pushed), which allowed me to test both execute_batch and execute_values.
> Here was what that I used:
> @event.listens_for(Engine, "do_executemany")
> def do_executemany(cursor, statement, parameters, context):
> context.dialect.supports_sane_multi_rowcount = False
>
> # Option: executemany
> cursor.executemany(statement, parameters)
>
> # Option: execute_batch
> extras.execute_batch(cursor, statement, parameters)
>
> # Option: execute_batch
> statement = re.sub('VALUES.*', 'VALUES %s', statement)
> parameters = [(info['field1'], info['field2']) for info in parameters]
> extras.execute_values(cursor, statement, parameters)
>
> return True
>
> Obviously only one option was used at a time :)
>
> SQLAlchemy Results:
> Inserting 1000 rows:
> execute_values: 0.083958s
> execute_batch: 0.110223s
> executemany: 0.276129s
>
> Inserting 1 rows:
> execute_values: 1.243230s
> execute_batch: 1.388278s
> executemany: 3.131808s
>
> Inserting 10 rows:
> execute_values: 13.909975s
> execute_batch: 14.942507s
> executemany: 29.671092s
>
>
> Conclusions:
>
> execute_batch is a significant improvement over executemany (10x at the
> pyscopg2 layer)
> Subtracting the 11-12 seconds of SQLAlchemy/Python overhead for inserting
> 100,000 rows gives roughly the psycopg2 times for each execute_ option.
> The 5000% improvements stated in
> https://github.com/psycopg/psycopg2/issues/491#issuecomment-276551038 are
> probably exaggerated for most users - that was for "transatlantic network
> connections", whereas I imagine most users have databases much closer to
> their servers. However, it's still a 10x speed up over executemany at the
> pscyopg2 layer, and a ~2-3x speed up including SQLAlchemy overhead.
> execute_values is still twice as fast as execute_batch at the psycopg2 layer
> (for this specific table*), so incorporating that would be even better!
> execute_batch only helps with inserts to tables where all primary keys are
> defined (as you noted). Thus, if users want to see improvements for tables
> with auto-incrementing primary keys and relationships, they'll likely need
> to combine this with something like the suggestion in
> https://groups.google.com/forum/#!topic/sqlalchemy/GyAZTThJi2I
>
> Hopefully that was helpful :) Happy to help test in any other ways as well!
>
> On Wednesday, October 11, 2017 at 7:25:06 AM UTC-7, Mike Bayer wrote:
>>
>> On Wed, Oct 11, 2017 at 3:02 AM,   wrote:
>> > Hey Mike,
>> >
>> > Thanks again for the detailed explanations!
>> >
>> > I went ahead and tested the code snippet you gave me - I'm not sure I
>> > totally understand when this will change behavior though.
>> >
>> > I tried the following code snippet:
>> > (Snippet #1)
>> > for i in xrange(10):
>> > db.session.add(A(id=i))
>> > db.session.flush()
>> >
>> > This calls through to the "do_executemany" handler and only executes 1
>> > insert statement with multiple VALUES. However, this was already the
>> > existing behavior right?
>>
>> There's multiple forms of "one insert statement".
>>
>> There is traditional 

Re: [sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?

2017-10-11 Thread vineet
I ran some tests using raw psycopg2 on my local computer, and also ran some 
tests with SQLAlchemy. I misunderstood a few things in my tests above, but 
I've explained the tests below and think they are more accurate.

*Context:*
- Each test was run 5 times and averaged.
- Both the python and database were running on the same computer (Macbook 
Pro (2015), using Postgres 9.5 and Docker)

*Raw psycopg2:*
# Rough code:
extras.execute_values(cursor, 'INSERT INTO table_a (field1, field2) VALUES 
%s', [...])
extras.execute_batch(cursor, 'INSERT INTO table_a (field1, field2) VALUES 
(%s, %s)', [...])
cursor.executemany('INSERT INTO table_a (field1, field2) VALUES (%s, %s)', 
[...])

# Inserting 1000 rows
execute_values, elapsed time: 0.023967s
execute_batch, elapsed time: 0.051530s
executemany, elapsed time: 0.173563s


# Inserting 1 rows
execute_values, elapsed time: 0.268656s
execute_batch, elapsed time: 0.488736s
executemany, elapsed time: 2.017565s


# Inserting 10 rows
execute_values, elapsed time: 1.858675s
execute_batch, elapsed time: 4.062823s
executemany, elapsed time: 19.900875s


*SQLAlchemy layer:*
for field1, field2 in genome_infos(rows):
db.session.add(TableA(field1, field2))

# Time this part:
db.session.flush()

I used the code you provided above (instead of the flag you recently 
pushed), which allowed me to test both execute_batch and execute_values. 
Here was what that I used:
@event.listens_for(Engine, "do_executemany")
def do_executemany(cursor, statement, parameters, context):
context.dialect.supports_sane_multi_rowcount = False

# Option: executemany
cursor.executemany(statement, parameters)

# Option: execute_batch
extras.execute_batch(cursor, statement, parameters)

# Option: execute_batch
statement = re.sub('VALUES.*', 'VALUES %s', statement)
parameters = [(info['field1'], info['field2']) for info in parameters]
extras.execute_values(cursor, statement, parameters)

return True

Obviously only one option was used at a time :)

*SQLAlchemy Results:*
Inserting 1000 rows:
execute_values: 0.083958s
execute_batch: 0.110223s
executemany: 0.276129s

Inserting 1 rows:
execute_values: 1.243230s
execute_batch: 1.388278s
executemany: 3.131808s

Inserting 10 rows:
execute_values: 13.909975s
execute_batch: 14.942507s
executemany: 29.671092s


*Conclusions:*

   - execute_batch is a significant improvement over executemany (10x at 
   the pyscopg2 layer)
   - Subtracting the 11-12 seconds of SQLAlchemy/Python overhead for 
   inserting 100,000 rows gives roughly the psycopg2 times for each execute_ 
   option.
   - The 5000% improvements stated 
   in https://github.com/psycopg/psycopg2/issues/491#issuecomment-276551038 
   are probably exaggerated for most users - that was for "transatlantic 
   network connections", whereas I imagine most users have databases much 
   closer to their servers. However, it's still a 10x speed up over 
   executemany at the pscyopg2 layer, and a ~2-3x speed up including 
   SQLAlchemy overhead.
   - execute_values is still twice as fast as execute_batch at the psycopg2 
   layer (for this specific table*), so incorporating that would be even 
   better!
   - execute_batch only helps with inserts to tables where all primary keys 
   are defined (as you noted). Thus, if users want to see improvements for 
   tables with auto-incrementing primary keys and relationships, 
   they'll likely need to combine this with something like the suggestion 
   in https://groups.google.com/forum/#!topic/sqlalchemy/GyAZTThJi2I 

Hopefully that was helpful :) Happy to help test in any other ways as well!

On Wednesday, October 11, 2017 at 7:25:06 AM UTC-7, Mike Bayer wrote:
>
> On Wed, Oct 11, 2017 at 3:02 AM,   
> wrote: 
> > Hey Mike, 
> > 
> > Thanks again for the detailed explanations! 
> > 
> > I went ahead and tested the code snippet you gave me - I'm not sure I 
> > totally understand when this will change behavior though. 
> > 
> > I tried the following code snippet: 
> > (Snippet #1) 
> > for i in xrange(10): 
> > db.session.add(A(id=i)) 
> > db.session.flush() 
> > 
> > This calls through to the "do_executemany" handler and only executes 1 
> > insert statement with multiple VALUES. However, this was already the 
> > existing behavior right? 
>
> There's multiple forms of "one insert statement". 
>
> There is traditional "executemany", which to the database looks like: 
>
> INSERT INTO table (x, y) VALUES (:x, :y) 
>
> The above statement is invoked for each set of parameters. 
>
> then there is multiple values, which to the database looks like: 
>
> INSERT INTO table (x, y) VALUES (:x1, :y1) (:x2, :y2), (:x3, :y3), ... 
>
> The above statement is invoked once, with all sets of parameters at 
> once (in reality they are usually batched in groups and the statement 
> is invoked once per group).  MySQL drivers do this as does psycopg2 
> execute_values(). 
>
> then there is psycopg2 

Re: [sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?

2017-10-11 Thread Mike Bayer
On Wed, Oct 11, 2017 at 3:02 AM,   wrote:
> Hey Mike,
>
> Thanks again for the detailed explanations!
>
> I went ahead and tested the code snippet you gave me - I'm not sure I
> totally understand when this will change behavior though.
>
> I tried the following code snippet:
> (Snippet #1)
> for i in xrange(10):
> db.session.add(A(id=i))
> db.session.flush()
>
> This calls through to the "do_executemany" handler and only executes 1
> insert statement with multiple VALUES. However, this was already the
> existing behavior right?

There's multiple forms of "one insert statement".

There is traditional "executemany", which to the database looks like:

INSERT INTO table (x, y) VALUES (:x, :y)

The above statement is invoked for each set of parameters.

then there is multiple values, which to the database looks like:

INSERT INTO table (x, y) VALUES (:x1, :y1) (:x2, :y2), (:x3, :y3), ...

The above statement is invoked once, with all sets of parameters at
once (in reality they are usually batched in groups and the statement
is invoked once per group).  MySQL drivers do this as does psycopg2
execute_values().

then there is psycopg2 execute_batch which yes, I forgot, is doing
"insert into table (x,y) values (:x, :y); insert into table(x, y)
values (:x, :y), ...", it's not actually rewriting the values clause.

In any case, SQLAlchemy only knows about the first form within the ORM.




>
> When I don't specify a primary key:
> (Snippet #2)
> for i in xrange(10):
> db.session.add(A()) # don't pre-specify a primary key
> db.session.flush()
>
> This does *not* call through to the do_executemany handler, and executes 10
> insert statements.

the ORM must retrieve newly generated primary key values if they were
not already present, which cannot be done with the
cursor.executemany() method.  it must use cursor.execute().


>
> If I understand
> http://initd.org/psycopg/docs/extras.html#fast-execution-helpers correctly,
> execute_batch does *not* actually combine the insert statements into a
> single statement with multiple VALUES clauses, but instead does something
> like sends multiple INSERT statements to the server at once.

yes I forgot, the execute_batch is actually stringing the statements
together on a semicolon, not rewriting values().   However if you look
at the performance gains at
https://github.com/psycopg/psycopg2/issues/491#issuecomment-276551038,
there is negligible difference between batch and values, compared to
the default which is 5000% percent slower.

We would
> obviously still expect to see performance benefits from the reduced network
> round-trips, but it looks like the above tests do not show any improvement
> there.

I'm not familiar with what testing you are performing.   Are you
testing the new flag?   what happens if you run the tests at
https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com
e.g. https://gist.github.com/altaurog/84668e034646fb354b5de81bb86a580d?
 if the feature is working, and the performance gains are being lost
for some reason, then that's a major issue in psycopg2 we need to
report.

>
> Let me know if I'm testing that incorrectly - I'm also happy to test this on
> our actual production use cases to give you some more accurate speed-up
> times.
>
> On Tuesday, October 10, 2017 at 10:36:33 AM UTC-7, Mike Bayer wrote:
>>
>> I've added
>> https://bitbucket.org/zzzeek/sqlalchemy/issues/4109/support-psycopg2-batch-mode
>> which will have a code review up soon.
>>
>> On Tue, Oct 10, 2017 at 1:06 PM, Mike Bayer 
>> wrote:
>> > On Tue, Oct 10, 2017 at 1:47 AM,   wrote:
>> >> Hello! SQLAlchemy has the (really useful) behavior of batching together
>> >> inserts when all the primary keys are defined. It looks like it groups
>> >> similar tables together (while maintaining insert order ) to minimize
>> >> the
>> >> number of Insert statements sent to the database.
>> >>
>> >> I'm unsure what the expected behavior here is for single-table
>> >> inheritance
>> >> tables. Here's some example models:
>> >> class Base(db.Model):
>> >> __tablename__ = 'base'
>> >> id = db.Column(db.Integer, primary_key=True)
>> >> base_type = db.Column(db.String)
>> >>
>> >> __mapper_args__ = {
>> >> 'polymorphic_on': base_type,
>> >> }
>> >>
>> >> def __init__(self, id_):
>> >> self.id = id_
>> >>
>> >>
>> >> class SubBase1(Base):
>> >> __mapper_args__ = {
>> >> 'polymorphic_identity': '1',
>> >> }
>> >>
>> >> col = db.Column(db.String)
>> >>
>> >>
>> >> class SubBase2(Base):
>> >> __mapper_args__ = {
>> >> 'polymorphic_identity': '2',
>> >> }
>> >>
>> >> col2 = db.Column(db.String)
>> >>
>> >>
>> >> class OtherModel(db.Model):
>> >> id = db.Column(db.Integer, primary_key=True)
>> >>
>> >> def __init__(self, id_):
>> >> self.id = id_
>> >>
>> >>
>> >> Base, SubBase, and SubBase2 form a tree, while 

Re: [sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?

2017-10-10 Thread Mike Bayer
I've added 
https://bitbucket.org/zzzeek/sqlalchemy/issues/4109/support-psycopg2-batch-mode
which will have a code review up soon.

On Tue, Oct 10, 2017 at 1:06 PM, Mike Bayer  wrote:
> On Tue, Oct 10, 2017 at 1:47 AM,   wrote:
>> Hello! SQLAlchemy has the (really useful) behavior of batching together
>> inserts when all the primary keys are defined. It looks like it groups
>> similar tables together (while maintaining insert order ) to minimize the
>> number of Insert statements sent to the database.
>>
>> I'm unsure what the expected behavior here is for single-table inheritance
>> tables. Here's some example models:
>> class Base(db.Model):
>> __tablename__ = 'base'
>> id = db.Column(db.Integer, primary_key=True)
>> base_type = db.Column(db.String)
>>
>> __mapper_args__ = {
>> 'polymorphic_on': base_type,
>> }
>>
>> def __init__(self, id_):
>> self.id = id_
>>
>>
>> class SubBase1(Base):
>> __mapper_args__ = {
>> 'polymorphic_identity': '1',
>> }
>>
>> col = db.Column(db.String)
>>
>>
>> class SubBase2(Base):
>> __mapper_args__ = {
>> 'polymorphic_identity': '2',
>> }
>>
>> col2 = db.Column(db.String)
>>
>>
>> class OtherModel(db.Model):
>> id = db.Column(db.Integer, primary_key=True)
>>
>> def __init__(self, id_):
>> self.id = id_
>>
>>
>> Base, SubBase, and SubBase2 form a tree, while OtherModel is just another
>> model.
>>
>> When creating alternating SubBase1s and OtherModels, SQLAlchemy batches the
>> INSERTs such that only two INSERT statements are sent to the DB:
>> (Snippet #1)
>> for i in xrange(0, 10, 2):
>> db.session.add(SubBase1(i))
>> db.session.add(OtherModel(i + 1))
>>
>> db.session.flush()  # Performs 2 insert statements, each with 5 elements in
>> the VALUES clause
>>
>> However, when creating alternating SubBase1s and SubBase2s, it actually
>> performs 10 separate insert statements:
>> (Snippet #2)
>> for i in xrange(0, 10, 2):
>> db.session.add(SubBase1(i))
>> db.session.add(SubBase2(i + 1))
>> db.session.flush()  # Performs 10 insert statements, each with 1 element in
>> the VALUES clause
>>
>>
>> It seems like SQLAlchemy maintains the insert_order *across* SubBase1 and
>> SubBase2, but also isn't able to do a single INSERT statement with 10
>> elements in the VALUES clause (or 2 INSERT statements with 5 elements each).
>>
>> Questions:
>> 1) Is Snippet #2 the desired behavior? I read through the unit-of-work
>> summary at http://www.aosabook.org/en/sqlalchemy.html, but I wasn't totally
>> sure what the desired behavior was for single-table inheritance models.
>
> yes because you'll note because sub1 and sub2 have different columns
> mapped, the actual columns in each INSERT statement are different.
> At the same time it is maintaining INSERT order into the table
> overall, so it has no choice but to break things up in this way.
>
>> 2) If this is desired behavior, do you have any suggestions of how to
>> optimize the number of queries here in e.g. a before_flush hook? Obviously
>> one way would be to rewrite our application such that inserts of like tables
>> happen together, but if there is a way to do this without rewriting the
>> application code, that would be even better.
>
> if your INSERTs are very simplistic then you can still consider using
> the bulk_save option, gathering your classes into lists and then doing
> your own bulk operation where you ensure all like-rows are grouped
> together ahead of time.
>
> Otherwise, if you really are trying to thread the needle through "I
> want full ORM flush convenience" plus "I need to optimize INSERTs into
> batches", you have one more option which is to manipulate the
> insert_order on the states like this:
>
> for i in xrange(0, 10, 2):
> s.add(SubBase1(i))
> s.add(SubBase2(i + 1))
>
> from sqlalchemy import inspect
>
> for idx, obj in enumerate(
> sorted(
> s.new,
> key=lambda obj: (obj.__class__, inspect(obj).insert_order)
> )
> ):
> inspect(obj).insert_order = idx
>
>
> I've not documented "insert_order" as an official thing but I can support 
> that.
>
> But also if you're really trying to maximize INSERT performance you
> need to use the psycopg2 executemany() extensions:
> http://initd.org/psycopg/docs/extras.html#fast-execution-helpers.
> These were added specifically from an email thread I started with them
> in response to user complaints about performance, but SQLAlchemy has
> not yet built these in as features.
>
> I've done no testing at all of this, however if you have the
> resources, you can turn on statement logging on your Postgresql
> database which will hopefully show INSERT statements being collapsed
> into a single statement with multiple VALUES:
>
> from psycopg2 import extras
> from sqlalchemy import event
>
> e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
>
> @event.listens_for(e, 

Re: [sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?

2017-10-10 Thread Mike Bayer
On Tue, Oct 10, 2017 at 1:47 AM,   wrote:
> Hello! SQLAlchemy has the (really useful) behavior of batching together
> inserts when all the primary keys are defined. It looks like it groups
> similar tables together (while maintaining insert order ) to minimize the
> number of Insert statements sent to the database.
>
> I'm unsure what the expected behavior here is for single-table inheritance
> tables. Here's some example models:
> class Base(db.Model):
> __tablename__ = 'base'
> id = db.Column(db.Integer, primary_key=True)
> base_type = db.Column(db.String)
>
> __mapper_args__ = {
> 'polymorphic_on': base_type,
> }
>
> def __init__(self, id_):
> self.id = id_
>
>
> class SubBase1(Base):
> __mapper_args__ = {
> 'polymorphic_identity': '1',
> }
>
> col = db.Column(db.String)
>
>
> class SubBase2(Base):
> __mapper_args__ = {
> 'polymorphic_identity': '2',
> }
>
> col2 = db.Column(db.String)
>
>
> class OtherModel(db.Model):
> id = db.Column(db.Integer, primary_key=True)
>
> def __init__(self, id_):
> self.id = id_
>
>
> Base, SubBase, and SubBase2 form a tree, while OtherModel is just another
> model.
>
> When creating alternating SubBase1s and OtherModels, SQLAlchemy batches the
> INSERTs such that only two INSERT statements are sent to the DB:
> (Snippet #1)
> for i in xrange(0, 10, 2):
> db.session.add(SubBase1(i))
> db.session.add(OtherModel(i + 1))
>
> db.session.flush()  # Performs 2 insert statements, each with 5 elements in
> the VALUES clause
>
> However, when creating alternating SubBase1s and SubBase2s, it actually
> performs 10 separate insert statements:
> (Snippet #2)
> for i in xrange(0, 10, 2):
> db.session.add(SubBase1(i))
> db.session.add(SubBase2(i + 1))
> db.session.flush()  # Performs 10 insert statements, each with 1 element in
> the VALUES clause
>
>
> It seems like SQLAlchemy maintains the insert_order *across* SubBase1 and
> SubBase2, but also isn't able to do a single INSERT statement with 10
> elements in the VALUES clause (or 2 INSERT statements with 5 elements each).
>
> Questions:
> 1) Is Snippet #2 the desired behavior? I read through the unit-of-work
> summary at http://www.aosabook.org/en/sqlalchemy.html, but I wasn't totally
> sure what the desired behavior was for single-table inheritance models.

yes because you'll note because sub1 and sub2 have different columns
mapped, the actual columns in each INSERT statement are different.
At the same time it is maintaining INSERT order into the table
overall, so it has no choice but to break things up in this way.

> 2) If this is desired behavior, do you have any suggestions of how to
> optimize the number of queries here in e.g. a before_flush hook? Obviously
> one way would be to rewrite our application such that inserts of like tables
> happen together, but if there is a way to do this without rewriting the
> application code, that would be even better.

if your INSERTs are very simplistic then you can still consider using
the bulk_save option, gathering your classes into lists and then doing
your own bulk operation where you ensure all like-rows are grouped
together ahead of time.

Otherwise, if you really are trying to thread the needle through "I
want full ORM flush convenience" plus "I need to optimize INSERTs into
batches", you have one more option which is to manipulate the
insert_order on the states like this:

for i in xrange(0, 10, 2):
s.add(SubBase1(i))
s.add(SubBase2(i + 1))

from sqlalchemy import inspect

for idx, obj in enumerate(
sorted(
s.new,
key=lambda obj: (obj.__class__, inspect(obj).insert_order)
)
):
inspect(obj).insert_order = idx


I've not documented "insert_order" as an official thing but I can support that.

But also if you're really trying to maximize INSERT performance you
need to use the psycopg2 executemany() extensions:
http://initd.org/psycopg/docs/extras.html#fast-execution-helpers.
These were added specifically from an email thread I started with them
in response to user complaints about performance, but SQLAlchemy has
not yet built these in as features.

I've done no testing at all of this, however if you have the
resources, you can turn on statement logging on your Postgresql
database which will hopefully show INSERT statements being collapsed
into a single statement with multiple VALUES:

from psycopg2 import extras
from sqlalchemy import event

e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)

@event.listens_for(e, "do_executemany")
def do_executemany(cursor, statement, parameters, context):
context.dialect.supports_sane_multi_rowcount = False
extras.execute_batch(cursor, statement, parameters)
return True

if you get the time to test this, please let me know as I do want to
add support for "execute_batch" directly into the psycopg2 dialect.
thanks!






One option I've thought of is:
> 

[sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?

2017-10-09 Thread vineet
Hello! SQLAlchemy has the (really useful) behavior of batching together 
inserts when all the primary keys are defined. It looks like it groups 
similar tables together (while maintaining insert order ) to minimize the 
number of Insert statements sent to the database.

I'm unsure what the expected behavior here is for single-table inheritance 
tables. Here's some example models:
class Base(db.Model):
__tablename__ = 'base'
id = db.Column(db.Integer, primary_key=True)
base_type = db.Column(db.String)

__mapper_args__ = {
'polymorphic_on': base_type,
}

def __init__(self, id_):
self.id = id_


class SubBase1(Base):
__mapper_args__ = {
'polymorphic_identity': '1',
}

col = db.Column(db.String)


class SubBase2(Base):
__mapper_args__ = {
'polymorphic_identity': '2',
}

col2 = db.Column(db.String)


class OtherModel(db.Model):
id = db.Column(db.Integer, primary_key=True)

def __init__(self, id_):
self.id = id_


Base, SubBase, and SubBase2 form a tree, while OtherModel is just another 
model.

When creating alternating SubBase1s and OtherModels, SQLAlchemy batches the 
INSERTs such that only two INSERT statements are sent to the DB:
(Snippet #1)
for i in xrange(0, 10, 2):
db.session.add(SubBase1(i))
db.session.add(OtherModel(i + 1))

db.session.flush()  # Performs 2 insert statements, each with 5 elements in 
the VALUES clause

However, when creating alternating SubBase1s and SubBase2s, it actually 
performs 10 separate insert statements:
(Snippet #2)
for i in xrange(0, 10, 2):
db.session.add(SubBase1(i))
db.session.add(SubBase2(i + 1))
db.session.flush()  # Performs 10 insert statements, each with 1 element in 
the VALUES clause


It seems like SQLAlchemy maintains the insert_order *across* SubBase1 and 
SubBase2, but also isn't able to do a single INSERT statement with 10 
elements in the VALUES clause (or 2 INSERT statements with 5 elements each).

Questions:
1) Is Snippet #2 the desired behavior? I read through the unit-of-work 
summary at http://www.aosabook.org/en/sqlalchemy.html, but I wasn't totally 
sure what the desired behavior was for single-table inheritance models.
2) If this is desired behavior, do you have any suggestions of how to 
optimize the number of queries here in e.g. a before_flush hook? Obviously 
one way would be to rewrite our application such that inserts of like 
tables happen together, but if there is a way to do this without rewriting 
the application code, that would be even better. One option I've thought of 
is:
  - Overwrite the insert_order in the before_flush hook to rearrange the 
models in the order I want. (Not sure if this has effects that I'm unaware 
of though)

Thanks!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.