Re: [sqlalchemy] Speed issue with bulk inserts

2016-12-23 Thread Jonathan Vanasco
Does this issue consistently repeat within a transaction block?  Does it 
still happen if you reverse the tests?  

I've run into similar issues in the past, and the problem was often from 
postgresql checking indexes -- the first test would stall because indexes 
needed to be read into memory, then the second test would fly.  If I waited 
more than 5 seconds (or outside of a transaction), postgres would lose the 
indexes from memory.  Triggers are likely too, but index issues are often 
more common.

If that's the case, what worked for me:
* increasing the cache allocation for postgres
* using partial/function indexes on some columns, and adjusted the query to 
hint against them (otherwise the planner won't use it)
* upgrading to postgres 9.6 (searching is a lot better thanks to a few 
improvements and a new parallel search feature)

Something that also worked for me in debugging this sort of stuff was using 
query-logging in postgres to grab the actual processed query, then running 
"explain analyze" on it and seeing where the slowdown happened.  the only 
way to clear out the postgres index cache is to stop/start the server , so 
it can be a pain to get every condition timed right.  the results from 
"explain analyze" often vary wildly from running the query normally with 
timing on.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Speed issue with bulk inserts

2016-12-23 Thread Mike Bayer
Does this table have triggers of some kind on it ?  I've been asking on the
psycopg2 list about this as this is not the first time this has come up.

On Dec 23, 2016 8:46 AM, "mike bayer"  wrote:

> those are two different kinds of INSERT statements.   To compare to Core
> you need to run like this:
>
>
>  engine.execute(
>TrialLocations.__table__.insert(),
>trial_location_core_inserts
>  )
>
> that will run executemany() on the psycopg2 side, which internally will
> run 223 INSERT statements.   On a normal setup this should run like
> lightning, 223 is extremely low.   You may have some unusual network
> overhead.   In the old days we'd say the table needs to be vacuumed but I
> had the impression that by the 9.x series this is mostly automatic, though
> maybe try running vacuuming on that table (https://www.postgresql.org/do
> cs/9.1/static/sql-vacuum.html)
>
>
>
> On 12/23/2016 03:25 AM, Brian Clark wrote:
>
>> So I'm having an issue with a very slow insert, I'm inserting 223 items
>> and it takes 20+ seconds to execute. Any advice on what I'm doing wrong
>> and why it would be so slow? Using Postgresql 9.4.8
>>
>> The line of code
>>
>> LOG_OUTPUT('==PRE BULK==', True)
>> db_session.bulk_save_objects(trial_location_inserts)
>> LOG_OUTPUT('==POST BULK==', True)
>>
>> And here's the log from sqlalchemy echo
>>
>> 2016-12-23 07:37:52.570: ==PRE BULK==
>> 2016-12-22 23:37:52,572 INFO sqlalchemy.engine.base.Engine INSERT
>> INTO trial_locations (status, trial_id, location_id, active) VALUES
>> (%(status)s, %(trial_id)s, %(location_id)s, %(active)s)
>> 2016-12-22 23:37:52,572 INFO sqlalchemy.engine.base.Engine
>> ({'status': u'Completed', 'active': True, 'location_id': 733,
>> 'trial_id': 126625}, {'status': u'Completed', 'active': True,
>> 'location_id': 716, 'trial_id': 126625}, {'status': u'Completed',
>> 'active': True, 'location_id': 1033, 'trial_id': 126625}, {'status':
>> u'Completed', 'active': True, 'location_id': 1548, 'trial_id': 126625},
>> {'status': u'Completed', 'active': True, 'location_id': 1283,
>> 'trial_id': 126625}, {'status': u'Completed', 'active': True,
>> 'location_id': 1556, 'trial_id': 126625}, {'status': u'Completed',
>> 'active': True, 'location_id': 4271, 'trial_id': 126625}, {'status':
>> u'Completed', 'active': True, 'location_id': 1567, 'trial_id': 126625}
>>  ... displaying 10 of 223 total bound parameter sets ...  {'status':
>> u'Completed', 'active': True, 'location_id': 1528, 'trial_id': 126625},
>> {'status': u'Completed', 'active': True, 'location_id': 1529,
>> 'trial_id': 126625})
>> 2016-12-23 07:38:14.270: ==POST BULK==
>>
>>
>> Also for comparison I rewrote it in Sqlalchemy core
>>
>>
>> LOG_OUTPUT('==PRE BULK==', True)
>> engine.execute(
>>   TrialLocations.__table__.insert().values(
>> trial_location_core_inserts
>>   )
>> )
>> # db_session.bulk_save_objects(trial_location_inserts)
>> LOG_OUTPUT('==POST BULK==', True)
>>
>> and it ran in 0.028 seconds
>>
>> 2016-12-23 08:11:26.097: ==PRE BULK==
>> ...
>> 2016-12-23 08:11:27.025: ==POST BULK==
>>
>> I'd like to keep it in session for the sake of this being the end part
>> of a transaction and if the whole thing fails I want it all to rollback.
>> Am I doing something wrong? I don't need the data afterwards it just
>> needs to be saved
>>
>> Help appreciated!
>>
>> --
>> SQLAlchemy -
>> The Python SQL Toolkit and Object Relational Mapper
>>
>> http://www.sqlalchemy.org/
>>
>> To post example code, please provide an MCVE: Minimal, Complete, and
>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> description.
>> ---
>> You received this message because you are subscribed to the Google
>> Groups "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to sqlalchemy+unsubscr...@googlegroups.com
>> .
>> To post to this group, send email to sqlalchemy@googlegroups.com
>> .
>> Visit this group at https://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more 

Re: [sqlalchemy] Speed issue with bulk inserts

2016-12-23 Thread mike bayer
those are two different kinds of INSERT statements.   To compare to Core 
you need to run like this:



 engine.execute(
   TrialLocations.__table__.insert(),
   trial_location_core_inserts
 )

that will run executemany() on the psycopg2 side, which internally will 
run 223 INSERT statements.   On a normal setup this should run like 
lightning, 223 is extremely low.   You may have some unusual network 
overhead.   In the old days we'd say the table needs to be vacuumed but 
I had the impression that by the 9.x series this is mostly automatic, 
though maybe try running vacuuming on that table 
(https://www.postgresql.org/docs/9.1/static/sql-vacuum.html)




On 12/23/2016 03:25 AM, Brian Clark wrote:

So I'm having an issue with a very slow insert, I'm inserting 223 items
and it takes 20+ seconds to execute. Any advice on what I'm doing wrong
and why it would be so slow? Using Postgresql 9.4.8

The line of code

LOG_OUTPUT('==PRE BULK==', True)
db_session.bulk_save_objects(trial_location_inserts)
LOG_OUTPUT('==POST BULK==', True)

And here's the log from sqlalchemy echo

2016-12-23 07:37:52.570: ==PRE BULK==
2016-12-22 23:37:52,572 INFO sqlalchemy.engine.base.Engine INSERT
INTO trial_locations (status, trial_id, location_id, active) VALUES
(%(status)s, %(trial_id)s, %(location_id)s, %(active)s)
2016-12-22 23:37:52,572 INFO sqlalchemy.engine.base.Engine
({'status': u'Completed', 'active': True, 'location_id': 733,
'trial_id': 126625}, {'status': u'Completed', 'active': True,
'location_id': 716, 'trial_id': 126625}, {'status': u'Completed',
'active': True, 'location_id': 1033, 'trial_id': 126625}, {'status':
u'Completed', 'active': True, 'location_id': 1548, 'trial_id': 126625},
{'status': u'Completed', 'active': True, 'location_id': 1283,
'trial_id': 126625}, {'status': u'Completed', 'active': True,
'location_id': 1556, 'trial_id': 126625}, {'status': u'Completed',
'active': True, 'location_id': 4271, 'trial_id': 126625}, {'status':
u'Completed', 'active': True, 'location_id': 1567, 'trial_id': 126625}
 ... displaying 10 of 223 total bound parameter sets ...  {'status':
u'Completed', 'active': True, 'location_id': 1528, 'trial_id': 126625},
{'status': u'Completed', 'active': True, 'location_id': 1529,
'trial_id': 126625})
2016-12-23 07:38:14.270: ==POST BULK==


Also for comparison I rewrote it in Sqlalchemy core


LOG_OUTPUT('==PRE BULK==', True)
engine.execute(
  TrialLocations.__table__.insert().values(
trial_location_core_inserts
  )
)
# db_session.bulk_save_objects(trial_location_inserts)
LOG_OUTPUT('==POST BULK==', True)

and it ran in 0.028 seconds

2016-12-23 08:11:26.097: ==PRE BULK==
...
2016-12-23 08:11:27.025: ==POST BULK==

I'd like to keep it in session for the sake of this being the end part
of a transaction and if the whole thing fails I want it all to rollback.
Am I doing something wrong? I don't need the data afterwards it just
needs to be saved

Help appreciated!

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and
Verifiable Example. See http://stackoverflow.com/help/mcve for a full
description.
---
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+unsubscr...@googlegroups.com
.
To post to this group, send email to sqlalchemy@googlegroups.com
.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper


http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Speed issue with bulk inserts

2016-12-23 Thread Brian Clark
So I'm having an issue with a very slow insert, I'm inserting 223 items and 
it takes 20+ seconds to execute. Any advice on what I'm doing wrong and why 
it would be so slow? Using Postgresql 9.4.8

The line of code

LOG_OUTPUT('==PRE BULK==', True)
db_session.bulk_save_objects(trial_location_inserts)
LOG_OUTPUT('==POST BULK==', True)

And here's the log from sqlalchemy echo

2016-12-23 07:37:52.570: ==PRE BULK==
2016-12-22 23:37:52,572 INFO sqlalchemy.engine.base.Engine INSERT INTO 
trial_locations (status, trial_id, location_id, active) VALUES (%(status)s, 
%(trial_id)s, %(location_id)s, %(active)s)
2016-12-22 23:37:52,572 INFO sqlalchemy.engine.base.Engine ({'status': 
u'Completed', 'active': True, 'location_id': 733, 'trial_id': 126625}, 
{'status': u'Completed', 'active': True, 'location_id': 716, 'trial_id': 
126625}, {'status': u'Completed', 'active': True, 'location_id': 1033, 
'trial_id': 126625}, {'status': u'Completed', 'active': True, 
'location_id': 1548, 'trial_id': 126625}, {'status': u'Completed', 
'active': True, 'location_id': 1283, 'trial_id': 126625}, {'status': 
u'Completed', 'active': True, 'location_id': 1556, 'trial_id': 126625}, 
{'status': u'Completed', 'active': True, 'location_id': 4271, 'trial_id': 
126625}, {'status': u'Completed', 'active': True, 'location_id': 1567, 
'trial_id': 126625}  ... displaying 10 of 223 total bound parameter sets 
...  {'status': u'Completed', 'active': True, 'location_id': 1528, 
'trial_id': 126625}, {'status': u'Completed', 'active': True, 
'location_id': 1529, 'trial_id': 126625})
2016-12-23 07:38:14.270: ==POST BULK==


Also for comparison I rewrote it in Sqlalchemy core 
  

LOG_OUTPUT('==PRE BULK==', True)
engine.execute(
  TrialLocations.__table__.insert().values(
trial_location_core_inserts
  )
)
# db_session.bulk_save_objects(trial_location_inserts)
LOG_OUTPUT('==POST BULK==', True)

and it ran in 0.028 seconds

2016-12-23 08:11:26.097: ==PRE BULK==
...
2016-12-23 08:11:27.025: ==POST BULK==

I'd like to keep it in session for the sake of this being the end part of a 
transaction and if the whole thing fails I want it all to rollback. Am I 
doing something wrong? I don't need the data afterwards it just needs to be 
saved

Help appreciated!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.