Re: [sqlalchemy] Speed up bulk inserts

2014-08-19 Thread Michael Bayer
sure that feature is here: 
http://docs.sqlalchemy.org/en/rel_0_9/core/dml.html?highlight=insert%20values#sqlalchemy.sql.expression.Insert.values.params.*args


On Aug 18, 2014, at 7:32 PM, Anton anschat...@gmail.com wrote:

 Hi Michael,
 
 I checked your post on the stackoverflow, could you help me figure something 
 out regarding usage of Core.
 In the example you provided you have a version with test_sqlite3 which uses 
 customer user supplied query so it issues only one insert statement with all 
 rows in values clause. I wonder if this can be achieved by using Core. I 
 tried this approach:
 session.connection().execute(
 mytable.insert(),
 raw_list
 )
 where raw_list is a list of dictionaries to insert and this issues a separate 
 statement for every row, which is probably expected. But then I came across 
 this docs page 
 http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements
  in the example with a list of dictionaries there is a single insert 
 statement issues with multi-row values clause. I wonder if I can achieve the 
 same behavior. I am using PostgreSQL 9.3.4 and SQLAlchemy==0.9.4
 Thanks!
 
 Best,
 Anton. 
 
 On Wednesday, November 6, 2013 12:58:53 PM UTC-8, Michael Bayer wrote:
 I wrote a full post regarding this topic on stackoverflow at  
 http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
  .  If you start with this, I can answer more specific questions. 
 
 
 On Nov 6, 2013, at 10:28 AM, Achim Domma do...@procoders.net wrote: 
 
  Hi, 
  
  I want to speed up my SqlAlchemy bulk inserting code and yes, I'm aware 
  that this is not the main purpose of SqlAlchemy and all databases have 
  faster low level import tools. 
  
  The background is the following: We import data from various sources and 
  apply various mappings. Currently I'm willing to trade raw runtime for much 
  simpler code, which is much easier to maintain. But I still want my code to 
  run as fast as it's possible with those assumptions. 
  
  There are two scenarios which I want to optimize: 
  
  1. Flat inserts without relations, but with unique ids generated inside the 
  database: In that case, SqlAlchemy retrieves the unique ids from the 
  database, but those ids are never used in my import process. I thought 
  about generating an insert statement out of an object. Obviously SqlAlchemy 
  has to do that too, so there might be some existing tool for that? The 
  other option would be, to tell SqlAlchemy to ignore the generated id an to 
  not retrieve it from the database. Is that possible? 
  
  2. Inserts using relations and unique ids generated in the database: I 
  think SqlAlchemy is already quite good at optimizing this and sending 
  objects as batches. If there are any tweeks and tricks to speed up inserts 
  having relations, I would be happy to hear them. 
  
  cheers, 
  Achim 
  
  -- 
  You received this message because you are subscribed to the Google Groups 
  sqlalchemy group. 
  To unsubscribe from this group and stop receiving emails from it, send an 
  email to sqlalchemy+...@googlegroups.com. 
  To post to this group, send email to sqlal...@googlegroups.com. 
  Visit this group at http://groups.google.com/group/sqlalchemy. 
  For more options, visit https://groups.google.com/groups/opt_out. 
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Speed up bulk inserts

2014-08-18 Thread Anton
Hi Michael,

I checked your post on the stackoverflow, could you help me figure 
something out regarding usage of Core.
In the example you provided you have a version with test_sqlite3 which uses 
customer user supplied query so it issues only one insert statement with 
all rows in values clause. I wonder if this can be achieved by using Core. 
I tried this approach:
session.connection().execute(
mytable.insert(),
raw_list
)
where raw_list is a list of dictionaries to insert and this issues a 
separate statement for every row, which is probably expected. But then I 
came across this docs page 
http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements
 
in the example with a list of dictionaries there is a single insert 
statement issues with multi-row values clause. I wonder if I can achieve 
the same behavior. I am using PostgreSQL 9.3.4 and SQLAlchemy==0.9.4
Thanks!

Best,
Anton. 

On Wednesday, November 6, 2013 12:58:53 PM UTC-8, Michael Bayer wrote:

 I wrote a full post regarding this topic on stackoverflow at  
 http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
  
 .  If you start with this, I can answer more specific questions. 


 On Nov 6, 2013, at 10:28 AM, Achim Domma do...@procoders.net 
 javascript: wrote: 

  Hi, 
  
  I want to speed up my SqlAlchemy bulk inserting code and yes, I'm aware 
 that this is not the main purpose of SqlAlchemy and all databases have 
 faster low level import tools. 
  
  The background is the following: We import data from various sources and 
 apply various mappings. Currently I'm willing to trade raw runtime for much 
 simpler code, which is much easier to maintain. But I still want my code to 
 run as fast as it's possible with those assumptions. 
  
  There are two scenarios which I want to optimize: 
  
  1. Flat inserts without relations, but with unique ids generated inside 
 the database: In that case, SqlAlchemy retrieves the unique ids from the 
 database, but those ids are never used in my import process. I thought 
 about generating an insert statement out of an object. Obviously SqlAlchemy 
 has to do that too, so there might be some existing tool for that? The 
 other option would be, to tell SqlAlchemy to ignore the generated id an to 
 not retrieve it from the database. Is that possible? 
  
  2. Inserts using relations and unique ids generated in the database: I 
 think SqlAlchemy is already quite good at optimizing this and sending 
 objects as batches. If there are any tweeks and tricks to speed up inserts 
 having relations, I would be happy to hear them. 
  
  cheers, 
  Achim 
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups sqlalchemy group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an email to sqlalchemy+...@googlegroups.com javascript:. 
  To post to this group, send email to sqlal...@googlegroups.com 
 javascript:. 
  Visit this group at http://groups.google.com/group/sqlalchemy. 
  For more options, visit https://groups.google.com/groups/opt_out. 



-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Achim

Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:

 I wrote a full post regarding this topic on stackoverflow at  
 http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768.
   If you start with this, I can answer more specific questions. 


The article was very helpful, thanks. I still want to figure out the best 
balance between convenience and speed for my use case. Do the following 
make sense and is possible?

I work only with Postgresql and I'm sure that all involved objects have a 
unique id column which is called 'id'.  So before doing a session.commit(), 
I could check how many objects are in my session. As I'm just bulk 
inserting, I know that all of them are new and don't have their id set yet. 
Now I ask the database for that number of new ids, iterate over the objects 
in my session and set the ids. Internally all ids would come from a single 
sequence, so I don't have to care about object types and so on. Afterwards 
SqlAlchemy should be aware that ids have already been set, so no generated 
ids have to be returned and the session.commit() should be much simpler and 
faster.

Sounds like a still quite simple, but hopefully much faster solution. Do 
you agree?

kind regards,
Achim

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Michael Bayer

On Nov 13, 2013, at 4:57 AM, Achim do...@procoders.net wrote:

 
 Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:
 I wrote a full post regarding this topic on stackoverflow at  
 http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
  .  If you start with this, I can answer more specific questions. 
 
 The article was very helpful, thanks. I still want to figure out the best 
 balance between convenience and speed for my use case. Do the following make 
 sense and is possible?
 
 I work only with Postgresql and I'm sure that all involved objects have a 
 unique id column which is called 'id'.  So before doing a session.commit(), I 
 could check how many objects are in my session. As I'm just bulk inserting, I 
 know that all of them are new and don't have their id set yet. Now I ask the 
 database for that number of new ids, iterate over the objects in my session 
 and set the ids. Internally all ids would come from a single sequence, so I 
 don't have to care about object types and so on. Afterwards SqlAlchemy should 
 be aware that ids have already been set, so no generated ids have to be 
 returned and the session.commit() should be much simpler and faster.
 
 Sounds like a still quite simple, but hopefully much faster solution. Do you 
 agree?


sure that should be fine, if you can pre-calc your PKs.   It just won’t work 
under any kind of concurrency, as in such a situation there could be 
interleaved INSERTs from different processes.




signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Michael Bayer

On Nov 13, 2013, at 11:52 AM, Claudio Freire klaussfre...@gmail.com wrote:

 On Wed, Nov 13, 2013 at 1:45 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:
 
 Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:
 
 I wrote a full post regarding this topic on stackoverflow at
 http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
 .  If you start with this, I can answer more specific questions.
 
 
 The article was very helpful, thanks. I still want to figure out the best
 balance between convenience and speed for my use case. Do the following make
 sense and is possible?
 
 I work only with Postgresql and I'm sure that all involved objects have a
 unique id column which is called 'id'.  So before doing a session.commit(),
 I could check how many objects are in my session. As I'm just bulk
 inserting, I know that all of them are new and don't have their id set yet.
 Now I ask the database for that number of new ids, iterate over the objects
 in my session and set the ids. Internally all ids would come from a single
 sequence, so I don't have to care about object types and so on. Afterwards
 SqlAlchemy should be aware that ids have already been set, so no generated
 ids have to be returned and the session.commit() should be much simpler and
 faster.
 
 Sounds like a still quite simple, but hopefully much faster solution. Do you
 agree?
 
 
 
 sure that should be fine, if you can pre-calc your PKs.   It just won’t work
 under any kind of concurrency, as in such a situation there could be
 interleaved INSERTs from different processes.
 
 
 Postgresql sequences already handle that kind of concurrency scenario.


how exactly, if two transactions T1 and T2 both pull a number from a sequence, 
T1 gets 40, T2 gets 41, then we pre-calc 10 inserts that have not yet occurred 
for each, T1 has 40-50, T2 has 41-51, the number is totally wrong for both - in 
reality it would be some random distribution of 40-60 between T1 and T2.   No ?




signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Achim
According to 
http://www.postgresql.org/docs/9.2/static/functions-sequence.html, 
sequences are non-transactional:

Advance the sequence object to its next value and return that value. This 
is done atomically: even if multiple sessions execute nextval concurrently, 
each will safely receive a distinct sequence value.
...
To avoid blocking concurrent transactions that obtain numbers from the 
same sequence, a nextval operation is never rolled back; that is, once a 
value has been fetched it is considered used, even if the transaction that 
did the nextval later aborts. This means that aborted transactions might 
leave unused holes in the sequence of assigned values.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Claudio Freire
On Wed, Nov 13, 2013 at 2:04 PM, Michael Bayer mike...@zzzcomputing.com wrote:

 On Nov 13, 2013, at 11:52 AM, Claudio Freire klaussfre...@gmail.com wrote:

 On Wed, Nov 13, 2013 at 1:45 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:

 Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:

 I wrote a full post regarding this topic on stackoverflow at
 http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
 .  If you start with this, I can answer more specific questions.


 The article was very helpful, thanks. I still want to figure out the best
 balance between convenience and speed for my use case. Do the following make
 sense and is possible?

 I work only with Postgresql and I'm sure that all involved objects have a
 unique id column which is called 'id'.  So before doing a session.commit(),
 I could check how many objects are in my session. As I'm just bulk
 inserting, I know that all of them are new and don't have their id set yet.
 Now I ask the database for that number of new ids, iterate over the objects
 in my session and set the ids. Internally all ids would come from a single
 sequence, so I don't have to care about object types and so on. Afterwards
 SqlAlchemy should be aware that ids have already been set, so no generated
 ids have to be returned and the session.commit() should be much simpler and
 faster.

 Sounds like a still quite simple, but hopefully much faster solution. Do you
 agree?



 sure that should be fine, if you can pre-calc your PKs.   It just won’t work
 under any kind of concurrency, as in such a situation there could be
 interleaved INSERTs from different processes.


 Postgresql sequences already handle that kind of concurrency scenario.


 how exactly, if two transactions T1 and T2 both pull a number from a 
 sequence, T1 gets 40, T2 gets 41, then we pre-calc 10 inserts that have not 
 yet occurred for each, T1 has 40-50, T2 has 41-51, the number is totally 
 wrong for both - in reality it would be some random distribution of 40-60 
 between T1 and T2.   No ?


No, you ask for 10 ids to the same sequence, and the sequence
allocates T1 40-49, and T2 50-59

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Michael Bayer

On Nov 13, 2013, at 12:22 PM, Claudio Freire klaussfre...@gmail.com wrote:

 On Wed, Nov 13, 2013 at 2:04 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:
 
 On Nov 13, 2013, at 11:52 AM, Claudio Freire klaussfre...@gmail.com wrote:
 
 On Wed, Nov 13, 2013 at 1:45 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:
 
 Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:
 
 I wrote a full post regarding this topic on stackoverflow at
 http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
 .  If you start with this, I can answer more specific questions.
 
 
 The article was very helpful, thanks. I still want to figure out the best
 balance between convenience and speed for my use case. Do the following 
 make
 sense and is possible?
 
 I work only with Postgresql and I'm sure that all involved objects have a
 unique id column which is called 'id'.  So before doing a session.commit(),
 I could check how many objects are in my session. As I'm just bulk
 inserting, I know that all of them are new and don't have their id set yet.
 Now I ask the database for that number of new ids, iterate over the objects
 in my session and set the ids. Internally all ids would come from a single
 sequence, so I don't have to care about object types and so on. Afterwards
 SqlAlchemy should be aware that ids have already been set, so no generated
 ids have to be returned and the session.commit() should be much simpler and
 faster.
 
 Sounds like a still quite simple, but hopefully much faster solution. Do 
 you
 agree?
 
 
 
 sure that should be fine, if you can pre-calc your PKs.   It just won’t 
 work
 under any kind of concurrency, as in such a situation there could be
 interleaved INSERTs from different processes.
 
 
 Postgresql sequences already handle that kind of concurrency scenario.
 
 
 how exactly, if two transactions T1 and T2 both pull a number from a 
 sequence, T1 gets 40, T2 gets 41, then we pre-calc 10 inserts that have not 
 yet occurred for each, T1 has 40-50, T2 has 41-51, the number is totally 
 wrong for both - in reality it would be some random distribution of 40-60 
 between T1 and T2.   No ?
 
 
 No, you ask for 10 ids to the same sequence, and the sequence
 allocates T1 40-49, and T2 50-59

oh right I’m thinking of the trick as applied to MySQL.Sure, if you 
actually run the seq’s you’re fine.


signature.asc
Description: Message signed with OpenPGP using GPGMail


[sqlalchemy] Speed up bulk inserts

2013-11-06 Thread Achim Domma
Hi,

I want to speed up my SqlAlchemy bulk inserting code and yes, I'm aware that 
this is not the main purpose of SqlAlchemy and all databases have faster low 
level import tools.

The background is the following: We import data from various sources and apply 
various mappings. Currently I'm willing to trade raw runtime for much simpler 
code, which is much easier to maintain. But I still want my code to run as fast 
as it's possible with those assumptions.

There are two scenarios which I want to optimize:

1. Flat inserts without relations, but with unique ids generated inside the 
database: In that case, SqlAlchemy retrieves the unique ids from the database, 
but those ids are never used in my import process. I thought about generating 
an insert statement out of an object. Obviously SqlAlchemy has to do that too, 
so there might be some existing tool for that? The other option would be, to 
tell SqlAlchemy to ignore the generated id an to not retrieve it from the 
database. Is that possible?

2. Inserts using relations and unique ids generated in the database: I think 
SqlAlchemy is already quite good at optimizing this and sending objects as 
batches. If there are any tweeks and tricks to speed up inserts having 
relations, I would be happy to hear them.

cheers,
Achim

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] Speed up bulk inserts

2013-11-06 Thread Michael Bayer
I wrote a full post regarding this topic on stackoverflow at  
http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
 .  If you start with this, I can answer more specific questions.


On Nov 6, 2013, at 10:28 AM, Achim Domma do...@procoders.net wrote:

 Hi,
 
 I want to speed up my SqlAlchemy bulk inserting code and yes, I'm aware that 
 this is not the main purpose of SqlAlchemy and all databases have faster low 
 level import tools.
 
 The background is the following: We import data from various sources and 
 apply various mappings. Currently I'm willing to trade raw runtime for much 
 simpler code, which is much easier to maintain. But I still want my code to 
 run as fast as it's possible with those assumptions.
 
 There are two scenarios which I want to optimize:
 
 1. Flat inserts without relations, but with unique ids generated inside the 
 database: In that case, SqlAlchemy retrieves the unique ids from the 
 database, but those ids are never used in my import process. I thought about 
 generating an insert statement out of an object. Obviously SqlAlchemy has to 
 do that too, so there might be some existing tool for that? The other option 
 would be, to tell SqlAlchemy to ignore the generated id an to not retrieve it 
 from the database. Is that possible?
 
 2. Inserts using relations and unique ids generated in the database: I think 
 SqlAlchemy is already quite good at optimizing this and sending objects as 
 batches. If there are any tweeks and tricks to speed up inserts having 
 relations, I would be happy to hear them.
 
 cheers,
 Achim
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy.
 For more options, visit https://groups.google.com/groups/opt_out.



signature.asc
Description: Message signed with OpenPGP using GPGMail