subject:"\[sqlalchemy\] Speed up bulk inserts"

Re: [sqlalchemy] Speed up bulk inserts

2014-08-19 Thread Michael Bayer

sure that feature is here:
http://docs.sqlalchemy.org/en/rel_0_9/core/dml.html?highlight=insert%20values#sqlalchemy.sql.expression.Insert.values.params.*args

On Aug 18, 2014, at 7:32 PM, Anton anschat...@gmail.com wrote:

Hi Michael,

I checked your post on the stackoverflow, could you help me figure something
out regarding usage of Core.
In the example you provided you have a version with test_sqlite3 which uses
customer user supplied query so it issues only one insert statement with all
rows in values clause. I wonder if this can be achieved by using Core. I
tried this approach:
session.connection().execute(
mytable.insert(),
raw_list
)
where raw_list is a list of dictionaries to insert and this issues a separate
statement for every row, which is probably expected. But then I came across
this docs page
http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements
in the example with a list of dictionaries there is a single insert
statement issues with multi-row values clause. I wonder if I can achieve the
same behavior. I am using PostgreSQL 9.3.4 and SQLAlchemy==0.9.4
Thanks!

Best,
Anton.

On Wednesday, November 6, 2013 12:58:53 PM UTC-8, Michael Bayer wrote:
I wrote a full post regarding this topic on stackoverflow at
http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
. If you start with this, I can answer more specific questions.

On Nov 6, 2013, at 10:28 AM, Achim Domma do...@procoders.net wrote:

Hi,

I want to speed up my SqlAlchemy bulk inserting code and yes, I'm aware
that this is not the main purpose of SqlAlchemy and all databases have
faster low level import tools.

The background is the following: We import data from various sources and
apply various mappings. Currently I'm willing to trade raw runtime for much
simpler code, which is much easier to maintain. But I still want my code to
run as fast as it's possible with those assumptions.

There are two scenarios which I want to optimize:

1. Flat inserts without relations, but with unique ids generated inside the
database: In that case, SqlAlchemy retrieves the unique ids from the
database, but those ids are never used in my import process. I thought
about generating an insert statement out of an object. Obviously SqlAlchemy
has to do that too, so there might be some existing tool for that? The
other option would be, to tell SqlAlchemy to ignore the generated id an to
not retrieve it from the database. Is that possible?

2. Inserts using relations and unique ids generated in the database: I
think SqlAlchemy is already quite good at optimizing this and sending
objects as batches. If there are any tweeks and tricks to speed up inserts
having relations, I would be happy to hear them.

cheers,
Achim

--
You received this message because you are subscribed to the Google Groups
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an
email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an
email to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Speed up bulk inserts

2014-08-18 Thread Anton

Hi Michael,

I checked your post on the stackoverflow, could you help me figure
something out regarding usage of Core.
In the example you provided you have a version with test_sqlite3 which uses
customer user supplied query so it issues only one insert statement with
all rows in values clause. I wonder if this can be achieved by using Core.
I tried this approach:
session.connection().execute(
mytable.insert(),
raw_list
)
where raw_list is a list of dictionaries to insert and this issues a
separate statement for every row, which is probably expected. But then I
came across this docs page
http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements

in the example with a list of dictionaries there is a single insert
statement issues with multi-row values clause. I wonder if I can achieve
the same behavior. I am using PostgreSQL 9.3.4 and SQLAlchemy==0.9.4
Thanks!

Best,
Anton.

On Wednesday, November 6, 2013 12:58:53 PM UTC-8, Michael Bayer wrote:

I wrote a full post regarding this topic on stackoverflow at
http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768

. If you start with this, I can answer more specific questions.

On Nov 6, 2013, at 10:28 AM, Achim Domma do...@procoders.net
javascript: wrote:

Hi,

I want to speed up my SqlAlchemy bulk inserting code and yes, I'm aware
that this is not the main purpose of SqlAlchemy and all databases have
faster low level import tools.

There are two scenarios which I want to optimize:

1. Flat inserts without relations, but with unique ids generated inside
the database: In that case, SqlAlchemy retrieves the unique ids from the
database, but those ids are never used in my import process. I thought
about generating an insert statement out of an object. Obviously SqlAlchemy
has to do that too, so there might be some existing tool for that? The
other option would be, to tell SqlAlchemy to ignore the generated id an to
not retrieve it from the database. Is that possible?

cheers,
Achim

--
You received this message because you are subscribed to the Google
Groups sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+...@googlegroups.com javascript:.
To post to this group, send email to sqlal...@googlegroups.com
javascript:.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Achim

Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:

The article was very helpful, thanks. I still want to figure out the best
balance between convenience and speed for my use case. Do the following
make sense and is possible?

I work only with Postgresql and I'm sure that all involved objects have a
unique id column which is called 'id'. So before doing a session.commit(),
I could check how many objects are in my session. As I'm just bulk
inserting, I know that all of them are new and don't have their id set yet.
Now I ask the database for that number of new ids, iterate over the objects
in my session and set the ids. Internally all ids would come from a single
sequence, so I don't have to care about object types and so on. Afterwards
SqlAlchemy should be aware that ids have already been set, so no generated
ids have to be returned and the session.commit() should be much simpler and
faster.

Sounds like a still quite simple, but hopefully much faster solution. Do
you agree?

kind regards,
Achim

--
You received this message because you are subscribed to the Google Groups
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Michael Bayer

On Nov 13, 2013, at 4:57 AM, Achim do...@procoders.net wrote:

Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:
I wrote a full post regarding this topic on stackoverflow at
http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
. If you start with this, I can answer more specific questions.

The article was very helpful, thanks. I still want to figure out the best
balance between convenience and speed for my use case. Do the following make
sense and is possible?

I work only with Postgresql and I'm sure that all involved objects have a
unique id column which is called 'id'. So before doing a session.commit(), I
could check how many objects are in my session. As I'm just bulk inserting, I
know that all of them are new and don't have their id set yet. Now I ask the
database for that number of new ids, iterate over the objects in my session
and set the ids. Internally all ids would come from a single sequence, so I
don't have to care about object types and so on. Afterwards SqlAlchemy should
be aware that ids have already been set, so no generated ids have to be
returned and the session.commit() should be much simpler and faster.

Sounds like a still quite simple, but hopefully much faster solution. Do you
agree?

sure that should be fine, if you can pre-calc your PKs. It just won’t work
under any kind of concurrency, as in such a situation there could be
interleaved INSERTs from different processes.

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Michael Bayer

On Nov 13, 2013, at 11:52 AM, Claudio Freire klaussfre...@gmail.com wrote:

On Wed, Nov 13, 2013 at 1:45 PM, Michael Bayer mike...@zzzcomputing.com
wrote:

Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:

The article was very helpful, thanks. I still want to figure out the best
balance between convenience and speed for my use case. Do the following make
sense and is possible?

Sounds like a still quite simple, but hopefully much faster solution. Do you
agree?

sure that should be fine, if you can pre-calc your PKs. It just won’t work
under any kind of concurrency, as in such a situation there could be
interleaved INSERTs from different processes.

Postgresql sequences already handle that kind of concurrency scenario.

how exactly, if two transactions T1 and T2 both pull a number from a sequence,
T1 gets 40, T2 gets 41, then we pre-calc 10 inserts that have not yet occurred
for each, T1 has 40-50, T2 has 41-51, the number is totally wrong for both - in
reality it would be some random distribution of 40-60 between T1 and T2. No ?

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Achim

According to 
http://www.postgresql.org/docs/9.2/static/functions-sequence.html, 
sequences are non-transactional:

Advance the sequence object to its next value and return that value. This 
is done atomically: even if multiple sessions execute nextval concurrently, 
each will safely receive a distinct sequence value.
...
To avoid blocking concurrent transactions that obtain numbers from the 
same sequence, a nextval operation is never rolled back; that is, once a 
value has been fetched it is considered used, even if the transaction that 
did the nextval later aborts. This means that aborted transactions might 
leave unused holes in the sequence of assigned values.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Claudio Freire

On Wed, Nov 13, 2013 at 2:04 PM, Michael Bayer mike...@zzzcomputing.com wrote:

On Nov 13, 2013, at 11:52 AM, Claudio Freire klaussfre...@gmail.com wrote:

On Wed, Nov 13, 2013 at 1:45 PM, Michael Bayer mike...@zzzcomputing.com
wrote:

Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:

The article was very helpful, thanks. I still want to figure out the best
balance between convenience and speed for my use case. Do the following make
sense and is possible?

Sounds like a still quite simple, but hopefully much faster solution. Do you
agree?

sure that should be fine, if you can pre-calc your PKs. It just won’t work
under any kind of concurrency, as in such a situation there could be
interleaved INSERTs from different processes.

Postgresql sequences already handle that kind of concurrency scenario.

how exactly, if two transactions T1 and T2 both pull a number from a
sequence, T1 gets 40, T2 gets 41, then we pre-calc 10 inserts that have not
yet occurred for each, T1 has 40-50, T2 has 41-51, the number is totally
wrong for both - in reality it would be some random distribution of 40-60
between T1 and T2. No ?

No, you ask for 10 ids to the same sequence, and the sequence
allocates T1 40-49, and T2 50-59

--
You received this message because you are subscribed to the Google Groups
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Michael Bayer

On Nov 13, 2013, at 12:22 PM, Claudio Freire klaussfre...@gmail.com wrote:

On Wed, Nov 13, 2013 at 2:04 PM, Michael Bayer mike...@zzzcomputing.com
wrote:

On Nov 13, 2013, at 11:52 AM, Claudio Freire klaussfre...@gmail.com wrote:

On Wed, Nov 13, 2013 at 1:45 PM, Michael Bayer mike...@zzzcomputing.com
wrote:

Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:

The article was very helpful, thanks. I still want to figure out the best
balance between convenience and speed for my use case. Do the following
make
sense and is possible?

Sounds like a still quite simple, but hopefully much faster solution. Do
you
agree?

sure that should be fine, if you can pre-calc your PKs. It just won’t
work
under any kind of concurrency, as in such a situation there could be
interleaved INSERTs from different processes.

Postgresql sequences already handle that kind of concurrency scenario.

how exactly, if two transactions T1 and T2 both pull a number from a
sequence, T1 gets 40, T2 gets 41, then we pre-calc 10 inserts that have not
yet occurred for each, T1 has 40-50, T2 has 41-51, the number is totally
wrong for both - in reality it would be some random distribution of 40-60
between T1 and T2. No ?

No, you ask for 10 ids to the same sequence, and the sequence
allocates T1 40-49, and T2 50-59

oh right I’m thinking of the trick as applied to MySQL.Sure, if you
actually run the seq’s you’re fine.

signature.asc
Description: Message signed with OpenPGP using GPGMail

[sqlalchemy] Speed up bulk inserts

2013-11-06 Thread Achim Domma

Hi,

I want to speed up my SqlAlchemy bulk inserting code and yes, I'm aware that 
this is not the main purpose of SqlAlchemy and all databases have faster low 
level import tools.

The background is the following: We import data from various sources and apply 
various mappings. Currently I'm willing to trade raw runtime for much simpler 
code, which is much easier to maintain. But I still want my code to run as fast 
as it's possible with those assumptions.

There are two scenarios which I want to optimize:

1. Flat inserts without relations, but with unique ids generated inside the 
database: In that case, SqlAlchemy retrieves the unique ids from the database, 
but those ids are never used in my import process. I thought about generating 
an insert statement out of an object. Obviously SqlAlchemy has to do that too, 
so there might be some existing tool for that? The other option would be, to 
tell SqlAlchemy to ignore the generated id an to not retrieve it from the 
database. Is that possible?

2. Inserts using relations and unique ids generated in the database: I think 
SqlAlchemy is already quite good at optimizing this and sending objects as 
batches. If there are any tweeks and tricks to speed up inserts having 
relations, I would be happy to hear them.

cheers,
Achim

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Speed up bulk inserts

2013-11-06 Thread Michael Bayer

I wrote a full post regarding this topic on stackoverflow at  
http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
 .  If you start with this, I can answer more specific questions.


On Nov 6, 2013, at 10:28 AM, Achim Domma do...@procoders.net wrote:

 Hi,
 
 I want to speed up my SqlAlchemy bulk inserting code and yes, I'm aware that 
 this is not the main purpose of SqlAlchemy and all databases have faster low 
 level import tools.
 
 The background is the following: We import data from various sources and 
 apply various mappings. Currently I'm willing to trade raw runtime for much 
 simpler code, which is much easier to maintain. But I still want my code to 
 run as fast as it's possible with those assumptions.
 
 There are two scenarios which I want to optimize:
 
 1. Flat inserts without relations, but with unique ids generated inside the 
 database: In that case, SqlAlchemy retrieves the unique ids from the 
 database, but those ids are never used in my import process. I thought about 
 generating an insert statement out of an object. Obviously SqlAlchemy has to 
 do that too, so there might be some existing tool for that? The other option 
 would be, to tell SqlAlchemy to ignore the generated id an to not retrieve it 
 from the database. Is that possible?
 
 2. Inserts using relations and unique ids generated in the database: I think 
 SqlAlchemy is already quite good at optimizing this and sending objects as 
 batches. If there are any tweeks and tricks to speed up inserts having 
 relations, I would be happy to hear them.
 
 cheers,
 Achim
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy.
 For more options, visit https://groups.google.com/groups/opt_out.



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [sqlalchemy] Speed up bulk inserts

Re: [sqlalchemy] Speed up bulk inserts

Re: [sqlalchemy] Speed up bulk inserts

Re: [sqlalchemy] Speed up bulk inserts

Re: [sqlalchemy] Speed up bulk inserts

Re: [sqlalchemy] Speed up bulk inserts

Re: [sqlalchemy] Speed up bulk inserts

Re: [sqlalchemy] Speed up bulk inserts

[sqlalchemy] Speed up bulk inserts

Re: [sqlalchemy] Speed up bulk inserts

10 matches

Site Navigation

Mail list logo

Footer information