Re: [sqlalchemy] How can I use bulk_save_objects when I know the primary key?

2020-03-12 Thread Mike Bayer


On Thu, Mar 12, 2020, at 9:51 PM, James Fennell wrote:
> Hey all! I'm having a problem using the bulk_save_objects function.
> **
> *Background*: I'm working on an application that updates state in a database 
> from an external feed. The process is: I read the feed, convert the data into 
> SQLAlchemy objects which are detached from the session and then call 
> session.merge on each one. The merge can result in an insert or an update, 
> depending on if the entity has been seen before.
> 
> Given the number of objects I'm working with, this has turned out to be very 
> CPU intensive in production. I've profiled it, and most of the work is in the 
> merge operation. So, I want to use bulk_save_objects to speed things up and 
> in my case the tradeoffs (lack of cascading etc.) are 100% worth it.
> 
> *Problem*: I can't get bulk_save_objects to do updates; no matter what, it 
> tries to insert and this results in primary key constraint failures:
> 
>> sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate 
>> key value violates unique constraint "trip_pkey"
> 
> Having read the documentation, I'm pretty sure the problem relates to this 
> comment in the docs:
> 
>> For each object, whether the object is sent as an INSERT or an UPDATE is 
>> dependent on the same rules used by the `Session` 
>> 
>>  in traditional operation; if the object has the `InstanceState.key` 
>> attribute set, then the object is assumed to be “detached” and will result 
>> in an UPDATE. Otherwise, an INSERT is used.
> 
> In all cases I'm not playing with the instance state. I'm essentially 
> manually stamping primary keys on detached objects, so I'm guessing 
> SQLAlchemy thinks it needs to insert? Any suggestions for how I can proceed?

you could try using bulk_update_mappings 
(https://docs.sqlalchemy.org/en/13/orm/session_api.html?highlight=bulk_update_mappings#sqlalchemy.orm.session.Session.bulk_update_mappings)
 so that you don't have to play with object state at all. Otherwise, use 
make_transient_to_detached 
(https://docs.sqlalchemy.org/en/13/orm/session_api.html?highlight=bulk_update_mappings#sqlalchemy.orm.session.make_transient_to_detached)
 to give them a key. you can see that the key is there by using 
inspect(object).key. However, if you're looking for speed, using the objects 
and making keys and such is all overhead that you wouldn't get with 
bulk_update_mappings.



> 
> Thanks!
> James
> 

> --
>  SQLAlchemy - 
>  The Python SQL Toolkit and Object Relational Mapper
> 
> http://www.sqlalchemy.org/
> 
>  To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
>  --- 
>  You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
>  To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
>  To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/CALDF6i2Mpw6EQk-q_%2Br2dm3%3Dxz0Mv5Sb_yCeMktq%3DgBf4uQYsA%40mail.gmail.com
>  
> .

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/142980bd-f809-4baa-85d0-81a9e24ef5f2%40www.fastmail.com.


[sqlalchemy] How can I use bulk_save_objects when I know the primary key?

2020-03-12 Thread James Fennell
Hey all! I'm having a problem using the bulk_save_objects function.

*Background*: I'm working on an application that updates state in a
database from an external feed. The process is: I read the feed, convert
the data into SQLAlchemy objects which are detached from the session and
then call session.merge on each one. The merge can result in an insert or
an update, depending on if the entity has been seen before.

Given the number of objects I'm working with, this has turned out to be
very CPU intensive in production. I've profiled it, and most of the work is
in the merge operation. So, I want to use bulk_save_objects to speed things
up and in my case the tradeoffs (lack of cascading etc.) are 100% worth it.

*Problem*: I can't get bulk_save_objects to do updates; no matter what, it
tries to insert and this results in primary key constraint failures:

sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate
key value violates unique constraint "trip_pkey"


Having read the documentation, I'm pretty sure the problem relates to this
comment in the docs:

For each object, whether the object is sent as an INSERT or an UPDATE is
dependent on the same rules used by the Session

in
traditional operation; if the object has the InstanceState.key attribute
set, then the object is assumed to be “detached” and will result in an
UPDATE. Otherwise, an INSERT is used.


In all cases I'm not playing with the instance state. I'm essentially
manually stamping primary keys on detached objects, so I'm guessing
SQLAlchemy thinks it needs to insert? Any suggestions for how I can proceed?

Thanks!
James

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/CALDF6i2Mpw6EQk-q_%2Br2dm3%3Dxz0Mv5Sb_yCeMktq%3DgBf4uQYsA%40mail.gmail.com.