Re: [sqlalchemy] Large RAM usage in bulk_insert_mappings

2019-11-15 Thread Jonathan Vanasco


On Friday, November 15, 2019 at 4:03:44 AM UTC-5, Elmer de Looff wrote: 
>
> That said, I've run into the same problem with a little toy project, which 
> works around this with a 'bulk save' interface. 
>

FWIW on something related... In my experience, if you need to focus on 
speed for bulk inserts / migrations, one of the fastest methods I've used 
is to have several Python scripts working in parallel. You can use a shared 
resource like a sqlite3 file or Redis to coordinate the workers acting 
together. Usually. I'll have workers "claim" a range of 10k items with 
Redis (if we're doing a migration, then I'll use the Windowed Ranged Query 
recipe - https://github.com/sqlalchemy/sqlalchemy/wiki/WindowedRangeQuery )

it takes a little bit of extra work to determine the right number of 
workers to spin up, but parallel workers will usually make a migration or 
bulk insert task 5-15x faster for me.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/b6b20b34-eb55-416c-8a36-d1aa94ef82aa%40googlegroups.com.


Re: [sqlalchemy] Large RAM usage in bulk_insert_mappings

2019-11-15 Thread Elmer de Looff
I'm not even sure the problem is with the batch insert function itself,
creating half a million dicts in Python is going to cause you a bit of a
bad time. That said, I've run into the same problem with a little toy
project, which works around this with a 'bulk save' interface. With a
minimal change you get to create the object/mapping to insert at the time
you need it, push it into the bulk saver, and it will flush every time it
hits a threshold. The actual threshold to use will depend a bit on your
database and driver, and I imagine connection latency plays an important
part as well.

Anyway, use what you can:
https://github.com/edelooff/smallville/blob/master/scripts/seed.py#L27

On Thu, Nov 14, 2019 at 8:42 PM James Fennell 
wrote:

> Because the memory spike was so bad (the application usually runs at 250mb
> RAM, and it went up to a GB during this process), I was able to find the
> problem by running htop and using print statements to discover where in the
> execution the Python code was when the RAM spike happened.
>
> I unfortunately don't have any advice on actual good tools for tracking
> RAM usage in Python programs but would to leave to hear if others do.
>
>
>
> On Thu, Nov 14, 2019 at 12:41 PM Soumaya Mauthoor <
> soumaya.mauth...@gmail.com> wrote:
>
>> What did you use to profile memory usage? I've recently been
>> investigating memory usage when loading data using memory_profiler and
>> would be interested to find out about the best approach
>>
>> On Thu, 14 Nov 2019, 17:16 James Fennell, 
>> wrote:
>>
>>> Hi all,
>>>
>>> Just sharing some perf insights into the bulk operation function
>>> bulk_insert_mappings.
>>>
>>> I was recently debugging a SQL Alchemy powered web app that was crashing
>>> due to out of memory issues on a small Kubernetes node. It turned out to be
>>> "caused" by an over optimistic invocation of bulk_insert_mappings.
>>> Basically I'm reading a CSV file with ~500,000 entries into a list of
>>> dictionaries, and then passing it into the bulk_insert_mappings function at
>>> once. It turned out the SQL Alchemy work was using 750mb of RAM, which was
>>> enough to OOM the small node the web app was running on.
>>>
>>> A simple workaround is to split the list of 500,000 entries into chunks
>>> of 1000 entries each, and then call bulk_insert_mappings on each chunk.
>>> When I do this, the extra memory usage is not even noticeable. But also, it
>>> seems that this chunked approach is actually faster! I might benchmark that
>>> to quantify that.
>>>
>>> Thought it was interesting. I wonder would it be worth adding to the
>>> docs on bulk_insert_mappings? Given that function is motivated by
>>> performance, it seems it might be relevant.
>>>
>>> James
>>>
>>>
>>>
>>>
>>> --
>>> SQLAlchemy -
>>> The Python SQL Toolkit and Object Relational Mapper
>>>
>>> http://www.sqlalchemy.org/
>>>
>>> To post example code, please provide an MCVE: Minimal, Complete, and
>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>>> description.
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to sqlalchemy+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/sqlalchemy/CALDF6i1sx_QmW1v1jvopT-iWZqmSGmJ7JpJi10egNnbE0D7FMQ%40mail.gmail.com
>>> 
>>> .
>>>
>> --
>> SQLAlchemy -
>> The Python SQL Toolkit and Object Relational Mapper
>>
>> http://www.sqlalchemy.org/
>>
>> To post example code, please provide an MCVE: Minimal, Complete, and
>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> description.
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to sqlalchemy+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/sqlalchemy/CAN14jWTdFQ7tSyhr71H_wCX_JXT58S40Q4MN7qte6pE2N-%2BOLw%40mail.gmail.com
>> 
>> .
>>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> 

Re: [sqlalchemy] Large RAM usage in bulk_insert_mappings

2019-11-14 Thread James Fennell
Because the memory spike was so bad (the application usually runs at 250mb
RAM, and it went up to a GB during this process), I was able to find the
problem by running htop and using print statements to discover where in the
execution the Python code was when the RAM spike happened.

I unfortunately don't have any advice on actual good tools for tracking RAM
usage in Python programs but would to leave to hear if others do.



On Thu, Nov 14, 2019 at 12:41 PM Soumaya Mauthoor <
soumaya.mauth...@gmail.com> wrote:

> What did you use to profile memory usage? I've recently been investigating
> memory usage when loading data using memory_profiler and would be
> interested to find out about the best approach
>
> On Thu, 14 Nov 2019, 17:16 James Fennell,  wrote:
>
>> Hi all,
>>
>> Just sharing some perf insights into the bulk operation function
>> bulk_insert_mappings.
>>
>> I was recently debugging a SQL Alchemy powered web app that was crashing
>> due to out of memory issues on a small Kubernetes node. It turned out to be
>> "caused" by an over optimistic invocation of bulk_insert_mappings.
>> Basically I'm reading a CSV file with ~500,000 entries into a list of
>> dictionaries, and then passing it into the bulk_insert_mappings function at
>> once. It turned out the SQL Alchemy work was using 750mb of RAM, which was
>> enough to OOM the small node the web app was running on.
>>
>> A simple workaround is to split the list of 500,000 entries into chunks
>> of 1000 entries each, and then call bulk_insert_mappings on each chunk.
>> When I do this, the extra memory usage is not even noticeable. But also, it
>> seems that this chunked approach is actually faster! I might benchmark that
>> to quantify that.
>>
>> Thought it was interesting. I wonder would it be worth adding to the docs
>> on bulk_insert_mappings? Given that function is motivated by performance,
>> it seems it might be relevant.
>>
>> James
>>
>>
>>
>>
>> --
>> SQLAlchemy -
>> The Python SQL Toolkit and Object Relational Mapper
>>
>> http://www.sqlalchemy.org/
>>
>> To post example code, please provide an MCVE: Minimal, Complete, and
>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> description.
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to sqlalchemy+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/sqlalchemy/CALDF6i1sx_QmW1v1jvopT-iWZqmSGmJ7JpJi10egNnbE0D7FMQ%40mail.gmail.com
>> 
>> .
>>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sqlalchemy/CAN14jWTdFQ7tSyhr71H_wCX_JXT58S40Q4MN7qte6pE2N-%2BOLw%40mail.gmail.com
> 
> .
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/CALDF6i3KeKteJ8aV5XHM-g%2BMsAZdRoAnGqLfCxVETfj8vsDTSg%40mail.gmail.com.


Re: [sqlalchemy] Large RAM usage in bulk_insert_mappings

2019-11-14 Thread Soumaya Mauthoor
What did you use to profile memory usage? I've recently been investigating
memory usage when loading data using memory_profiler and would be
interested to find out about the best approach

On Thu, 14 Nov 2019, 17:16 James Fennell,  wrote:

> Hi all,
>
> Just sharing some perf insights into the bulk operation function
> bulk_insert_mappings.
>
> I was recently debugging a SQL Alchemy powered web app that was crashing
> due to out of memory issues on a small Kubernetes node. It turned out to be
> "caused" by an over optimistic invocation of bulk_insert_mappings.
> Basically I'm reading a CSV file with ~500,000 entries into a list of
> dictionaries, and then passing it into the bulk_insert_mappings function at
> once. It turned out the SQL Alchemy work was using 750mb of RAM, which was
> enough to OOM the small node the web app was running on.
>
> A simple workaround is to split the list of 500,000 entries into chunks of
> 1000 entries each, and then call bulk_insert_mappings on each chunk. When I
> do this, the extra memory usage is not even noticeable. But also, it seems
> that this chunked approach is actually faster! I might benchmark that to
> quantify that.
>
> Thought it was interesting. I wonder would it be worth adding to the docs
> on bulk_insert_mappings? Given that function is motivated by performance,
> it seems it might be relevant.
>
> James
>
>
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sqlalchemy/CALDF6i1sx_QmW1v1jvopT-iWZqmSGmJ7JpJi10egNnbE0D7FMQ%40mail.gmail.com
> 
> .
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/CAN14jWTdFQ7tSyhr71H_wCX_JXT58S40Q4MN7qte6pE2N-%2BOLw%40mail.gmail.com.


[sqlalchemy] Large RAM usage in bulk_insert_mappings

2019-11-14 Thread James Fennell
Hi all,

Just sharing some perf insights into the bulk operation function
bulk_insert_mappings.

I was recently debugging a SQL Alchemy powered web app that was crashing
due to out of memory issues on a small Kubernetes node. It turned out to be
"caused" by an over optimistic invocation of bulk_insert_mappings.
Basically I'm reading a CSV file with ~500,000 entries into a list of
dictionaries, and then passing it into the bulk_insert_mappings function at
once. It turned out the SQL Alchemy work was using 750mb of RAM, which was
enough to OOM the small node the web app was running on.

A simple workaround is to split the list of 500,000 entries into chunks of
1000 entries each, and then call bulk_insert_mappings on each chunk. When I
do this, the extra memory usage is not even noticeable. But also, it seems
that this chunked approach is actually faster! I might benchmark that to
quantify that.

Thought it was interesting. I wonder would it be worth adding to the docs
on bulk_insert_mappings? Given that function is motivated by performance,
it seems it might be relevant.

James

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/CALDF6i1sx_QmW1v1jvopT-iWZqmSGmJ7JpJi10egNnbE0D7FMQ%40mail.gmail.com.