Re: [PERFORM] Large insert and delete batches

2012-03-01 Thread Kääriäinen Anssi
Quoting myself:

So, is there some common wisdom about the batch sizes? Or is it better
to do the inserts and deletes in just one batch? I think the case for
performance problems needs to be strong before default limits are
considered for PostgreSQL.


I did a little test about this. My test was to see if there is any interesting 
difference
in performance between doing queries in small batches vs doing them in one go.

The test setup is simple: one table with an integer primary key containing a 
million rows.
The queries are select * from the_table where id = ANY(ARRAY[list_of_numbers])
and the similar delete, too.

For any sane amount of numbers in the list, the result is that doing the 
queries in smaller
batches might be a little faster, but nothing conclusive found. However, once 
you go into
millions of items in the list, the query will OOM my Postgres server. With 
million items
in the list the process uses around 700MB of memory, 2 million items is 1.4GB, 
and beyond
that it is an OOM condition. The problem seems to be the array which takes all 
the memory.
So, you can OOM the server by doing SELECT 
ARRAY[large_enough_list_of_numbers].

Conclusion: as long as you are not doing anything really stupid it seems that 
there isn't any important
performance reasons to split the bulk queries into smaller batches.

For inserts the conclusion is similar. A lot of memory is used if you go to the 
millions of items range,
but otherwise it seems it doesn't matter if you do many smaller batches versus 
one larger batch.

 - Anssi
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] problems with set_config, work_mem, maintenance_work_mem, and sorting

2012-02-28 Thread Kääriäinen Anssi
Quoting Jon Nelson:

The config shows 128MB for work_mem and 2GB for maintenance_work_mem.
Why does PostgreSQL /sometimes/ use the globally-configured values and
sometimes use the values that come from the connection?
Am I wrong in misunderstanding what 'session' variables are? I thought
that session (versus transaction) config items were set for /all/
transactions in a given backend, until changed or until that backend
terminates. Is that not so?


Could it be that the transaction which does the set_config is rolled back? If 
that is
the case, the set_config is rolled back, too. However, if the transaction 
commits,
then the set_config should be in effect for the whole session. It seems this is 
not
documented at all for set_config, just for SET SQL command.

I think it would be nice to have a way to force the connection to use the 
provided
settings even if the transaction in which they are done is rolled back. In 
single statement
if possible. Otherwise you might be forced to do a transaction just to be sure 
the SET
is actually in effect for the connection's life-time.

Django was bitten by this for example, it is now fixed by using this:
https://github.com/django/django/blob/master/django/db/backends/postgresql_psycopg2/base.py#L189

 - Anssi
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance