Hi all,

while working on the patch decreasing amount of memory consumed by
array_agg [1], I've ran into some strange OOM issues. Reproducing them
using the attached SQL script is rather simple.

  [1] https://commitfest.postgresql.org/action/patch_view?id=1652

At first I thought there's some rare hidden memory leak, but I'm pretty
sure that's not the case. I've even put some explicit counters into
aset.c counting allocated/freed blocks, and it seems to be working just
fine (and matching the context tree perfectly). So no memory leak.

The only thing I can think of is some interaction (fragmentation?) with
the system allocator (I'm using glibc-2.19 and kernel 3.16.1, btw),
making the RSS values even less useful than I thought. Sadly, it seems
to trigger the OOM killer :-(

It's entirely possible that this is a known behavior of the allocator,
and I've been unaware of it. It's also true that the work_mem values are
really excessive, and the actual values would be much lower, which would
make the issues much less serious.

To demonstrate the problem I'll use the attached SQL script - it's a
rather simple script generating sample tables and then executing trivial
array_agg() queries.

The script sets work_mem to two different values:

  work_mem = '1GB'    ==>  this limits the INSERT (generating data)

  work_mem = '1024GB' ==>  bogus value, forcing a hash aggegate
                           (assuming  the hash table fits into memory)

The size of the sample tables (and the amount of memory needed for the
hash aggregate) is determined by the fist parameter set in the script.
With this value

  \set size 10000

I get a OOM crash on the first execution of the SQL script (on a machine
with 8GB of RAM and 512MB shared buffers), but YMMV.

The problem is, that even with a much smaller dataset (say, using size
7500) you'll get an OOM error after several executions of the script.
How many executions are needed seems to be inversely proportional to the
size of the data set.

The "problem" is that the RSS amount is increasing over time for some
reason. For example with the "size = 5000", the memory stats for the
process look like this over the first few minutes:

     VIRT    RES    SHR  %CPU %MEM     TIME+ COMMAND
  5045508 2,818g 187220  51,9 36,6   0:15.39 postgres: INSERT
  5045508 3,600g 214868  62,8 46,8   3:11.58 postgres: INSERT
  5045508 3,771g 214868  50,9 49,0   3:40.03 postgres: INSERT
  5045508 3,840g 214868  48,5 49,9   4:00.71 postgres: INSERT
  5045508 3,978g 214880  51,5 51,7   4:40.73 postgres: INSERT
  5045508 4,107g 214892  53,2 53,4   5:22.04 postgres: INSERT
  5045508 4,297g 215276  53,9 55,9   6:22.63 postgres: INSERT
  ...

Those are rows for the backend process, captured from "top" over time.
How long the backend was running is in the TIME column. Each iteration
takes ~30 seconds, so those lines represent approximately iterations 1,
6, 7, 8, 11, etc.

Notice how the RSS value grows over time, and also notice that this is
the INSERT, restricted by work_mem=1GB. So the memory consumption should
be ~1.5GB, and MemoryContextStats(TopMemoryContext) collected at this
point is consistent with that (see the mem-ctx.log).

And then the value stabilizes at ~4,430g and stops growing. With size
7500 it however takes only ~20 iterations to reach the OOM issue, with a
crash log like this:

[10560.843547] Killed process 15862 (postgres) total-vm:7198260kB,
anon-rss:6494136kB, file-rss:300436kB

So, any ideas what might be the culprit here?

As I said, this is clearly made worse by inappropriately high work_mem
values, but I'm not sure it's completely harmless. Imagine for example
long-running backends, executing complex queries with inaccurate
estimates. That may easily result in using much more memory than the
work_mem limit, and increasing the RSS value over time.

regards
Tomas

Attachment: array-agg.sql
Description: application/sql

TopMemoryContext: 136614192 total in 16678 blocks; 136005936 free (500017 chunks); 608256 used
  TopTransactionContext: 8192 total in 1 blocks; 7904 free (0 chunks); 288 used
  TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
  Attopt cache: 8192 total in 1 blocks; 2704 free (0 chunks); 5488 used
  Type information cache: 24240 total in 2 blocks; 3744 free (0 chunks); 20496 used
  Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
  MessageContext: 32768 total in 3 blocks; 9640 free (0 chunks); 23128 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  smgr relation table: 24576 total in 2 blocks; 9808 free (4 chunks); 14768 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
    PortalHeapMemory: 1024 total in 1 blocks; 848 free (0 chunks); 176 used
      ExecutorState: 318758960 total in 40 blocks; 5879664 free (8 chunks); 312879296 used
        Table function arguments: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 24576 total in 2 blocks; 13872 free (3 chunks); 10704 used
  CacheMemoryContext: 516096 total in 6 blocks; 23848 free (9 chunks); 492248 used
    pg_type_typname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_type_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_shdepend_reference_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_class_tblspc_relfilenode_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_index_indrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_seclabel_object_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_description_o_c_o_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_shdepend_depender_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_depend_reference_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_depend_depender_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    EventTriggerCache: 8192 total in 1 blocks; 8160 free (2 chunks); 32 used
      Event Trigger Cache: 8192 total in 1 blocks; 3744 free (0 chunks); 4448 used
    pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_user_mapping_user_server_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_user_mapping_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_template_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_template_tmplname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_parser_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_parser_prsname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_dict_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_dict_dictname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_config_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_config_cfgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_config_map_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_class_relname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_range_rngtypid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_proc_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_opfamily_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_operator_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_namespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_namespace_nspname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_language_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_language_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_table_relid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_server_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_server_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_event_trigger_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_event_trigger_evtname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_enum_typid_label_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_enum_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_conversion_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_constraint_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_conversion_name_nsp_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_conversion_default_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_collation_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_collation_name_enc_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_cast_source_target_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_amop_fam_strat_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_amop_opr_fam_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_aggregate_fnoid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_opclass_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_index_indexrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_class_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_authid_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_auth_members_member_role_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_tablespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_database_datname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_auth_members_role_member_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_database_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_authid_rolname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
  WAL record construction: 8192 total in 1 blocks; 6592 free (0 chunks); 1600 used
  PrivateRefCount: 8192 total in 1 blocks; 3744 free (0 chunks); 4448 used
  MdSmgr: 8192 total in 1 blocks; 7680 free (0 chunks); 512 used
  LOCALLOCK hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  ident parser context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  hba parser context: 3072 total in 2 blocks; 80 free (1 chunks); 2992 used
  Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (4 chunks); 32 used
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to