I’m running PostgreSQL 9.5.4 on a virtual machine for production purposes. It
runs Ubuntu 16.04.1 LTS 64bit, 32GB RAM, 461GB disk space and 4 x logical CPUs.
Postgres executes the following activities:
- many INSERTS for ETL
- a lot of read and write operations for the main OLTP application
The ETL job is still under development, so I’m launching several sequential
“tries” in order to get the whole thing working. The ETL procedure consists of
a lot of inserts packed inside transactions. At the moment each transaction
consists of 100k inserts, so for a 90mln rows table I get 90mln inserts packed
in 900 transactions. I know it’s not the best, but JDBC drivers combined with
Pentaho doesn’t seem to pack more inserts into one, so I get a lot of overhead.
I can see INSERT, BIND and PARSE called for each insert.. I think it’s Pentaho
which embeds the INSERT in a parametric query.. I hate Pentaho.. anyway..
The ETL procedure does the following:
1) DROP SCHEMA IF EXISTS data_schema CASCADE;
2) creates the “data_schema” schema and populates it with tables and rows using
INSERTs as described before;
3) if an error occurs, drop the schema
I’m repeating the previous steps many times because of some Pentaho errors
which the team is working on in order to get it working. This stresses the WAL
because the interruption of the process interrupts the current transaction and
is followed by a DROP SCHEMA .. CASCADE.
After few days since we began debugging the ETL elaboration, the disk filled up
and the last ETL job was automatically aborted. Note that the DB data directory
is located on the same root disk at /var/lib/postgresql/9.5/main
What shocked me was that the data directory of Postgres was just 815MB in size
($ du -h /var/lib/postgresql/9.5/main ) and pg_xlog was 705MB, but the entire
disk was full ("df -h" returned a disk usage of 100%).
I looked for any postgres activity and only noticed a checkpoint writer process
that was writing at low speeds (IO usage was about 5%).
Also, "SELECT * FROM pg_stat_activity" returned nothing and the most shocking
part was that the "du -h /“ command returned 56GB as the total size of files
stored on the whole disk!!! The same was for “du -ha /“, which returns the
apparent size.
The total disk size is 461GB, so how is it possible that “df -h” resulted in
461GB occupied while “du -h /“ returned just 56GB?
After executing:
$ service postgresql stop
$ service postgresql start
the disk was freed and “df -h” returned a usage of just 16%!
The other questions are:
- how can I prevent the disk from filling up? I’m using the default
configuration for the WAL (1GB max size).
- how can I tune Postgres to speed up the INSERTs?
The actual configuration is the following:
listen_addresses = 'localhost'
max_connections = 32
shared_buffers = 16GB
work_mem = 128MB
maintenance_work_mem = 512MB
effective_io_concurrency = 10
checkpoint_completion_target = 0.9
cpu_tuple_cost = 0.02
cpu_index_tuple_cost = 0.01
cpu_operator_cost = 0.005
effective_cache_size = 24GB
default_statistics_target = 1000
May be that some of these parameters causes this strange behavior?
checkpoint_completion_target?
Thanks to everyone for the support.
Best regards,
Pietro Pugni