Hi,
We have a so far (to us) unexplainable issue on our production systems after we 
roughly doubled the amount of data we import daily. We should be ok on pure 
theoretical hardware performance, but we are seeing some weird IO counters when 
the actual throughput of the writes is very low. The use case is as follows:  - 
typical DW - relatively constant periodic data loads - i.e. heavy write  - we 
receive large CSV files ~ 5-10Gb every 15 minutes spread out across 5-7 minutes 
 - Custom ETL scripts process and filter files within < 30 seconds down to 
about 5Gb CSV ready to load  - 2 loader queues load the files, picking off a 
file one-by-one  - tables are partitioned daily, indexed on a primary key + 
timestamp   - system is HP blade; 128Gb RAM, 2x 8-core, 12x 10k RPM RAID1+0 
(database) on first controller, 2x 15k RAID1 (xlog) on a different controller  
- DB size is ~2.5Tb; rotating load of 30 days keeps the database stable  - 
filesystem: zfs with lz4 compression  - raw throughput of the database disk is 
> 700Mbytes/sec sequential and >150Mbytes random for read and roughly half for 
write in various benchmarks  - CPU load is minimal when copy loads are taking 
place (i.e. after ETL has finished)
The issue is that the system is constantly checkpointing regardless of various 
kernel and postgres settings. Having read through most of the history of this 
list and most of the recommendations on various blogs, we have been unable to 
find an answer why the checkpoints are being written so slowly. Even when we 
disable all import processes or if index is dropped, the checkpoint is still 
taking > 1hour. Stats are pointing to checkpoint sizes of roughly 7Gb which 
should take < 1min even with full random reads; so even when imports are fully 
disabled, what is not making sense is why would the checkpointing be taking 
well over an hour?
One other thing that's noticed, but not measured, i.e. mostly anecdotal is that 
for a period of <1hr when postgres is restarted, the system performs mostly 
fine and checkpoints are completing in <5min; so it may be that after a while 
some (OS/postgres) buffers are filling up and causing this issue?
Full iostat/iotop, configuration, checkpoint stats, etc. are pasted below for 
completeness. Highlights 
are:checkpoint_segments=512shared_buffers=16GBcheckpoint_timeout=15mincheckpoint_completion_target=0.1
Regards,Steve
---Checkpoint stats:
db=# select * from pg_stat_bgwriter;
  checkpoints_timed   6 checkpoints_req  3 checkpoint_write_time  26346184 
checkpoint_sync_time   142 buffers_checkpoint   4227065 buffers_clean  4139841 
maxwritten_clean   8261 buffers_backend  9128583 buffers_backend_fsync  0 
buffers_alloc  9311478 stats_reset   2015-03-17 11:14:21.5649
---postgres log file - checkpoint log entries:
2015-03-17 11:25:25 LOG:  checkpoint complete: wrote 855754 buffers (40.8%); 0 
transaction log file(s) added, 0 removed, 500 recycled; write=2988.185 s, 
sync=0.044 s, total=2988.331 s; sync files=110, longest=0.003 s, average=0.000 
s2015-03-17 11:25:25 LOG:  checkpoint starting: xlog time2015-03-17 11:59:54 
LOG:  parameter "checkpoint_completion_target" changed to "0.9"2015-03-17 
13:30:20 LOG:  checkpoint complete: wrote 1012112 buffers (48.3%); 0 
transaction log file(s) added, 0 removed, 512 recycled; write=7494.228 s, 
sync=0.021 s, total=7494.371 s; sync files=119, longest=0.001 s, average=0.000 
s2015-03-17 13:30:20 LOG:  checkpoint starting: xlog time2015-03-17 14:21:53 
LOG:  parameter "checkpoint_completion_target" changed to "0.1"2015-03-17 
16:00:58 LOG:  checkpoint complete: wrote 1411979 buffers (67.3%); 0 
transaction log file(s) added, 696 removed, 900 recycled; write=9036.898 s, 
sync=0.020 s, total=9038.538 s; sync files=109, longest=0.000 s, average=0.000 
s2015-03-17 16:00:58 LOG:  checkpoint starting: time2015-03-17 16:28:40 LOG:  
checkpoint complete: wrote 345183 buffers (16.5%); 0 transaction log file(s) 
added, 2001 removed, 0 recycled; write=1660.333 s, sync=0.018 s, total=1661.816 
s; sync files=93, longest=0.002 s, average=0.000 s2015-03-17 17:28:40 LOG:  
checkpoint starting: time2015-03-17 18:54:47 LOG:  checkpoint complete: wrote 
602037 buffers (28.7%); 0 transaction log file(s) added, 0 removed, 500 
recycled; write=5166.540 s, sync=0.039 s, total=5166.657 s; sync files=122, 
longest=0.003 s, average=0.000 s2015-03-17 18:54:47 LOG:  checkpoint starting: 
xlog time
---iostat -x snapshot:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle           0.50    0.00  
  2.35   15.09    0.00   82.05
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %utilsda               0.00     0.00   
 0.00    5.00     0.00  2056.00   822.40     0.00    0.00    0.00    0.00   
0.00   0.00sdb               0.00     0.00 1055.00  549.00 41166.50 22840.00    
79.81     5.28    3.28    4.94    0.10   0.62 100.00
---vmstat 60 output
# vmstat 60procs -----------memory---------- ---swap-- -----io---- -system-- 
----cpu---- r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us 
sy id wa 5  3 877508 1251152  74476 98853728    0    0    87  1891    0    0  1 
 5 92  2 6  5 877508 915044  74940 99237840    0    0 46588 41857 6993 41784  8 
 4 76 12 2  4 877508 1676008  75292 98577936    0    0 46847 34540 4778 17175  
9  3 75 13
---sysctl settings for dirty pages
vm.dirty_background_bytes = 0vm.dirty_background_ratio = 5vm.dirty_bytes = 
0vm.dirty_expire_centisecs = 3000vm.dirty_ratio = 
10vm.dirty_writeback_centisecs = 500
---# free -m             total       used       free     shared    buffers     
cachedMem:        128905     126654       2250          0         70      
95035-/+ buffers/cache:      31549      97355Swap:        15255        856      
14399

---postgres settings: 
# cat postgresql.conf |grep checkcheckpoint_segments = 512       # in logfile 
segments, min 1, 16MB eachcheckpoint_timeout = 15min              # range 
30s-1hcheckpoint_completion_target = 0.1      # checkpoint target duration, 0.0 
- 1.0checkpoint_warning = 10min              # 0 disableslog_checkpoints = on
# cat postgresql.conf |egrep -e 'wal|arch|hot|lru|shared'shared_buffers = 
16384MBbgwriter_lru_maxpages = 500#bgwriter_lru_multiplier = 2.0          # 
0-10.0 multipler on buffers scanned/roundwal_level = 'archive'archive_mode = 
onarchive_command = 'cd .'   # we can also use exit 0, anything 
thatmax_wal_senders = 0wal_keep_segments = 500   hot_standby = off

---iotop snapshot:Total DISK READ:      41.63 M/s | Total DISK WRITE:      
31.43 M/s  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    
COMMAND32101 be/4 postgres   10.25 M/s 1085.86 K/s  0.00 % 96.80 % postgres: 
checkpointer process56661 be/4 postgres    6.84 M/s  591.61 K/s  0.00 % 90.91 % 
postgres: dbauser db [local] COPY56751 be/4 postgres    6.97 M/s  838.73 K/s  
0.00 % 88.00 % postgres: dbauser db [local] COPY56744 be/4 postgres    6.13 M/s 
 958.55 K/s  0.00 % 85.48 % postgres: dbauser db [local] COPY56621 be/4 
postgres    6.77 M/s 1288.05 K/s  0.00 % 83.96 % postgres: dbauser db [local] 
COPY32102 be/4 postgres    8.05 M/s 1340.47 K/s  0.00 % 82.47 % postgres: 
writer process 1005 be/0 root        0.00 B/s    0.00 B/s  0.00 %  5.81 % 
[txg_sync]32103 be/4 postgres    0.00 B/s   10.41 M/s  0.00 %  0.52 % postgres: 
wal writer process

---                                       

Reply via email to