For the past few days, we've been seeing unexpected high CPU spikes in our system. We observed the following:
- every single CPU spike was preceded by low 'free' memory even though 'cached' is quite high - as soon as we shut down any of our applications which is occupying some DB connections (e.g., pgpool), the 'free' memory usage goes up and CPU load immediately drops (please see below) - we saw instances when the ‘free’ memory did reach low values but CPU remained OK I understand how running out of memory could cause various issues with the DB, but in this case, we had plenty of memory in the ‘cached’ portion. Why would CPU load go up when there's still plenty of room in the 'cached' memory? Here's the session: 04:58:37 up 31 days, 23:41, 0 users, load average: 2.37, 1.91, 1.68 total used free shared buffers cached Mem: 31720 31188 532 0 90 22852 (…) 05:00:37 up 31 days, 23:43, 1 user, load average: 5.51, 2.66, 1.95 total used free shared buffers cached Mem: 31720 31452 268 0 77 22267 (…) 05:00:58 up 31 days, 23:44, 1 user, load average: 21.44, 6.52, 3.24 total used free shared buffers cached Mem: 31720 31482 237 0 77 21704 (…) 05:01:18 up 31 days, 23:44, 1 user, load average: 42.98, 12.36, 5.22 total used free shared buffers cached Mem: 31720 31477 243 0 77 21061 (…) 05:01:38 up 31 days, 23:44, 1 user, load average: 63.38, 18.99, 7.56 total used free shared buffers cached Mem: 31720 31454 266 0 77 20410 (…) 05:03:20 up 31 days, 23:46, 1 user, load average: 110.10, 47.85, 19.07 total used free shared buffers cached Mem: 31720 31326 394 0 76 19290 At this point, pgpool and apache were shut down: 05:03:40 up 31 days, 23:46, 1 user, load average: 113.51, 52.66, 21.26 total used free shared buffers cached Mem: 31720 29835 1885 0 76 19291 (…) 05:04:00 up 31 days, 23:47, 1 user, load average: 82.49, 49.53, 20.90 total used free shared buffers cached Mem: 31720 26082 5638 0 76 19300 (…) 05:04:20 up 31 days, 23:47, 1 user, load average: 60.37, 46.62, 20.56 total used free shared buffers cached Mem: 31720 24701 7019 0 76 19311 (…) 05:04:40 up 31 days, 23:47, 1 user, load average: 43.63, 43.70, 20.15 total used free shared buffers cached Mem: 31720 24797 6923 0 76 19320 (…) 05:05:00 up 31 days, 23:48, 1 user, load average: 31.70, 40.96, 19.75 total used free shared buffers cached Mem: 31720 24947 6773 0 76 19326 (…) 05:05:20 up 31 days, 23:48, 1 user, load average: 23.12, 38.41, 19.36 total used free shared buffers cached Mem: 31720 25036 6684 0 76 19334 (…) 05:05:40 up 31 days, 23:48, 1 user, load average: 17.12, 36.05, 18.99 total used free shared buffers cached Mem: 31720 25197 6523 0 76 19340 (…) 05:06:00 up 31 days, 23:49, 1 user, load average: 12.84, 33.84, 18.63 total used free shared buffers cached Mem: 31720 25316 6404 0 76 19367 (…) 05:06:20 up 31 days, 23:49, 1 user, load average: 9.85, 31.80, 18.28 total used free shared buffers cached Mem: 31720 24728 6992 0 76 18839 (…) 05:06:40 up 31 days, 23:49, 1 user, load average: 7.61, 29.86, 17.93 total used free shared buffers cached Mem: 31720 24835 6885 0 76 18847 (…) 05:07:00 up 31 days, 23:50, 1 user, load average: 5.74, 27.99, 17.57 total used free shared buffers cached Mem: 31720 24971 6749 0 76 18852 (…) 05:07:20 up 31 days, 23:50, 1 user, load average: 4.48, 26.26, 17.22 total used free shared buffers cached Mem: 31720 25133 6587 0 76 18861 (…) 05:07:40 up 31 days, 23:50, 2 users, load average: 3.83, 24.70, 16.90 total used free shared buffers cached Mem: 31720 25351 6369 0 76 18872 (…) 05:08:00 up 31 days, 23:51, 2 users, load average: 3.10, 23.18, 16.56 total used free shared buffers cached Mem: 31720 25334 6385 0 76 18879 (…) 05:08:20 up 31 days, 23:51, 2 users, load average: 2.52, 21.75, 16.23 total used free shared buffers cached Mem: 31720 25362 6358 0 77 18884 Here are the pertinent machine and OS and Postgres details: PostgreSQL 9.1.11 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit Linux ps2db 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux postgres=# SELECT name, current_setting(name), source postgres-# FROM pg_settings postgres-# WHERE source NOT IN ('default', 'override'); name | current_setting | source ------------------------------+-------------------------------+---------------------- application_name | psql | client archive_command | /bin/true | configuration file archive_mode | on | configuration file autovacuum_analyze_threshold | 50 | configuration file autovacuum_freeze_max_age | 800000000 | configuration file autovacuum_naptime | 5min | configuration file autovacuum_vacuum_threshold | 50 | configuration file bytea_output | escape | configuration file checkpoint_completion_target | 0.7 | configuration file checkpoint_segments | 128 | configuration file checkpoint_timeout | 15min | configuration file checkpoint_warning | 30s | configuration file client_encoding | UTF8 | client constraint_exclusion | partition | configuration file cpu_index_tuple_cost | 0.005 | configuration file cpu_operator_cost | 0.0025 | configuration file cpu_tuple_cost | 0.01 | configuration file custom_variable_classes | pg_stat_statements | configuration file DateStyle | ISO, MDY | configuration file default_statistics_target | 100 | configuration file default_text_search_config | pg_catalog.english | configuration file effective_cache_size | 16GB | configuration file effective_io_concurrency | 1 | configuration file enable_material | off | configuration file escape_string_warning | on | configuration file hot_standby | on | configuration file lc_messages | C | configuration file lc_monetary | en_US.UTF-8 | configuration file lc_numeric | en_US.UTF-8 | configuration file lc_time | en_US.UTF-8 | configuration file listen_addresses | * | configuration file log_autovacuum_min_duration | 0 | configuration file log_checkpoints | on | configuration file log_connections | on | configuration file log_destination | csvlog | configuration file log_directory | pg_log | configuration file log_disconnections | on | configuration file log_filename | postgresql.log.ps2db.%H | configuration file log_line_prefix | %t [%d] [%u] [%p]: [%l-1] %h | configuration file log_lock_waits | on | configuration file log_min_duration_statement | 0 | configuration file log_rotation_age | 1h | configuration file log_temp_files | 0 | configuration file log_timezone | Canada/Pacific | environment variable log_truncate_on_rotation | on | configuration file logging_collector | on | configuration file maintenance_work_mem | 1GB | configuration file max_connections | 500 | configuration file max_locks_per_transaction | 512 | configuration file max_stack_depth | 2MB | environment variable max_standby_streaming_delay | 90min | configuration file max_wal_senders | 6 | configuration file pg_stat_statements.max | 10000 | configuration file pg_stat_statements.track | all | configuration file port | 5432 | configuration file random_page_cost | 4 | configuration file shared_buffers | 6GB | configuration file shared_preload_libraries | pg_stat_statements | configuration file standard_conforming_strings | off | configuration file stats_temp_directory | /ram_postgres_stats | configuration file temp_buffers | 16MB | configuration file TimeZone | Canada/Pacific | environment variable wal_keep_segments | 64 | configuration file wal_level | hot_standby | configuration file work_mem | 8MB | configuration file (65 rows) RAM: 32 gigs CPU: 24 cores; Intel(R) Xeon(R) CPU X7460 @ 2.66GHz RAID 10