Probably the result of vacuum freeze manual execution will give some more information: Table size: appdbname3=# \dt+ appschemaname.applications List of relations Schema | Name | Type | Owner | Size | Description ---------------+--------------+-------+---------------+----------+------------- appschemaname | applications | table | appschemaname | 10012 MB |
Each index size: tablename | indexname | index_size | unique | --------------+------------------------------------------------------------+------------+--------+ applications | chk_invariant_1_apps2 | 8192 bytes | Y | applications | applications2_pkey | 1645 MB | Y | applications | idx_applications2_agent_id | 2982 MB | N | applications | idx_applications2_context_id | 2708 MB | N | applications | idx_applications2_context_tenant_id | 1560 MB | N | applications | idx_applications2_context_type | 4963 MB | N | applications | idx_applications2_deleted_at | 3264 MB | N | applications | idx_applications2_enabled | 4975 MB | N | applications | idx_applications2_euc_index | 1684 MB | N | applications | idx_applications2_policy_id | 3131 MB | N | applications | idx_applications2_policy_type | 16 GB | N | applications | idx_applications2_root_policy_id | 1574 MB | N | applications | idx_applications2_status | 5391 MB | N | applications | idx_applications2_status_by_policy_fields_context_tenant_i | 13 MB | N | applications | idx_applications2_status_by_policy_fields_tenant_id | 15 MB | N | applications | idx_applications2_tenant_id | 1670 MB | N | applications | idx_applications_created_at | 2855 MB | N | applications | tmp_idx_applications2_updated_at2 | 3505 MB | N | applications | uq_policy_id_context2 | 1398 MB | Y | And vacuum freeze output: appdbname3=# VACUUM (FREEZE, VERBOSE) appschemaname.applications; INFO: aggressively vacuuming "appschemaname.applications" INFO: scanned index "applications2_pkey" to remove 10488 row versions DETAIL: CPU: user: 2.45 s, system: 1.07 s, elapsed: 5.16 s INFO: scanned index "tmp_idx_applications2_updated_at2" to remove 10488 row versions DETAIL: CPU: user: 2.53 s, system: 2.86 s, elapsed: 11.56 s INFO: scanned index "idx_applications2_agent_id" to remove 10488 row versions DETAIL: CPU: user: 3.16 s, system: 2.36 s, elapsed: 11.04 s INFO: scanned index "idx_applications2_context_id" to remove 10488 row versions DETAIL: CPU: user: 2.22 s, system: 1.95 s, elapsed: 8.06 s INFO: scanned index "idx_applications2_context_tenant_id" to remove 10488 row versions DETAIL: CPU: user: 1.86 s, system: 1.16 s, elapsed: 5.10 s INFO: scanned index "idx_applications2_context_type" to remove 10488 row versions DETAIL: CPU: user: 190.09 s, system: 6.48 s, elapsed: 220.88 s INFO: scanned index "idx_applications2_deleted_at" to remove 10488 row versions DETAIL: CPU: user: 67.93 s, system: 3.41 s, elapsed: 82.75 s INFO: scanned index "idx_applications2_enabled" to remove 10488 row versions DETAIL: CPU: user: 78.22 s, system: 4.74 s, elapsed: 96.96 s INFO: scanned index "idx_applications2_policy_id" to remove 10488 row versions DETAIL: CPU: user: 2.99 s, system: 1.65 s, elapsed: 5.32 s INFO: scanned index "idx_applications2_policy_type" to remove 10488 row versions DETAIL: CPU: user: 300.45 s, system: 16.05 s, elapsed: 365.80 s INFO: scanned index "idx_applications2_root_policy_id" to remove 10488 row versions DETAIL: CPU: user: 2.13 s, system: 1.20 s, elapsed: 6.16 s INFO: scanned index "idx_applications2_status" to remove 10488 row versions DETAIL: CPU: user: 153.79 s, system: 6.57 s, elapsed: 185.76 s INFO: scanned index "idx_applications2_status_by_policy_fields_context_tenant_i" to remove 10488 row versions DETAIL: CPU: user: 0.02 s, system: 0.01 s, elapsed: 0.06 s INFO: scanned index "idx_applications2_status_by_policy_fields_tenant_id" to remove 10488 row versions DETAIL: CPU: user: 0.03 s, system: 0.00 s, elapsed: 0.05 s INFO: scanned index "idx_applications2_tenant_id" to remove 10488 row versions DETAIL: CPU: user: 2.11 s, system: 1.23 s, elapsed: 6.19 s INFO: scanned index "idx_applications2_euc_index" to remove 10488 row versions DETAIL: CPU: user: 2.05 s, system: 1.27 s, elapsed: 6.05 s INFO: scanned index "chk_invariant_1_apps2" to remove 10488 row versions DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s INFO: scanned index "uq_policy_id_context2" to remove 10488 row versions DETAIL: CPU: user: 1.24 s, system: 1.00 s, elapsed: 4.40 s INFO: scanned index "idx_applications_created_at" to remove 10488 row versions DETAIL: CPU: user: 2.32 s, system: 1.97 s, elapsed: 8.46 s INFO: "applications": removed 10488 row versions in 7580 pages DETAIL: CPU: user: 0.86 s, system: 0.08 s, elapsed: 1.03 s INFO: index "applications2_pkey" now contains 17803283 row versions in 210561 pages DETAIL: 0 index row versions were removed. 59 index pages have been deleted, 59 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "tmp_idx_applications2_updated_at2" now contains 17803283 row versions in 448580 pages DETAIL: 0 index row versions were removed. 19542 index pages have been deleted, 19542 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_agent_id" now contains 17803283 row versions in 381741 pages DETAIL: 0 index row versions were removed. 61237 index pages have been deleted, 28239 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_context_id" now contains 17803283 row versions in 346594 pages DETAIL: 0 index row versions were removed. 57308 index pages have been deleted, 24105 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_context_tenant_id" now contains 17803283 row versions in 199743 pages DETAIL: 0 index row versions were removed. 32830 index pages have been deleted, 14767 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_context_type" now contains 17803283 row versions in 635317 pages DETAIL: 0 index row versions were removed. 43331 index pages have been deleted, 41233 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_deleted_at" now contains 17803283 row versions in 417770 pages DETAIL: 0 index row versions were removed. 15695 index pages have been deleted, 15381 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_enabled" now contains 17803283 row versions in 636750 pages DETAIL: 0 index row versions were removed. 26332 index pages have been deleted, 26037 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_policy_id" now contains 17803283 row versions in 400718 pages DETAIL: 0 index row versions were removed. 55007 index pages have been deleted, 29671 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_policy_type" now contains 17803283 row versions in 2084261 pages DETAIL: 7990 index row versions were removed. 56012 index pages have been deleted, 55365 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_root_policy_id" now contains 5015730 row versions in 201441 pages DETAIL: 5807 index row versions were removed. 22964 index pages have been deleted, 17915 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_status" now contains 17803283 row versions in 690063 pages DETAIL: 10488 index row versions were removed. 31401 index pages have been deleted, 31113 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_status_by_policy_fields_context_tenant_i" now contains 155059 row versions in 1707 pages DETAIL: 223 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_status_by_policy_fields_tenant_id" now contains 155059 row versions in 1883 pages DETAIL: 223 index row versions were removed. 1 index pages have been deleted, 1 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_tenant_id" now contains 17803283 row versions in 213798 pages DETAIL: 10488 index row versions were removed. 33323 index pages have been deleted, 16730 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications2_euc_index" now contains 17803283 row versions in 215570 pages DETAIL: 10488 index row versions were removed. 32261 index pages have been deleted, 16643 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "chk_invariant_1_apps2" now contains 0 row versions in 1 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "uq_policy_id_context2" now contains 5015730 row versions in 178953 pages DETAIL: 5807 index row versions were removed. 23793 index pages have been deleted, 13898 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_applications_created_at" now contains 17803283 row versions in 365479 pages DETAIL: 10488 index row versions were removed. 19020 index pages have been deleted, 18938 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: "applications": found 0 removable, 471321 nonremovable row versions in 30772 out of 1278613 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 4177172255 There were 1490846 unused item pointers. Skipped 0 pages due to buffer pins, 1247841 frozen pages. 0 pages are entirely empty. CPU: user: 816.65 s, system: 55.37 s, elapsed: 1031.96 s. INFO: aggressively vacuuming "pg_toast.pg_toast_35599271" INFO: "pg_toast_35599271": found 0 removable, 2 nonremovable row versions in 1 out of 2328 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 4178183844 There were 2 unused item pointers. Skipped 0 pages due to buffer pins, 2327 frozen pages. 0 pages are entirely empty. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. VACUUM What catches my eye: scanning indexes smaller than 3.1GB is fast, larger ones are slow. For example: idx_applications2_policy_id is 3131 MB took just 5 seconds (DETAIL: CPU: user: 2.99 s, system: 1.65 s, elapsed: 5.32 s) but idx_applications2_deleted_at with 3264 MB took 1 minute 22 seconds (DETAIL: CPU: user: 67.93 s, system: 3.41 s, elapsed: 82.75 s) In addition to this, I collected statistics during the idx_applications2_enabled scan: # strace -c -p 26792 strace: Process 26792 attached ^Cstrace: Process 26792 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 90.87 2.931694 4 711882 read 4.61 0.148573 3 55598 19503 futex 3.01 0.097138 1 70933 lseek 0.87 0.027926 7 4261 write 0.44 0.014348 23 623 fdatasync 0.20 0.006372 5 1245 kill 0.00 0.000081 5 17 open 0.00 0.000042 4 12 select 0.00 0.000038 3 12 close 0.00 0.000000 0 1 rt_sigreturn 0.00 0.000000 0 1 sendto 0.00 0.000000 0 2 getrusage ------ ----------- ----------- --------- --------- ---------------- -- Mikhael On Mon, 20 Feb 2023 at 13:29, Mikhail Balayan <mv.bala...@gmail.com> wrote: > >> Can you run amcheck's bt_index_check() routine against some of the >> >> indexes you've shown? There is perhaps some chance that index >> >> corruption exists and causes VACUUM to take a very long time to delete >> >> index pages. This is pretty much a wild guess, though. > > > Unfortunately I can't, we haven't enabled this extension yet. And since > this is a production, I'm not ready to turn it on right away. But I can say > that this theory is unlikely, since this problem occurs on different > sites. Here's an example of an output from another site where table size is > 46.5 GB (again, data only), indexes 107GB, toast 62MB: > > automatic aggressive vacuum of table > "appdbname2.appschemaname.applications": index scans: 1 > pages: 0 removed, 6091646 remain, 0 skipped due to pins, 6086395 skipped > frozen > tuples: 2344 removed, 35295654 remain, 0 are dead but not yet removable, > oldest xmin: 213412878 > buffer usage: 251980554 hits, 14462331 misses, 18844 dirtied > avg read rate: 12.018 MB/s, avg write rate: 0.016 MB/s > system usage: CPU: user: 7734.43 s, system: 178.98 s, elapsed: 9401.36 s > > > Here again we see that there are 5251 blocks that need to be cleaned > (6091646 - 6086395), buffer usage is 266461729 blocks or ~ 2 TB and > processing time is 2.5h+. > > >> It's possible that VACUUM had to wait a long time for a cleanup lock >> on one individual heap page here > > If such a scenario is possible, it makes sense to add information about > the blocking waiting time to the output. Something like: > system usage: CPU: user: 7734.43 s, system: 178.98 s, lock_wait: > 1234.56 s, elapsed: 9401.36 s > > > Mikhail > > On Sat, 18 Feb 2023 at 05:35, Peter Geoghegan <p...@bowt.ie> wrote: > >> On Thu, Feb 16, 2023 at 5:40 PM Mikhail Balayan <mv.bala...@gmail.com> >> wrote: >> >> >> Do you have any non-btree indexes on the table? Can you show us the >> details of the >> >> >> table, including all of its indexes? In other words, can you show >> "\d applications" output from psql? >> > >> > Only btree indexes. Please find the full table schema below: >> >> It's possible that VACUUM had to wait a long time for a cleanup lock >> on one individual heap page here, which could have added a long delay. >> But...that doesn't seem particularly likely. >> >> Can you run amcheck's bt_index_check() routine against some of the >> indexes you've shown? There is perhaps some chance that index >> corruption exists and causes VACUUM to take a very long time to delete >> index pages. This is pretty much a wild guess, though. >> >> -- >> Peter Geoghegan >> >