Re: autovac issue with large number of tables
On Wed, Dec 9, 2020 at 12:01 AM Fujii Masao wrote: > > > > On 2020/12/04 12:21, Kasahara Tatsuhito wrote: > > Hi, > > > > On Thu, Dec 3, 2020 at 9:09 PM Fujii Masao > > wrote: > >> > >> > >> > >> On 2020/12/03 11:46, Kasahara Tatsuhito wrote: > >>> On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada > >>> wrote: > > On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao > wrote: > > > > > > > > On 2020/12/02 12:53, Masahiko Sawada wrote: > >> On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada > >> wrote: > >>> > >>> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao > >>> wrote: > > > > On 2020/12/01 16:23, Masahiko Sawada wrote: > > On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito > > wrote: > >> > >> Hi, > >> > >> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao > >> wrote: > >>> > >>> > >>> > >>> On 2020/11/30 10:43, Masahiko Sawada wrote: > On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > wrote: > > > > Hi, Thanks for you comments. > > > > On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao > > wrote: > >> > >> > >> > >> On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > >>> Hi, > >>> > >>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > >>> wrote: > > > > On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > > On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > > wrote: > >> > >> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > >> wrote: > >>> > >>> Hi, > >>> > >>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > >>> wrote: > > On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > wrote: > > > > Hi, > > > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > wrote: > >>> I wonder if we could have table_recheck_autovac do > >>> two probes of the stats > >>> data. First probe the existing stats data, and if it > >>> shows the table to > >>> be already vacuumed, return immediately. If not, > >>> *then* force a stats > >>> re-read, and check a second time. > >> Does the above mean that the second and subsequent > >> table_recheck_autovac() > >> will be improved to first check using the previous > >> refreshed statistics? > >> I think that certainly works. > >> > >> If that's correct, I'll try to create a patch for the > >> PoC > > > > I still don't know how to reproduce Jim's troubles, but > > I was able to reproduce > > what was probably a very similar problem. > > > > This problem seems to be more likely to occur in cases > > where you have > > a large number of tables, > > i.e., a large amount of stats, and many small tables > > need VACUUM at > > the same time. > > > > So I followed Tom's advice and created a patch for the > > PoC. > > This patch will enable a flag in the > > table_recheck_autovac function to use > > the existing stats next time if VACUUM (or ANALYZE) has > > already been done > > by another worker on the check after the stats have > > been updated. > > If the tables continue to require VACUUM after the > > refresh, then a refresh > > will be required instead of using the existing > > statistics. > > > > I did simple test with HEAD and HEAD + this PoC patch. > > The tests were conducted in two cases. > > (I changed few configurations. see attached scripts) > > > > 1. Normal VACUUM case > > - SET autovacuum = off > > - CREATE tables with 100 rows > > - DELETE 90 rows for each tables
Re: autovac issue with large number of tables
On 2020/12/04 12:21, Kasahara Tatsuhito wrote: Hi, On Thu, Dec 3, 2020 at 9:09 PM Fujii Masao wrote: On 2020/12/03 11:46, Kasahara Tatsuhito wrote: On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada wrote: On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao wrote: On 2020/12/02 12:53, Masahiko Sawada wrote: On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada wrote: On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao wrote: On 2020/12/01 16:23, Masahiko Sawada wrote: On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito wrote: Hi, On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao wrote: On 2020/11/30 10:43, Masahiko Sawada wrote: On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito wrote: Hi, Thanks for you comments. On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao wrote: On 2020/11/27 18:38, Kasahara Tatsuhito wrote: Hi, On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao wrote: On 2020/11/26 10:41, Kasahara Tatsuhito wrote: On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada wrote: On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito wrote: Hi, On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada wrote: On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito wrote: Hi, On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito wrote: I wonder if we could have table_recheck_autovac do two probes of the stats data. First probe the existing stats data, and if it shows the table to be already vacuumed, return immediately. If not, *then* force a stats re-read, and check a second time. Does the above mean that the second and subsequent table_recheck_autovac() will be improved to first check using the previous refreshed statistics? I think that certainly works. If that's correct, I'll try to create a patch for the PoC I still don't know how to reproduce Jim's troubles, but I was able to reproduce what was probably a very similar problem. This problem seems to be more likely to occur in cases where you have a large number of tables, i.e., a large amount of stats, and many small tables need VACUUM at the same time. So I followed Tom's advice and created a patch for the PoC. This patch will enable a flag in the table_recheck_autovac function to use the existing stats next time if VACUUM (or ANALYZE) has already been done by another worker on the check after the stats have been updated. If the tables continue to require VACUUM after the refresh, then a refresh will be required instead of using the existing statistics. I did simple test with HEAD and HEAD + this PoC patch. The tests were conducted in two cases. (I changed few configurations. see attached scripts) 1. Normal VACUUM case - SET autovacuum = off - CREATE tables with 100 rows - DELETE 90 rows for each tables - SET autovacuum = on and restart PostgreSQL - Measure the time it takes for all tables to be VACUUMed 2. Anti wrap round VACUUM case - CREATE brank tables - SELECT all of these tables (for generate stats) - SET autovacuum_freeze_max_age to low values and restart PostgreSQL - Consumes a lot of XIDs by using txid_curent() - Measure the time it takes for all tables to be VACUUMed For each test case, the following results were obtained by changing autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. Also changing num of tables to 1000, 5000, 1 and 2. Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable, but I think it's enough to ask for a trend. === [1.Normal VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec tables:5000 autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec tables:1 autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec tables:2 autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec autovacuum_max_w
Re: autovac issue with large number of tables
Hi, On Thu, Dec 3, 2020 at 9:09 PM Fujii Masao wrote: > > > > On 2020/12/03 11:46, Kasahara Tatsuhito wrote: > > On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada > > wrote: > >> > >> On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao > >> wrote: > >>> > >>> > >>> > >>> On 2020/12/02 12:53, Masahiko Sawada wrote: > On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada > wrote: > > > > On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao > > wrote: > >> > >> > >> > >> On 2020/12/01 16:23, Masahiko Sawada wrote: > >>> On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito > >>> wrote: > > Hi, > > On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao > wrote: > > > > > > > > On 2020/11/30 10:43, Masahiko Sawada wrote: > >> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > >> wrote: > >>> > >>> Hi, Thanks for you comments. > >>> > >>> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao > >>> wrote: > > > > On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > > Hi, > > > > On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > > wrote: > >> > >> > >> > >> On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > >>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > >>> wrote: > > On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > wrote: > > > > Hi, > > > > On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > > wrote: > >> > >> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > >> wrote: > >>> > >>> Hi, > >>> > >>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > >>> wrote: > > I wonder if we could have table_recheck_autovac do two > > probes of the stats > > data. First probe the existing stats data, and if it > > shows the table to > > be already vacuumed, return immediately. If not, > > *then* force a stats > > re-read, and check a second time. > Does the above mean that the second and subsequent > table_recheck_autovac() > will be improved to first check using the previous > refreshed statistics? > I think that certainly works. > > If that's correct, I'll try to create a patch for the PoC > >>> > >>> I still don't know how to reproduce Jim's troubles, but I > >>> was able to reproduce > >>> what was probably a very similar problem. > >>> > >>> This problem seems to be more likely to occur in cases > >>> where you have > >>> a large number of tables, > >>> i.e., a large amount of stats, and many small tables need > >>> VACUUM at > >>> the same time. > >>> > >>> So I followed Tom's advice and created a patch for the > >>> PoC. > >>> This patch will enable a flag in the > >>> table_recheck_autovac function to use > >>> the existing stats next time if VACUUM (or ANALYZE) has > >>> already been done > >>> by another worker on the check after the stats have been > >>> updated. > >>> If the tables continue to require VACUUM after the > >>> refresh, then a refresh > >>> will be required instead of using the existing statistics. > >>> > >>> I did simple test with HEAD and HEAD + this PoC patch. > >>> The tests were conducted in two cases. > >>> (I changed few configurations. see attached scripts) > >>> > >>> 1. Normal VACUUM case > >>> - SET autovacuum = off > >>> - CREATE tables with 100 rows > >>> - DELETE 90 rows for each tables > >>> - SET autovacuum = on and restart PostgreSQL > >>> - Measure the time it takes for all tables to be > >>> VACUUMed > >>> > >>> 2. Anti wrap round VACUUM case > >>> - CREATE brank tables > >>> - SELECT all of these tables (for generate stats) > >>
Re: autovac issue with large number of tables
On 2020/12/03 11:46, Kasahara Tatsuhito wrote: On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada wrote: On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao wrote: On 2020/12/02 12:53, Masahiko Sawada wrote: On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada wrote: On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao wrote: On 2020/12/01 16:23, Masahiko Sawada wrote: On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito wrote: Hi, On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao wrote: On 2020/11/30 10:43, Masahiko Sawada wrote: On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito wrote: Hi, Thanks for you comments. On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao wrote: On 2020/11/27 18:38, Kasahara Tatsuhito wrote: Hi, On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao wrote: On 2020/11/26 10:41, Kasahara Tatsuhito wrote: On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada wrote: On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito wrote: Hi, On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada wrote: On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito wrote: Hi, On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito wrote: I wonder if we could have table_recheck_autovac do two probes of the stats data. First probe the existing stats data, and if it shows the table to be already vacuumed, return immediately. If not, *then* force a stats re-read, and check a second time. Does the above mean that the second and subsequent table_recheck_autovac() will be improved to first check using the previous refreshed statistics? I think that certainly works. If that's correct, I'll try to create a patch for the PoC I still don't know how to reproduce Jim's troubles, but I was able to reproduce what was probably a very similar problem. This problem seems to be more likely to occur in cases where you have a large number of tables, i.e., a large amount of stats, and many small tables need VACUUM at the same time. So I followed Tom's advice and created a patch for the PoC. This patch will enable a flag in the table_recheck_autovac function to use the existing stats next time if VACUUM (or ANALYZE) has already been done by another worker on the check after the stats have been updated. If the tables continue to require VACUUM after the refresh, then a refresh will be required instead of using the existing statistics. I did simple test with HEAD and HEAD + this PoC patch. The tests were conducted in two cases. (I changed few configurations. see attached scripts) 1. Normal VACUUM case - SET autovacuum = off - CREATE tables with 100 rows - DELETE 90 rows for each tables - SET autovacuum = on and restart PostgreSQL - Measure the time it takes for all tables to be VACUUMed 2. Anti wrap round VACUUM case - CREATE brank tables - SELECT all of these tables (for generate stats) - SET autovacuum_freeze_max_age to low values and restart PostgreSQL - Consumes a lot of XIDs by using txid_curent() - Measure the time it takes for all tables to be VACUUMed For each test case, the following results were obtained by changing autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. Also changing num of tables to 1000, 5000, 1 and 2. Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable, but I think it's enough to ask for a trend. === [1.Normal VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec tables:5000 autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec tables:1 autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec tables:2 autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec autov
Re: autovac issue with large number of tables
On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada wrote: > > On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao > wrote: > > > > > > > > On 2020/12/02 12:53, Masahiko Sawada wrote: > > > On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada > > > wrote: > > >> > > >> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao > > >> wrote: > > >>> > > >>> > > >>> > > >>> On 2020/12/01 16:23, Masahiko Sawada wrote: > > On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito > > wrote: > > > > > > Hi, > > > > > > On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao > > > wrote: > > >> > > >> > > >> > > >> On 2020/11/30 10:43, Masahiko Sawada wrote: > > >>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > > >>> wrote: > > > > Hi, Thanks for you comments. > > > > On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao > > wrote: > > > > > > > > > > > > On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > > >> Hi, > > >> > > >> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > > >> wrote: > > >>> > > >>> > > >>> > > >>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > > On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > > wrote: > > > > > > On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > > > wrote: > > >> > > >> Hi, > > >> > > >> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > > >> wrote: > > >>> > > >>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > > >>> wrote: > > > > Hi, > > > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > wrote: > > >> I wonder if we could have table_recheck_autovac do two > > >> probes of the stats > > >> data. First probe the existing stats data, and if it > > >> shows the table to > > >> be already vacuumed, return immediately. If not, *then* > > >> force a stats > > >> re-read, and check a second time. > > > Does the above mean that the second and subsequent > > > table_recheck_autovac() > > > will be improved to first check using the previous > > > refreshed statistics? > > > I think that certainly works. > > > > > > If that's correct, I'll try to create a patch for the PoC > > > > I still don't know how to reproduce Jim's troubles, but I > > was able to reproduce > > what was probably a very similar problem. > > > > This problem seems to be more likely to occur in cases > > where you have > > a large number of tables, > > i.e., a large amount of stats, and many small tables need > > VACUUM at > > the same time. > > > > So I followed Tom's advice and created a patch for the PoC. > > This patch will enable a flag in the table_recheck_autovac > > function to use > > the existing stats next time if VACUUM (or ANALYZE) has > > already been done > > by another worker on the check after the stats have been > > updated. > > If the tables continue to require VACUUM after the > > refresh, then a refresh > > will be required instead of using the existing statistics. > > > > I did simple test with HEAD and HEAD + this PoC patch. > > The tests were conducted in two cases. > > (I changed few configurations. see attached scripts) > > > > 1. Normal VACUUM case > > - SET autovacuum = off > > - CREATE tables with 100 rows > > - DELETE 90 rows for each tables > > - SET autovacuum = on and restart PostgreSQL > > - Measure the time it takes for all tables to be > > VACUUMed > > > > 2. Anti wrap round VACUUM case > > - CREATE brank tables > > - SELECT all of these tables (for generate stats) > > - SET autovacuum_freeze_max_age to low values and > > restart PostgreSQL > > - Consumes a lot of XIDs by using txid_curent() > > - Measure the time it takes for all tables to be > > VA
Re: autovac issue with large number of tables
On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao wrote: > > > > On 2020/12/02 12:53, Masahiko Sawada wrote: > > On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada > > wrote: > >> > >> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao > >> wrote: > >>> > >>> > >>> > >>> On 2020/12/01 16:23, Masahiko Sawada wrote: > On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito > wrote: > > > > Hi, > > > > On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao > > wrote: > >> > >> > >> > >> On 2020/11/30 10:43, Masahiko Sawada wrote: > >>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > >>> wrote: > > Hi, Thanks for you comments. > > On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao > wrote: > > > > > > > > On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > >> Hi, > >> > >> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > >> wrote: > >>> > >>> > >>> > >>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > wrote: > > > > On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > > wrote: > >> > >> Hi, > >> > >> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > >> wrote: > >>> > >>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > >>> wrote: > > Hi, > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > wrote: > >> I wonder if we could have table_recheck_autovac do two > >> probes of the stats > >> data. First probe the existing stats data, and if it > >> shows the table to > >> be already vacuumed, return immediately. If not, *then* > >> force a stats > >> re-read, and check a second time. > > Does the above mean that the second and subsequent > > table_recheck_autovac() > > will be improved to first check using the previous > > refreshed statistics? > > I think that certainly works. > > > > If that's correct, I'll try to create a patch for the PoC > > I still don't know how to reproduce Jim's troubles, but I > was able to reproduce > what was probably a very similar problem. > > This problem seems to be more likely to occur in cases where > you have > a large number of tables, > i.e., a large amount of stats, and many small tables need > VACUUM at > the same time. > > So I followed Tom's advice and created a patch for the PoC. > This patch will enable a flag in the table_recheck_autovac > function to use > the existing stats next time if VACUUM (or ANALYZE) has > already been done > by another worker on the check after the stats have been > updated. > If the tables continue to require VACUUM after the refresh, > then a refresh > will be required instead of using the existing statistics. > > I did simple test with HEAD and HEAD + this PoC patch. > The tests were conducted in two cases. > (I changed few configurations. see attached scripts) > > 1. Normal VACUUM case > - SET autovacuum = off > - CREATE tables with 100 rows > - DELETE 90 rows for each tables > - SET autovacuum = on and restart PostgreSQL > - Measure the time it takes for all tables to be > VACUUMed > > 2. Anti wrap round VACUUM case > - CREATE brank tables > - SELECT all of these tables (for generate stats) > - SET autovacuum_freeze_max_age to low values and > restart PostgreSQL > - Consumes a lot of XIDs by using txid_curent() > - Measure the time it takes for all tables to be > VACUUMed > > For each test case, the following results were obtained by > changing > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > Also changing num of tables to 1000, 5000, 1 and 2. > >
Re: autovac issue with large number of tables
Hi On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao wrote: > > > > On 2020/12/02 12:53, Masahiko Sawada wrote: > > On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada > > wrote: > >> > >> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao > >> wrote: > >>> > >>> > >>> > >>> On 2020/12/01 16:23, Masahiko Sawada wrote: > On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito > wrote: > > > > Hi, > > > > On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao > > wrote: > >> > >> > >> > >> On 2020/11/30 10:43, Masahiko Sawada wrote: > >>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > >>> wrote: > > Hi, Thanks for you comments. > > On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao > wrote: > > > > > > > > On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > >> Hi, > >> > >> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > >> wrote: > >>> > >>> > >>> > >>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > wrote: > > > > On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > > wrote: > >> > >> Hi, > >> > >> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > >> wrote: > >>> > >>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > >>> wrote: > > Hi, > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > wrote: > >> I wonder if we could have table_recheck_autovac do two > >> probes of the stats > >> data. First probe the existing stats data, and if it > >> shows the table to > >> be already vacuumed, return immediately. If not, *then* > >> force a stats > >> re-read, and check a second time. > > Does the above mean that the second and subsequent > > table_recheck_autovac() > > will be improved to first check using the previous > > refreshed statistics? > > I think that certainly works. > > > > If that's correct, I'll try to create a patch for the PoC > > I still don't know how to reproduce Jim's troubles, but I > was able to reproduce > what was probably a very similar problem. > > This problem seems to be more likely to occur in cases where > you have > a large number of tables, > i.e., a large amount of stats, and many small tables need > VACUUM at > the same time. > > So I followed Tom's advice and created a patch for the PoC. > This patch will enable a flag in the table_recheck_autovac > function to use > the existing stats next time if VACUUM (or ANALYZE) has > already been done > by another worker on the check after the stats have been > updated. > If the tables continue to require VACUUM after the refresh, > then a refresh > will be required instead of using the existing statistics. > > I did simple test with HEAD and HEAD + this PoC patch. > The tests were conducted in two cases. > (I changed few configurations. see attached scripts) > > 1. Normal VACUUM case > - SET autovacuum = off > - CREATE tables with 100 rows > - DELETE 90 rows for each tables > - SET autovacuum = on and restart PostgreSQL > - Measure the time it takes for all tables to be > VACUUMed > > 2. Anti wrap round VACUUM case > - CREATE brank tables > - SELECT all of these tables (for generate stats) > - SET autovacuum_freeze_max_age to low values and > restart PostgreSQL > - Consumes a lot of XIDs by using txid_curent() > - Measure the time it takes for all tables to be > VACUUMed > > For each test case, the following results were obtained by > changing > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > Also changing num of tables to 1000, 5000, 1 and 2. > >
Re: autovac issue with large number of tables
On 2020/12/02 12:53, Masahiko Sawada wrote: On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada wrote: On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao wrote: On 2020/12/01 16:23, Masahiko Sawada wrote: On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito wrote: Hi, On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao wrote: On 2020/11/30 10:43, Masahiko Sawada wrote: On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito wrote: Hi, Thanks for you comments. On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao wrote: On 2020/11/27 18:38, Kasahara Tatsuhito wrote: Hi, On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao wrote: On 2020/11/26 10:41, Kasahara Tatsuhito wrote: On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada wrote: On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito wrote: Hi, On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada wrote: On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito wrote: Hi, On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito wrote: I wonder if we could have table_recheck_autovac do two probes of the stats data. First probe the existing stats data, and if it shows the table to be already vacuumed, return immediately. If not, *then* force a stats re-read, and check a second time. Does the above mean that the second and subsequent table_recheck_autovac() will be improved to first check using the previous refreshed statistics? I think that certainly works. If that's correct, I'll try to create a patch for the PoC I still don't know how to reproduce Jim's troubles, but I was able to reproduce what was probably a very similar problem. This problem seems to be more likely to occur in cases where you have a large number of tables, i.e., a large amount of stats, and many small tables need VACUUM at the same time. So I followed Tom's advice and created a patch for the PoC. This patch will enable a flag in the table_recheck_autovac function to use the existing stats next time if VACUUM (or ANALYZE) has already been done by another worker on the check after the stats have been updated. If the tables continue to require VACUUM after the refresh, then a refresh will be required instead of using the existing statistics. I did simple test with HEAD and HEAD + this PoC patch. The tests were conducted in two cases. (I changed few configurations. see attached scripts) 1. Normal VACUUM case - SET autovacuum = off - CREATE tables with 100 rows - DELETE 90 rows for each tables - SET autovacuum = on and restart PostgreSQL - Measure the time it takes for all tables to be VACUUMed 2. Anti wrap round VACUUM case - CREATE brank tables - SELECT all of these tables (for generate stats) - SET autovacuum_freeze_max_age to low values and restart PostgreSQL - Consumes a lot of XIDs by using txid_curent() - Measure the time it takes for all tables to be VACUUMed For each test case, the following results were obtained by changing autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. Also changing num of tables to 1000, 5000, 1 and 2. Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable, but I think it's enough to ask for a trend. === [1.Normal VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec tables:5000 autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec tables:1 autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec tables:2 autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec [2.Anti wrap round VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
Re: autovac issue with large number of tables
On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada wrote: > > On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao > wrote: > > > > > > > > On 2020/12/01 16:23, Masahiko Sawada wrote: > > > On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito > > > wrote: > > >> > > >> Hi, > > >> > > >> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao > > >> wrote: > > >>> > > >>> > > >>> > > >>> On 2020/11/30 10:43, Masahiko Sawada wrote: > > On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > > wrote: > > > > > > Hi, Thanks for you comments. > > > > > > On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao > > > wrote: > > >> > > >> > > >> > > >> On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > > >>> Hi, > > >>> > > >>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > > >>> wrote: > > > > > > > > On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > > > On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > > > wrote: > > >> > > >> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > > >> wrote: > > >>> > > >>> Hi, > > >>> > > >>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > > >>> wrote: > > > > On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > > wrote: > > > > > > Hi, > > > > > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > > wrote: > > >>> I wonder if we could have table_recheck_autovac do two > > >>> probes of the stats > > >>> data. First probe the existing stats data, and if it shows > > >>> the table to > > >>> be already vacuumed, return immediately. If not, *then* > > >>> force a stats > > >>> re-read, and check a second time. > > >> Does the above mean that the second and subsequent > > >> table_recheck_autovac() > > >> will be improved to first check using the previous refreshed > > >> statistics? > > >> I think that certainly works. > > >> > > >> If that's correct, I'll try to create a patch for the PoC > > > > > > I still don't know how to reproduce Jim's troubles, but I was > > > able to reproduce > > > what was probably a very similar problem. > > > > > > This problem seems to be more likely to occur in cases where > > > you have > > > a large number of tables, > > > i.e., a large amount of stats, and many small tables need > > > VACUUM at > > > the same time. > > > > > > So I followed Tom's advice and created a patch for the PoC. > > > This patch will enable a flag in the table_recheck_autovac > > > function to use > > > the existing stats next time if VACUUM (or ANALYZE) has > > > already been done > > > by another worker on the check after the stats have been > > > updated. > > > If the tables continue to require VACUUM after the refresh, > > > then a refresh > > > will be required instead of using the existing statistics. > > > > > > I did simple test with HEAD and HEAD + this PoC patch. > > > The tests were conducted in two cases. > > > (I changed few configurations. see attached scripts) > > > > > > 1. Normal VACUUM case > > > - SET autovacuum = off > > > - CREATE tables with 100 rows > > > - DELETE 90 rows for each tables > > > - SET autovacuum = on and restart PostgreSQL > > > - Measure the time it takes for all tables to be > > > VACUUMed > > > > > > 2. Anti wrap round VACUUM case > > > - CREATE brank tables > > > - SELECT all of these tables (for generate stats) > > > - SET autovacuum_freeze_max_age to low values and > > > restart PostgreSQL > > > - Consumes a lot of XIDs by using txid_curent() > > > - Measure the time it takes for all tables to be > > > VACUUMed > > > > > > For each test case, the following results were obtained by > > > changing > > > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > > > Also changing num of tables to 1000, 5000, 1 and 2. > > > > > > Due to the poor VM environment (2 VCPU/4 GB), the results are > > > a little unstable, > > > but I think it's enough to ask for a trend. > > > > > > ==
Re: autovac issue with large number of tables
On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao wrote: > > > > On 2020/12/01 16:23, Masahiko Sawada wrote: > > On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito > > wrote: > >> > >> Hi, > >> > >> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao > >> wrote: > >>> > >>> > >>> > >>> On 2020/11/30 10:43, Masahiko Sawada wrote: > On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > wrote: > > > > Hi, Thanks for you comments. > > > > On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao > > wrote: > >> > >> > >> > >> On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > >>> Hi, > >>> > >>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > >>> wrote: > > > > On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > > On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > > wrote: > >> > >> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > >> wrote: > >>> > >>> Hi, > >>> > >>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > >>> wrote: > > On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > wrote: > > > > Hi, > > > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > wrote: > >>> I wonder if we could have table_recheck_autovac do two probes > >>> of the stats > >>> data. First probe the existing stats data, and if it shows > >>> the table to > >>> be already vacuumed, return immediately. If not, *then* > >>> force a stats > >>> re-read, and check a second time. > >> Does the above mean that the second and subsequent > >> table_recheck_autovac() > >> will be improved to first check using the previous refreshed > >> statistics? > >> I think that certainly works. > >> > >> If that's correct, I'll try to create a patch for the PoC > > > > I still don't know how to reproduce Jim's troubles, but I was > > able to reproduce > > what was probably a very similar problem. > > > > This problem seems to be more likely to occur in cases where > > you have > > a large number of tables, > > i.e., a large amount of stats, and many small tables need > > VACUUM at > > the same time. > > > > So I followed Tom's advice and created a patch for the PoC. > > This patch will enable a flag in the table_recheck_autovac > > function to use > > the existing stats next time if VACUUM (or ANALYZE) has already > > been done > > by another worker on the check after the stats have been > > updated. > > If the tables continue to require VACUUM after the refresh, > > then a refresh > > will be required instead of using the existing statistics. > > > > I did simple test with HEAD and HEAD + this PoC patch. > > The tests were conducted in two cases. > > (I changed few configurations. see attached scripts) > > > > 1. Normal VACUUM case > > - SET autovacuum = off > > - CREATE tables with 100 rows > > - DELETE 90 rows for each tables > > - SET autovacuum = on and restart PostgreSQL > > - Measure the time it takes for all tables to be VACUUMed > > > > 2. Anti wrap round VACUUM case > > - CREATE brank tables > > - SELECT all of these tables (for generate stats) > > - SET autovacuum_freeze_max_age to low values and restart > > PostgreSQL > > - Consumes a lot of XIDs by using txid_curent() > > - Measure the time it takes for all tables to be VACUUMed > > > > For each test case, the following results were obtained by > > changing > > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > > Also changing num of tables to 1000, 5000, 1 and 2. > > > > Due to the poor VM environment (2 VCPU/4 GB), the results are a > > little unstable, > > but I think it's enough to ask for a trend. > > > > === > > [1.Normal VACUUM case] > > tables:1000 > > autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) > > 20 sec > > autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) > > 16 sec
Re: autovac issue with large number of tables
On 2020/12/01 16:23, Masahiko Sawada wrote: On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito wrote: Hi, On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao wrote: On 2020/11/30 10:43, Masahiko Sawada wrote: On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito wrote: Hi, Thanks for you comments. On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao wrote: On 2020/11/27 18:38, Kasahara Tatsuhito wrote: Hi, On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao wrote: On 2020/11/26 10:41, Kasahara Tatsuhito wrote: On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada wrote: On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito wrote: Hi, On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada wrote: On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito wrote: Hi, On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito wrote: I wonder if we could have table_recheck_autovac do two probes of the stats data. First probe the existing stats data, and if it shows the table to be already vacuumed, return immediately. If not, *then* force a stats re-read, and check a second time. Does the above mean that the second and subsequent table_recheck_autovac() will be improved to first check using the previous refreshed statistics? I think that certainly works. If that's correct, I'll try to create a patch for the PoC I still don't know how to reproduce Jim's troubles, but I was able to reproduce what was probably a very similar problem. This problem seems to be more likely to occur in cases where you have a large number of tables, i.e., a large amount of stats, and many small tables need VACUUM at the same time. So I followed Tom's advice and created a patch for the PoC. This patch will enable a flag in the table_recheck_autovac function to use the existing stats next time if VACUUM (or ANALYZE) has already been done by another worker on the check after the stats have been updated. If the tables continue to require VACUUM after the refresh, then a refresh will be required instead of using the existing statistics. I did simple test with HEAD and HEAD + this PoC patch. The tests were conducted in two cases. (I changed few configurations. see attached scripts) 1. Normal VACUUM case - SET autovacuum = off - CREATE tables with 100 rows - DELETE 90 rows for each tables - SET autovacuum = on and restart PostgreSQL - Measure the time it takes for all tables to be VACUUMed 2. Anti wrap round VACUUM case - CREATE brank tables - SELECT all of these tables (for generate stats) - SET autovacuum_freeze_max_age to low values and restart PostgreSQL - Consumes a lot of XIDs by using txid_curent() - Measure the time it takes for all tables to be VACUUMed For each test case, the following results were obtained by changing autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. Also changing num of tables to 1000, 5000, 1 and 2. Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable, but I think it's enough to ask for a trend. === [1.Normal VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec tables:5000 autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec tables:1 autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec tables:2 autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec [2.Anti wrap round VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec autovacuum_max_workers 5: (HEAD) 14 sec VS (with p
Re: autovac issue with large number of tables
On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito wrote: > > Hi, > > On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao > wrote: > > > > > > > > On 2020/11/30 10:43, Masahiko Sawada wrote: > > > On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > > > wrote: > > >> > > >> Hi, Thanks for you comments. > > >> > > >> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao > > >> wrote: > > >>> > > >>> > > >>> > > >>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > > Hi, > > > > On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > > wrote: > > > > > > > > > > > > On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > > >> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > > >> wrote: > > >>> > > >>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > > >>> wrote: > > > > Hi, > > > > On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > > wrote: > > > > > > On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > > > wrote: > > >> > > >> Hi, > > >> > > >> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > >> wrote: > > I wonder if we could have table_recheck_autovac do two probes > > of the stats > > data. First probe the existing stats data, and if it shows > > the table to > > be already vacuumed, return immediately. If not, *then* force > > a stats > > re-read, and check a second time. > > >>> Does the above mean that the second and subsequent > > >>> table_recheck_autovac() > > >>> will be improved to first check using the previous refreshed > > >>> statistics? > > >>> I think that certainly works. > > >>> > > >>> If that's correct, I'll try to create a patch for the PoC > > >> > > >> I still don't know how to reproduce Jim's troubles, but I was > > >> able to reproduce > > >> what was probably a very similar problem. > > >> > > >> This problem seems to be more likely to occur in cases where you > > >> have > > >> a large number of tables, > > >> i.e., a large amount of stats, and many small tables need VACUUM > > >> at > > >> the same time. > > >> > > >> So I followed Tom's advice and created a patch for the PoC. > > >> This patch will enable a flag in the table_recheck_autovac > > >> function to use > > >> the existing stats next time if VACUUM (or ANALYZE) has already > > >> been done > > >> by another worker on the check after the stats have been updated. > > >> If the tables continue to require VACUUM after the refresh, then > > >> a refresh > > >> will be required instead of using the existing statistics. > > >> > > >> I did simple test with HEAD and HEAD + this PoC patch. > > >> The tests were conducted in two cases. > > >> (I changed few configurations. see attached scripts) > > >> > > >> 1. Normal VACUUM case > > >> - SET autovacuum = off > > >> - CREATE tables with 100 rows > > >> - DELETE 90 rows for each tables > > >> - SET autovacuum = on and restart PostgreSQL > > >> - Measure the time it takes for all tables to be VACUUMed > > >> > > >> 2. Anti wrap round VACUUM case > > >> - CREATE brank tables > > >> - SELECT all of these tables (for generate stats) > > >> - SET autovacuum_freeze_max_age to low values and restart > > >> PostgreSQL > > >> - Consumes a lot of XIDs by using txid_curent() > > >> - Measure the time it takes for all tables to be VACUUMed > > >> > > >> For each test case, the following results were obtained by > > >> changing > > >> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > > >> Also changing num of tables to 1000, 5000, 1 and 2. > > >> > > >> Due to the poor VM environment (2 VCPU/4 GB), the results are a > > >> little unstable, > > >> but I think it's enough to ask for a trend. > > >> > > >> === > > >> [1.Normal VACUUM case] > > >> tables:1000 > > >> autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) > > >> 20 sec > > >> autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) > > >> 16 sec > > >> autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) > > >> 16 sec > > >> autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) > > >> 17 sec > > >> autovacuum_max_workers 10: (HEAD) 19 sec VS (wi
Re: autovac issue with large number of tables
Hi, On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao wrote: > > > > On 2020/11/30 10:43, Masahiko Sawada wrote: > > On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > > wrote: > >> > >> Hi, Thanks for you comments. > >> > >> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao > >> wrote: > >>> > >>> > >>> > >>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > Hi, > > On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > wrote: > > > > > > > > On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > >> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > >> wrote: > >>> > >>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > >>> wrote: > > Hi, > > On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > wrote: > > > > On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > > wrote: > >> > >> Hi, > >> > >> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > >> wrote: > I wonder if we could have table_recheck_autovac do two probes of > the stats > data. First probe the existing stats data, and if it shows the > table to > be already vacuumed, return immediately. If not, *then* force a > stats > re-read, and check a second time. > >>> Does the above mean that the second and subsequent > >>> table_recheck_autovac() > >>> will be improved to first check using the previous refreshed > >>> statistics? > >>> I think that certainly works. > >>> > >>> If that's correct, I'll try to create a patch for the PoC > >> > >> I still don't know how to reproduce Jim's troubles, but I was able > >> to reproduce > >> what was probably a very similar problem. > >> > >> This problem seems to be more likely to occur in cases where you > >> have > >> a large number of tables, > >> i.e., a large amount of stats, and many small tables need VACUUM at > >> the same time. > >> > >> So I followed Tom's advice and created a patch for the PoC. > >> This patch will enable a flag in the table_recheck_autovac > >> function to use > >> the existing stats next time if VACUUM (or ANALYZE) has already > >> been done > >> by another worker on the check after the stats have been updated. > >> If the tables continue to require VACUUM after the refresh, then a > >> refresh > >> will be required instead of using the existing statistics. > >> > >> I did simple test with HEAD and HEAD + this PoC patch. > >> The tests were conducted in two cases. > >> (I changed few configurations. see attached scripts) > >> > >> 1. Normal VACUUM case > >> - SET autovacuum = off > >> - CREATE tables with 100 rows > >> - DELETE 90 rows for each tables > >> - SET autovacuum = on and restart PostgreSQL > >> - Measure the time it takes for all tables to be VACUUMed > >> > >> 2. Anti wrap round VACUUM case > >> - CREATE brank tables > >> - SELECT all of these tables (for generate stats) > >> - SET autovacuum_freeze_max_age to low values and restart > >> PostgreSQL > >> - Consumes a lot of XIDs by using txid_curent() > >> - Measure the time it takes for all tables to be VACUUMed > >> > >> For each test case, the following results were obtained by changing > >> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > >> Also changing num of tables to 1000, 5000, 1 and 2. > >> > >> Due to the poor VM environment (2 VCPU/4 GB), the results are a > >> little unstable, > >> but I think it's enough to ask for a trend. > >> > >> === > >> [1.Normal VACUUM case] > >> tables:1000 > >> autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 > >> sec > >> autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 > >> sec > >> autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 > >> sec > >> autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 > >> sec > >> autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 > >> sec > >> > >> tables:5000 > >> autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 > >> sec > >> autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 > >> sec > >> autovacuum_max_workers 3: (HEAD) 38 sec
Re: autovac issue with large number of tables
On 2020/11/30 10:43, Masahiko Sawada wrote: On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito wrote: Hi, Thanks for you comments. On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao wrote: On 2020/11/27 18:38, Kasahara Tatsuhito wrote: Hi, On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao wrote: On 2020/11/26 10:41, Kasahara Tatsuhito wrote: On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada wrote: On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito wrote: Hi, On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada wrote: On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito wrote: Hi, On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito wrote: I wonder if we could have table_recheck_autovac do two probes of the stats data. First probe the existing stats data, and if it shows the table to be already vacuumed, return immediately. If not, *then* force a stats re-read, and check a second time. Does the above mean that the second and subsequent table_recheck_autovac() will be improved to first check using the previous refreshed statistics? I think that certainly works. If that's correct, I'll try to create a patch for the PoC I still don't know how to reproduce Jim's troubles, but I was able to reproduce what was probably a very similar problem. This problem seems to be more likely to occur in cases where you have a large number of tables, i.e., a large amount of stats, and many small tables need VACUUM at the same time. So I followed Tom's advice and created a patch for the PoC. This patch will enable a flag in the table_recheck_autovac function to use the existing stats next time if VACUUM (or ANALYZE) has already been done by another worker on the check after the stats have been updated. If the tables continue to require VACUUM after the refresh, then a refresh will be required instead of using the existing statistics. I did simple test with HEAD and HEAD + this PoC patch. The tests were conducted in two cases. (I changed few configurations. see attached scripts) 1. Normal VACUUM case - SET autovacuum = off - CREATE tables with 100 rows - DELETE 90 rows for each tables - SET autovacuum = on and restart PostgreSQL - Measure the time it takes for all tables to be VACUUMed 2. Anti wrap round VACUUM case - CREATE brank tables - SELECT all of these tables (for generate stats) - SET autovacuum_freeze_max_age to low values and restart PostgreSQL - Consumes a lot of XIDs by using txid_curent() - Measure the time it takes for all tables to be VACUUMed For each test case, the following results were obtained by changing autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. Also changing num of tables to 1000, 5000, 1 and 2. Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable, but I think it's enough to ask for a trend. === [1.Normal VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec tables:5000 autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec tables:1 autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec tables:2 autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec [2.Anti wrap round VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec tables:5000 autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec autovacuum_max_workers 2: (H
Re: autovac issue with large number of tables
On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito wrote: > > Hi, Thanks for you comments. > > On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao > wrote: > > > > > > > > On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > > > Hi, > > > > > > On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > > > wrote: > > >> > > >> > > >> > > >> On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > > >>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > > >>> wrote: > > > > On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > > wrote: > > > > > > Hi, > > > > > > On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > > > wrote: > > >> > > >> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > > >> wrote: > > >>> > > >>> Hi, > > >>> > > >>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > >>> wrote: > > > I wonder if we could have table_recheck_autovac do two probes of > > > the stats > > > data. First probe the existing stats data, and if it shows the > > > table to > > > be already vacuumed, return immediately. If not, *then* force a > > > stats > > > re-read, and check a second time. > > Does the above mean that the second and subsequent > > table_recheck_autovac() > > will be improved to first check using the previous refreshed > > statistics? > > I think that certainly works. > > > > If that's correct, I'll try to create a patch for the PoC > > >>> > > >>> I still don't know how to reproduce Jim's troubles, but I was able > > >>> to reproduce > > >>> what was probably a very similar problem. > > >>> > > >>> This problem seems to be more likely to occur in cases where you > > >>> have > > >>> a large number of tables, > > >>> i.e., a large amount of stats, and many small tables need VACUUM at > > >>> the same time. > > >>> > > >>> So I followed Tom's advice and created a patch for the PoC. > > >>> This patch will enable a flag in the table_recheck_autovac function > > >>> to use > > >>> the existing stats next time if VACUUM (or ANALYZE) has already > > >>> been done > > >>> by another worker on the check after the stats have been updated. > > >>> If the tables continue to require VACUUM after the refresh, then a > > >>> refresh > > >>> will be required instead of using the existing statistics. > > >>> > > >>> I did simple test with HEAD and HEAD + this PoC patch. > > >>> The tests were conducted in two cases. > > >>> (I changed few configurations. see attached scripts) > > >>> > > >>> 1. Normal VACUUM case > > >>> - SET autovacuum = off > > >>> - CREATE tables with 100 rows > > >>> - DELETE 90 rows for each tables > > >>> - SET autovacuum = on and restart PostgreSQL > > >>> - Measure the time it takes for all tables to be VACUUMed > > >>> > > >>> 2. Anti wrap round VACUUM case > > >>> - CREATE brank tables > > >>> - SELECT all of these tables (for generate stats) > > >>> - SET autovacuum_freeze_max_age to low values and restart > > >>> PostgreSQL > > >>> - Consumes a lot of XIDs by using txid_curent() > > >>> - Measure the time it takes for all tables to be VACUUMed > > >>> > > >>> For each test case, the following results were obtained by changing > > >>> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > > >>> Also changing num of tables to 1000, 5000, 1 and 2. > > >>> > > >>> Due to the poor VM environment (2 VCPU/4 GB), the results are a > > >>> little unstable, > > >>> but I think it's enough to ask for a trend. > > >>> > > >>> === > > >>> [1.Normal VACUUM case] > > >>>tables:1000 > > >>> autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 > > >>> sec > > >>> autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 > > >>> sec > > >>> autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 > > >>> sec > > >>> autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 > > >>> sec > > >>> autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 > > >>> sec > > >>> > > >>>tables:5000 > > >>> autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 > > >>> sec > > >>> autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 > > >>> sec > > >>> autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 > > >>> sec > > >>> autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 > > >>> sec > > >>> autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 > > >>> sec > > >>> > > >>>tables:1 >
Re: autovac issue with large number of tables
Hi, Thanks for you comments. On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao wrote: > > > > On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > > Hi, > > > > On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > > wrote: > >> > >> > >> > >> On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > >>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > >>> wrote: > > On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > wrote: > > > > Hi, > > > > On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > > wrote: > >> > >> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > >> wrote: > >>> > >>> Hi, > >>> > >>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > >>> wrote: > > I wonder if we could have table_recheck_autovac do two probes of > > the stats > > data. First probe the existing stats data, and if it shows the > > table to > > be already vacuumed, return immediately. If not, *then* force a > > stats > > re-read, and check a second time. > Does the above mean that the second and subsequent > table_recheck_autovac() > will be improved to first check using the previous refreshed > statistics? > I think that certainly works. > > If that's correct, I'll try to create a patch for the PoC > >>> > >>> I still don't know how to reproduce Jim's troubles, but I was able to > >>> reproduce > >>> what was probably a very similar problem. > >>> > >>> This problem seems to be more likely to occur in cases where you have > >>> a large number of tables, > >>> i.e., a large amount of stats, and many small tables need VACUUM at > >>> the same time. > >>> > >>> So I followed Tom's advice and created a patch for the PoC. > >>> This patch will enable a flag in the table_recheck_autovac function > >>> to use > >>> the existing stats next time if VACUUM (or ANALYZE) has already been > >>> done > >>> by another worker on the check after the stats have been updated. > >>> If the tables continue to require VACUUM after the refresh, then a > >>> refresh > >>> will be required instead of using the existing statistics. > >>> > >>> I did simple test with HEAD and HEAD + this PoC patch. > >>> The tests were conducted in two cases. > >>> (I changed few configurations. see attached scripts) > >>> > >>> 1. Normal VACUUM case > >>> - SET autovacuum = off > >>> - CREATE tables with 100 rows > >>> - DELETE 90 rows for each tables > >>> - SET autovacuum = on and restart PostgreSQL > >>> - Measure the time it takes for all tables to be VACUUMed > >>> > >>> 2. Anti wrap round VACUUM case > >>> - CREATE brank tables > >>> - SELECT all of these tables (for generate stats) > >>> - SET autovacuum_freeze_max_age to low values and restart > >>> PostgreSQL > >>> - Consumes a lot of XIDs by using txid_curent() > >>> - Measure the time it takes for all tables to be VACUUMed > >>> > >>> For each test case, the following results were obtained by changing > >>> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > >>> Also changing num of tables to 1000, 5000, 1 and 2. > >>> > >>> Due to the poor VM environment (2 VCPU/4 GB), the results are a > >>> little unstable, > >>> but I think it's enough to ask for a trend. > >>> > >>> === > >>> [1.Normal VACUUM case] > >>>tables:1000 > >>> autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > >>> autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > >>> autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > >>> autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > >>> autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > >>> > >>>tables:5000 > >>> autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > >>> autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > >>> autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > >>> autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > >>> autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > >>> > >>>tables:1 > >>> autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 > >>> sec > >>> autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 > >>> sec > >>> autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 > >>> sec > >>> autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 > >>> sec > >>> autovacuum_max_workers 10: (HEAD) 97 sec V
Re: autovac issue with large number of tables
On 2020/11/27 18:38, Kasahara Tatsuhito wrote: Hi, On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao wrote: On 2020/11/26 10:41, Kasahara Tatsuhito wrote: On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada wrote: On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito wrote: Hi, On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada wrote: On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito wrote: Hi, On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito wrote: I wonder if we could have table_recheck_autovac do two probes of the stats data. First probe the existing stats data, and if it shows the table to be already vacuumed, return immediately. If not, *then* force a stats re-read, and check a second time. Does the above mean that the second and subsequent table_recheck_autovac() will be improved to first check using the previous refreshed statistics? I think that certainly works. If that's correct, I'll try to create a patch for the PoC I still don't know how to reproduce Jim's troubles, but I was able to reproduce what was probably a very similar problem. This problem seems to be more likely to occur in cases where you have a large number of tables, i.e., a large amount of stats, and many small tables need VACUUM at the same time. So I followed Tom's advice and created a patch for the PoC. This patch will enable a flag in the table_recheck_autovac function to use the existing stats next time if VACUUM (or ANALYZE) has already been done by another worker on the check after the stats have been updated. If the tables continue to require VACUUM after the refresh, then a refresh will be required instead of using the existing statistics. I did simple test with HEAD and HEAD + this PoC patch. The tests were conducted in two cases. (I changed few configurations. see attached scripts) 1. Normal VACUUM case - SET autovacuum = off - CREATE tables with 100 rows - DELETE 90 rows for each tables - SET autovacuum = on and restart PostgreSQL - Measure the time it takes for all tables to be VACUUMed 2. Anti wrap round VACUUM case - CREATE brank tables - SELECT all of these tables (for generate stats) - SET autovacuum_freeze_max_age to low values and restart PostgreSQL - Consumes a lot of XIDs by using txid_curent() - Measure the time it takes for all tables to be VACUUMed For each test case, the following results were obtained by changing autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. Also changing num of tables to 1000, 5000, 1 and 2. Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable, but I think it's enough to ask for a trend. === [1.Normal VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec tables:5000 autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec tables:1 autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec tables:2 autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec [2.Anti wrap round VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec tables:5000 autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 2
Re: autovac issue with large number of tables
On Fri, Nov 27, 2020 at 5:22 PM Masahiko Sawada wrote: > > On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > wrote: > > > > > > > > On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > > > On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > > > wrote: > > >> > > >> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > > >> wrote: > > >>> > > >>> Hi, > > >>> > > >>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > > >>> wrote: > > > > On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > > wrote: > > > > > > Hi, > > > > > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > > wrote: > > >>> I wonder if we could have table_recheck_autovac do two probes of > > >>> the stats > > >>> data. First probe the existing stats data, and if it shows the > > >>> table to > > >>> be already vacuumed, return immediately. If not, *then* force a > > >>> stats > > >>> re-read, and check a second time. > > >> Does the above mean that the second and subsequent > > >> table_recheck_autovac() > > >> will be improved to first check using the previous refreshed > > >> statistics? > > >> I think that certainly works. > > >> > > >> If that's correct, I'll try to create a patch for the PoC > > > > > > I still don't know how to reproduce Jim's troubles, but I was able to > > > reproduce > > > what was probably a very similar problem. > > > > > > This problem seems to be more likely to occur in cases where you have > > > a large number of tables, > > > i.e., a large amount of stats, and many small tables need VACUUM at > > > the same time. > > > > > > So I followed Tom's advice and created a patch for the PoC. > > > This patch will enable a flag in the table_recheck_autovac function > > > to use > > > the existing stats next time if VACUUM (or ANALYZE) has already been > > > done > > > by another worker on the check after the stats have been updated. > > > If the tables continue to require VACUUM after the refresh, then a > > > refresh > > > will be required instead of using the existing statistics. > > > > > > I did simple test with HEAD and HEAD + this PoC patch. > > > The tests were conducted in two cases. > > > (I changed few configurations. see attached scripts) > > > > > > 1. Normal VACUUM case > > >- SET autovacuum = off > > >- CREATE tables with 100 rows > > >- DELETE 90 rows for each tables > > >- SET autovacuum = on and restart PostgreSQL > > >- Measure the time it takes for all tables to be VACUUMed > > > > > > 2. Anti wrap round VACUUM case > > >- CREATE brank tables > > >- SELECT all of these tables (for generate stats) > > >- SET autovacuum_freeze_max_age to low values and restart > > > PostgreSQL > > >- Consumes a lot of XIDs by using txid_curent() > > >- Measure the time it takes for all tables to be VACUUMed > > > > > > For each test case, the following results were obtained by changing > > > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > > > Also changing num of tables to 1000, 5000, 1 and 2. > > > > > > Due to the poor VM environment (2 VCPU/4 GB), the results are a > > > little unstable, > > > but I think it's enough to ask for a trend. > > > > > > === > > > [1.Normal VACUUM case] > > > tables:1000 > > >autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > > >autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > > >autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > > >autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > > >autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > > > > > > tables:5000 > > >autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > > >autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > > >autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > > >autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > > >autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > > > > > > tables:1 > > >autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec > > >autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec > > >autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec > > >autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec > > >autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec > > > > > > tables:2 > > >autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec > > >autovacuum_max_wor
Re: autovac issue with large number of tables
Hi, On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao wrote: > > > > On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > > On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > > wrote: > >> > >> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > >> wrote: > >>> > >>> Hi, > >>> > >>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > >>> wrote: > > On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > wrote: > > > > Hi, > > > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > wrote: > >>> I wonder if we could have table_recheck_autovac do two probes of the > >>> stats > >>> data. First probe the existing stats data, and if it shows the table > >>> to > >>> be already vacuumed, return immediately. If not, *then* force a stats > >>> re-read, and check a second time. > >> Does the above mean that the second and subsequent > >> table_recheck_autovac() > >> will be improved to first check using the previous refreshed > >> statistics? > >> I think that certainly works. > >> > >> If that's correct, I'll try to create a patch for the PoC > > > > I still don't know how to reproduce Jim's troubles, but I was able to > > reproduce > > what was probably a very similar problem. > > > > This problem seems to be more likely to occur in cases where you have > > a large number of tables, > > i.e., a large amount of stats, and many small tables need VACUUM at > > the same time. > > > > So I followed Tom's advice and created a patch for the PoC. > > This patch will enable a flag in the table_recheck_autovac function to > > use > > the existing stats next time if VACUUM (or ANALYZE) has already been > > done > > by another worker on the check after the stats have been updated. > > If the tables continue to require VACUUM after the refresh, then a > > refresh > > will be required instead of using the existing statistics. > > > > I did simple test with HEAD and HEAD + this PoC patch. > > The tests were conducted in two cases. > > (I changed few configurations. see attached scripts) > > > > 1. Normal VACUUM case > >- SET autovacuum = off > >- CREATE tables with 100 rows > >- DELETE 90 rows for each tables > >- SET autovacuum = on and restart PostgreSQL > >- Measure the time it takes for all tables to be VACUUMed > > > > 2. Anti wrap round VACUUM case > >- CREATE brank tables > >- SELECT all of these tables (for generate stats) > >- SET autovacuum_freeze_max_age to low values and restart PostgreSQL > >- Consumes a lot of XIDs by using txid_curent() > >- Measure the time it takes for all tables to be VACUUMed > > > > For each test case, the following results were obtained by changing > > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > > Also changing num of tables to 1000, 5000, 1 and 2. > > > > Due to the poor VM environment (2 VCPU/4 GB), the results are a little > > unstable, > > but I think it's enough to ask for a trend. > > > > === > > [1.Normal VACUUM case] > > tables:1000 > >autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > >autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > >autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > >autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > >autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > > > > tables:5000 > >autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > >autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > >autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > >autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > >autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > > > > tables:1 > >autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec > >autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec > >autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec > >autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec > >autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec > > > > tables:2 > >autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec > >autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec > >autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec > >autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec > >autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec > > >
Re: autovac issue with large number of tables
On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao wrote: > > > > On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > > On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > > wrote: > >> > >> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > >> wrote: > >>> > >>> Hi, > >>> > >>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > >>> wrote: > > On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > wrote: > > > > Hi, > > > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > wrote: > >>> I wonder if we could have table_recheck_autovac do two probes of the > >>> stats > >>> data. First probe the existing stats data, and if it shows the table > >>> to > >>> be already vacuumed, return immediately. If not, *then* force a stats > >>> re-read, and check a second time. > >> Does the above mean that the second and subsequent > >> table_recheck_autovac() > >> will be improved to first check using the previous refreshed > >> statistics? > >> I think that certainly works. > >> > >> If that's correct, I'll try to create a patch for the PoC > > > > I still don't know how to reproduce Jim's troubles, but I was able to > > reproduce > > what was probably a very similar problem. > > > > This problem seems to be more likely to occur in cases where you have > > a large number of tables, > > i.e., a large amount of stats, and many small tables need VACUUM at > > the same time. > > > > So I followed Tom's advice and created a patch for the PoC. > > This patch will enable a flag in the table_recheck_autovac function to > > use > > the existing stats next time if VACUUM (or ANALYZE) has already been > > done > > by another worker on the check after the stats have been updated. > > If the tables continue to require VACUUM after the refresh, then a > > refresh > > will be required instead of using the existing statistics. > > > > I did simple test with HEAD and HEAD + this PoC patch. > > The tests were conducted in two cases. > > (I changed few configurations. see attached scripts) > > > > 1. Normal VACUUM case > >- SET autovacuum = off > >- CREATE tables with 100 rows > >- DELETE 90 rows for each tables > >- SET autovacuum = on and restart PostgreSQL > >- Measure the time it takes for all tables to be VACUUMed > > > > 2. Anti wrap round VACUUM case > >- CREATE brank tables > >- SELECT all of these tables (for generate stats) > >- SET autovacuum_freeze_max_age to low values and restart PostgreSQL > >- Consumes a lot of XIDs by using txid_curent() > >- Measure the time it takes for all tables to be VACUUMed > > > > For each test case, the following results were obtained by changing > > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > > Also changing num of tables to 1000, 5000, 1 and 2. > > > > Due to the poor VM environment (2 VCPU/4 GB), the results are a little > > unstable, > > but I think it's enough to ask for a trend. > > > > === > > [1.Normal VACUUM case] > > tables:1000 > >autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > >autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > >autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > >autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > >autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > > > > tables:5000 > >autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > >autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > >autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > >autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > >autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > > > > tables:1 > >autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec > >autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec > >autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec > >autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec > >autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec > > > > tables:2 > >autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec > >autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec > >autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec > >autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec > >autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec > > >
Re: autovac issue with large number of tables
On 2020/11/26 10:41, Kasahara Tatsuhito wrote: On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada wrote: On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito wrote: Hi, On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada wrote: On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito wrote: Hi, On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito wrote: I wonder if we could have table_recheck_autovac do two probes of the stats data. First probe the existing stats data, and if it shows the table to be already vacuumed, return immediately. If not, *then* force a stats re-read, and check a second time. Does the above mean that the second and subsequent table_recheck_autovac() will be improved to first check using the previous refreshed statistics? I think that certainly works. If that's correct, I'll try to create a patch for the PoC I still don't know how to reproduce Jim's troubles, but I was able to reproduce what was probably a very similar problem. This problem seems to be more likely to occur in cases where you have a large number of tables, i.e., a large amount of stats, and many small tables need VACUUM at the same time. So I followed Tom's advice and created a patch for the PoC. This patch will enable a flag in the table_recheck_autovac function to use the existing stats next time if VACUUM (or ANALYZE) has already been done by another worker on the check after the stats have been updated. If the tables continue to require VACUUM after the refresh, then a refresh will be required instead of using the existing statistics. I did simple test with HEAD and HEAD + this PoC patch. The tests were conducted in two cases. (I changed few configurations. see attached scripts) 1. Normal VACUUM case - SET autovacuum = off - CREATE tables with 100 rows - DELETE 90 rows for each tables - SET autovacuum = on and restart PostgreSQL - Measure the time it takes for all tables to be VACUUMed 2. Anti wrap round VACUUM case - CREATE brank tables - SELECT all of these tables (for generate stats) - SET autovacuum_freeze_max_age to low values and restart PostgreSQL - Consumes a lot of XIDs by using txid_curent() - Measure the time it takes for all tables to be VACUUMed For each test case, the following results were obtained by changing autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. Also changing num of tables to 1000, 5000, 1 and 2. Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable, but I think it's enough to ask for a trend. === [1.Normal VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec tables:5000 autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec tables:1 autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec tables:2 autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec [2.Anti wrap round VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec tables:5000 autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec tables:1 autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch)
Re: autovac issue with large number of tables
On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada wrote: > > On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > wrote: > > > > Hi, > > > > On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > > wrote: > > > > > > On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > > > wrote: > > > > > > > > Hi, > > > > > > > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > > > wrote: > > > > > > I wonder if we could have table_recheck_autovac do two probes of > > > > > > the stats > > > > > > data. First probe the existing stats data, and if it shows the > > > > > > table to > > > > > > be already vacuumed, return immediately. If not, *then* force a > > > > > > stats > > > > > > re-read, and check a second time. > > > > > Does the above mean that the second and subsequent > > > > > table_recheck_autovac() > > > > > will be improved to first check using the previous refreshed > > > > > statistics? > > > > > I think that certainly works. > > > > > > > > > > If that's correct, I'll try to create a patch for the PoC > > > > > > > > I still don't know how to reproduce Jim's troubles, but I was able to > > > > reproduce > > > > what was probably a very similar problem. > > > > > > > > This problem seems to be more likely to occur in cases where you have > > > > a large number of tables, > > > > i.e., a large amount of stats, and many small tables need VACUUM at > > > > the same time. > > > > > > > > So I followed Tom's advice and created a patch for the PoC. > > > > This patch will enable a flag in the table_recheck_autovac function to > > > > use > > > > the existing stats next time if VACUUM (or ANALYZE) has already been > > > > done > > > > by another worker on the check after the stats have been updated. > > > > If the tables continue to require VACUUM after the refresh, then a > > > > refresh > > > > will be required instead of using the existing statistics. > > > > > > > > I did simple test with HEAD and HEAD + this PoC patch. > > > > The tests were conducted in two cases. > > > > (I changed few configurations. see attached scripts) > > > > > > > > 1. Normal VACUUM case > > > > - SET autovacuum = off > > > > - CREATE tables with 100 rows > > > > - DELETE 90 rows for each tables > > > > - SET autovacuum = on and restart PostgreSQL > > > > - Measure the time it takes for all tables to be VACUUMed > > > > > > > > 2. Anti wrap round VACUUM case > > > > - CREATE brank tables > > > > - SELECT all of these tables (for generate stats) > > > > - SET autovacuum_freeze_max_age to low values and restart PostgreSQL > > > > - Consumes a lot of XIDs by using txid_curent() > > > > - Measure the time it takes for all tables to be VACUUMed > > > > > > > > For each test case, the following results were obtained by changing > > > > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > > > > Also changing num of tables to 1000, 5000, 1 and 2. > > > > > > > > Due to the poor VM environment (2 VCPU/4 GB), the results are a little > > > > unstable, > > > > but I think it's enough to ask for a trend. > > > > > > > > === > > > > [1.Normal VACUUM case] > > > > tables:1000 > > > > autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > > > > autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > > > > autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > > > > autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > > > > autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > > > > > > > > tables:5000 > > > > autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > > > > autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > > > > autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > > > > autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > > > > autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > > > > > > > > tables:1 > > > > autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec > > > > autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec > > > > autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec > > > > autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec > > > > autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec > > > > > > > > tables:2 > > > > autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec > > > > autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec > > > > autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec > > > > autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec > > > > autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec > > > > > > > > [2.Anti wrap round VACUUM case] > > > > tables:1000 > > > > autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec > > > >
Re: autovac issue with large number of tables
On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito wrote: > > Hi, > > On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada wrote: > > > > On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > > wrote: > > > > > > Hi, > > > > > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > > wrote: > > > > > I wonder if we could have table_recheck_autovac do two probes of the > > > > > stats > > > > > data. First probe the existing stats data, and if it shows the table > > > > > to > > > > > be already vacuumed, return immediately. If not, *then* force a stats > > > > > re-read, and check a second time. > > > > Does the above mean that the second and subsequent > > > > table_recheck_autovac() > > > > will be improved to first check using the previous refreshed statistics? > > > > I think that certainly works. > > > > > > > > If that's correct, I'll try to create a patch for the PoC > > > > > > I still don't know how to reproduce Jim's troubles, but I was able to > > > reproduce > > > what was probably a very similar problem. > > > > > > This problem seems to be more likely to occur in cases where you have > > > a large number of tables, > > > i.e., a large amount of stats, and many small tables need VACUUM at > > > the same time. > > > > > > So I followed Tom's advice and created a patch for the PoC. > > > This patch will enable a flag in the table_recheck_autovac function to use > > > the existing stats next time if VACUUM (or ANALYZE) has already been done > > > by another worker on the check after the stats have been updated. > > > If the tables continue to require VACUUM after the refresh, then a refresh > > > will be required instead of using the existing statistics. > > > > > > I did simple test with HEAD and HEAD + this PoC patch. > > > The tests were conducted in two cases. > > > (I changed few configurations. see attached scripts) > > > > > > 1. Normal VACUUM case > > > - SET autovacuum = off > > > - CREATE tables with 100 rows > > > - DELETE 90 rows for each tables > > > - SET autovacuum = on and restart PostgreSQL > > > - Measure the time it takes for all tables to be VACUUMed > > > > > > 2. Anti wrap round VACUUM case > > > - CREATE brank tables > > > - SELECT all of these tables (for generate stats) > > > - SET autovacuum_freeze_max_age to low values and restart PostgreSQL > > > - Consumes a lot of XIDs by using txid_curent() > > > - Measure the time it takes for all tables to be VACUUMed > > > > > > For each test case, the following results were obtained by changing > > > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > > > Also changing num of tables to 1000, 5000, 1 and 2. > > > > > > Due to the poor VM environment (2 VCPU/4 GB), the results are a little > > > unstable, > > > but I think it's enough to ask for a trend. > > > > > > === > > > [1.Normal VACUUM case] > > > tables:1000 > > > autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > > > autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > > > autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > > > autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > > > autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > > > > > > tables:5000 > > > autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > > > autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > > > autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > > > autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > > > autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > > > > > > tables:1 > > > autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec > > > autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec > > > autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec > > > autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec > > > autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec > > > > > > tables:2 > > > autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec > > > autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec > > > autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec > > > autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec > > > autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec > > > > > > [2.Anti wrap round VACUUM case] > > > tables:1000 > > > autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec > > > autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec > > > autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec > > > autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec > > > autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec > > > > > > tables:5000 > > > aut
Re: autovac issue with large number of tables
Hi, On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada wrote: > > On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > wrote: > > > > Hi, > > > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > wrote: > > > > I wonder if we could have table_recheck_autovac do two probes of the > > > > stats > > > > data. First probe the existing stats data, and if it shows the table to > > > > be already vacuumed, return immediately. If not, *then* force a stats > > > > re-read, and check a second time. > > > Does the above mean that the second and subsequent table_recheck_autovac() > > > will be improved to first check using the previous refreshed statistics? > > > I think that certainly works. > > > > > > If that's correct, I'll try to create a patch for the PoC > > > > I still don't know how to reproduce Jim's troubles, but I was able to > > reproduce > > what was probably a very similar problem. > > > > This problem seems to be more likely to occur in cases where you have > > a large number of tables, > > i.e., a large amount of stats, and many small tables need VACUUM at > > the same time. > > > > So I followed Tom's advice and created a patch for the PoC. > > This patch will enable a flag in the table_recheck_autovac function to use > > the existing stats next time if VACUUM (or ANALYZE) has already been done > > by another worker on the check after the stats have been updated. > > If the tables continue to require VACUUM after the refresh, then a refresh > > will be required instead of using the existing statistics. > > > > I did simple test with HEAD and HEAD + this PoC patch. > > The tests were conducted in two cases. > > (I changed few configurations. see attached scripts) > > > > 1. Normal VACUUM case > > - SET autovacuum = off > > - CREATE tables with 100 rows > > - DELETE 90 rows for each tables > > - SET autovacuum = on and restart PostgreSQL > > - Measure the time it takes for all tables to be VACUUMed > > > > 2. Anti wrap round VACUUM case > > - CREATE brank tables > > - SELECT all of these tables (for generate stats) > > - SET autovacuum_freeze_max_age to low values and restart PostgreSQL > > - Consumes a lot of XIDs by using txid_curent() > > - Measure the time it takes for all tables to be VACUUMed > > > > For each test case, the following results were obtained by changing > > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > > Also changing num of tables to 1000, 5000, 1 and 2. > > > > Due to the poor VM environment (2 VCPU/4 GB), the results are a little > > unstable, > > but I think it's enough to ask for a trend. > > > > === > > [1.Normal VACUUM case] > > tables:1000 > > autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > > autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > > autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > > autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > > autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > > > > tables:5000 > > autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > > autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > > autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > > autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > > autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > > > > tables:1 > > autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec > > autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec > > autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec > > autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec > > autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec > > > > tables:2 > > autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec > > autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec > > autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec > > autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec > > autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec > > > > [2.Anti wrap round VACUUM case] > > tables:1000 > > autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec > > autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec > > autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec > > autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec > > autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec > > > > tables:5000 > > autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec > > autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec > > autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec > > autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec > > autovac
Re: autovac issue with large number of tables
On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito wrote: > > Hi, > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > wrote: > > > I wonder if we could have table_recheck_autovac do two probes of the stats > > > data. First probe the existing stats data, and if it shows the table to > > > be already vacuumed, return immediately. If not, *then* force a stats > > > re-read, and check a second time. > > Does the above mean that the second and subsequent table_recheck_autovac() > > will be improved to first check using the previous refreshed statistics? > > I think that certainly works. > > > > If that's correct, I'll try to create a patch for the PoC > > I still don't know how to reproduce Jim's troubles, but I was able to > reproduce > what was probably a very similar problem. > > This problem seems to be more likely to occur in cases where you have > a large number of tables, > i.e., a large amount of stats, and many small tables need VACUUM at > the same time. > > So I followed Tom's advice and created a patch for the PoC. > This patch will enable a flag in the table_recheck_autovac function to use > the existing stats next time if VACUUM (or ANALYZE) has already been done > by another worker on the check after the stats have been updated. > If the tables continue to require VACUUM after the refresh, then a refresh > will be required instead of using the existing statistics. > > I did simple test with HEAD and HEAD + this PoC patch. > The tests were conducted in two cases. > (I changed few configurations. see attached scripts) > > 1. Normal VACUUM case > - SET autovacuum = off > - CREATE tables with 100 rows > - DELETE 90 rows for each tables > - SET autovacuum = on and restart PostgreSQL > - Measure the time it takes for all tables to be VACUUMed > > 2. Anti wrap round VACUUM case > - CREATE brank tables > - SELECT all of these tables (for generate stats) > - SET autovacuum_freeze_max_age to low values and restart PostgreSQL > - Consumes a lot of XIDs by using txid_curent() > - Measure the time it takes for all tables to be VACUUMed > > For each test case, the following results were obtained by changing > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > Also changing num of tables to 1000, 5000, 1 and 2. > > Due to the poor VM environment (2 VCPU/4 GB), the results are a little > unstable, > but I think it's enough to ask for a trend. > > === > [1.Normal VACUUM case] > tables:1000 > autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > > tables:5000 > autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > > tables:1 > autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec > autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec > autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec > autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec > autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec > > tables:2 > autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec > autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec > autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec > autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec > autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec > > [2.Anti wrap round VACUUM case] > tables:1000 > autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec > autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec > autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec > autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec > autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec > > tables:5000 > autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec > autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec > autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec > autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec > autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec > > tables:1 > autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec > autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec > autovacuum_max_workers 3: (HEAD) 120 sec VS (with patc
Re: autovac issue with large number of tables
Therefore, we expect this patch [1] to be committed for its original purpose, as well as to improve autovacuum from v14 onwards.Hi, On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito wrote: > > Hi, > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > wrote: > > > I wonder if we could have table_recheck_autovac do two probes of the stats > > > data. First probe the existing stats data, and if it shows the table to > > > be already vacuumed, return immediately. If not, *then* force a stats > > > re-read, and check a second time. > > Does the above mean that the second and subsequent table_recheck_autovac() > > will be improved to first check using the previous refreshed statistics? > > I think that certainly works. > > > > If that's correct, I'll try to create a patch for the PoC > > I still don't know how to reproduce Jim's troubles, but I was able to > reproduce > what was probably a very similar problem. > > This problem seems to be more likely to occur in cases where you have > a large number of tables, > i.e., a large amount of stats, and many small tables need VACUUM at > the same time. > > So I followed Tom's advice and created a patch for the PoC. > This patch will enable a flag in the table_recheck_autovac function to use > the existing stats next time if VACUUM (or ANALYZE) has already been done > by another worker on the check after the stats have been updated. > If the tables continue to require VACUUM after the refresh, then a refresh > will be required instead of using the existing statistics. > > I did simple test with HEAD and HEAD + this PoC patch. > The tests were conducted in two cases. > (I changed few configurations. see attached scripts) > > 1. Normal VACUUM case > - SET autovacuum = off > - CREATE tables with 100 rows > - DELETE 90 rows for each tables > - SET autovacuum = on and restart PostgreSQL > - Measure the time it takes for all tables to be VACUUMed > > 2. Anti wrap round VACUUM case > - CREATE brank tables > - SELECT all of these tables (for generate stats) > - SET autovacuum_freeze_max_age to low values and restart PostgreSQL > - Consumes a lot of XIDs by using txid_curent() > - Measure the time it takes for all tables to be VACUUMed > > For each test case, the following results were obtained by changing > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > Also changing num of tables to 1000, 5000, 1 and 2. > > Due to the poor VM environment (2 VCPU/4 GB), the results are a little > unstable, > but I think it's enough to ask for a trend. > > === > [1.Normal VACUUM case] > tables:1000 > autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > > tables:5000 > autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > > tables:1 > autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec > autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec > autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec > autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec > autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec > > tables:2 > autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec > autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec > autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec > autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec > autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec > > [2.Anti wrap round VACUUM case] > tables:1000 > autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec > autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec > autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec > autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec > autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec > > tables:5000 > autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec > autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec > autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec > autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec > autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec > > tables:1 > autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 se
Re: autovac issue with large number of tables
Hi, On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito wrote: > > I wonder if we could have table_recheck_autovac do two probes of the stats > > data. First probe the existing stats data, and if it shows the table to > > be already vacuumed, return immediately. If not, *then* force a stats > > re-read, and check a second time. > Does the above mean that the second and subsequent table_recheck_autovac() > will be improved to first check using the previous refreshed statistics? > I think that certainly works. > > If that's correct, I'll try to create a patch for the PoC I still don't know how to reproduce Jim's troubles, but I was able to reproduce what was probably a very similar problem. This problem seems to be more likely to occur in cases where you have a large number of tables, i.e., a large amount of stats, and many small tables need VACUUM at the same time. So I followed Tom's advice and created a patch for the PoC. This patch will enable a flag in the table_recheck_autovac function to use the existing stats next time if VACUUM (or ANALYZE) has already been done by another worker on the check after the stats have been updated. If the tables continue to require VACUUM after the refresh, then a refresh will be required instead of using the existing statistics. I did simple test with HEAD and HEAD + this PoC patch. The tests were conducted in two cases. (I changed few configurations. see attached scripts) 1. Normal VACUUM case - SET autovacuum = off - CREATE tables with 100 rows - DELETE 90 rows for each tables - SET autovacuum = on and restart PostgreSQL - Measure the time it takes for all tables to be VACUUMed 2. Anti wrap round VACUUM case - CREATE brank tables - SELECT all of these tables (for generate stats) - SET autovacuum_freeze_max_age to low values and restart PostgreSQL - Consumes a lot of XIDs by using txid_curent() - Measure the time it takes for all tables to be VACUUMed For each test case, the following results were obtained by changing autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. Also changing num of tables to 1000, 5000, 1 and 2. Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable, but I think it's enough to ask for a trend. === [1.Normal VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec tables:5000 autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec tables:1 autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec tables:2 autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec [2.Anti wrap round VACUUM case] tables:1000 autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec tables:5000 autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec tables:1 autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec tables:2 autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec autovacuum_max_worker
Re: autovac issue with large number of tables
Hi, On Wed, Aug 12, 2020 at 2:46 AM Tom Lane wrote: > So I think Kasahara-san's point is that the shared memory stats collector > might wipe out those costs, depending on how it's implemented. (I've not > looked at that patch in a long time either, so I don't know how much it'd > cut the reader-side costs. But maybe it'd be substantial.) Thanks for your clarification, that's what I wanted to say. Sorry for the lack of explanation. > I think the real issue here is autovac_refresh_stats's insistence that it > shouldn't throttle pgstats re-reads in workers. I agree that. > I wonder if we could have table_recheck_autovac do two probes of the stats > data. First probe the existing stats data, and if it shows the table to > be already vacuumed, return immediately. If not, *then* force a stats > re-read, and check a second time. Does the above mean that the second and subsequent table_recheck_autovac() will be improved to first check using the previous refreshed statistics? I think that certainly works. If that's correct, I'll try to create a patch for the PoC. Best regards, -- Tatsuhito Kasahara kasahara.tatsuhito _at_ gmail.com
Re: autovac issue with large number of tables
Jim Nasby writes: > Without reading the 100+ emails or the 260k patch, I'm guessing that it > won't help because the problem I observed was spending most of it's time in > 42.62% postgres [.] hash_search_with_hash_value > I don't see how moving things to shared memory would help that at all. So I'm a bit mystified as to why that would show up as the primary cost. It looks to me like we force a re-read of the pgstats data each time through table_recheck_autovac(), and it seems like the costs associated with that would swamp everything else in the case you're worried about. I suspect that the bulk of the hash_search_with_hash_value costs are HASH_ENTER calls caused by repopulating the pgstats hash table, rather than the single read probe that table_recheck_autovac itself will do. It's still surprising that that would dominate the other costs of reading the data, but maybe those costs just aren't as well localized in the code. So I think Kasahara-san's point is that the shared memory stats collector might wipe out those costs, depending on how it's implemented. (I've not looked at that patch in a long time either, so I don't know how much it'd cut the reader-side costs. But maybe it'd be substantial.) In the meantime, though, do we want to do something else to alleviate the issue? I realize you only described your patch as a PoC, but I can't say I like it much: * Giving up after we've wasted 1000 pgstats re-reads seems like locking the barn door only after the horse is well across the state line. * I'm not convinced that the business with skipping N entries at a time buys anything. You'd have to make pretty strong assumptions about the workers all processing tables at about the same rate to believe it will help. In the worst case, it might lead to all the workers ignoring the same table(s). I think the real issue here is autovac_refresh_stats's insistence that it shouldn't throttle pgstats re-reads in workers. I see the point about not wanting to repeat vacuum work on the basis of stale data, but still ... I wonder if we could have table_recheck_autovac do two probes of the stats data. First probe the existing stats data, and if it shows the table to be already vacuumed, return immediately. If not, *then* force a stats re-read, and check a second time. BTW, can you provide a test script that reproduces the problem you're looking at? The rest of us are kind of guessing at what's happening. regards, tom lane
Re: autovac issue with large number of tables
Hi, On Tue, Jul 28, 2020 at 3:49 AM Jim Nasby wrote: > I'm in favor of trying to improve scheduling (especially allowing users > to control how things are scheduled), but that's a far more invasive > patch. I'd like to get something like this patch in without waiting on a > significantly larger effort. BTW, Have you tried the patch suggested in the thread below? https://www.postgresql.org/message-id/20180629.173418.190173462.horiguchi.kyotaro%40lab.ntt.co.jp The above is a suggestion to manage statistics on shared memory rather than in a file, but I think this feature may mitigate your problem. I think that feature has yet another performance challenge, but it might be worth a try. The above patch will also require a great deal of effort to get into the PostgreSQL-core, but I'm curious to see how well it works for this problem. Best regards, -- Tatsuhito Kasahara kasahara.tatsuhito _at_ gmail.com
Re: autovac issue with large number of tables
On Mon, 27 Jul 2020 at 06:43, Nasby, Jim wrote: > > A database with a very large number of tables eligible for autovacuum can > result in autovacuum workers “stuck” in a tight loop of > table_recheck_autovac() constantly reporting nothing to do on the table. This > is because a database with a very large number of tables means it takes a > while to search the statistics hash to verify that the table still needs to > be processed[1]. If a worker spends some time processing a table, when it’s > done it can spend a significant amount of time rechecking each table that it > identified at launch (I’ve seen a worker in this state for over an hour). A > simple work-around in this scenario is to kill the worker; the launcher will > quickly fire up a new worker on the same database, and that worker will build > a new list of tables. > > > > That’s not a complete solution though… if the database contains a large > number of very small tables you can end up in a state where 1 or 2 workers is > busy chugging through those small tables so quickly than any additional > workers spend all their time in table_recheck_autovac(), because that takes > long enough that the additional workers are never able to “leapfrog” the > workers that are doing useful work. > As another solution, I've been considering adding a queue having table OIDs that need to vacuumed/analyzed on the shared memory (i.g. on DSA). Since all autovacuum workers running on the same database can see a consistent queue, the issue explained above won't happen and probably it makes the implementation of prioritization of tables being vacuumed easier which is sometimes discussed on pgsql-hackers. I guess it might be worth to discuss including this idea. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services