Re: [HACKERS] Block level parallel vacuum WIP
On Mon, Oct 23, 2017 at 10:43 AM, Amit Langotewrote: > On 2017/10/22 5:25, Thomas Munro wrote: >> On Sun, Oct 22, 2017 at 5:09 AM, Robert Haas wrote: >>> On Tue, Sep 19, 2017 at 3:31 AM, Masahiko Sawada >>> wrote: > Down at the bottom of the build log in the regression diffs file you can > see: > > ! ERROR: cache lookup failed for relation 32893 > > https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907 Thank you for letting me know. Hmm, it's an interesting failure. I'll investigate it and post the new patch. >>> >>> Did you ever find out what the cause of this problem was? >> >> I wonder if it might have been the same issue that commit >> 19de0ab23ccba12567c18640f00b49f01471018d fixed a week or so later. > > Hmm, 19de0ab23ccba seems to prevent the "cache lookup failed for relation > XXX" error in a different code path though (the code path handling manual > vacuum). Not sure if the commit could have prevented that error being > emitted by DROP SCHEMA ... CASCADE, which seems to be what produced it in > this case. Maybe I'm missing something though. > Yeah, I was thinking the commit is relevant with this issue but as Amit mentioned this error is emitted by DROP SCHEMA CASCASE. I don't find out the cause of this issue yet. With the previous version patch, autovacuum workers were woking with one parallel worker but it never drops relations. So it's possible that the error might not have been relevant with the patch but anywayI'll continue to work on that. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On 2017/10/22 5:25, Thomas Munro wrote: > On Sun, Oct 22, 2017 at 5:09 AM, Robert Haaswrote: >> On Tue, Sep 19, 2017 at 3:31 AM, Masahiko Sawada >> wrote: Down at the bottom of the build log in the regression diffs file you can see: ! ERROR: cache lookup failed for relation 32893 https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907 >>> >>> Thank you for letting me know. >>> >>> Hmm, it's an interesting failure. I'll investigate it and post the new >>> patch. >> >> Did you ever find out what the cause of this problem was? > > I wonder if it might have been the same issue that commit > 19de0ab23ccba12567c18640f00b49f01471018d fixed a week or so later. Hmm, 19de0ab23ccba seems to prevent the "cache lookup failed for relation XXX" error in a different code path though (the code path handling manual vacuum). Not sure if the commit could have prevented that error being emitted by DROP SCHEMA ... CASCADE, which seems to be what produced it in this case. Maybe I'm missing something though. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Sun, Oct 22, 2017 at 5:09 AM, Robert Haaswrote: > On Tue, Sep 19, 2017 at 3:31 AM, Masahiko Sawada > wrote: >>> Down at the bottom of the build log in the regression diffs file you can >>> see: >>> >>> ! ERROR: cache lookup failed for relation 32893 >>> >>> https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907 >> >> Thank you for letting me know. >> >> Hmm, it's an interesting failure. I'll investigate it and post the new patch. > > Did you ever find out what the cause of this problem was? I wonder if it might have been the same issue that commit 19de0ab23ccba12567c18640f00b49f01471018d fixed a week or so later. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Sep 19, 2017 at 3:31 AM, Masahiko Sawadawrote: >> Down at the bottom of the build log in the regression diffs file you can see: >> >> ! ERROR: cache lookup failed for relation 32893 >> >> https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907 > > Thank you for letting me know. > > Hmm, it's an interesting failure. I'll investigate it and post the new patch. Did you ever find out what the cause of this problem was? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Sep 19, 2017 at 4:31 PM, Masahiko Sawadawrote: > On Tue, Sep 19, 2017 at 3:33 PM, Thomas Munro > wrote: >> On Fri, Sep 8, 2017 at 10:37 PM, Masahiko Sawada >> wrote: >>> Since v4 patch conflicts with current HEAD I attached the latest version >>> patch. >> >> Hi Sawada-san, >> >> Here is an interesting failure with this patch: >> >> test rowsecurity ... FAILED >> test rules... FAILED >> >> Down at the bottom of the build log in the regression diffs file you can see: >> >> ! ERROR: cache lookup failed for relation 32893 >> >> https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907 >> > > Thank you for letting me know. > > Hmm, it's an interesting failure. I'll investigate it and post the new patch. > Since the patch conflicts with current HEAD, I've rebased the patch and fixed a bug. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center parallel_vacuum_v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Sep 19, 2017 at 3:33 PM, Thomas Munrowrote: > On Fri, Sep 8, 2017 at 10:37 PM, Masahiko Sawada > wrote: >> Since v4 patch conflicts with current HEAD I attached the latest version >> patch. > > Hi Sawada-san, > > Here is an interesting failure with this patch: > > test rowsecurity ... FAILED > test rules... FAILED > > Down at the bottom of the build log in the regression diffs file you can see: > > ! ERROR: cache lookup failed for relation 32893 > > https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907 > Thank you for letting me know. Hmm, it's an interesting failure. I'll investigate it and post the new patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Fri, Sep 8, 2017 at 10:37 PM, Masahiko Sawadawrote: > Since v4 patch conflicts with current HEAD I attached the latest version > patch. Hi Sawada-san, Here is an interesting failure with this patch: test rowsecurity ... FAILED test rules... FAILED Down at the bottom of the build log in the regression diffs file you can see: ! ERROR: cache lookup failed for relation 32893 https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907 -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Aug 15, 2017 at 10:13 AM, Masahiko Sawadawrote: > On Wed, Jul 26, 2017 at 5:38 PM, Masahiko Sawada > wrote: >> On Sun, Mar 5, 2017 at 4:09 PM, Masahiko Sawada >> wrote: >>> On Sun, Mar 5, 2017 at 12:14 PM, David Steele wrote: On 3/4/17 9:08 PM, Masahiko Sawada wrote: > On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas wrote: >> On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada >> wrote: >>> Yes, it's taking a time to update logic and measurement but it's >>> coming along. Also I'm working on changing deadlock detection. Will >>> post new patch and measurement result. >> >> I think that we should push this patch out to v11. I think there are >> too many issues here to address in the limited time we have remaining >> this cycle, and I believe that if we try to get them all solved in the >> next few weeks we're likely to end up getting backed into some choices >> by time pressure that we may later regret bitterly. Since I created >> the deadlock issues that this patch is facing, I'm willing to try to >> help solve them, but I think it's going to require considerable and >> delicate surgery, and I don't think doing that under time pressure is >> a good idea. >> >> From a fairness point of view, a patch that's not in reviewable shape >> on March 1st should really be pushed out, and we're several days past >> that. >> > > Agreed. There are surely some rooms to discuss about the design yet, > and it will take long time. it's good to push this out to CF2017-07. > Thank you for the comment. I have marked this patch "Returned with Feedback." Of course you are welcome to submit this patch to the 2017-07 CF, or whenever you feel it is ready. >>> >>> Thank you! >>> >> >> I re-considered the basic design of parallel lazy vacuum. I didn't >> change the basic concept of this feature and usage, the lazy vacuum >> still executes with some parallel workers. In current design, dead >> tuple TIDs are shared with all vacuum workers including leader process >> when table has index. If we share dead tuple TIDs, we have to make two >> synchronization points: before starting vacuum and before clearing >> dead tuple TIDs. Before starting vacuum we have to make sure that the >> dead tuple TIDs are not added no more. And before clearing dead tuple >> TIDs we have to make sure that it's used no more. >> >> For index vacuum, each indexes is assigned to a vacuum workers based >> on ParallelWorkerNumber. For example, if a table has 5 indexes and >> vacuum with 2 workers, the leader process and one vacuum worker are >> assigned to 2 indexes, and another vacuum process is assigned the >> remaining one. The following steps are how the parallel vacuum >> processes if table has indexes. >> >> 1. The leader process and workers scan the table in parallel using >> ParallelHeapScanDesc, and collect dead tuple TIDs to shared memory. >> 2. Before vacuum on table, the leader process sort the dead tuple TIDs >> in physical order once all workers completes to scan the table. >> 3. In vacuum on table, the leader process and workers reclaim garbage >> on table in block-level parallel. >> 4. In vacuum on indexes, the indexes on table is assigned to >> particular parallel worker or leader process. The process assigned to >> a index vacuums on the index. >> 5. Before back to scanning the table, the leader process clears the >> dead tuple TIDs once all workers completes to vacuum on table and >> indexes. >> >> Attached the latest patch but it's still PoC version patch and >> contains some debug codes. Note that this patch still requires another >> patch which moves the relation extension lock out of heavy-weight >> lock[1]. The parallel lazy vacuum patch could work even without [1] >> patch but could fail during vacuum in some cases. >> >> Also, I attached the result of performance evaluation. The table size >> is approximately 300MB ( > shared_buffers) and I deleted tuples on >> every blocks before execute vacuum so that vacuum visits every blocks. >> The server spec is >> * Intel Xeon E5620 @ 2.4Ghz (8cores) >> * 32GB RAM >> * ioDrive >> >> According to the result of table with indexes, performance of lazy >> vacuum improved up to a point where the number of indexes and parallel >> degree are the same. If a table has 16 indexes and vacuum with 16 >> workers, parallel vacuum is 10x faster than single process execution. >> Also according to the result of table with no indexes, the parallel >> vacuum is 5x faster than single process execution at 8 parallel >> degree. Of course we can vacuum only for indexes >> >> I'm planning to work on that in PG11, will register it to next CF. >> Comment and feedback are very welcome. >> > > Since the previous patch conflicts with current HEAD I
Re: [HACKERS] Block level parallel vacuum WIP
On Sun, Mar 5, 2017 at 12:14 PM, David Steelewrote: > On 3/4/17 9:08 PM, Masahiko Sawada wrote: >> On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas wrote: >>> On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada >>> wrote: Yes, it's taking a time to update logic and measurement but it's coming along. Also I'm working on changing deadlock detection. Will post new patch and measurement result. >>> >>> I think that we should push this patch out to v11. I think there are >>> too many issues here to address in the limited time we have remaining >>> this cycle, and I believe that if we try to get them all solved in the >>> next few weeks we're likely to end up getting backed into some choices >>> by time pressure that we may later regret bitterly. Since I created >>> the deadlock issues that this patch is facing, I'm willing to try to >>> help solve them, but I think it's going to require considerable and >>> delicate surgery, and I don't think doing that under time pressure is >>> a good idea. >>> >>> From a fairness point of view, a patch that's not in reviewable shape >>> on March 1st should really be pushed out, and we're several days past >>> that. >>> >> >> Agreed. There are surely some rooms to discuss about the design yet, >> and it will take long time. it's good to push this out to CF2017-07. >> Thank you for the comment. > > I have marked this patch "Returned with Feedback." Of course you are > welcome to submit this patch to the 2017-07 CF, or whenever you feel it > is ready. Thank you! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On 3/4/17 9:08 PM, Masahiko Sawada wrote: > On Sat, Mar 4, 2017 at 5:47 PM, Robert Haaswrote: >> On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada >> wrote: >>> Yes, it's taking a time to update logic and measurement but it's >>> coming along. Also I'm working on changing deadlock detection. Will >>> post new patch and measurement result. >> >> I think that we should push this patch out to v11. I think there are >> too many issues here to address in the limited time we have remaining >> this cycle, and I believe that if we try to get them all solved in the >> next few weeks we're likely to end up getting backed into some choices >> by time pressure that we may later regret bitterly. Since I created >> the deadlock issues that this patch is facing, I'm willing to try to >> help solve them, but I think it's going to require considerable and >> delicate surgery, and I don't think doing that under time pressure is >> a good idea. >> >> From a fairness point of view, a patch that's not in reviewable shape >> on March 1st should really be pushed out, and we're several days past >> that. >> > > Agreed. There are surely some rooms to discuss about the design yet, > and it will take long time. it's good to push this out to CF2017-07. > Thank you for the comment. I have marked this patch "Returned with Feedback." Of course you are welcome to submit this patch to the 2017-07 CF, or whenever you feel it is ready. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Sat, Mar 4, 2017 at 5:47 PM, Robert Haaswrote: > On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada wrote: >> Yes, it's taking a time to update logic and measurement but it's >> coming along. Also I'm working on changing deadlock detection. Will >> post new patch and measurement result. > > I think that we should push this patch out to v11. I think there are > too many issues here to address in the limited time we have remaining > this cycle, and I believe that if we try to get them all solved in the > next few weeks we're likely to end up getting backed into some choices > by time pressure that we may later regret bitterly. Since I created > the deadlock issues that this patch is facing, I'm willing to try to > help solve them, but I think it's going to require considerable and > delicate surgery, and I don't think doing that under time pressure is > a good idea. > > From a fairness point of view, a patch that's not in reviewable shape > on March 1st should really be pushed out, and we're several days past > that. > Agreed. There are surely some rooms to discuss about the design yet, and it will take long time. it's good to push this out to CF2017-07. Thank you for the comment. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawadawrote: > Yes, it's taking a time to update logic and measurement but it's > coming along. Also I'm working on changing deadlock detection. Will > post new patch and measurement result. I think that we should push this patch out to v11. I think there are too many issues here to address in the limited time we have remaining this cycle, and I believe that if we try to get them all solved in the next few weeks we're likely to end up getting backed into some choices by time pressure that we may later regret bitterly. Since I created the deadlock issues that this patch is facing, I'm willing to try to help solve them, but I think it's going to require considerable and delicate surgery, and I don't think doing that under time pressure is a good idea. >From a fairness point of view, a patch that's not in reviewable shape on March 1st should really be pushed out, and we're several days past that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Fri, Mar 3, 2017 at 11:01 PM, David Steelewrote: > On 1/10/17 11:23 AM, Claudio Freire wrote: >> On Tue, Jan 10, 2017 at 6:42 AM, Masahiko Sawada >> wrote: Does this work negate the other work to allow VACUUM to use > 1GB memory? >>> >>> Partly yes. Because memory space for dead TIDs needs to be allocated >>> in DSM before vacuum worker launches, parallel lazy vacuum cannot use >>> such a variable amount of memory as that work does. But in >>> non-parallel lazy vacuum, that work would be effective. We might be >>> able to do similar thing using DSA but I'm not sure that is better. >> >> I think it would work well with DSA as well. >> >> Just instead of having a single segment list, you'd have one per worker. >> >> Since workers work on disjoint tid sets, that shouldn't pose a problem. >> >> The segment list can be joined together later rather efficiently >> (simple logical joining of the segment pointer arrays) for the index >> scan phases. > > It's been a while since there was any movement on this patch and quite a > few issues have been raised. > > Have you tried the approaches that Claudio suggested? > Yes, it's taking a time to update logic and measurement but it's coming along. Also I'm working on changing deadlock detection. Will post new patch and measurement result. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On 1/10/17 11:23 AM, Claudio Freire wrote: > On Tue, Jan 10, 2017 at 6:42 AM, Masahiko Sawada> wrote: >>> Does this work negate the other work to allow VACUUM to use > >>> 1GB memory? >> >> Partly yes. Because memory space for dead TIDs needs to be allocated >> in DSM before vacuum worker launches, parallel lazy vacuum cannot use >> such a variable amount of memory as that work does. But in >> non-parallel lazy vacuum, that work would be effective. We might be >> able to do similar thing using DSA but I'm not sure that is better. > > I think it would work well with DSA as well. > > Just instead of having a single segment list, you'd have one per worker. > > Since workers work on disjoint tid sets, that shouldn't pose a problem. > > The segment list can be joined together later rather efficiently > (simple logical joining of the segment pointer arrays) for the index > scan phases. It's been a while since there was any movement on this patch and quite a few issues have been raised. Have you tried the approaches that Claudio suggested? -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Jan 10, 2017 at 6:42 AM, Masahiko Sawadawrote: >> Does this work negate the other work to allow VACUUM to use > >> 1GB memory? > > Partly yes. Because memory space for dead TIDs needs to be allocated > in DSM before vacuum worker launches, parallel lazy vacuum cannot use > such a variable amount of memory as that work does. But in > non-parallel lazy vacuum, that work would be effective. We might be > able to do similar thing using DSA but I'm not sure that is better. I think it would work well with DSA as well. Just instead of having a single segment list, you'd have one per worker. Since workers work on disjoint tid sets, that shouldn't pose a problem. The segment list can be joined together later rather efficiently (simple logical joining of the segment pointer arrays) for the index scan phases. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Jan 10, 2017 at 6:42 AM, Masahiko Sawadawrote: > Attached result of performance test with scale factor = 500 and the > test script I used. I measured each test at four times and plot > average of last three execution times to sf_500.png file. When table > has index, vacuum execution time is smallest when number of index and > parallel degree is same. It does seem from those results that parallel heap scans aren't paying off, and in fact are hurting. It could be your I/O that's at odds with the parallel degree settings rather than the approach (ie: your I/O system can't handle that many parallel scans), but in any case it does warrant a few more tests. I'd suggest you try to: 1. Disable parallel lazy vacuum, leave parallel index scans 2. Limit parallel degree to number of indexes, leaving parallel lazy vacuum enabled 3. Cap lazy vacuum parallel degree by effective_io_concurrency, and index scan parallel degree to number of indexes And compare against your earlier test results. I suspect 1 could be the winner, but 3 has a chance too (if e_i_c is properly set up for your I/O system). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Mon, Jan 9, 2017 at 6:01 PM, Simon Riggswrote: > On 9 January 2017 at 08:48, Masahiko Sawada wrote: > >> I had not considered necessity of dead lock detection support. > > It seems like a big potential win to scan multiple indexes in parallel. > > Does the design for collecting dead TIDs use a variable amount of > memory? No. Collecting dead TIDs and calculation for max dead tuples are same as current lazy vacuum. That is, the memory space for dead TIDs is allocated with fixed size. If parallel lazy vacuum that memory space is allocated in dynamic shared memory, else is allocated in local memory. > Does this work negate the other work to allow VACUUM to use > > 1GB memory? Partly yes. Because memory space for dead TIDs needs to be allocated in DSM before vacuum worker launches, parallel lazy vacuum cannot use such a variable amount of memory as that work does. But in non-parallel lazy vacuum, that work would be effective. We might be able to do similar thing using DSA but I'm not sure that is better. Attached result of performance test with scale factor = 500 and the test script I used. I measured each test at four times and plot average of last three execution times to sf_500.png file. When table has index, vacuum execution time is smallest when number of index and parallel degree is same. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center parallel_vacuum.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Jan 10, 2017 at 3:46 PM, Amit Kapilawrote: > On Mon, Jan 9, 2017 at 2:18 PM, Masahiko Sawada wrote: >> On Sat, Jan 7, 2017 at 2:47 PM, Amit Kapila wrote: >>> On Fri, Jan 6, 2017 at 11:08 PM, Masahiko Sawada >>> wrote: On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier wrote: > On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada > wrote: >> Yeah, I don't have a good solution for this problem so far. >> We might need to improve group locking mechanism for the updating >> operation or came up with another approach to resolve this problem. >> For example, one possible idea is that the launcher process allocates >> vm and fsm enough in advance in order to avoid extending fork relation >> by parallel workers, but it's not resolve fundamental problem. > I got some advices at PGConf.ASIA 2016 and started to work on this again. The most big problem so far is the group locking. As I mentioned before, parallel vacuum worker could try to extend the same visibility map page at the same time. So we need to make group locking conflict in some cases, or need to eliminate the necessity of acquiring extension lock. Attached 000 patch uses former idea, which makes the group locking conflict between parallel workers when parallel worker tries to acquire extension lock on same page. >>> >>> How are planning to ensure the same in deadlock detector? Currently, >>> deadlock detector considers members from same lock group as >>> non-blocking. If you think we don't need to make any changes in >>> deadlock detector, then explain why so? >>> >> >> Thank you for comment. >> I had not considered necessity of dead lock detection support. But >> because lazy_scan_heap actquires the relation extension lock and >> release it before acquiring another extension lock, I guess we don't >> need that changes for parallel lazy vacuum. Thought? >> > > Okay, but it is quite possible that lazy_scan_heap is not able to > acquire the required lock as that is already acquired by another > process (which is not part of group performing Vacuum), then all the > processes in a group might need to run deadlock detector code wherein > multiple places, it has been assumed that group members won't > conflict. As an example, refer code in TopoSort where it is trying to > emit all groupmates together and IIRC, the basis of that part of the > code is groupmates won't conflict with each other and this patch will > break that assumption. I have not looked into the parallel vacuum > patch, but changes in 000_make_group_locking_conflict_extend_lock_v2 > doesn't appear to be safe. Even if your parallel vacuum patch doesn't > need any change in deadlock detector, then also as proposed it appears > that changes in locking will behave same for any of the operations > performing relation extension. So in future any parallel operation > (say parallel copy/insert) which involves relation extension lock will > behave similary. Is that okay or are you assuming that the next > person developing any such feature should rethink about this problem > and extends your solution to match his requirement. Thank you for expatiation. I agree that we should support dead lock detection as well in this patch even if this feature doesn't need that actually. I'm going to extend 000 patch to support dead lock detection. > > >> What do we actually gain from having the other parts of VACUUM execute >> in parallel? Does truncation happen faster in parallel? >> > > I think all CPU intensive operations for heap (like checking of > dead/live rows, processing of dead tuples, etc.) can be faster. Vacuum on table with no index can be faster as well. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Mon, Jan 9, 2017 at 2:18 PM, Masahiko Sawadawrote: > On Sat, Jan 7, 2017 at 2:47 PM, Amit Kapila wrote: >> On Fri, Jan 6, 2017 at 11:08 PM, Masahiko Sawada >> wrote: >>> On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier >>> wrote: On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada wrote: > Yeah, I don't have a good solution for this problem so far. > We might need to improve group locking mechanism for the updating > operation or came up with another approach to resolve this problem. > For example, one possible idea is that the launcher process allocates > vm and fsm enough in advance in order to avoid extending fork relation > by parallel workers, but it's not resolve fundamental problem. >>> >>> I got some advices at PGConf.ASIA 2016 and started to work on this again. >>> >>> The most big problem so far is the group locking. As I mentioned >>> before, parallel vacuum worker could try to extend the same visibility >>> map page at the same time. So we need to make group locking conflict >>> in some cases, or need to eliminate the necessity of acquiring >>> extension lock. Attached 000 patch uses former idea, which makes the >>> group locking conflict between parallel workers when parallel worker >>> tries to acquire extension lock on same page. >>> >> >> How are planning to ensure the same in deadlock detector? Currently, >> deadlock detector considers members from same lock group as >> non-blocking. If you think we don't need to make any changes in >> deadlock detector, then explain why so? >> > > Thank you for comment. > I had not considered necessity of dead lock detection support. But > because lazy_scan_heap actquires the relation extension lock and > release it before acquiring another extension lock, I guess we don't > need that changes for parallel lazy vacuum. Thought? > Okay, but it is quite possible that lazy_scan_heap is not able to acquire the required lock as that is already acquired by another process (which is not part of group performing Vacuum), then all the processes in a group might need to run deadlock detector code wherein multiple places, it has been assumed that group members won't conflict. As an example, refer code in TopoSort where it is trying to emit all groupmates together and IIRC, the basis of that part of the code is groupmates won't conflict with each other and this patch will break that assumption. I have not looked into the parallel vacuum patch, but changes in 000_make_group_locking_conflict_extend_lock_v2 doesn't appear to be safe. Even if your parallel vacuum patch doesn't need any change in deadlock detector, then also as proposed it appears that changes in locking will behave same for any of the operations performing relation extension. So in future any parallel operation (say parallel copy/insert) which involves relation extension lock will behave similary. Is that okay or are you assuming that the next person developing any such feature should rethink about this problem and extends your solution to match his requirement. > What do we actually gain from having the other parts of VACUUM execute > in parallel? Does truncation happen faster in parallel? > I think all CPU intensive operations for heap (like checking of dead/live rows, processing of dead tuples, etc.) can be faster. > Can you give us some timings for performance of the different phases, > with varying levels of parallelism? I feel timings depend on the kind of test we perform, for example if there are many dead rows in heap and there are few indexes on a table, we might see that the gain for doing parallel heap scan is substantial. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Sat, Jan 7, 2017 at 7:18 AM, Claudio Freirewrote: > On Fri, Jan 6, 2017 at 2:38 PM, Masahiko Sawada wrote: >> table_size | indexes | parallel_degree | time >> +-+-+-- >> 6.5GB | 0 | 1 | 00:00:14 >> 6.5GB | 0 | 2 | 00:00:02 >> 6.5GB | 0 | 4 | 00:00:02 > > Those numbers look highly suspect. > > Are you sure you're not experiencing caching effects? (ie: maybe you > ran the second and third vacuums after the first, and didn't flush the > page cache, so the table was cached) > >> 6.5GB | 2 | 1 | 00:02:18 >> 6.5GB | 2 | 2 | 00:00:38 >> 6.5GB | 2 | 4 | 00:00:46 > ... >> 13GB | 0 | 1 | 00:03:52 >> 13GB | 0 | 2 | 00:00:49 >> 13GB | 0 | 4 | 00:00:50 > .. >> 13GB | 2 | 1 | 00:12:42 >> 13GB | 2 | 2 | 00:01:17 >> 13GB | 2 | 4 | 00:02:12 > > These would also be consistent with caching effects Since I ran vacuum after updated all pages on table, I thought that all data are in either shared buffer or OS cache. But anyway, I measured it at only one time so this result is not accurate. I'll test again and measure it at some times. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On 9 January 2017 at 08:48, Masahiko Sawadawrote: > I had not considered necessity of dead lock detection support. It seems like a big potential win to scan multiple indexes in parallel. What do we actually gain from having the other parts of VACUUM execute in parallel? Does truncation happen faster in parallel? ISTM we might reduce the complexity of this if there is no substantial gain. Can you give us some timings for performance of the different phases, with varying levels of parallelism? Does the design for collecting dead TIDs use a variable amount of memory? Does this work negate the other work to allow VACUUM to use > 1GB memory? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Sat, Jan 7, 2017 at 2:47 PM, Amit Kapilawrote: > On Fri, Jan 6, 2017 at 11:08 PM, Masahiko Sawada > wrote: >> On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier >> wrote: >>> On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada >>> wrote: Yeah, I don't have a good solution for this problem so far. We might need to improve group locking mechanism for the updating operation or came up with another approach to resolve this problem. For example, one possible idea is that the launcher process allocates vm and fsm enough in advance in order to avoid extending fork relation by parallel workers, but it's not resolve fundamental problem. >>> >> >> I got some advices at PGConf.ASIA 2016 and started to work on this again. >> >> The most big problem so far is the group locking. As I mentioned >> before, parallel vacuum worker could try to extend the same visibility >> map page at the same time. So we need to make group locking conflict >> in some cases, or need to eliminate the necessity of acquiring >> extension lock. Attached 000 patch uses former idea, which makes the >> group locking conflict between parallel workers when parallel worker >> tries to acquire extension lock on same page. >> > > How are planning to ensure the same in deadlock detector? Currently, > deadlock detector considers members from same lock group as > non-blocking. If you think we don't need to make any changes in > deadlock detector, then explain why so? > Thank you for comment. I had not considered necessity of dead lock detection support. But because lazy_scan_heap actquires the relation extension lock and release it before acquiring another extension lock, I guess we don't need that changes for parallel lazy vacuum. Thought? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Fri, Jan 6, 2017 at 11:08 PM, Masahiko Sawadawrote: > On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier > wrote: >> On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada >> wrote: >>> Yeah, I don't have a good solution for this problem so far. >>> We might need to improve group locking mechanism for the updating >>> operation or came up with another approach to resolve this problem. >>> For example, one possible idea is that the launcher process allocates >>> vm and fsm enough in advance in order to avoid extending fork relation >>> by parallel workers, but it's not resolve fundamental problem. >> > > I got some advices at PGConf.ASIA 2016 and started to work on this again. > > The most big problem so far is the group locking. As I mentioned > before, parallel vacuum worker could try to extend the same visibility > map page at the same time. So we need to make group locking conflict > in some cases, or need to eliminate the necessity of acquiring > extension lock. Attached 000 patch uses former idea, which makes the > group locking conflict between parallel workers when parallel worker > tries to acquire extension lock on same page. > How are planning to ensure the same in deadlock detector? Currently, deadlock detector considers members from same lock group as non-blocking. If you think we don't need to make any changes in deadlock detector, then explain why so? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Fri, Jan 6, 2017 at 2:38 PM, Masahiko Sawadawrote: > table_size | indexes | parallel_degree | time > +-+-+-- > 6.5GB | 0 | 1 | 00:00:14 > 6.5GB | 0 | 2 | 00:00:02 > 6.5GB | 0 | 4 | 00:00:02 Those numbers look highly suspect. Are you sure you're not experiencing caching effects? (ie: maybe you ran the second and third vacuums after the first, and didn't flush the page cache, so the table was cached) > 6.5GB | 2 | 1 | 00:02:18 > 6.5GB | 2 | 2 | 00:00:38 > 6.5GB | 2 | 4 | 00:00:46 ... > 13GB | 0 | 1 | 00:03:52 > 13GB | 0 | 2 | 00:00:49 > 13GB | 0 | 4 | 00:00:50 .. > 13GB | 2 | 1 | 00:12:42 > 13GB | 2 | 2 | 00:01:17 > 13GB | 2 | 4 | 00:02:12 These would also be consistent with caching effects -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquierwrote: > On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada > wrote: >> Yeah, I don't have a good solution for this problem so far. >> We might need to improve group locking mechanism for the updating >> operation or came up with another approach to resolve this problem. >> For example, one possible idea is that the launcher process allocates >> vm and fsm enough in advance in order to avoid extending fork relation >> by parallel workers, but it's not resolve fundamental problem. > I got some advices at PGConf.ASIA 2016 and started to work on this again. The most big problem so far is the group locking. As I mentioned before, parallel vacuum worker could try to extend the same visibility map page at the same time. So we need to make group locking conflict in some cases, or need to eliminate the necessity of acquiring extension lock. Attached 000 patch uses former idea, which makes the group locking conflict between parallel workers when parallel worker tries to acquire extension lock on same page. I'm not sure this is the best idea but it's very simple and enough to support parallel vacuum. More smart idea could be needed when we want to support parallel DML and so on. 001 patch adds parallel option to VACUUM command. As Robert suggested before, parallel option is set with parallel degree. =# VACUUM (PARALLEL 4) table_name; ..means 4 background processes are launched and background process executes lazy_scan_heap while the launcher (leader) process waits for all vacuum workers finish. If N = 1 or without parallel option, leader process itself executes lazy_scan_heap. Internal Design = I changed the parallel vacuum internal design. Collecting garbage on table is processed in block-level parallel. For tables with indexes, each index on table is assigned to each vacuum worker and all garbage on a index are processed by particular assigned vacuum worker. The all spaces for the array of dead tuple TIDs used by vacuum worker are allocated in dynamic shared memory by launcher process. Vacuum worker process stores dead tuple location into its dead tuple array without lock, the TIDs in a dead tuple array are ordered by TID. Note that entire space for dead tuple, that is a bunch of dead tuple array, are not ordered. If table has index, all dead tuple TIDs needs to be shared with all vacuum workers before actual reclaiming dead tuples starts and these data should be cleared after all vacuum worker finished to use them. So I put two synchronization points at where before reclaiming dead tuples and where after finished to reclaim them. At these points, parallel vacuum worker waits for all other workers to reach to the same point. Once all vacuum workers reached to same point, vacuum worker resumes next operation. For example, If a table has five indexes and we execute parallel lazy vacuum on that table with three vacuum workers, two of three vacuum workers are assigned two indexes and another one vacuum worker is assigned to one indexes. After the amount of dead tuple of all vacuum worker reached to maintenance_work_mem size vacuum worker starts to reclaim dead tuple on table and index. A vacuum worker that is assigned to one index finishes (probably first) and sleeps until other two vacuum workers finish to vacuum. If table has no index then each parallel vacuum worker vacuums each page as we go. Performance === I measured the execution time of vacuum on dirty table with several parallel degree in my poor environment. table_size | indexes | parallel_degree | time +-+-+-- 6.5GB | 0 | 1 | 00:00:14 6.5GB | 0 | 2 | 00:00:02 6.5GB | 0 | 4 | 00:00:02 6.5GB | 1 | 1 | 00:00:13 6.5GB | 1 | 2 | 00:00:15 6.5GB | 1 | 4 | 00:00:18 6.5GB | 2 | 1 | 00:02:18 6.5GB | 2 | 2 | 00:00:38 6.5GB | 2 | 4 | 00:00:46 13GB | 0 | 1 | 00:03:52 13GB | 0 | 2 | 00:00:49 13GB | 0 | 4 | 00:00:50 13GB | 1 | 1 | 00:01:41 13GB | 1 | 2 | 00:01:59 13GB | 1 | 4 | 00:01:24 13GB | 2 | 1 | 00:12:42 13GB | 2 | 2 | 00:01:17 13GB | 2 | 4 | 00:02:12 In result of my measurement, vacuum execution time got better in some cases but didn't improve in case where index = 1. I'll investigate the cause. ToDo == * Vacuum progress support. * Storage parameter support, perhaps parallel_vacuum_workers parameter which allows autovacuum to use parallel vacuum on specified table. I register this to next CF. Regards, -- Masahiko
Re: [HACKERS] Block level parallel vacuum WIP
On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawadawrote: > Yeah, I don't have a good solution for this problem so far. > We might need to improve group locking mechanism for the updating > operation or came up with another approach to resolve this problem. > For example, one possible idea is that the launcher process allocates > vm and fsm enough in advance in order to avoid extending fork relation > by parallel workers, but it's not resolve fundamental problem. Marked as returned with feedback because of lack of activity and... Feedback provided. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Thu, Sep 15, 2016 at 11:44 PM, Robert Haaswrote: > On Thu, Sep 15, 2016 at 7:21 AM, Masahiko Sawada > wrote: >> I'm implementing this patch but I need to resolve the problem >> regarding lock for extension by multiple parallel workers. >> In parallel vacuum, multiple workers could try to acquire the >> exclusive lock for extension on same relation. >> Since acquiring the exclusive lock for extension by multiple workers >> is regarded as locking from same locking group, multiple workers >> extend fsm or vm at the same time and end up with error. >> I thought that it might be involved with parallel update operation, so >> I'd like to discuss about this in advance. > > Hmm, yeah. This is one of the reasons why parallel queries currently > need to be entirely read-only. I think there's a decent argument that > the relation extension lock mechanism should be entirely redesigned: > the current system is neither particularly fast nor particularly > elegant, and some of the services that the heavyweight lock manager > provides, such as deadlock detection, are not relevant for relation > extension locks. I'm not sure if we should try to fix that right away > or come up with some special-purpose hack for vacuum, such as having > backends use condition variables to take turns calling > visibilitymap_set(). > Yeah, I don't have a good solution for this problem so far. We might need to improve group locking mechanism for the updating operation or came up with another approach to resolve this problem. For example, one possible idea is that the launcher process allocates vm and fsm enough in advance in order to avoid extending fork relation by parallel workers, but it's not resolve fundamental problem. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Thu, Sep 15, 2016 at 7:21 AM, Masahiko Sawadawrote: > I'm implementing this patch but I need to resolve the problem > regarding lock for extension by multiple parallel workers. > In parallel vacuum, multiple workers could try to acquire the > exclusive lock for extension on same relation. > Since acquiring the exclusive lock for extension by multiple workers > is regarded as locking from same locking group, multiple workers > extend fsm or vm at the same time and end up with error. > I thought that it might be involved with parallel update operation, so > I'd like to discuss about this in advance. Hmm, yeah. This is one of the reasons why parallel queries currently need to be entirely read-only. I think there's a decent argument that the relation extension lock mechanism should be entirely redesigned: the current system is neither particularly fast nor particularly elegant, and some of the services that the heavyweight lock manager provides, such as deadlock detection, are not relevant for relation extension locks. I'm not sure if we should try to fix that right away or come up with some special-purpose hack for vacuum, such as having backends use condition variables to take turns calling visibilitymap_set(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Sat, Sep 10, 2016 at 7:44 PM, Pavan Deolaseewrote: > > > On Wed, Aug 24, 2016 at 3:31 AM, Michael Paquier > wrote: >> >> On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila >> wrote: >> > On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier >> > wrote: >> >> On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada >> >> wrote: >> >>> As for PoC, I implemented parallel vacuum so that each worker >> >>> processes both 1 and 2 phases for particular block range. >> >>> Suppose we vacuum 1000 blocks table with 4 workers, each worker >> >>> processes 250 consecutive blocks in phase 1 and then reclaims dead >> >>> tuples from heap and indexes (phase 2). >> >> >> >> So each worker is assigned a range of blocks, and processes them in >> >> parallel? This does not sound performance-wise. I recall Robert and >> >> Amit emails on the matter for sequential scan that this would suck >> >> performance out particularly for rotating disks. >> >> >> > >> > The implementation in patch is same as we have initially thought for >> > sequential scan, but turned out that it is not good way to do because >> > it can lead to inappropriate balance of work among workers. Suppose >> > one worker is able to finish it's work, it won't be able to do more. >> >> Ah, so it was the reason. Thanks for confirming my doubts on what is >> proposed. >> -- > > > I believe Sawada-san has got enough feedback on the design to work out the > next steps. It seems natural that the vacuum workers are assigned a portion > of the heap to scan and collect dead tuples (similar to what patch does) and > the same workers to be responsible for the second phase of heap scan. Yeah, thank you for the feedback. > But as far as index scans are concerned, I agree with Tom that the best > strategy is to assign a different index to each worker process and let them > vacuum indexes in parallel. > That way the work for each worker process is > clearly cut out and they don't contend for the same resources, which means > the first patch to allow multiple backends to wait for a cleanup buffer is > not required. Later we could extend it further such multiple workers can > vacuum a single index by splitting the work on physical boundaries, but even > that will ensure clear demarkation of work and hence no contention on index > blocks. I also agree with this idea. Each worker vacuums different indexes and then the leader process should update all index statistics after parallel mode exited. I'm implementing this patch but I need to resolve the problem regarding lock for extension by multiple parallel workers. In parallel vacuum, multiple workers could try to acquire the exclusive lock for extension on same relation. Since acquiring the exclusive lock for extension by multiple workers is regarded as locking from same locking group, multiple workers extend fsm or vm at the same time and end up with error. I thought that it might be involved with parallel update operation, so I'd like to discuss about this in advance. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Wed, Aug 24, 2016 at 3:31 AM, Michael Paquierwrote: > On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila > wrote: > > On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier > > wrote: > >> On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada > wrote: > >>> As for PoC, I implemented parallel vacuum so that each worker > >>> processes both 1 and 2 phases for particular block range. > >>> Suppose we vacuum 1000 blocks table with 4 workers, each worker > >>> processes 250 consecutive blocks in phase 1 and then reclaims dead > >>> tuples from heap and indexes (phase 2). > >> > >> So each worker is assigned a range of blocks, and processes them in > >> parallel? This does not sound performance-wise. I recall Robert and > >> Amit emails on the matter for sequential scan that this would suck > >> performance out particularly for rotating disks. > >> > > > > The implementation in patch is same as we have initially thought for > > sequential scan, but turned out that it is not good way to do because > > it can lead to inappropriate balance of work among workers. Suppose > > one worker is able to finish it's work, it won't be able to do more. > > Ah, so it was the reason. Thanks for confirming my doubts on what is > proposed. > -- > I believe Sawada-san has got enough feedback on the design to work out the next steps. It seems natural that the vacuum workers are assigned a portion of the heap to scan and collect dead tuples (similar to what patch does) and the same workers to be responsible for the second phase of heap scan. But as far as index scans are concerned, I agree with Tom that the best strategy is to assign a different index to each worker process and let them vacuum indexes in parallel. That way the work for each worker process is clearly cut out and they don't contend for the same resources, which means the first patch to allow multiple backends to wait for a cleanup buffer is not required. Later we could extend it further such multiple workers can vacuum a single index by splitting the work on physical boundaries, but even that will ensure clear demarkation of work and hence no contention on index blocks. ISTM this will require further work and it probably makes sense to mark the patch as "Returned with feedback" for now. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapilawrote: > On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier > wrote: >> On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada >> wrote: >>> As for PoC, I implemented parallel vacuum so that each worker >>> processes both 1 and 2 phases for particular block range. >>> Suppose we vacuum 1000 blocks table with 4 workers, each worker >>> processes 250 consecutive blocks in phase 1 and then reclaims dead >>> tuples from heap and indexes (phase 2). >> >> So each worker is assigned a range of blocks, and processes them in >> parallel? This does not sound performance-wise. I recall Robert and >> Amit emails on the matter for sequential scan that this would suck >> performance out particularly for rotating disks. >> > > The implementation in patch is same as we have initially thought for > sequential scan, but turned out that it is not good way to do because > it can lead to inappropriate balance of work among workers. Suppose > one worker is able to finish it's work, it won't be able to do more. Ah, so it was the reason. Thanks for confirming my doubts on what is proposed. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On 2016-08-23 12:17:30 -0400, Robert Haas wrote: > On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera >wrote: > > Robert Haas wrote: > >> 2. When you finish the heap scan, or when the array of dead tuple IDs > >> is full (or very nearly full?), perform a cycle of index vacuuming. > >> For now, have each worker process a separate index; extra workers just > >> wait. Perhaps use the condition variable patch that I posted > >> previously to make the workers wait. Then resume the parallel heap > >> scan, if not yet done. > > > > At least btrees should easily be scannable in parallel, given that we > > process them in physical order rather than logically walk the tree. So > > if there are more workers than indexes, it's possible to put more than > > one worker on the same index by carefully indicating each to stop at a > > predetermined index page number. > > Well that's fine if we figure it out, but I wouldn't try to include it > in the first patch. Let's make VACUUM parallel one step at a time. Given that index scan(s) are, in my experience, way more often the bottleneck than the heap-scan(s), I'm not sure that order is the best. The heap-scan benefits from the VM, the index scans don't. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
Robert Haas wrote: > On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera >wrote: > > Robert Haas wrote: > >> 2. When you finish the heap scan, or when the array of dead tuple IDs > >> is full (or very nearly full?), perform a cycle of index vacuuming. > >> For now, have each worker process a separate index; extra workers just > >> wait. Perhaps use the condition variable patch that I posted > >> previously to make the workers wait. Then resume the parallel heap > >> scan, if not yet done. > > > > At least btrees should easily be scannable in parallel, given that we > > process them in physical order rather than logically walk the tree. So > > if there are more workers than indexes, it's possible to put more than > > one worker on the same index by carefully indicating each to stop at a > > predetermined index page number. > > Well that's fine if we figure it out, but I wouldn't try to include it > in the first patch. Let's make VACUUM parallel one step at a time. Sure, just putting the idea out there. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrerawrote: > Robert Haas wrote: >> 2. When you finish the heap scan, or when the array of dead tuple IDs >> is full (or very nearly full?), perform a cycle of index vacuuming. >> For now, have each worker process a separate index; extra workers just >> wait. Perhaps use the condition variable patch that I posted >> previously to make the workers wait. Then resume the parallel heap >> scan, if not yet done. > > At least btrees should easily be scannable in parallel, given that we > process them in physical order rather than logically walk the tree. So > if there are more workers than indexes, it's possible to put more than > one worker on the same index by carefully indicating each to stop at a > predetermined index page number. Well that's fine if we figure it out, but I wouldn't try to include it in the first patch. Let's make VACUUM parallel one step at a time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Aug 23, 2016 at 10:50 PM, Robert Haaswrote: > On Tue, Aug 23, 2016 at 7:02 AM, Masahiko Sawada > wrote: >> I'd like to propose block level parallel VACUUM. >> This feature makes VACUUM possible to use multiple CPU cores. > > Great. This is something that I have thought about, too. Andres and > Heikki recommended it as a project to me a few PGCons ago. > >> As for PoC, I implemented parallel vacuum so that each worker >> processes both 1 and 2 phases for particular block range. >> Suppose we vacuum 1000 blocks table with 4 workers, each worker >> processes 250 consecutive blocks in phase 1 and then reclaims dead >> tuples from heap and indexes (phase 2). >> To use visibility map efficiency, each worker scan particular block >> range of relation and collect dead tuple locations. >> After each worker finished task, the leader process gathers these >> vacuum statistics information and update relfrozenxid if possible. > > This doesn't seem like a good design, because it adds a lot of extra > index scanning work. What I think you should do is: > > 1. Use a parallel heap scan (heap_beginscan_parallel) to let all > workers scan in parallel. Allocate a DSM segment to store the control > structure for this parallel scan plus an array for the dead tuple IDs > and a lock to protect the array. > > 2. When you finish the heap scan, or when the array of dead tuple IDs > is full (or very nearly full?), perform a cycle of index vacuuming. > For now, have each worker process a separate index; extra workers just > wait. Perhaps use the condition variable patch that I posted > previously to make the workers wait. Then resume the parallel heap > scan, if not yet done. > > Later, we can try to see if there's a way to have multiple workers > work together to vacuum a single index. But the above seems like a > good place to start. Thank you for the advice. That's a what I thought as an another design, I will change the patch to this design. >> I also changed the buffer lock infrastructure so that multiple >> processes can wait for cleanup lock on a buffer. > > You won't need this if you proceed as above, which is probably a good thing. Right. > >> And the new GUC parameter vacuum_parallel_workers controls the number >> of vacuum workers. > > I suspect that for autovacuum there is little reason to use parallel > vacuum, since most of the time we are trying to slow vacuum down, not > speed it up. I'd be inclined, for starters, to just add a PARALLEL > option to the VACUUM command, for when people want to speed up > parallel vacuums. Perhaps > > VACUUM (PARALLEL 4) relation; > > ...could mean to vacuum the relation with the given number of workers, and: > > VACUUM (PARALLEL) relation; > > ...could mean to vacuum the relation in parallel with the system > choosing the number of workers - 1 worker per index is probably a good > starting formula, though it might need some refinement. It looks convenient. I was thinking that we can manage the number of parallel worker per table using this parameter for autovacuum , like ALTER TABLE relation SET (parallel_vacuum_workers = 2) Regards, -- Masahiko Sawada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
Robert Haas wrote: > 2. When you finish the heap scan, or when the array of dead tuple IDs > is full (or very nearly full?), perform a cycle of index vacuuming. > For now, have each worker process a separate index; extra workers just > wait. Perhaps use the condition variable patch that I posted > previously to make the workers wait. Then resume the parallel heap > scan, if not yet done. At least btrees should easily be scannable in parallel, given that we process them in physical order rather than logically walk the tree. So if there are more workers than indexes, it's possible to put more than one worker on the same index by carefully indicating each to stop at a predetermined index page number. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Aug 23, 2016 at 9:40 PM, Alexander Korotkovwrote: > On Tue, Aug 23, 2016 at 3:32 PM, Tom Lane wrote: >> >> Claudio Freire writes: >> > Not only that, but from your description (I haven't read the patch, >> > sorry), you'd be scanning the whole index multiple times (one per >> > worker). >> >> What about pointing each worker at a separate index? Obviously the >> degree of concurrency during index cleanup is then limited by the >> number of indexes, but that doesn't seem like a fatal problem. > > > +1 > We could eventually need some effective way of parallelizing vacuum of > single index. > But pointing each worker at separate index seems to be fair enough for > majority of cases. > Or we can improve vacuum of single index by changing data representation of dead tuple to bitmap. It can reduce the number of index whole scan during vacuum and make comparing the index item to the dead tuples faster. This is a listed on Todo list and I've implemented it. Regards, -- Masahiko Sawada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Aug 23, 2016 at 7:02 AM, Masahiko Sawadawrote: > I'd like to propose block level parallel VACUUM. > This feature makes VACUUM possible to use multiple CPU cores. Great. This is something that I have thought about, too. Andres and Heikki recommended it as a project to me a few PGCons ago. > As for PoC, I implemented parallel vacuum so that each worker > processes both 1 and 2 phases for particular block range. > Suppose we vacuum 1000 blocks table with 4 workers, each worker > processes 250 consecutive blocks in phase 1 and then reclaims dead > tuples from heap and indexes (phase 2). > To use visibility map efficiency, each worker scan particular block > range of relation and collect dead tuple locations. > After each worker finished task, the leader process gathers these > vacuum statistics information and update relfrozenxid if possible. This doesn't seem like a good design, because it adds a lot of extra index scanning work. What I think you should do is: 1. Use a parallel heap scan (heap_beginscan_parallel) to let all workers scan in parallel. Allocate a DSM segment to store the control structure for this parallel scan plus an array for the dead tuple IDs and a lock to protect the array. 2. When you finish the heap scan, or when the array of dead tuple IDs is full (or very nearly full?), perform a cycle of index vacuuming. For now, have each worker process a separate index; extra workers just wait. Perhaps use the condition variable patch that I posted previously to make the workers wait. Then resume the parallel heap scan, if not yet done. Later, we can try to see if there's a way to have multiple workers work together to vacuum a single index. But the above seems like a good place to start. > I also changed the buffer lock infrastructure so that multiple > processes can wait for cleanup lock on a buffer. You won't need this if you proceed as above, which is probably a good thing. > And the new GUC parameter vacuum_parallel_workers controls the number > of vacuum workers. I suspect that for autovacuum there is little reason to use parallel vacuum, since most of the time we are trying to slow vacuum down, not speed it up. I'd be inclined, for starters, to just add a PARALLEL option to the VACUUM command, for when people want to speed up parallel vacuums. Perhaps VACUUM (PARALLEL 4) relation; ...could mean to vacuum the relation with the given number of workers, and: VACUUM (PARALLEL) relation; ...could mean to vacuum the relation in parallel with the system choosing the number of workers - 1 worker per index is probably a good starting formula, though it might need some refinement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquierwrote: > On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada > wrote: >> As for PoC, I implemented parallel vacuum so that each worker >> processes both 1 and 2 phases for particular block range. >> Suppose we vacuum 1000 blocks table with 4 workers, each worker >> processes 250 consecutive blocks in phase 1 and then reclaims dead >> tuples from heap and indexes (phase 2). > > So each worker is assigned a range of blocks, and processes them in > parallel? This does not sound performance-wise. I recall Robert and > Amit emails on the matter for sequential scan that this would suck > performance out particularly for rotating disks. > The implementation in patch is same as we have initially thought for sequential scan, but turned out that it is not good way to do because it can lead to inappropriate balance of work among workers. Suppose one worker is able to finish it's work, it won't be able to do more. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On 23.08.2016 15:41, Michael Paquier wrote: On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawadawrote: As for PoC, I implemented parallel vacuum so that each worker processes both 1 and 2 phases for particular block range. Suppose we vacuum 1000 blocks table with 4 workers, each worker processes 250 consecutive blocks in phase 1 and then reclaims dead tuples from heap and indexes (phase 2). So each worker is assigned a range of blocks, and processes them in parallel? This does not sound performance-wise. I recall Robert and Amit emails on the matter for sequential scan that this would suck performance out particularly for rotating disks. Rotating disks is not a problem - you can always raid them and etc. 8k allocation per relation once per half an hour that is the problem. Seq scan is this way = random scan... Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawadawrote: > As for PoC, I implemented parallel vacuum so that each worker > processes both 1 and 2 phases for particular block range. > Suppose we vacuum 1000 blocks table with 4 workers, each worker > processes 250 consecutive blocks in phase 1 and then reclaims dead > tuples from heap and indexes (phase 2). So each worker is assigned a range of blocks, and processes them in parallel? This does not sound performance-wise. I recall Robert and Amit emails on the matter for sequential scan that this would suck performance out particularly for rotating disks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Aug 23, 2016 at 3:32 PM, Tom Lanewrote: > Claudio Freire writes: > > Not only that, but from your description (I haven't read the patch, > > sorry), you'd be scanning the whole index multiple times (one per > > worker). > > What about pointing each worker at a separate index? Obviously the > degree of concurrency during index cleanup is then limited by the > number of indexes, but that doesn't seem like a fatal problem. > +1 We could eventually need some effective way of parallelizing vacuum of single index. But pointing each worker at separate index seems to be fair enough for majority of cases. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Block level parallel vacuum WIP
Claudio Freirewrites: > Not only that, but from your description (I haven't read the patch, > sorry), you'd be scanning the whole index multiple times (one per > worker). What about pointing each worker at a separate index? Obviously the degree of concurrency during index cleanup is then limited by the number of indexes, but that doesn't seem like a fatal problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
I repeat your test on ProLiant DL580 Gen9 with Xeon E7-8890 v3. pgbench -s 100 and command vacuum pgbench_acounts after 10_000 transactions: with: alter system set vacuum_cost_delay to DEFAULT; parallel_vacuum_workers | time 1 | 138.703,263 ms 2 | 83.751,064 ms 4 | 66.105,861 ms 8 | 59.820,171 ms with: alter system set vacuum_cost_delay to 1; parallel_vacuum_workers | time 1 | 127.210,896 ms 2 | 75.300,278 ms 4 | 64.253,087 ms 8 | 60.130,953 --- Dmitry Vasilyev Postgres Professional: http://www.postgrespro.ru The Russian Postgres Company 2016-08-23 14:02 GMT+03:00 Masahiko Sawada: > Hi all, > > I'd like to propose block level parallel VACUUM. > This feature makes VACUUM possible to use multiple CPU cores. > > Vacuum Processing Logic > === > > PostgreSQL VACUUM processing logic consists of 2 phases, > 1. Collecting dead tuple locations on heap. > 2. Reclaiming dead tuples from heap and indexes. > These phases 1 and 2 are executed alternately, and once amount of dead > tuple location reached maintenance_work_mem in phase 1, phase 2 will > be executed. > > Basic Design > == > > As for PoC, I implemented parallel vacuum so that each worker > processes both 1 and 2 phases for particular block range. > Suppose we vacuum 1000 blocks table with 4 workers, each worker > processes 250 consecutive blocks in phase 1 and then reclaims dead > tuples from heap and indexes (phase 2). > To use visibility map efficiency, each worker scan particular block > range of relation and collect dead tuple locations. > After each worker finished task, the leader process gathers these > vacuum statistics information and update relfrozenxid if possible. > > I also changed the buffer lock infrastructure so that multiple > processes can wait for cleanup lock on a buffer. > And the new GUC parameter vacuum_parallel_workers controls the number > of vacuum workers. > > Performance(PoC) > = > > I ran parallel vacuum on 13GB table (pgbench scale 1000) with several > workers (on my poor virtual machine). > The result is, > > 1. Vacuum whole table without index (disable page skipping) > 1 worker : 33 sec > 2 workers : 27 sec > 3 workers : 23 sec > 4 workers : 22 sec > > 2. Vacuum table and index (after 1 transaction executed) > 1 worker : 12 sec > 2 workers : 49 sec > 3 workers : 54 sec > 4 workers : 53 sec > > As a result of my test, since multiple process could frequently try to > acquire the cleanup lock on same index buffer, execution time of > parallel vacuum got worse. > And it seems to be effective for only table vacuum so far, but is not > improved as expected (maybe disk bottleneck). > > Another Design > > ISTM that processing index vacuum by multiple process is not good idea > in most cases because many index items can be stored in a page and > multiple vacuum worker could try to require the cleanup lock on the > same index buffer. > It's rather better that multiple workers process particular block > range and then multiple workers process each particular block range, > and then one worker per index processes index vacuum. > > Still lots of work to do but attached PoC patch. > Feedback and suggestion are very welcome. > > Regards, > > -- > Masahiko Sawada > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Aug 23, 2016 at 8:02 AM, Masahiko Sawadawrote: > > 2. Vacuum table and index (after 1 transaction executed) > 1 worker : 12 sec > 2 workers : 49 sec > 3 workers : 54 sec > 4 workers : 53 sec > > As a result of my test, since multiple process could frequently try to > acquire the cleanup lock on same index buffer, execution time of > parallel vacuum got worse. > And it seems to be effective for only table vacuum so far, but is not > improved as expected (maybe disk bottleneck). Not only that, but from your description (I haven't read the patch, sorry), you'd be scanning the whole index multiple times (one per worker). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Block level parallel vacuum WIP
Hi all, I'd like to propose block level parallel VACUUM. This feature makes VACUUM possible to use multiple CPU cores. Vacuum Processing Logic === PostgreSQL VACUUM processing logic consists of 2 phases, 1. Collecting dead tuple locations on heap. 2. Reclaiming dead tuples from heap and indexes. These phases 1 and 2 are executed alternately, and once amount of dead tuple location reached maintenance_work_mem in phase 1, phase 2 will be executed. Basic Design == As for PoC, I implemented parallel vacuum so that each worker processes both 1 and 2 phases for particular block range. Suppose we vacuum 1000 blocks table with 4 workers, each worker processes 250 consecutive blocks in phase 1 and then reclaims dead tuples from heap and indexes (phase 2). To use visibility map efficiency, each worker scan particular block range of relation and collect dead tuple locations. After each worker finished task, the leader process gathers these vacuum statistics information and update relfrozenxid if possible. I also changed the buffer lock infrastructure so that multiple processes can wait for cleanup lock on a buffer. And the new GUC parameter vacuum_parallel_workers controls the number of vacuum workers. Performance(PoC) = I ran parallel vacuum on 13GB table (pgbench scale 1000) with several workers (on my poor virtual machine). The result is, 1. Vacuum whole table without index (disable page skipping) 1 worker : 33 sec 2 workers : 27 sec 3 workers : 23 sec 4 workers : 22 sec 2. Vacuum table and index (after 1 transaction executed) 1 worker : 12 sec 2 workers : 49 sec 3 workers : 54 sec 4 workers : 53 sec As a result of my test, since multiple process could frequently try to acquire the cleanup lock on same index buffer, execution time of parallel vacuum got worse. And it seems to be effective for only table vacuum so far, but is not improved as expected (maybe disk bottleneck). Another Design ISTM that processing index vacuum by multiple process is not good idea in most cases because many index items can be stored in a page and multiple vacuum worker could try to require the cleanup lock on the same index buffer. It's rather better that multiple workers process particular block range and then multiple workers process each particular block range, and then one worker per index processes index vacuum. Still lots of work to do but attached PoC patch. Feedback and suggestion are very welcome. Regards, -- Masahiko Sawada From b25d491a05a43fb7adf014b2580c71ec7adb75a2 Mon Sep 17 00:00:00 2001 From: Masahiko SawadaDate: Mon, 8 Aug 2016 16:43:35 -0700 Subject: [PATCH 1/2] Allow muliple backends to wait for cleanup lock. --- src/backend/storage/buffer/buf_init.c | 3 +- src/backend/storage/buffer/bufmgr.c | 57 +++ src/include/storage/buf_internals.h | 4 ++- src/include/storage/proc.h| 2 ++ 4 files changed, 45 insertions(+), 21 deletions(-) diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c index a4163cf..2aad030 100644 --- a/src/backend/storage/buffer/buf_init.c +++ b/src/backend/storage/buffer/buf_init.c @@ -134,7 +134,8 @@ InitBufferPool(void) CLEAR_BUFFERTAG(buf->tag); pg_atomic_init_u32(>state, 0); - buf->wait_backend_pid = 0; + dlist_init(>pin_count_waiters); + pg_atomic_write_u32(>nwaiters, 0); buf->buf_id = i; diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 76ade37..f2f4ab9 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -38,6 +38,7 @@ #include "catalog/storage.h" #include "executor/instrument.h" #include "lib/binaryheap.h" +#include "lib/ilist.h" #include "miscadmin.h" #include "pg_trace.h" #include "pgstat.h" @@ -1730,15 +1731,19 @@ UnpinBuffer(BufferDesc *buf, bool fixOwner) */ buf_state = LockBufHdr(buf); - if ((buf_state & BM_PIN_COUNT_WAITER) && - BUF_STATE_GET_REFCOUNT(buf_state) == 1) + if (buf_state & BM_PIN_COUNT_WAITER) { - /* we just released the last pin other than the waiter's */ - int wait_backend_pid = buf->wait_backend_pid; + dlist_mutable_iter iter; - buf_state &= ~BM_PIN_COUNT_WAITER; + if (pg_atomic_read_u32(>nwaiters) == 1) + buf_state &= ~BM_PIN_COUNT_WAITER; + + dlist_foreach_modify(iter, >pin_count_waiters) + { +