Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
On Sat, Jan 7, 2017 at 11:27 AM, Mithun Cy wrote: > On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila wrote: >> >> Your test and results look good, what kind of m/c you have used to >> test this. Let me see if I or one of my colleague can do this and >> similar test on some high-end m/c. > > As discussed with Amit, I have tried to run the same tests with below > modification, > 1. Increased the total rows to 10milion. > 2. Set fsync off; > 3. Changed tests as below. Updated all rows at a time. > > VACUUM FULL; > BEGIN; > UPDATE testtab SET col2 = md5(random()::text); > ROLLBACK; > > I have run these tests on IBM power2 which have sufficient ram. I have > set shared_buffer=32GB. > > My results show after this patch there is a slight increase in > response time (psql \timing was used) for the above update statement. > Which is around 5 to 10% delay. > I would like to add here, that the intention of the test was to stress the changes of the patch to see the overhead patch can incur. Now, surely this is a synthetic test prepared to test this patch, but I think it indicates that the changes have some overhead which might or might not be ignorable depending on how important is to get this patch. I think if Warm tuples or indirect indexes need this patch and they can't do without this, then it is worth considering this patch along with those patches. OTOH, if we can reduce the overhead, then it might be okay proceed with this patch on the basis of simplicity it can bring. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
On Sat, Jan 7, 2017 at 11:27 AM, Mithun Cy wrote: Sorry Auto plain text setting has disturbed the table indentation. Attaching the spreadsheet for same. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com HeapSatisfiesHOTAndKey_perf_results.ods Description: application/vnd.oasis.opendocument.spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila wrote: > > Your test and results look good, what kind of m/c you have used to > test this. Let me see if I or one of my colleague can do this and > similar test on some high-end m/c. As discussed with Amit, I have tried to run the same tests with below modification, 1. Increased the total rows to 10milion. 2. Set fsync off; 3. Changed tests as below. Updated all rows at a time. VACUUM FULL; BEGIN; UPDATE testtab SET col2 = md5(random()::text); ROLLBACK; I have run these tests on IBM power2 which have sufficient ram. I have set shared_buffer=32GB. My results show after this patch there is a slight increase in response time (psql \timing was used) for the above update statement. Which is around 5 to 10% delay. Runs Response time in ms for update base. Response Time in ms for update new patch. %INC 1 158863.501 167443.767 5.4010304104 2 151061.793 168908.536 11.8142004312 3 153750.577 164071.994 6.7130915548 4 153639.165 168364.225 9.5841838245 5 149607.139 166498.44 11.2904378179 Under the same condition running original tests, that is, updating rows which satisfy a condition col1 = :value1. I did not see any regression. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila wrote: > > Your test and results look good, what kind of m/c you have used to > test this. I ran it on my Macbook Pro, so nothing fancy. The code was compiled with simple ./confgure and with no special flags. The only non-default setting was shared_buffers = 512MB to ensure the table/index fits in memory. > Let me see if I or one of my colleague can do this and > similar test on some high-end m/c. > > Sure. That'll be helpful given a slight unexpected result. May be it's just a noise or we may see different result on a high end m/c. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
On Wed, Jan 4, 2017 at 11:45 PM, Pavan Deolasee wrote: > > > On Tue, Jan 3, 2017 at 9:33 PM, Robert Haas wrote: >> >> On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila >> wrote: >> > Okay, but I think if we know how much is the additional cost in >> > average and worst case, then we can take a better call. >> >> Yeah. We shouldn't just rip out optimizations that are inconvenient >> without doing some test of what the impact is on the cases where those >> optimizations are likely to matter. I don't think it needs to be >> anything incredibly laborious and if there's no discernable impact, >> great. > > > So I performed some tests to measure if this causes any noticeable > regression. > Your test and results look good, what kind of m/c you have used to test this. Let me see if I or one of my colleague can do this and similar test on some high-end m/c. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
Pavan Deolasee wrote: > A transaction then updates the second column in the table. So the > refactored patch will do heap_getattr() on more columns that the master > while checking if HOT update is possible and before giving up. Thanks. > 1-client > Master Refactored > Run1 8774.089935 8979.068604 > Run2 8509.2661 8943.613575 > Run3 8879.484019 8950.994425 > > > 8-clients > Master Refactored > Run1 22520.422448 22672.798871 > Run2 21967.812303 22022.969747 > Run3 22305.073223 21909.945623 Wow, this is very surprising. I was expecting a slight performance decrease, not this. I will try to reproduce these numbers. One thing worth mentioning is that the current implementation is not very good -- I just kept the attribute check loop as it was in the original, which uses heap_getattr. If we used incremental extraction such as what we do in slot_getattr, it could be made a bit faster too. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
On Tue, Jan 3, 2017 at 9:33 PM, Robert Haas wrote: > > On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila wrote: > > Okay, but I think if we know how much is the additional cost in > > average and worst case, then we can take a better call. > > Yeah. We shouldn't just rip out optimizations that are inconvenient > without doing some test of what the impact is on the cases where those > optimizations are likely to matter. I don't think it needs to be > anything incredibly laborious and if there's no discernable impact, > great. So I performed some tests to measure if this causes any noticeable regression. I used the following simple schema: DROP TABLE IF EXISTS testtab; CREATE UNLOGGED TABLE testtab ( col1 integer, col2 text, col3 float, col4 text, col5 text, col6 char(30), col7 text, col8 date, col9 text, col10 text ); INSERT INTO testtab SELECT generate_series(1,10), md5(random()::text), random(), md5(random()::text), md5(random()::text), md5(random()::text)::char(30), md5(random()::text), now(), md5(random()::text), md5(random()::text); CREATE INDEX testindx ON testtab (col1, col2, col3, col4, col5, col6, col7, col8, col9); I used a rather wide UNLOGGED table with an index on first 9 columns, as suggested by Amit. Also, the table has reasonable number of rows, but not more than what shared buffers (set to 512MB for these tests) can hold. This should make the test mostly CPU bound. A transaction then updates the second column in the table. So the refactored patch will do heap_getattr() on more columns that the master while checking if HOT update is possible and before giving up. I believe we are probably testing a somewhat worst case with this setup, though may be I could have tuned some other configuration parameters. \set value random(1, 10) UPDATE testtab SET col2 = md5(random()::text) WHERE col1 = :value; I tested with -c1 and -c8 -j4 and the results are: 1-client Master Refactored Run1 8774.089935 8979.068604 Run2 8509.2661 8943.613575 Run3 8879.484019 8950.994425 8-clients Master Refactored Run1 22520.422448 22672.798871 Run2 21967.812303 22022.969747 Run3 22305.073223 21909.945623 So at best there is some improvement with the patch, though I don't see any reason why it should positively affect the performance. The results with more number of clients look almost identical, probably because the bottleneck shifts somewhere else. For all these tests, table was dropped and recreated in every iteration, so I don't think there was any error in testing. It might be a good idea for someone else to repeat the tests to confirm the improvement that I noticed. Apart from this, I also ran "make check" multiple times and couldn't find any significant difference in the average time. I will leave it to Alvaro's judgement to decide whether it's worth to commit the patch now or later when he or other committer looks at committing WARM/indirect indexes because without either of those patches this change probably does not bring up much value, if we ignore the slight improvement we see here. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila wrote: > Okay, but I think if we know how much is the additional cost in > average and worst case, then we can take a better call. Yeah. We shouldn't just rip out optimizations that are inconvenient without doing some test of what the impact is on the cases where those optimizations are likely to matter. I don't think it needs to be anything incredibly laborious and if there's no discernable impact, great. But it doesn't make sense to change things for the benefit of WARM and indirect indexes and then discover later that we regressed other cases we care about and have no plan to fix it. Better to find out any potential problems of that kind now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
On Mon, Jan 2, 2017 at 10:59 AM, Pavan Deolasee wrote: > > On Mon, Jan 2, 2017 at 10:17 AM, Amit Kapila > wrote: >> >> On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee >> wrote: >> > >> > >> > On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila >> > wrote: >> >> >> >> >> >> I think there is some chance that such a change could induce >> >> regression for the cases when there are many index columns or I think >> >> even when index is on multiple columns (consider index is on first and >> >> eight column in a ten column table). >> >> >> > >> > I don't see that as a problem because the routine only checks for >> > columns >> > that are passed as "interesting_cols". >> > >> >> Right, but now it will evaluate for all interesting_cols whereas >> previously it would just stop at first if that column is changed. >> > > Ah ok. I read your comment "consider index is on first and > eight column in a ten column table" as s/eight/eighth. But may be you're > referring to > the case where there is an index on eight or nine columns of a ten column > table. > I am talking about both kinds of cases. The scenario where we can see some performance impact is when there is variable-width column before the index column (in above context before the eighth column) as there will be cached offset optimization won't work for such a column. > You're right. That's an additional cost as Alvaro himself explained in the > original > post. But both indirect indexes and WARM needs to know information about all > modified columns. So assuming either of these patches are going to make it, > we've to bear that cost. > Okay, but I think if we know how much is the additional cost in average and worst case, then we can take a better call. Also, if we agree, then doing an update-intensive test on a unlogged table or with asynchronous commit mode can show us the overhead if there is any. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
On Mon, Jan 2, 2017 at 10:17 AM, Amit Kapila wrote: > On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee > wrote: > > > > > > On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila > wrote: > >> > >> > >> I think there is some chance that such a change could induce > >> regression for the cases when there are many index columns or I think > >> even when index is on multiple columns (consider index is on first and > >> eight column in a ten column table). > >> > > > > I don't see that as a problem because the routine only checks for columns > > that are passed as "interesting_cols". > > > > Right, but now it will evaluate for all interesting_cols whereas > previously it would just stop at first if that column is changed. > > Ah ok. I read your comment "consider index is on first and eight column in a ten column table" as s/eight/eighth. But may be you're referring to the case where there is an index on eight or nine columns of a ten column table. You're right. That's an additional cost as Alvaro himself explained in the original post. But both indirect indexes and WARM needs to know information about all modified columns. So assuming either of these patches are going to make it, we've to bear that cost. Having said that, given that attributes are usually cached, the cost may not be significant since for non-HOT updates, the attributes will be later fetched anyways while preparing index tuples. So we're probably doing that work in advance. Obviously, I'm not against doing additional performance tests to ensure that the cost is not significant, especially if neither indirect indexes nor WARM gets committed in 10.0 Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee wrote: > > > On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila wrote: >> >> >> I think there is some chance that such a change could induce >> regression for the cases when there are many index columns or I think >> even when index is on multiple columns (consider index is on first and >> eight column in a ten column table). >> > > I don't see that as a problem because the routine only checks for columns > that are passed as "interesting_cols". > Right, but now it will evaluate for all interesting_cols whereas previously it would just stop at first if that column is changed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila wrote: > > I think there is some chance that such a change could induce > regression for the cases when there are many index columns or I think > even when index is on multiple columns (consider index is on first and > eight column in a ten column table). > > I don't see that as a problem because the routine only checks for columns that are passed as "interesting_cols". Noticed below comment in interesting-attrs-2.patch > + * are considered the "key" of rows in the table, and columns that are > + * part of indirect indexes. > > Is it right to mention about indirect indexes in above comment > considering indirect indexes are still not part of core code? > I agree. We can add details about indirect indexes or WARM later, as and when those patches get committed. > Pavan, please rebase your WARM patch on top of this and let me know how > you like it. I'll post a new version of indirect indexes later this > week. > > I've rebased WARM on top of this patch and the proposed changes look fine from WARM's perspective too. I'll send rebased patches separately. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
On Thu, Dec 29, 2016 at 4:50 AM, Alvaro Herrera wrote: > Pursuant to my comments at > https://www.postgresql.org/message-id/20161223192245.hx4rbrxbrwtgwj6i@alvherre.pgsql > and because of a stupid bug I found in my indirect indexes patch, I > rewrote (read: removed) HeapSatisfiesHOTAndKey. The replacement > function HeapDetermineModifiedColumns returns a bitmapset with a bit set > for each modified column, for those columns that are listed as > "interesting" -- currently that set is the ID columns, the "key" > columns, and the indexed columns. The new code is much simpler, at the > expense of a few bytes of additional memory used during heap_update(). > > All tests pass. > > Both WARM and indirect indexes should be able to use this infrastructure > in a way that better serves them than the current HeapSatisfiesHOTAndKey > (both patches modify that function in a rather ugly way). I intend to > get this pushed shortly unless objections are raised. > > The new coding prevents stopping the check early as soon as the three > result booleans could be determined. > I think there is some chance that such a change could induce regression for the cases when there are many index columns or I think even when index is on multiple columns (consider index is on first and eight column in a ten column table). The reason for such a suspicion is that heap_getattr() is not so cheap that doing it multiple times is free. Do you think it is worth to do few tests before committing the patch? Noticed below comment in interesting-attrs-2.patch + * are considered the "key" of rows in the table, and columns that are + * part of indirect indexes. Is it right to mention about indirect indexes in above comment considering indirect indexes are still not part of core code? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
Here's a version with fixed comments. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index ea579a0..19edbdf 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -95,11 +95,8 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf, Buffer newbuf, HeapTuple oldtup, HeapTuple newtup, HeapTuple old_key_tup, bool all_visible_cleared, bool new_all_visible_cleared); -static void HeapSatisfiesHOTandKeyUpdate(Relation relation, -Bitmapset *hot_attrs, -Bitmapset *key_attrs, Bitmapset *id_attrs, -bool *satisfies_hot, bool *satisfies_key, -bool *satisfies_id, +static Bitmapset *HeapDetermineModifiedColumns(Relation relation, +Bitmapset *interesting_cols, HeapTuple oldtup, HeapTuple newtup); static bool heap_acquire_tuplock(Relation relation, ItemPointer tid, LockTupleMode mode, LockWaitPolicy wait_policy, @@ -3443,6 +3440,8 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup, Bitmapset *hot_attrs; Bitmapset *key_attrs; Bitmapset *id_attrs; + Bitmapset *interesting_attrs; + Bitmapset *modified_attrs; ItemId lp; HeapTupleData oldtup; HeapTuple heaptup; @@ -3460,9 +3459,6 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup, pagefree; boolhave_tuple_lock = false; booliscombo; - boolsatisfies_hot; - boolsatisfies_key; - boolsatisfies_id; booluse_hot_update = false; boolkey_intact; boolall_visible_cleared = false; @@ -3489,21 +3485,30 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup, errmsg("cannot update tuples during a parallel operation"))); /* -* Fetch the list of attributes to be checked for HOT update. This is -* wasted effort if we fail to update or have to put the new tuple on a -* different page. But we must compute the list before obtaining buffer -* lock --- in the worst case, if we are doing an update on one of the -* relevant system catalogs, we could deadlock if we try to fetch the list -* later. In any case, the relcache caches the data so this is usually -* pretty cheap. +* Fetch the list of attributes to be checked for various operations. * -* Note that we get a copy here, so we need not worry about relcache flush -* happening midway through. +* For HOT considerations, this is wasted effort if we fail to update or +* have to put the new tuple on a different page. But we must compute the +* list before obtaining buffer lock --- in the worst case, if we are doing +* an update on one of the relevant system catalogs, we could deadlock if +* we try to fetch the list later. In any case, the relcache caches the +* data so this is usually pretty cheap. +* +* We also need columns used by the replica identity, the columns that +* are considered the "key" of rows in the table, and columns that are +* part of indirect indexes. +* +* Note that we get copies of each bitmap, so we need not worry about +* relcache flush happening midway through. */ hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_ALL); key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY); id_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY); + interesting_attrs = bms_add_members(NULL, hot_attrs); + interesting_attrs = bms_add_members(interesting_attrs, key_attrs); + interesting_attrs = bms_add_members(interesting_attrs, id_attrs); + block = ItemPointerGetBlockNumber(otid); buffer = ReadBuffer(relation, block); @@ -3524,7 +3529,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup, Assert(ItemIdIsNormal(lp)); /* -* Fill in enough data in oldtup for HeapSatisfiesHOTandKeyUpdate to work +* Fill in enough data in oldtup for HeapDetermineModifiedColumns to