Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-06 Thread Amit Kapila
On Sat, Jan 7, 2017 at 11:27 AM, Mithun Cy  wrote:
> On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila  wrote:
>>
>> Your test and results look good, what kind of m/c you have used to
>> test this.  Let me see if I or one of my colleague can do this and
>> similar test on some high-end m/c.
>
> As discussed with Amit, I have tried to run the same tests with below
> modification,
> 1. Increased the total rows to 10milion.
> 2. Set fsync off;
> 3. Changed tests as below. Updated all rows at a time.
>
> VACUUM FULL;
> BEGIN;
> UPDATE testtab SET col2 = md5(random()::text);
> ROLLBACK;
>
> I have run these tests on IBM power2 which have sufficient ram. I have
> set shared_buffer=32GB.
>
> My results show after this patch there is a slight increase in
> response time (psql \timing was used) for the above update statement.
> Which is around 5 to 10% delay.
>

I would like to add here, that the intention of the test was to stress
the changes of the patch to see the overhead patch can incur.  Now,
surely this is a synthetic test prepared to test this patch, but I
think it indicates that the changes have some overhead which might or
might not be ignorable depending on how important is to get this
patch.  I think if Warm tuples or indirect indexes need this patch and
they can't do without this, then it is worth considering this patch
along with those patches.  OTOH, if we can reduce the overhead, then
it might be okay proceed with this patch on the basis of simplicity it
can bring.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-06 Thread Mithun Cy
On Sat, Jan 7, 2017 at 11:27 AM, Mithun Cy  wrote:
Sorry Auto plain text setting has disturbed the table indentation.
Attaching the spreadsheet for same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


HeapSatisfiesHOTAndKey_perf_results.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-06 Thread Mithun Cy
On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila  wrote:
>
> Your test and results look good, what kind of m/c you have used to
> test this.  Let me see if I or one of my colleague can do this and
> similar test on some high-end m/c.

As discussed with Amit, I have tried to run the same tests with below
modification,
1. Increased the total rows to 10milion.
2. Set fsync off;
3. Changed tests as below. Updated all rows at a time.

VACUUM FULL;
BEGIN;
UPDATE testtab SET col2 = md5(random()::text);
ROLLBACK;

I have run these tests on IBM power2 which have sufficient ram. I have
set shared_buffer=32GB.

My results show after this patch there is a slight increase in
response time (psql \timing was used) for the above update statement.
Which is around 5 to 10% delay.


Runs Response time in ms for update base. Response
Time in ms for update new patch.  %INC

1 158863.501
  167443.767
  5.4010304104

2 151061.793
  168908.536
 11.8142004312

3 153750.577
  164071.994
 6.7130915548

4 153639.165
  168364.225
 9.5841838245

5 149607.139
  166498.44
  11.2904378179


Under the same condition running original tests, that is, updating
rows which satisfy a condition col1 = :value1. I did not see any
regression.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-05 Thread Pavan Deolasee
On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila  wrote:

>
> Your test and results look good, what kind of m/c you have used to
> test this.


I ran it on my Macbook Pro, so nothing fancy. The code was compiled with
simple ./confgure and with no special flags. The only non-default setting
was shared_buffers = 512MB to ensure the table/index fits in memory.


>   Let me see if I or one of my colleague can do this and
> similar test on some high-end m/c.
>
>
Sure. That'll be helpful given a slight unexpected result. May be it's just
a noise or we may see different result on a high end m/c.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-05 Thread Amit Kapila
On Wed, Jan 4, 2017 at 11:45 PM, Pavan Deolasee
 wrote:
>
>
> On Tue, Jan 3, 2017 at 9:33 PM, Robert Haas  wrote:
>>
>> On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila 
>> wrote:
>> > Okay, but I think if we know how much is the additional cost in
>> > average and worst case, then we can take a better call.
>>
>> Yeah.  We shouldn't just rip out optimizations that are inconvenient
>> without doing some test of what the impact is on the cases where those
>> optimizations are likely to matter.  I don't think it needs to be
>> anything incredibly laborious and if there's no discernable impact,
>> great.
>
>
> So I performed some tests to measure if this causes any noticeable
> regression.
>

Your test and results look good, what kind of m/c you have used to
test this.  Let me see if I or one of my colleague can do this and
similar test on some high-end m/c.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-04 Thread Alvaro Herrera
Pavan Deolasee wrote:

> A transaction then updates the second column in the table. So the
> refactored patch will do heap_getattr() on more columns that the master
> while checking if HOT update is possible and before giving up.

Thanks.

> 1-client
>   Master Refactored
> Run1 8774.089935 8979.068604
> Run2 8509.2661 8943.613575
> Run3 8879.484019 8950.994425
> 
> 
> 8-clients
>   Master Refactored
> Run1 22520.422448 22672.798871
> Run2 21967.812303 22022.969747
> Run3 22305.073223 21909.945623

Wow, this is very surprising.  I was expecting a slight performance
decrease, not this.  I will try to reproduce these numbers.

One thing worth mentioning is that the current implementation is not
very good -- I just kept the attribute check loop as it was in the
original, which uses heap_getattr.  If we used incremental extraction
such as what we do in slot_getattr, it could be made a bit faster too.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-04 Thread Pavan Deolasee
On Tue, Jan 3, 2017 at 9:33 PM, Robert Haas  wrote:
>
> On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila 
wrote:
> > Okay, but I think if we know how much is the additional cost in
> > average and worst case, then we can take a better call.
>
> Yeah.  We shouldn't just rip out optimizations that are inconvenient
> without doing some test of what the impact is on the cases where those
> optimizations are likely to matter.  I don't think it needs to be
> anything incredibly laborious and if there's no discernable impact,
> great.


So I performed some tests to measure if this causes any noticeable
regression. I used the following simple schema:

DROP TABLE IF EXISTS testtab;
CREATE UNLOGGED TABLE testtab (
col1 integer,
col2 text,
col3 float,
col4 text,
col5 text,
col6 char(30),
col7 text,
col8 date,
col9 text,
col10 text
);
INSERT INTO testtab
SELECT generate_series(1,10),
md5(random()::text),
random(),
md5(random()::text),
md5(random()::text),
md5(random()::text)::char(30),
md5(random()::text),
now(),
md5(random()::text),
md5(random()::text);
CREATE INDEX testindx ON testtab (col1, col2, col3, col4, col5, col6, col7,
col8, col9);

I used a rather wide UNLOGGED table with an index on first 9 columns, as
suggested by Amit. Also, the table has reasonable number of rows, but not
more than what shared buffers (set to 512MB for these tests) can hold. This
should make the test mostly CPU bound.

A transaction then updates the second column in the table. So the
refactored patch will do heap_getattr() on more columns that the master
while checking if HOT update is possible and before giving up. I believe we
are probably testing a somewhat worst case with this setup, though may be I
could have tuned some other configuration parameters.

\set value random(1, 10)
UPDATE testtab SET col2 = md5(random()::text) WHERE col1 = :value;

I tested with -c1 and -c8 -j4 and the results are:

1-client
  Master Refactored
Run1 8774.089935 8979.068604
Run2 8509.2661 8943.613575
Run3 8879.484019 8950.994425


8-clients
  Master Refactored
Run1 22520.422448 22672.798871
Run2 21967.812303 22022.969747
Run3 22305.073223 21909.945623


So at best there is some improvement with the patch, though I don't see any
reason why it should positively affect the performance. The results with
more number of clients look almost identical, probably because the
bottleneck shifts somewhere else. For all these tests, table was dropped
and recreated in every iteration, so I don't think there was any error in
testing. It might be a good idea for someone else to repeat the tests to
confirm the improvement that I noticed.

Apart from this, I also ran "make check" multiple times and couldn't find
any significant difference in the average time.

I will leave it to Alvaro's judgement to decide whether it's worth to
commit the patch now or later when he or other committer looks at
committing WARM/indirect indexes because without either of those patches
this change probably does not bring up much value, if we ignore the slight
improvement we see here.

Thanks,
Pavan

--
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-03 Thread Robert Haas
On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila  wrote:
> Okay, but I think if we know how much is the additional cost in
> average and worst case, then we can take a better call.

Yeah.  We shouldn't just rip out optimizations that are inconvenient
without doing some test of what the impact is on the cases where those
optimizations are likely to matter.  I don't think it needs to be
anything incredibly laborious and if there's no discernable impact,
great.  But it doesn't make sense to change things for the benefit of
WARM and indirect indexes and then discover later that we regressed
other cases we care about and have no plan to fix it.  Better to find
out any potential problems of that kind now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-01 Thread Amit Kapila
On Mon, Jan 2, 2017 at 10:59 AM, Pavan Deolasee
 wrote:
>
> On Mon, Jan 2, 2017 at 10:17 AM, Amit Kapila 
> wrote:
>>
>> On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee 
>> wrote:
>> >
>> >
>> > On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila 
>> > wrote:
>> >>
>> >>
>> >> I think there is some chance that such a change could induce
>> >> regression for the cases when there are many index columns or I think
>> >> even when index is on multiple columns (consider index is on first and
>> >> eight column in a ten column table).
>> >>
>> >
>> > I don't see that as a problem because the routine only checks for
>> > columns
>> > that are passed as "interesting_cols".
>> >
>>
>> Right, but now it will evaluate for all interesting_cols whereas
>> previously it would just stop at first if that column is changed.
>>
>
> Ah ok. I read your comment "consider index is on first and
> eight column in a ten column table" as s/eight/eighth. But may be you're
> referring to
> the case where there is an index on eight or nine columns of a ten column
> table.
>

I am talking about both kinds of cases.  The scenario where we can see
some performance impact is when there is variable-width column before
the index column (in above context before the eighth column) as there
will be cached offset optimization won't work for such a column.

> You're right. That's an additional cost as Alvaro himself explained in the
> original
> post. But both indirect indexes and WARM needs to know information about all
> modified columns. So assuming either of these patches are going to make it,
> we've to bear that cost.
>

Okay, but I think if we know how much is the additional cost in
average and worst case, then we can take a better call.  Also, if we
agree, then doing an update-intensive test on a unlogged table or with
asynchronous commit mode can show us the overhead if there is any.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-01 Thread Pavan Deolasee
On Mon, Jan 2, 2017 at 10:17 AM, Amit Kapila 
wrote:

> On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee 
> wrote:
> >
> >
> > On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila 
> wrote:
> >>
> >>
> >> I think there is some chance that such a change could induce
> >> regression for the cases when there are many index columns or I think
> >> even when index is on multiple columns (consider index is on first and
> >> eight column in a ten column table).
> >>
> >
> > I don't see that as a problem because the routine only checks for columns
> > that are passed as "interesting_cols".
> >
>
> Right, but now it will evaluate for all interesting_cols whereas
> previously it would just stop at first if that column is changed.
>
>
Ah ok. I read your comment "consider index is on first and
eight column in a ten column table" as s/eight/eighth. But may be you're
referring to
the case where there is an index on eight or nine columns of a ten column
table.

You're right. That's an additional cost as Alvaro himself explained in the
original
post. But both indirect indexes and WARM needs to know information about all
modified columns. So assuming either of these patches are going to make it,
we've to bear that cost. Having said that, given that attributes are
usually cached,
the cost may not be significant  since for non-HOT updates, the
attributes will be later fetched anyways while preparing index tuples. So
we're
probably doing that work in advance.

Obviously, I'm not against doing additional performance tests to ensure
that the
cost is not significant, especially if neither indirect indexes nor WARM
gets
committed in 10.0

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-01 Thread Amit Kapila
On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee  wrote:
>
>
> On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila  wrote:
>>
>>
>> I think there is some chance that such a change could induce
>> regression for the cases when there are many index columns or I think
>> even when index is on multiple columns (consider index is on first and
>> eight column in a ten column table).
>>
>
> I don't see that as a problem because the routine only checks for columns
> that are passed as "interesting_cols".
>

Right, but now it will evaluate for all interesting_cols whereas
previously it would just stop at first if that column is changed.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-01 Thread Pavan Deolasee
On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila  wrote:

>
> I think there is some chance that such a change could induce
> regression for the cases when there are many index columns or I think
> even when index is on multiple columns (consider index is on first and
> eight column in a ten column table).
>
>
I don't see that as a problem because the routine only checks for columns
that are passed as "interesting_cols".

Noticed below comment in interesting-attrs-2.patch
> + * are considered the "key" of rows in the table, and columns that are
> + * part of indirect indexes.
>
> Is it right to mention about indirect indexes in above comment
> considering indirect indexes are still not part of core code?
>

I agree. We can add details about indirect indexes or WARM later, as and
when those patches get committed.


> Pavan, please rebase your WARM patch on top of this and let me know how
> you like it.  I'll post a new version of indirect indexes later this
> week.
>
>
I've rebased WARM on top of this patch and the proposed changes look fine
from WARM's perspective too. I'll send rebased patches separately.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-01 Thread Amit Kapila
On Thu, Dec 29, 2016 at 4:50 AM, Alvaro Herrera
 wrote:
> Pursuant to my comments at
> https://www.postgresql.org/message-id/20161223192245.hx4rbrxbrwtgwj6i@alvherre.pgsql
> and because of a stupid bug I found in my indirect indexes patch, I
> rewrote (read: removed) HeapSatisfiesHOTAndKey.  The replacement
> function HeapDetermineModifiedColumns returns a bitmapset with a bit set
> for each modified column, for those columns that are listed as
> "interesting" -- currently that set is the ID columns, the "key"
> columns, and the indexed columns.  The new code is much simpler, at the
> expense of a few bytes of additional memory used during heap_update().
>
> All tests pass.
>
> Both WARM and indirect indexes should be able to use this infrastructure
> in a way that better serves them than the current HeapSatisfiesHOTAndKey
> (both patches modify that function in a rather ugly way).  I intend to
> get this pushed shortly unless objections are raised.
>
> The new coding prevents stopping the check early as soon as the three
> result booleans could be determined.
>

I think there is some chance that such a change could induce
regression for the cases when there are many index columns or I think
even when index is on multiple columns (consider index is on first and
eight column in a ten column table).  The reason for such a suspicion
is that heap_getattr() is not so cheap that doing it multiple times is
free.  Do you think it is worth to do few tests before committing the
patch?

Noticed below comment in interesting-attrs-2.patch
+ * are considered the "key" of rows in the table, and columns that are
+ * part of indirect indexes.

Is it right to mention about indirect indexes in above comment
considering indirect indexes are still not part of core code?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2016-12-30 Thread Alvaro Herrera
Here's a version with fixed comments.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ea579a0..19edbdf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -95,11 +95,8 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer 
oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tup,
bool all_visible_cleared, bool 
new_all_visible_cleared);
-static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
-Bitmapset *hot_attrs,
-Bitmapset *key_attrs, 
Bitmapset *id_attrs,
-bool *satisfies_hot, 
bool *satisfies_key,
-bool *satisfies_id,
+static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
+Bitmapset 
*interesting_cols,
 HeapTuple oldtup, 
HeapTuple newtup);
 static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
 LockTupleMode mode, LockWaitPolicy 
wait_policy,
@@ -3443,6 +3440,8 @@ heap_update(Relation relation, ItemPointer otid, 
HeapTuple newtup,
Bitmapset  *hot_attrs;
Bitmapset  *key_attrs;
Bitmapset  *id_attrs;
+   Bitmapset  *interesting_attrs;
+   Bitmapset  *modified_attrs;
ItemId  lp;
HeapTupleData oldtup;
HeapTuple   heaptup;
@@ -3460,9 +3459,6 @@ heap_update(Relation relation, ItemPointer otid, 
HeapTuple newtup,
pagefree;
boolhave_tuple_lock = false;
booliscombo;
-   boolsatisfies_hot;
-   boolsatisfies_key;
-   boolsatisfies_id;
booluse_hot_update = false;
boolkey_intact;
boolall_visible_cleared = false;
@@ -3489,21 +3485,30 @@ heap_update(Relation relation, ItemPointer otid, 
HeapTuple newtup,
 errmsg("cannot update tuples during a parallel 
operation")));
 
/*
-* Fetch the list of attributes to be checked for HOT update.  This is
-* wasted effort if we fail to update or have to put the new tuple on a
-* different page.  But we must compute the list before obtaining buffer
-* lock --- in the worst case, if we are doing an update on one of the
-* relevant system catalogs, we could deadlock if we try to fetch the 
list
-* later.  In any case, the relcache caches the data so this is usually
-* pretty cheap.
+* Fetch the list of attributes to be checked for various operations.
 *
-* Note that we get a copy here, so we need not worry about relcache 
flush
-* happening midway through.
+* For HOT considerations, this is wasted effort if we fail to update or
+* have to put the new tuple on a different page.  But we must compute 
the
+* list before obtaining buffer lock --- in the worst case, if we are 
doing
+* an update on one of the relevant system catalogs, we could deadlock 
if
+* we try to fetch the list later.  In any case, the relcache caches the
+* data so this is usually pretty cheap.
+*
+* We also need columns used by the replica identity, the columns that
+* are considered the "key" of rows in the table, and columns that are
+* part of indirect indexes.
+*
+* Note that we get copies of each bitmap, so we need not worry about
+* relcache flush happening midway through.
 */
hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_ALL);
key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
id_attrs = RelationGetIndexAttrBitmap(relation,

  INDEX_ATTR_BITMAP_IDENTITY_KEY);
+   interesting_attrs = bms_add_members(NULL, hot_attrs);
+   interesting_attrs = bms_add_members(interesting_attrs, key_attrs);
+   interesting_attrs = bms_add_members(interesting_attrs, id_attrs);
+
 
block = ItemPointerGetBlockNumber(otid);
buffer = ReadBuffer(relation, block);
@@ -3524,7 +3529,7 @@ heap_update(Relation relation, ItemPointer otid, 
HeapTuple newtup,
Assert(ItemIdIsNormal(lp));
 
/*
-* Fill in enough data in oldtup for HeapSatisfiesHOTandKeyUpdate to 
work
+* Fill in enough data in oldtup for HeapDetermineModifiedColumns to