Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-10-02 Thread Daniel Gustafsson
> On 28 Jul 2017, at 16:46, Robert Haas  wrote:
> 
> On Fri, Jul 28, 2017 at 12:39 AM, Pavan Deolasee
>  wrote:
>> I see your point. But I would like to think this way: does the technology
>> significantly help many common use cases, that are currently not addressed
>> by HOT? It probably won't help all workloads, that's given. Also, we don't
>> have any credible alternative while this patch has progressed quite a lot.
>> May be Robert will soon present the pluggable storage/UNDO patch and that
>> will cover everything and more that is currently covered by HOT/WARM. That
>> will probably make many other things redundant.
> 
> A lot of work is currently being done on this, by multiple people,
> mostly not including me, and a lot of good progress is being made.
> But it's not exactly ready to ship, nor will it be any time soon.  I
> think we can run a 1-client pgbench without crashing the server at
> this point, if you tweak the configuration a little bit and don't do
> anything fancy like, say, try to roll back a transaction.  :-)

The discussions in this implies that there is a bit more work on this patch,
which also hasn’t moved in the current commitfest, so marking it Returned with
Feedback.  Please re-submit this work in a future commitfest when ready for a
new round of reviews.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: In-place index updates and HOT (Was: [HACKERS] Patch: Write Amplification Reduction Method (WARM))

2017-07-29 Thread Peter Geoghegan

Claudio Freire  wrote:

README.HOT says that that cost is not worth the benefit of
preventing a new index write, but I think that it ought to take into
account that not all index writes are equal. There is an appreciable
difference between inserting a new tuple, and updating one in-place. We
can remove the cost (hurting new snapshots by making them go through old
heap pages) while preserving most of the benefits (no logically
unnecessary index bloat).


It's a neat idea.


Thanks.

I think it's important to both prevent index bloat, and to make sure
that only the latest version is pointed to within indexes. There are
only so many ways that that can be done. I've tried to come up with a
way of doing those two things that breaks as little of heapam.c as
possible. As a bonus, some kind of super-pruning of many linked HOT
chains may be enabled, which is something that an asynchronous process
can do when triggered by a regular prune within a user backend.

This is a kind of micro-vacuum that is actually much closer to VACUUM
than the kill_prior_tuple stuff, or traditional pruning, in that it
potentially kills index entries (just those that were not subsequently
updated in place, because the new values for the index differed), and
then kills heap tuples, all together, without even keeping around a stub
itemId in the heap. And, chaining together HOT chains also lets us chain
together pruning. Retail index tuple deletion from pruning needs to be
crash safe, unlike LP_DEAD setting.


And, well, now that you mention, you don't need to touch indexes at all.

You can create the new chain, and "update" the index to point to it,
without ever touching the index itself, since you can repoint the old
HOT chain's start line pointer to point to the new HOT chain, create
a new pointer for the old one and point to it in the new HOT chain's
t_tid.

Existing index tuples thus now point to the right HOT chain without
having to go into the index and make any changes.

You do need the new HOT chain to live in the same page for this,
however.


That seems complicated. The idea that I'm trying to preserve here is the
idea that the beginning of a HOT-chain (a definition that includes a
"potential HOT chain" -- a single heap tuple that could later receive a
HOT UPDATE) unambiguously signals a need for physical changes to indexes
in all cases. The idea that I'm trying to move away from is that those
physical changes need to be new index insertions (new insertions should
only happen when it is logically necessary, because indexed values
changed).

Note that this can preserve the kill_prior_tuple stuff, I think, because
if everything is dead within a single HOT chain (a HOT chain by our
current definition -- not a chain of HOT chains) then nobody can need
the index tuple. This does require adding complexity around aborted
transactions, whose new (potential) HOT chain t_tid "backpointer" is
still needed; we must revise the definition of a HOT chain being
all_dead to accommodate that. But for the most part, we preserve HOT
chains as a thing that garbage collection can independently reason
about, process with single page atomic operations while still being
crash safe, etc.

As far as microvacuum style garbage collection goes, at a high level,
HOT chains seem like a good choke point to do clean-up of both heap
tuples (pruning) and index tuples. The complexity of doing that seems
manageable. And by chaining together HOT chains, you can really
aggressively microvacuum many HOT chains on many pages within an
asynchronous process as soon as the long running transaction goes away.
We lean on temporal locality for garbage collection.

There are numerous complications that I haven't really acknowledged but
am at least aware of. For one, when I say "update in place", I don't
necessarily mean it literally. It's probably possible to literally
update in place with unique indexes. For secondary indexes, which should
still have heap TID as part of their keyspace (once you go implement
that, Claudio), we probably need an index insertion immediately followed
by an index deletion, often within the same leaf page.

I hope that this design, such as it is, will be reviewed as a thought
experiment. What would be good or bad about a design like this in the
real world, particularly as compared to alternatives that we know about?
Is *some* "third way" design desirable and achievable, if not this one?
By "third way" design, I mean a design that is much less invasive than
adopting UNDO for MVCC, that still addresses the issues that we
currently have with certain types of UPDATE-heavy workloads, especially
when there are long running transactions, etc. I doubt that WARM meets
this standard, unfortunately, because it doesn't do anything for cases
that suffer only due to a long running xact.

I don't accept that there is a rigid dichotomy between Postgres style
MVCC, and using UNDO for MVCC, and I most certainly don't accept that
garbage 

Re: In-place index updates and HOT (Was: [HACKERS] Patch: Write Amplification Reduction Method (WARM))

2017-07-28 Thread Claudio Freire
On Fri, Jul 28, 2017 at 8:32 PM, Peter Geoghegan  wrote:
> README.HOT says that that cost is not worth the benefit of
> preventing a new index write, but I think that it ought to take into
> account that not all index writes are equal. There is an appreciable
> difference between inserting a new tuple, and updating one in-place. We
> can remove the cost (hurting new snapshots by making them go through old
> heap pages) while preserving most of the benefits (no logically
> unnecessary index bloat).

It's a neat idea.

And, well, now that you mention, you don't need to touch indexes at all.

You can create the new chain, and "update" the index to point to it,
without ever touching the index itself, since you can repoint the old
HOT chain's start line pointer to point to the new HOT chain, create
a new pointer for the old one and point to it in the new HOT chain's
t_tid.

Existing index tuples thus now point to the right HOT chain without
having to go into the index and make any changes.

You do need the new HOT chain to live in the same page for this,
however.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


In-place index updates and HOT (Was: [HACKERS] Patch: Write Amplification Reduction Method (WARM))

2017-07-28 Thread Peter Geoghegan

Pavan Deolasee  wrote:

One good thing is that the patch is ready and fully functional. So that
allows those who are keen to run real performance tests and see the actual
impact of the patch.


Very true.


I see your point. But I would like to think this way: does the technology
significantly help many common use cases, that are currently not addressed
by HOT? It probably won't help all workloads, that's given. Also, we don't
have any credible alternative while this patch has progressed quite a lot.
May be Robert will soon present the pluggable storage/UNDO patch and that
will cover everything and more that is currently covered by HOT/WARM. That
will probably make many other things redundant.


Well, I don't assume that it will; again, I just don't know. I agree
with your general assessment of things, which is that WARM, EDB's
Z-Heap/UNDO project, and things like IOTs have significant overlap in
terms of the high-level problems that they fix. While it's hard to say
just how much overlap exists, it's clearly more than a little. And, you
are right that we don't have a credible alternative in this general
category right now. The WARM patch is available today.

As you may have noticed, in recent weeks I've been very vocal about the
role of index bloat in cases where bloat has a big impact on production
workloads. I think that it has an under-appreciated role in workloads
that deteriorate over time, as bloat accumulates. Perhaps HOT made such
a big difference to workloads 10 years ago not just because it prevented
creating new index entries. It also reduced fragmentation of the
keyspace in indexes, by never inserting duplicates in the first place.

I have some rough ideas related to this, and to the general questions
you're addressing. I'd like to run these by you.

In-place index updates + HOT


Maybe we could improve things markedly in this general area by "chaining
together HOT chains", and updating index heap pointers in place, to
point to the start of the latest HOT chain in that chain of chains
(provided the index tuple was "logically unchanged" -- otherwise, you'd
need to have both sets of indexed values at once, of course). Index
tuples therefore always point to the latest HOT chain, favoring recent
MVCC snapshots over older ones.

Pruning
---

HOT pruning is great because you can remove heap bloat without worrying
about there being index entries with heap item pointers pointing to what
is removed. But isn't that limitation as much about what is in the index
as it is about what is in the heap?

Under this scheme, you don't even have to keep around the old ItemId
stub when pruning, if it's a sufficiently old HOT chain that no index
points to the corresponding TID. That may not seem like a lot of bloat
to have to keep around, but it accumulates within a page until VACUUM
runs, ultimately limiting the effectiveness of pruning for certain
workloads.

Old snapshots/row versions
--

Superseding HOT chains have their last heap tuple's t_tid point to the
start of the preceding/superseded HOT chain (not their own TID, as
today, which is redundant), which may or may not be on the same heap
page. That's how old snapshots go backwards to get old versions, without
needing their own "logically redundant" index entries. So with UPDATE
heavy workloads that are essentially HOT-safe today, performance doesn't
tank due to a long running transaction that obstructs pruning within a
heap page, and thus necessitates the insertion of new index tuples.
That's the main justification for this entire design.

It's also possible that pruning can be taught that since only one index
update was logically necessary when the to-be-pruned HOT chain was
created, it's worth doing a "retail index tuple deletion" against the
index tuple that was logically necessary, then completely obliterating
the HOT chain, stub item pointer and all.

Bloat and locality
--

README.HOT argues against HOT chains that span pages, which this is a
bit like, on the grounds that it's bad news that every recent snapshot
has to go through the old heap page. That makes sense, but only because
the temporal locality there is horrible, which would not be the case
here. README.HOT says that that cost is not worth the benefit of
preventing a new index write, but I think that it ought to take into
account that not all index writes are equal. There is an appreciable
difference between inserting a new tuple, and updating one in-place. We
can remove the cost (hurting new snapshots by making them go through old
heap pages) while preserving most of the benefits (no logically
unnecessary index bloat).

The benefit of HOT is clearly more bloat prevention than not having to
visit indexes at all. InnoDB secondary index updates update the index
twice: The first time, during the update itself, and the second time, by
the purge thread, once the xact commits. Clearly they care about 

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-07-28 Thread Robert Haas
On Fri, Jul 28, 2017 at 12:39 AM, Pavan Deolasee
 wrote:
> I see your point. But I would like to think this way: does the technology
> significantly help many common use cases, that are currently not addressed
> by HOT? It probably won't help all workloads, that's given. Also, we don't
> have any credible alternative while this patch has progressed quite a lot.
> May be Robert will soon present the pluggable storage/UNDO patch and that
> will cover everything and more that is currently covered by HOT/WARM. That
> will probably make many other things redundant.

A lot of work is currently being done on this, by multiple people,
mostly not including me, and a lot of good progress is being made.
But it's not exactly ready to ship, nor will it be any time soon.  I
think we can run a 1-client pgbench without crashing the server at
this point, if you tweak the configuration a little bit and don't do
anything fancy like, say, try to roll back a transaction.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-07-27 Thread Pavan Deolasee
On Fri, Jul 28, 2017 at 5:57 AM, Peter Geoghegan  wrote:

> Pavan Deolasee  wrote:
> > I'll be happy if someone wants to continue hacking the patch further and
> > get it in a committable shape. I can stay actively involved. But TBH the
> > amount of time I can invest is far as compared to what I could during the
> > last cycle.
>
> That's disappointing.
>
>
Yes, it is even more for me. But I was hard pressed to choose between
Postgres-XL 10 and WARM. Given ever increasing interest in XL and my
ability to control the outcome, I thought it makes sense to focus on XL for
now.


> I personally find it very difficult to assess something like this.


One good thing is that the patch is ready and fully functional. So that
allows those who are keen to run real performance tests and see the actual
impact of the patch.


> The
> problem is that even if you can demonstrate that the patch is strictly
> better than what we have today, the risk of reaching a local maxima
> exists.  Do we really want to double-down on HOT?
>

Well HOT has served us well for over a decade now. So I won't hesitate to
place my bets on WARM.


>
> If I'm not mistaken, the goal of WARM is, roughly speaking, to make
> updates that would not be HOT-safe today do a "partial HOT update".  My
> concern with that idea is that it doesn't do much for the worst case.
>

I see your point. But I would like to think this way: does the technology
significantly help many common use cases, that are currently not addressed
by HOT? It probably won't help all workloads, that's given. Also, we don't
have any credible alternative while this patch has progressed quite a lot.
May be Robert will soon present the pluggable storage/UNDO patch and that
will cover everything and more that is currently covered by HOT/WARM. That
will probably make many other things redundant.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-07-27 Thread Peter Geoghegan
Pavan Deolasee  wrote:
> I'll be happy if someone wants to continue hacking the patch further and
> get it in a committable shape. I can stay actively involved. But TBH the
> amount of time I can invest is far as compared to what I could during the
> last cycle.

That's disappointing.

I personally find it very difficult to assess something like this. The
problem is that even if you can demonstrate that the patch is strictly
better than what we have today, the risk of reaching a local maxima
exists.  Do we really want to double-down on HOT?

If I'm not mistaken, the goal of WARM is, roughly speaking, to make
updates that would not be HOT-safe today do a "partial HOT update".  My
concern with that idea is that it doesn't do much for the worst case.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-07-27 Thread Pavan Deolasee
On Wed, Jul 26, 2017 at 6:26 PM, Robert Haas  wrote:

> On Tue, Apr 18, 2017 at 4:25 AM, Pavan Deolasee
>  wrote:
> > I'll include the fix in the next set of patches.
>
> I haven't see a new set of patches.  Are you intending to continue
> working on this?
>
>
Looks like I'll be short on bandwidth to pursue this further, given other
work commitments including upcoming Postgres-XL 10 release. While I haven't
worked on the patch since April, I think it was in a pretty good shape
where I left it. But it's going to be incredibly difficult to estimate the
amount of further efforts required, especially with testing and validating
all the use cases and finding optimisations to fix regressions in all those
cases. Also, many fundamental concerns around the patch touching the core
of the database engine can only be addressed if some senior hackers, like
you, take serious interest in the patch.

I'll be happy if someone wants to continue hacking the patch further and
get it in a committable shape. I can stay actively involved. But TBH the
amount of time I can invest is far as compared to what I could during the
last cycle.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-07-26 Thread Robert Haas
On Tue, Apr 18, 2017 at 4:25 AM, Pavan Deolasee
 wrote:
> I'll include the fix in the next set of patches.

I haven't see a new set of patches.  Are you intending to continue
working on this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-18 Thread Pavan Deolasee
On Fri, Apr 14, 2017 at 9:21 PM, Jaime Casanova  wrote:

>
>
> Hi Pavan,
>
> I run a test on current warm patchset, i used pgbench with a scale of
> 20 and a fillfactor of 90 and then start the pgbench run with 6
> clients in parallel i also run sqlsmith on it.
>
> And i got a core dump after sometime of those things running.
>
> The assertion that fails is:
>
> """
> LOG:  statement: UPDATE pgbench_tellers SET tbalance = tbalance + 3519
> WHERE tid = 34;
> TRAP: FailedAssertion("!(((bool) (((const void*)(>t_ctid) !=
> ((void *)0)) && (((>t_ctid)->ip_posid & uint16) 1) << 13) -
> 1)) != 0", File: "../../../../src/include/access/htup_details.h",
> Line: 659)
> """
>

Hi Jaime,

Thanks for doing the tests and reporting the problem. Per our chat, the
assertion failure occurs only after a crash recovery. I traced i down to
the point where we were failing to set the root line pointer correctly
during crash recovery. In fact, we were setting it, but after the local
changes are copied to the on-disk image, thus failing to make to the
storage.

Can you please test with the attached patch and confirm it works? I was
able to reproduce the exact same assertion on my end and the patch seems to
fix it. But an additional check won't harm.

I'll include the fix in the next set of patches.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


warm_crash_recovery_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-14 Thread Jaime Casanova
On 5 April 2017 at 13:32, Pavan Deolasee  wrote:
>
> Ok. I've extensively updated the README to match the current state of
> affairs. Updated patch set attached.

Hi Pavan,

I run a test on current warm patchset, i used pgbench with a scale of
20 and a fillfactor of 90 and then start the pgbench run with 6
clients in parallel i also run sqlsmith on it.

And i got a core dump after sometime of those things running.

The assertion that fails is:

"""
LOG:  statement: UPDATE pgbench_tellers SET tbalance = tbalance + 3519
WHERE tid = 34;
TRAP: FailedAssertion("!(((bool) (((const void*)(>t_ctid) !=
((void *)0)) && (((>t_ctid)->ip_posid & uint16) 1) << 13) -
1)) != 0", File: "../../../../src/include/access/htup_details.h",
Line: 659)
"""

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
#0  0x7f42d832e067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f42d832f448 in __GI_abort () at abort.c:89
#2  0x00842961 in ExceptionalCondition (conditionName=, 
errorType=, fileName=, lineNumber=)
at assert.c:54
#3  0x004c48e6 in HeapTupleHeaderGetRootOffset (tup=) at 
../../../../src/include/access/htup_details.h:659
#4  heap_update (relation=0x7f4294dc8168, otid=0x2660ff0, newtup=0x6, cid=2, 
crosscheck=0x7f42d91bc700, wait=-102 '\232', hufd=0x7ffd14fcebb0, 
lockmode=0x7ffd14fceb9c, modified_attrsp=0x7ffd14fceba8, 
warm_update=0x7ffd14fceb9b "") at heapam.c:4672
#5  0x0062e95d in ExecUpdate (tupleid=0x7ffd14fcec90, oldtuple=0x2324, 
slot=0x26606f8, planSlot=0x, epqstate=0x2660ff0, 
estate=0x25db9e0, 
canSetTag=1 '\001') at nodeModifyTable.c:1012
#6  0x0062f009 in ExecModifyTable (node=0x25dbc48) at 
nodeModifyTable.c:1609
#7  0x00616198 in ExecProcNode (node=node@entry=0x25dbc48) at 
execProcnode.c:424
#8  0x006116a6 in ExecutePlan (execute_once=, 
dest=0x262e498, direction=, numberTuples=0, 
sendTuples=, 
operation=CMD_UPDATE, use_parallel_mode=, 
planstate=0x25dbc48, estate=0x25db9e0) at execMain.c:1651
#9  standard_ExecutorRun (queryDesc=0x2666100, direction=, 
count=0, execute_once=) at execMain.c:360
#10 0x00746822 in ProcessQuery (plan=, 
sourceText=0x25ac4e0 "UPDATE pgbench_tellers SET tbalance = tbalance + 3494 
WHERE tid = 76;", 
params=0x0, queryEnv=0x0, dest=0x262e498, completionTag=0x7ffd14fcf010 "") 
at pquery.c:162
#11 0x00746a93 in PortalRunMulti (portal=portal@entry=0x25462d0, 
isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 
'\000', 
dest=dest@entry=0x262e498, altdest=altdest@entry=0x262e498, 
completionTag=completionTag@entry=0x7ffd14fcf010 "") at pquery.c:1287
#12 0x007476a2 in PortalRun (portal=0x25462d0, 
count=9223372036854775807, isTopLevel=, run_once=, dest=0x262e498, 
altdest=0x262e498, completionTag=0x7ffd14fcf010 "") at pquery.c:800
#13 0x0074361b in exec_simple_query (query_string=0x2324 ) at postgres.c:1105
#14 0x00745254 in PostgresMain (argc=1, argv=0x25ac4e0, 
dbname=0x2555888 "pgbench", username=0x2528260 "jcasanov") at postgres.c:4075
#15 0x004780f3 in BackendRun (port=0x254dd70) at postmaster.c:4317
#16 BackendStartup (port=0x254dd70) at postmaster.c:3989
#17 ServerLoop () at postmaster.c:1729
#18 0x006d33d0 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x25260c0) at postmaster.c:1337
#19 0x00478f4d in main (argc=3, argv=0x25260c0) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-12 Thread Pavan Deolasee
On Wed, Apr 12, 2017 at 10:42 PM, Robert Haas  wrote:

> On Tue, Apr 11, 2017 at 1:20 PM, Pavan Deolasee
>


> > 5. Added code to set a CLEAR pointer to a WARM pointer when we know that
> the
> > entire chain is WARM. This should address the workload Dilip ran and
> found
> > regression (I don't think he got chance to confirm that)
>
> Which is clearly a thing that should happen before commit, and really,
> you ought to be leading the effort to confirm that, not him.  It's
> good for him to verify that your fix worked, but you should test it
> first.
>

Not sure why you think I did not do the tests. I did and reported that it
helps reduce the regression. Last para here:  https://www.postgresql.
org/message-id/CABOikdOTstHK2y0rDk%2BY3Wx9HRe%2BbZtj3zuYGU%
3DVngneiHo5KQ%40mail.gmail.com

I understand it might have got lost in the conversation and I possibly did
a poor job of explaining it. From my perspective, I did not want say that
everything is hunky-dory based on my own tests because 1. I probably do not
have access to the same kind of machine Dilip has and 2. It's better to get
it confirmed by someone who initially reported it. Again, I fully respect
that he would be busy with other things and I do not expect him or anyone
else to test/review my patch on priority. The only point I am trying to
make is that I did my own tests and made sure that it helps.

(Having said that, I am not sure if changing pointer state from CLEAR to
WARM is indeed a good change. Having thought more about it and after
looking at the page-split code, I now think that this might just confuse
the WARM cleanup code and make algorithm that much harder to prove)


> > 6. Enhanced stats collector to collect information about candidate WARM
> > chains and added mechanism to control WARM cleanup at the heap as well as
> > index level, based on configurable parameters. This gives user better
> > control over the additional work that is required for WARM cleanup.
>
> I haven't seen previous discussion of this; therefore I doubt whether
> we have agreement on these parameters.
>

Sure. I will bring these up in a more structured manner for everyone to
comment.


>
> > 7. Added table level option to disable WARM if nothing else works.
>
> -1 from me.
>

Ok. It's kinda last resort for me too. But at some point, we might want to
make that call if we find an important use case that regresses because of
WARM and we see no way to fix that or at least not without a whole lot of
complexity.


>
>
> > I may have missed something, but there is no intention to ignore known
> > regressions/reviews. Of course, I don't think that every regression will
> be
> > solvable, like if you run a CPU-bound workload, setting it up in a way
> such
> > that you repeatedly exercise the area where WARM is doing additional
> work,
> > without providing any benefit, may be you can still find regression. I am
> > willing to fix them as long as they are fixable and we are comfortable
> with
> > the additional code complexity. IMHO certain trade-offs are good, but I
> > understand that not everybody will agree with my views and that's ok.
>
> The point here is that we can't make intelligent decisions about
> whether to commit this feature unless we know which situations get
> better and which get worse and by how much.


Sure.


>   I don't accept as a
> general principle the idea that CPU-bound workloads don't matter.
> Obviously, I/O-bound workloads matter too, but we can't throw
> CPU-bound workloads under the bus.


Yeah, definitely not suggesting that.


>   Now, avoiding index bloat does
> also save CPU, so it is easy to imagine that WARM could come out ahead
> even if each update consumes slightly more CPU when actually updating,
> so we might not actually regress.  If we do, I guess I'd want to know
> why.


Well the kind of tests we did to look for regression were worst case
scenarios. For example, in the test where we found 10-15% regression, we
used a wide index (so recheck cost is high), WARM updated all rows,
disabled auto-vacuum (so no chain conversion) and then repeatedly selected
the rows from the index, thus incurring recheck overhead and in fact,
measuring only that.

When I measured WARM on tables with small scale factor so that everything
fits in memory, I found a modest 20% improvement in tps. So, you're right,
WARM might also help in-memory workloads. But that will show up only if we
measure UPDATEs and SELECTs both. If we measure only SELECTs and that too
in a state where we are paying all price for having done a WARM update,
obviously we will only see regression, if any. Not saying we should ignore
that. We should in fact measure all possible loads, and try to fix as many
as we can, especially if they resemble to a real-world use case,  but there
will be a trade-off to make. So I highly appreciate Amit and Dilip's help
with coming up additional tests. At least it gives us opportunity to think
how to fix them, even if we can't 

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-12 Thread Pavan Deolasee
On Thu, Apr 13, 2017 at 2:04 AM, Peter Geoghegan  wrote:

> On Wed, Apr 12, 2017 at 10:12 AM, Robert Haas 
> wrote:
> >> I may have missed something, but there is no intention to ignore known
> >> regressions/reviews. Of course, I don't think that every regression
> will be
> >> solvable, like if you run a CPU-bound workload, setting it up in a way
> such
> >> that you repeatedly exercise the area where WARM is doing additional
> work,
> >> without providing any benefit, may be you can still find regression. I
> am
> >> willing to fix them as long as they are fixable and we are comfortable
> with
> >> the additional code complexity. IMHO certain trade-offs are good, but I
> >> understand that not everybody will agree with my views and that's ok.
> >
> > The point here is that we can't make intelligent decisions about
> > whether to commit this feature unless we know which situations get
> > better and which get worse and by how much.  I don't accept as a
> > general principle the idea that CPU-bound workloads don't matter.
> > Obviously, I/O-bound workloads matter too, but we can't throw
> > CPU-bound workloads under the bus.  Now, avoiding index bloat does
> > also save CPU, so it is easy to imagine that WARM could come out ahead
> > even if each update consumes slightly more CPU when actually updating,
> > so we might not actually regress.  If we do, I guess I'd want to know
> > why.
>
> I myself wonder if this CPU overhead is at all related to LP_DEAD
> recycling during page splits.


With the respect to the tests that myself, Dilip and others did for WARM, I
think we were kinda exercising the worst case scenario. Like in one case,
we created a table with 40% fill factor,  created an index with a large
text column, WARM updated all rows in the table, turned off autovacuum so
that chain conversion does not take place, and then repeatedly run select
query on those rows using the index which did not receive WARM insert.

IOW we were only measuring the overhead of doing recheck by constructing an
index tuple from the heap tuple and then comparing it against the existing
index tuple. And we did find regression, which is not entirely surprising
because obviously that code path does extra work when it needs to do
recheck. And we're only measuring that overhead without taking into account
the benefits of WARM to the system in general. I think counter-argument to
that is, such workload may exists somewhere which might be regressed.

I have my suspicions that the recyling
> has some relationship to locality, which leads me to want to
> investigate how Claudio Freire's patch to consistently treat heap TID
> as part of the B-Tree sort order could help, both in general, and for
> WARM.
>

It could be, especially if we re-redesign recheck solely based on the index
pointer state and the heap tuple state. That could be more performant for
selects and could also be more robust, but will require index inserts to
get hold of the old index pointer (based on root TID), compare it against
the new index tuple and either skip the insert (if everything matches) or
set a PREWARM flag on the old pointer, and insert the new tuple with
POSTWARM flag.

Searching for old index pointer will be non-starter for non-unique indexes,
unless they are also sorted by TID, something that Claudio's patch does.
What I am not sure is whether the patch on its own will stand the
performance implications because it increases the index tuple width (and
probably index maintenance cost too).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-12 Thread Peter Geoghegan
On Wed, Apr 12, 2017 at 10:12 AM, Robert Haas  wrote:
>> I may have missed something, but there is no intention to ignore known
>> regressions/reviews. Of course, I don't think that every regression will be
>> solvable, like if you run a CPU-bound workload, setting it up in a way such
>> that you repeatedly exercise the area where WARM is doing additional work,
>> without providing any benefit, may be you can still find regression. I am
>> willing to fix them as long as they are fixable and we are comfortable with
>> the additional code complexity. IMHO certain trade-offs are good, but I
>> understand that not everybody will agree with my views and that's ok.
>
> The point here is that we can't make intelligent decisions about
> whether to commit this feature unless we know which situations get
> better and which get worse and by how much.  I don't accept as a
> general principle the idea that CPU-bound workloads don't matter.
> Obviously, I/O-bound workloads matter too, but we can't throw
> CPU-bound workloads under the bus.  Now, avoiding index bloat does
> also save CPU, so it is easy to imagine that WARM could come out ahead
> even if each update consumes slightly more CPU when actually updating,
> so we might not actually regress.  If we do, I guess I'd want to know
> why.

I myself wonder if this CPU overhead is at all related to LP_DEAD
recycling during page splits. I have my suspicions that the recyling
has some relationship to locality, which leads me to want to
investigate how Claudio Freire's patch to consistently treat heap TID
as part of the B-Tree sort order could help, both in general, and for
WARM.

Bear in mind that the recycling has to happen with an exclusive buffer
lock held on a leaf page, which could hold up rather a lot of scans
that need to visit the same value even if it's on some other,
relatively removed leaf page.

This is just a theory.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-12 Thread Robert Haas
On Tue, Apr 11, 2017 at 1:20 PM, Pavan Deolasee
 wrote:
> I don't know why you say that regressions are not addressed. Here are a few
> things I did to address the regressions/reviews/concerns, apart from fixing
> all the bugs discovered, but please let me know if there are things I've not
> addressed.

I'm making statements based on my perception of the discussion on the
thread.  Perhaps you did some work which you either didn't mention or
I missed you mentioning it, but it sure didn't feel like all of the
things reported got addressed.

> 1. Improved the interesting attrs patch that Alvaro wrote to address the
> regression discovered in fetching more heap attributes. The patch that got
> committed in fact improved certain synthetic workloads over then master.

Yep, though it was not clear that all of the regressing cases were
actually addressed, at least not to me.

> 2. Based on Petr and your feedback, disabled WARM on toasted attributes to
> reduce overhead of fetching/decompressing the attributes.

But that's not necessarily the right fix, as per
http://postgr.es/m/ca+tgmoyufxy1lsedzsw8uuulujhh0r8ncd-up-hzmc1fydp...@mail.gmail.com
and subsequent discussion.  It's not clear to me from that discussion
that we've got to a place where the method used to identify whether a
WARM update happened during a scan is exactly identical to the method
used to decide whether to perform one in the first place.

> 3. Added code to avoid doing second index scan when the index does not
> contain any WARM pointers. This should address the situation Amit brought up
> where only one of the indexes receive WARM inserts
> 4. Added code to kill wrong index pointers to do online cleanup.

Good changes.

> 5. Added code to set a CLEAR pointer to a WARM pointer when we know that the
> entire chain is WARM. This should address the workload Dilip ran and found
> regression (I don't think he got chance to confirm that)

Which is clearly a thing that should happen before commit, and really,
you ought to be leading the effort to confirm that, not him.  It's
good for him to verify that your fix worked, but you should test it
first.

> 6. Enhanced stats collector to collect information about candidate WARM
> chains and added mechanism to control WARM cleanup at the heap as well as
> index level, based on configurable parameters. This gives user better
> control over the additional work that is required for WARM cleanup.

I haven't seen previous discussion of this; therefore I doubt whether
we have agreement on these parameters.

> 7. Added table level option to disable WARM if nothing else works.

-1 from me.

> 8. Added mechanism to disable WARM when more than 50% indexes are being
> updated. I ran some benchmarks with different percentage of indexes getting
> updated and thought this is a good threshold.

+1 from me.

> I may have missed something, but there is no intention to ignore known
> regressions/reviews. Of course, I don't think that every regression will be
> solvable, like if you run a CPU-bound workload, setting it up in a way such
> that you repeatedly exercise the area where WARM is doing additional work,
> without providing any benefit, may be you can still find regression. I am
> willing to fix them as long as they are fixable and we are comfortable with
> the additional code complexity. IMHO certain trade-offs are good, but I
> understand that not everybody will agree with my views and that's ok.

The point here is that we can't make intelligent decisions about
whether to commit this feature unless we know which situations get
better and which get worse and by how much.  I don't accept as a
general principle the idea that CPU-bound workloads don't matter.
Obviously, I/O-bound workloads matter too, but we can't throw
CPU-bound workloads under the bus.  Now, avoiding index bloat does
also save CPU, so it is easy to imagine that WARM could come out ahead
even if each update consumes slightly more CPU when actually updating,
so we might not actually regress.  If we do, I guess I'd want to know
why.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-11 Thread Pavan Deolasee
On Wed, Apr 12, 2017 at 9:23 AM, Amit Kapila 
wrote:

> On Tue, Apr 11, 2017 at 10:50 PM, Pavan Deolasee
>
> >
> > And I fixed them as quickly as humanly possible.
> >
>
> Yes, you have responded to them quickly, but I didn't get a chance to
> re-verify all of those.  However, I think the main point Robert wants
> to say is that somebody needs to dig the complete patch to see if
> there is any kind of problems with it.
>

There are no two views about that. I don't even claim that more problems
won't be found during in-depth review. I was only responding to his view
that I did not do much to address the regressions reported during the
review/tests.


>
> > 5. Added code to set a CLEAR pointer to a WARM pointer when we know that
> the
> > entire chain is WARM. This should address the workload Dilip ran and
> found
> > regression (I don't think he got chance to confirm that)
> >
>
> Have you by any chance tried to reproduce it at your end?


I did reproduce and verified that the new technique helps the case [1] (see
last para). I did not go extra length to check if there are more cases
which can still cause regression, like recheck is applied only once  to
each tuple (so the new technique does not yield any benefit) and whether
that still causes regression and by how much. However I ran pure pgbench
workload (only HOT updates) with smallish scale factor so that everything
fits in memory and did not find any regression.

Having said that, it's my view that others need not agree to it, that we
need to distinguish between CPU and IO load since WARM is designed to
address IO problems and not so much CPU problems. We also need to see
things in totality and probably measure updates and selects both if we are
going to WARM update all tuples once and read them once. That doesn't mean
we shouldn't perform more tests and I am more than willing to fix if we
find regression in even a remotely real-world use case.

Thanks,
Pavan

[1]
https://www.postgresql.org/message-id/CABOikdOTstHK2y0rDk%2BY3Wx9HRe%2BbZtj3zuYGU%3DVngneiHo5KQ%40mail.gmail.com

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-11 Thread Amit Kapila
On Tue, Apr 11, 2017 at 10:50 PM, Pavan Deolasee
 wrote:
>
>
> On Tue, Apr 11, 2017 at 7:10 PM, Robert Haas  wrote:
>>
>>
>>
>> Yes, and as Andres says, you don't help with those, and then you're
>> upset when your own patch doesn't get attention.
>
>
> I am not upset, I was obviously a bit disappointed which I think is a very
> natural emotion after spending weeks on it. I am not blaming any one
> individual (excluding myself) for that and neither the community at large
> for the outcome. And I've moved on. I know everyone is busy getting the
> release ready and I see no point discussing this endlessly. We have enough
> on our plates for next few weeks.
>
>>
>>
>>   Amit and others who have started to
>> dig into this patch a little bit found real problems pretty quickly
>> when they started digging.
>
>
> And I fixed them as quickly as humanly possible.
>

Yes, you have responded to them quickly, but I didn't get a chance to
re-verify all of those.  However, I think the main point Robert wants
to say is that somebody needs to dig the complete patch to see if
there is any kind of problems with it.

>>
>>   Those problems should be addressed, and
>> review should continue (from whatever source) until no more problems
>> can be found.
>
>
> Absolutely.
>
>>
>>  The patch may
>> or may not have any data-corrupting bugs, but regressions have been
>> found and not addressed.
>
>
> I don't know why you say that regressions are not addressed. Here are a few
> things I did to address the regressions/reviews/concerns, apart from fixing
> all the bugs discovered, but please let me know if there are things I've not
> addressed.
>
> 1. Improved the interesting attrs patch that Alvaro wrote to address the
> regression discovered in fetching more heap attributes. The patch that got
> committed in fact improved certain synthetic workloads over then master.
> 2. Based on Petr and your feedback, disabled WARM on toasted attributes to
> reduce overhead of fetching/decompressing the attributes.
> 3. Added code to avoid doing second index scan when the index does not
> contain any WARM pointers. This should address the situation Amit brought up
> where only one of the indexes receive WARM inserts.
> 4. Added code to kill wrong index pointers to do online cleanup.
> 5. Added code to set a CLEAR pointer to a WARM pointer when we know that the
> entire chain is WARM. This should address the workload Dilip ran and found
> regression (I don't think he got chance to confirm that)
>

Have you by any chance tried to reproduce it at your end?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-11 Thread Bruce Momjian
On Mon, Apr 10, 2017 at 04:34:50PM -0700, Andres Freund wrote:
> Hi,
> 
> 
> On 2017-04-08 23:36:13 +0530, Pavan Deolasee wrote:
> > On Wed, Apr 5, 2017 at 11:57 PM, Andres Freund  wrote:
> > 
> > > On 2017-04-05 09:36:47 -0400, Robert Haas wrote:
> > > > By the way, the "Converting WARM chains back to HOT chains" section of
> > > > README.WARM seems to be out of date.  Any chance you could update that
> > > > to reflect the current state and thinking of the patch?
> > >
> > > I propose we move this patch to the next CF.  That shouldn't prevent you
> > > working on it, although focusing on review of patches that still might
> > > make it wouldn't hurt either.
> > >
> > >
> > Thank you all for the  reviews, feedback, tests, criticism. And apologies
> > for keep pushing it till the last minute even though it was clear to me
> > quite some time back the patch is not going to make it.
> 
> What confuses me about that position is that people were advocating to
> actually commit till literally hours before the CF closed.

Yes, I was surprised by that too and have privately emailed people on
this topic.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-11 Thread Pavan Deolasee
On Tue, Apr 11, 2017 at 7:10 PM, Robert Haas  wrote:

>
>
> Yes, and as Andres says, you don't help with those, and then you're
> upset when your own patch doesn't get attention.


I am not upset, I was obviously a bit disappointed which I think is a very
natural emotion after spending weeks on it. I am not blaming any one
individual (excluding myself) for that and neither the community at large
for the outcome. And I've moved on. I know everyone is busy getting the
release ready and I see no point discussing this endlessly. We have enough
on our plates for next few weeks.


>
>   Amit and others who have started to
> dig into this patch a little bit found real problems pretty quickly
> when they started digging.


And I fixed them as quickly as humanly possible.


>   Those problems should be addressed, and
> review should continue (from whatever source) until no more problems
> can be found.


Absolutely.


>  The patch may
> or may not have any data-corrupting bugs, but regressions have been
> found and not addressed.


I don't know why you say that regressions are not addressed. Here are a few
things I did to address the regressions/reviews/concerns, apart from fixing
all the bugs discovered, but please let me know if there are things I've
not addressed.

1. Improved the interesting attrs patch that Alvaro wrote to address the
regression discovered in fetching more heap attributes. The patch that got
committed in fact improved certain synthetic workloads over then master.
2. Based on Petr and your feedback, disabled WARM on toasted attributes to
reduce overhead of fetching/decompressing the attributes.
3. Added code to avoid doing second index scan when the index does not
contain any WARM pointers. This should address the situation Amit brought
up where only one of the indexes receive WARM inserts.
4. Added code to kill wrong index pointers to do online cleanup.
5. Added code to set a CLEAR pointer to a WARM pointer when we know that
the entire chain is WARM. This should address the workload Dilip ran and
found regression (I don't think he got chance to confirm that)
6. Enhanced stats collector to collect information about candidate WARM
chains and added mechanism to control WARM cleanup at the heap as well as
index level, based on configurable parameters. This gives user better
control over the additional work that is required for WARM cleanup.
7. Added table level option to disable WARM if nothing else works.
8. Added mechanism to disable WARM when more than 50% indexes are being
updated. I ran some benchmarks with different percentage of indexes getting
updated and thought this is a good threshold.

I may have missed something, but there is no intention to ignore known
regressions/reviews. Of course, I don't think that every regression will be
solvable, like if you run a CPU-bound workload, setting it up in a way such
that you repeatedly exercise the area where WARM is doing additional work,
without providing any benefit, may be you can still find regression. I am
willing to fix them as long as they are fixable and we are comfortable with
the additional code complexity. IMHO certain trade-offs are good, but I
understand that not everybody will agree with my views and that's ok.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-11 Thread Robert Haas
On Sat, Apr 8, 2017 at 2:06 PM, Pavan Deolasee  wrote:
> Thank you all for the  reviews, feedback, tests, criticism. And apologies
> for keep pushing it till the last minute even though it was clear to me
> quite some time back the patch is not going to make it. But if I'd given up,
> it would have never received whatever little attention it got. The only
> thing that disappoints me is that the patch was held back on no strong
> technical grounds -  at least none were clear to me. There were concerns
> about on-disk changes etc, but most on-disk changes were known for 7 months
> now. Reminds me of HOT development, when it would not receive adequate
> feedback for quite many months, probably for very similar reasons - complex
> patch, changes on-disk format, risky, even though performance gains were
> quite substantial. I was much more hopeful this time because we have many
> more experts now as compared to then, but we probably have equally more
> amount of complex patches to review/commit.

Yes, and as Andres says, you don't help with those, and then you're
upset when your own patch doesn't get attention.  I think there are
two ways that this patch could have gotten the detailed and in-depth
review which it needs.  First, I would have been more than happy to
spend time on WARM in exchange for a comparable amount of your time
spent on parallel bitmap heap scan, or partition-wise join, or
partitioning, but that time was not forthcoming.  Second, there are
numerous senior reviewers at 2ndQuadrant who could have put time time
into this patch and didn't.  Yes, Alvaro did some review, but it was
not in a huge degree of depth and didn't arrive until quite late,
unless there was more to it than what was posted on the mailing list
which, as a reminder, is the place where review is supposed to take
place.

If people senior reviewers with whom you share an employer don't have
time to review your patch, and you aren't willing to trade review time
on other patches for a comparable amount of attention on your own,
then it shouldn't surprise you when people object to it being
committed.

If there is an intention to commit this patch soon after v11
development opens, then signs of serious in-depth review, and
responses to criticisms thus-far proffered, really ought to be in
evidence will in advance of that date.  It's slightly better to commit
an inadequately-reviewed patch at the beginning of the cycle than at
the end, but what's even better is thorough review, which I maintain
this patch hasn't really had yet.  Amit and others who have started to
dig into this patch a little bit found real problems pretty quickly
when they started digging.  Those problems should be addressed, and
review should continue (from whatever source) until no more problems
can be found.  Everyone here understands (if they've been paying
attention) that this patch has large benefits in sympathetic cases,
and everyone wants those benefits.  What nobody wants (I assume) is
regressions is unsympathetic cases, or data corruption.  The patch may
or may not have any data-corrupting bugs, but regressions have been
found and not addressed.  Yet, there's still talk of committing this
with as much haste as possible.  I do not think that is a responsible
way to do development.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-10 Thread Andres Freund
Hi,


On 2017-04-08 23:36:13 +0530, Pavan Deolasee wrote:
> On Wed, Apr 5, 2017 at 11:57 PM, Andres Freund  wrote:
> 
> > On 2017-04-05 09:36:47 -0400, Robert Haas wrote:
> > > By the way, the "Converting WARM chains back to HOT chains" section of
> > > README.WARM seems to be out of date.  Any chance you could update that
> > > to reflect the current state and thinking of the patch?
> >
> > I propose we move this patch to the next CF.  That shouldn't prevent you
> > working on it, although focusing on review of patches that still might
> > make it wouldn't hurt either.
> >
> >
> Thank you all for the  reviews, feedback, tests, criticism. And apologies
> for keep pushing it till the last minute even though it was clear to me
> quite some time back the patch is not going to make it.

What confuses me about that position is that people were advocating to
actually commit till literally hours before the CF closed.


> But if I'd given
> up, it would have never received whatever little attention it got. The only
> thing that disappoints me is that the patch was held back on no strong
> technical grounds -  at least none were clear to me. There were concerns
> about on-disk changes etc, but most on-disk changes were known for 7 months
> now. Reminds me of HOT development, when it would not receive adequate
> feedback for quite many months, probably for very similar reasons - complex
> patch, changes on-disk format, risky, even though performance gains were
> quite substantial. I was much more hopeful this time because we have many
> more experts now as compared to then, but we probably have equally more
> amount of complex patches to review/commit.

I don't think it's realistic to expect isolated in-depth review of
on-disk changes, when the rest of the patch isn't in a close-to-ready
shape. The likelihood that further work on the patch invalidates such
in-depth review is significant. It's not like only minor details changed
in the last few months.

I do agree that it's hard to get qualified reviewers on bigger patches.
But I think part of the reaction to that has to be active work on that
front: If your patch needs reviews by committers or other topical
experts, you need to explicitly reach out.  There's a lot of active
threads, and nobody has time to follow all of them in sufficient detail
to know that certain core parts of an actively developed patch are ready
for review.  Offer tit-for-tat reviews.  Announce that your patch is
ready, that you're only waiting for review.  Post a summary of open
questions...


> Finally, my apologies for not spending enough time reviewing other
> patches.  I know its critical, and I'll try to improve on that.

I do find it a more than a bit ironic to lament early lack of attention
to your patch, while also being aware of not having done much review.
This can only scale if everyone reviews each others patches, not if
there's a few individuals that have to review everyones patches.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-10 Thread Bruce Momjian
On Sat, Apr  8, 2017 at 11:36:13PM +0530, Pavan Deolasee wrote:
> Thank you all for the  reviews, feedback, tests, criticism. And apologies for
> keep pushing it till the last minute even though it was clear to me quite some
> time back the patch is not going to make it. But if I'd given up, it would 
> have
> never received whatever little attention it got. The only thing that
> disappoints me is that the patch was held back on no strong technical grounds 
> -
>  at least none were clear to me. There were concerns about on-disk changes 
> etc,
> but most on-disk changes were known for 7 months now. Reminds me of HOT
> development, when it would not receive adequate feedback for quite many 
> months,
> probably for very similar reasons - complex patch, changes on-disk format,
> risky, even though performance gains were quite substantial. I was much more
> hopeful this time because we have many more experts now as compared to then,
> but we probably have equally more amount of complex patches to review/commit.

I am sad to see WARM didn't make it into Postgres 10, but I agree
deferment was the right decision, as painful as that is.  We now have
something to look forward to in Postgres 11.  :-)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-08 Thread Pavan Deolasee
On Wed, Apr 5, 2017 at 11:57 PM, Andres Freund  wrote:

> On 2017-04-05 09:36:47 -0400, Robert Haas wrote:
> > By the way, the "Converting WARM chains back to HOT chains" section of
> > README.WARM seems to be out of date.  Any chance you could update that
> > to reflect the current state and thinking of the patch?
>
> I propose we move this patch to the next CF.  That shouldn't prevent you
> working on it, although focusing on review of patches that still might
> make it wouldn't hurt either.
>
>
Thank you all for the  reviews, feedback, tests, criticism. And apologies
for keep pushing it till the last minute even though it was clear to me
quite some time back the patch is not going to make it. But if I'd given
up, it would have never received whatever little attention it got. The only
thing that disappoints me is that the patch was held back on no strong
technical grounds -  at least none were clear to me. There were concerns
about on-disk changes etc, but most on-disk changes were known for 7 months
now. Reminds me of HOT development, when it would not receive adequate
feedback for quite many months, probably for very similar reasons - complex
patch, changes on-disk format, risky, even though performance gains were
quite substantial. I was much more hopeful this time because we have many
more experts now as compared to then, but we probably have equally more
amount of complex patches to review/commit.

I understand that we would like this patch to go in very early in the
development cycle. So as Alvaro mentioned elsewhere, we will continue to
work on it so that we can get it in as soon as v11 tree open. We shall soon
submit a revised version, with the list of critical things so that we can
discuss them here and get some useful feedback. I hope everyone understands
that the feature of this kind won't happen without on-disk format changes.
So to be able to address any concerns, we will need specific feedback and
workable suggestions, if any.

Finally, my apologies for not spending enough time reviewing other patches.
I know its critical, and I'll try to improve on that. Congratulations to
all whose work got accepted and many thanks to all reviewers/committers/CF
managers. I know how difficult and thankless that work is.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-06 Thread Pavan Deolasee
On Thu, Apr 6, 2017 at 12:20 AM, Peter Geoghegan  wrote:

> On Wed, Apr 5, 2017 at 11:27 AM, Andres Freund  wrote:
> > I propose we move this patch to the next CF.
>
> I agree. I think it's too late to be working out fine details around
> TOAST like this. This is a patch that touches the storage format in a
> fairly fundamental way.
>
> The idea of turning WARM on or off reminds me a little bit of the way
> it was at one time suggested that HOT not be used against catalog
> tables, a position that Tom pushed against.


I agree. I am grateful that Tom put his put down and helped me find answers
to all hard problems, including catalog tables and create index
concurrently. So I was very clear in my mind from the very beginning that
WARM must support all these things too. Obviously it still doesn't support
everything like other index methods and expression indexes, but IMHO that's
a much smaller problem. Also, making sure that WARM works on system tables
helped me find any corner bugs which would have otherwise skipped via
regular regression testing.



> I'm not saying that it's
> necessarily a bad idea, but we should exhaust alternatives, and have a
> clear rationale for it.
>

One reason why it's probably a good idea is because we know WARM will not
effective for all use cases and it might actually cause performance
regression for some of them. Even worse and as Robert fears, it might cause
data loss issues. Though TBH I haven't yet seen any concrete example where
it breaks so badly that it causes data loss, but that may be because the
patch still hasn't received enough eye balls or outside tests. Having table
level option would allow us to incrementally improve things instead of
making the initial patch so large that reviewing it is a complete
nightmare. May be it's already a nightmare.

It's not as if HOT would not have caused regression for some specific use
cases. But I think the general benefit was so strong that we never invested
time in finding and tuning for those specific cases, thus avoided some more
complexity to the code. WARM's benefits are probably not the same as HOT or
our standards may have changed or we probably have resources to do much
more elaborate tests, which were missing 10 years back. But now that we are
aware of some regressions, the choice is between spending considerable
amount of time trying to handle every case vs doing it incrementally and
start delivering to majority of the users, yet keeping the patch at a
manageable level.

Even if we were to provide table level option, my preference would be keep
it ON by default.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-05 Thread Pavan Deolasee
On Thu, Apr 6, 2017 at 1:06 AM, Robert Haas  wrote:

> On Wed, Apr 5, 2017 at 2:32 PM, Pavan Deolasee 
> wrote:
>
> > The other way is to pass old tuple values along with the new tuple
> values to
> > amwarminsert, build index tuples and then do a comparison. For duplicate
> > index tuples, skip WARM inserts.
>
> This is more what I was thinking.  But maybe one of the other ideas
> you wrote here is better; not sure.
>
>
Ok. I think I suggested this as one of the ideas upthread, to support hash
indexes for example. This might be a good safety-net, but AFAIC what we
have today should work since we pretty much construct index tuples in a
consistent way before doing a comparison.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-05 Thread Robert Haas
On Wed, Apr 5, 2017 at 2:32 PM, Pavan Deolasee  wrote:
>> The only other idea that I have for a really clean solution here is to
>> support this only for index types that are amcanreturn, and actually
>> compare the value stored in the index tuple with the one stored in the
>> heap tuple, ensuring that new index tuples are inserted whenever they
>> don't match and then using the exact same test to determine the
>> applicability of a given index pointer to a given heap tuple.
>
> Just so that I understand, are you suggesting that while inserting WARM
> index pointers, we check if the new index tuple will look exactly the same
> as the old index tuple and not insert a duplicate pointer at all?

Yes.

> I considered that, but it will require us to do an index lookup during WARM
> index insert and for non-unique keys, that may or may not be exactly cheap.

I don't think it requires that.  You should be able to figure out
based on the tuple being updated and the corresponding new tuple
whether this will bet true or not.

> Or we need something like what Claudio wrote to sort all index entries by
> heap TIDs. If we do that, then the recheck can be done just based on the
> index and heap flags (because we can then turn the old index pointer into a
> CLEAR pointer. Index pointer is set to COMMON during initial insert).

Yeah, I think that patch is going to be needed for some of the storage
work I'm interesting in doing, too, so I am tentatively in favor of
it, but I wasn't proposing using it here.

> The other way is to pass old tuple values along with the new tuple values to
> amwarminsert, build index tuples and then do a comparison. For duplicate
> index tuples, skip WARM inserts.

This is more what I was thinking.  But maybe one of the other ideas
you wrote here is better; not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-05 Thread Peter Geoghegan
On Wed, Apr 5, 2017 at 11:27 AM, Andres Freund  wrote:
> I propose we move this patch to the next CF.

I agree. I think it's too late to be working out fine details around
TOAST like this. This is a patch that touches the storage format in a
fairly fundamental way.

The idea of turning WARM on or off reminds me a little bit of the way
it was at one time suggested that HOT not be used against catalog
tables, a position that Tom pushed against. I'm not saying that it's
necessarily a bad idea, but we should exhaust alternatives, and have a
clear rationale for it.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-05 Thread Andres Freund
On 2017-04-05 09:36:47 -0400, Robert Haas wrote:
> By the way, the "Converting WARM chains back to HOT chains" section of
> README.WARM seems to be out of date.  Any chance you could update that
> to reflect the current state and thinking of the patch?

I propose we move this patch to the next CF.  That shouldn't prevent you
working on it, although focusing on review of patches that still might
make it wouldn't hurt either.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-05 Thread Robert Haas
On Tue, Apr 4, 2017 at 11:43 PM, Pavan Deolasee
 wrote:
> Well, better than causing a deadlock ;-)

Yep.

> Lets see if we want to go down the path of blocking WARM when tuples have
> toasted attributes. I submitted a patch yesterday, but having slept over it,
> I think I made mistakes there. It might not be enough to look at the caller
> supplied new tuple because that may not have any toasted values, but the
> final tuple that gets written to the heap may be toasted.

Yes, you have to make whatever decision you're going to make here
after any toast-ing has been done.

> We could look at
> the new tuple's attributes to find if any indexed attributes are toasted,
> but that might suck as well. Or we can simply block WARM if the old or the
> new tuple has external attributes i.e. HeapTupleHasExternal() returns true.
> That could be overly restrictive because irrespective of whether the indexed
> attributes are toasted or just some other attribute is toasted, we will
> block WARM on such updates. May be that's not a problem.

Well, I think that there's some danger of whittling down this
optimization to the point where it still incurs most of the costs --
in bit-space if not in CPU cycles -- but no longer yields much of the
benefit.  Even though the speed-up might still be substantial in the
cases where the optimization kicks in, if a substantial number of
users doing things that are basically pretty normal sometimes fail to
get the optimization, this isn't going to be very exciting outside of
synthetic benchmarks.

Backing up a little bit, it seems like the root of the issue here is
that, at a certain point in what was once a HOT chain, you make a WARM
update, and you make a decision about which indexes to update at that
point.  Now, later on, when you traverse that chain, you need to be
able to figure what decide you made before; otherwise, you might make
a bad decision about whether an index pointer applies to a particular
tuple.  If the index tuple is WARM, then the answer is "yes" if the
heap tuple is also WARM, and "no" if the heap tuple is CLEAR (which is
an odd antonym to WARM, but leave that aside).  If the index tuple is
CLEAR, then the answer is "yes" if the heap tuple is also CLEAR, and
"maybe" if the heap tuple is WARM.

In that "maybe" case, we are trying to reconstruct the decision that
we made when we did the update.  If, at the time of the update, we
decided to insert a new index entry, then the answer is "no"; if not,
it's "yes".  From an integrity point of view, it doesn't really matter
how we make the decision; what matters is that we're consistent.  More
specifically, if we sometimes insert a new index tuple even when the
value has not changed in any user-visible way, I think that would be
fine, provided that later chain traversals can tell that we did that.
As an extreme example, suppose that the WARM update inserted in some
magical way a bitmap of which attributes had changed into the new
tuple.  Then, when we are walking the chain following a CLEAR index
tuple, we test whether the index columns overlap with that bitmap; if
they do, then that index got a new entry; if not, then it didn't.  It
would actually be fine (apart from efficiency) to set extra bits in
this bitmap; extra indexes would get updated, but chain traversal
would know exactly which ones, so no problem.  This is of course just
a gedankenexperiment, but the point is that as long as the insert
itself and later chain traversals agree on the rule, there's no
integrity problem.  I think.

The first idea I had for an actual solution to this problem was to
make the decision as to whether to insert new index entries based on
whether the indexed attributes in the final tuple (post-TOAST) are
byte-for-byte identical with the original tuple.  If somebody injects
a new compression algorithm into the system, or just changes the
storage parameters on a column, or we re-insert an identical value
into the TOAST table when we could have reused the old TOAST pointer,
then you might have some potentially-WARM updates that end up being
done as regular updates, but that's OK.  When you are walking the
chain, you will KNOW whether you inserted new index entries or not,
because you can do the exact same comparison that was done before and
be sure of getting the same answer.  But that's actually not really a
solution, because it doesn't work if all of the CLEAR tuples are gone
-- all you have is the index tuple and the new heap tuple; there's no
old heap tuple with which to compare.

The only other idea that I have for a really clean solution here is to
support this only for index types that are amcanreturn, and actually
compare the value stored in the index tuple with the one stored in the
heap tuple, ensuring that new index tuples are inserted whenever they
don't match and then using the exact same test to determine the
applicability of a given index pointer to a given heap tuple.  I'm not
sure how viable that is 

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-04 Thread Pavan Deolasee
On Wed, Apr 5, 2017 at 8:42 AM, Robert Haas  wrote:

> On Tue, Apr 4, 2017 at 10:21 PM, Pavan Deolasee
>  wrote:
> > On Thu, Mar 30, 2017 at 7:55 PM, Robert Haas 
> wrote:
> >>  but
> >> try to access the TOAST table would be fatal; that probably would have
> >> deadlock hazards among other problems.
> >
> > Hmm. I think you're right. We could make a copy of the heap tuple, drop
> the
> > lock and then access TOAST to handle that. Would that work?
>
> Yeah, but it might suck.  :-)


Well, better than causing a deadlock ;-)

Lets see if we want to go down the path of blocking WARM when tuples have
toasted attributes. I submitted a patch yesterday, but having slept over
it, I think I made mistakes there. It might not be enough to look at the
caller supplied new tuple because that may not have any toasted values, but
the final tuple that gets written to the heap may be toasted. We could look
at the new tuple's attributes to find if any indexed attributes are
toasted, but that might suck as well. Or we can simply block WARM if the
old or the new tuple has external attributes i.e. HeapTupleHasExternal()
returns true. That could be overly restrictive because irrespective of
whether the indexed attributes are toasted or just some other attribute is
toasted, we will block WARM on such updates. May be that's not a problem.

We will also need to handle the case where some older tuple in the chain
has toasted value and that tuple is presented to recheck (I think we can
handle that case fairly easily, but its not done in the code yet) because
of a subsequent WARM update and the tuples updated by WARM did not have any
toasted values (and hence allowed).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-04 Thread Robert Haas
On Tue, Apr 4, 2017 at 10:21 PM, Pavan Deolasee
 wrote:
> On Thu, Mar 30, 2017 at 7:55 PM, Robert Haas  wrote:
>>  but
>> try to access the TOAST table would be fatal; that probably would have
>> deadlock hazards among other problems.
>
> Hmm. I think you're right. We could make a copy of the heap tuple, drop the
> lock and then access TOAST to handle that. Would that work?

Yeah, but it might suck.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-04 Thread Pavan Deolasee
On Thu, Mar 30, 2017 at 7:55 PM, Robert Haas  wrote:

>  but
> try to access the TOAST table would be fatal; that probably would have
> deadlock hazards among other problems.


Hmm. I think you're right. We could make a copy of the heap tuple, drop the
lock and then access TOAST to handle that. Would that work?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-03 Thread Pavan Deolasee
On Thu, Mar 30, 2017 at 11:17 PM, Bruce Momjian  wrote:

> On Tue, Mar 21, 2017 at 04:04:58PM -0400, Bruce Momjian wrote:
> > On Tue, Mar 21, 2017 at 04:56:16PM -0300, Alvaro Herrera wrote:
> > > Bruce Momjian wrote:
> > > > On Tue, Mar 21, 2017 at 04:43:58PM -0300, Alvaro Herrera wrote:
> > > > > Bruce Momjian wrote:
> > > > >
> > > > > > I don't think it makes sense to try and save bits and add
> complexity
> > > > > > when we have no idea if we will ever use them,
> > > > >
> > > > > If we find ourselves in dire need of additional bits, there is a
> known
> > > > > mechanism to get back 2 bits from old-style VACUUM FULL.  I assume
> that
> > > > > the reason nobody has bothered to write the code for that is that
> > > > > there's no *that* much interest.
> > > >
> > > > We have no way of tracking if users still have pages that used the
> bits
> > > > via pg_upgrade before they were removed.
> > >
> > > Yes, that's exactly the code that needs to be written.
> >
> > Yes, but once it is written it will take years before those bits can be
> > used on most installations.
>
> Actually, the 2 bits from old-style VACUUM FULL bits could be reused if
> one of the WARM bits would be set  when it is checked.  The WARM bits
> will all be zero on pre-9.0.  The check would have to be checking the
> old-style VACUUM FULL bit and checking that a WARM bit is set.
>
>
We're already doing that in the submitted patch.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-03 Thread Pavan Deolasee
On Fri, Mar 31, 2017 at 12:31 PM, Dilip Kumar  wrote:

> On Thu, Mar 30, 2017 at 5:27 PM, Amit Kapila 
> wrote:
> > I am not sure if we can consider it as completely synthetic because we
> > might see some similar cases for json datatypes.  Can we once try to
> > see the impact when the same test runs from multiple clients?  For
> > your information, I am also trying to setup some tests along with one
> > of my colleague and we will report the results once the tests are
> > complete.
> We have done some testing and below is the test details and results.
>
> Test:
> I have derived this test from above test given by pavan[1] except
> below difference.
>
> - I have reduced the fill factor to 40 to ensure that multiple there
> is scope in the page to store multiple WARM chains.
> - WARM updated all the tuples.
> - Executed a large select to enforce lot of recheck tuple within single
> query.
> - Smaller tuple size (aid field is around ~100 bytes) just to ensure
> tuple have sufficient space on a page to get WARM updated.
>
> Results:
> ---
> * I can see more than 15% of regression in this case. This regression
> is repeatable.
> * If I increase the fill factor to 90 than regression reduced to 7%,
> may be only fewer tuples are getting WARM updated and others are not
> because of no space left on page after few WARM update.
>

Thanks for doing the tests. The tests show us that if the table gets filled
up with WARM chains, and they are not cleaned up and the table is subjected
to read-only workload, we will see regression. Obviously, the test is
completely CPU bound, something WARM is not meant to address.I am not yet
certain if recheck is causing the problem. Yesterday I ran the test where I
was seeing regression with recheck completely turned off and still saw
regression. So there is something else that's going on with this kind of
workload. Will check.

Having said that, I think there are some other ways to fix some of the
common problems with repeated rechecks. One thing that we can do it rely on
the index pointer flags to decide whether recheck is necessary or not. For
example, a WARM pointer to a WARM tuple does not require recheck.
Similarly, a CLEAR pointer to a CLEAR tuple does not require recheck. A
WARM pointer to a CLEAR tuple can be discarded immediately because the only
situation where it can occur is in the case of aborted WARM updates. The
only troublesome situation is a CLEAR pointer to a WARM tuple. That
entirely depends on whether the index had received a WARM insert or not.
What we can do though, if recheck succeeds for the first time and if the
chain has only WARM tuples, we set the WARM bit on the index pointer. We
can use the same hint mechanism as used for marking index pointers dead to
minimise overhead.

Obviously this will only handle the case when the same tuple is rechecked
often. But if a tuple is rechecked only once then may be other overheads
will kick-in, thus reducing the regression significantly.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-01 Thread Amit Kapila
On Fri, Mar 31, 2017 at 11:54 PM, Pavan Deolasee
 wrote:
>
> On Fri, Mar 31, 2017 at 11:16 PM, Robert Haas  wrote:
>>
>>   Now, I understand you to be suggesting a flag at
>> table-creation time that would, maybe, be immutable after that, but
>> even then - are we going to run completely unmodified 9.6 code for
>> tables where that's not enabled, and only go through any of the WARM
>> logic when it is enabled?  Doesn't sound likely.  The commits already
>> made from this patch series certainly affect everybody, and I can't
>> see us adding switches that bypass
>> ce96ce60ca2293f75f36c3661e4657a3c79ffd61 for example.
>
>
> I don't think I am going to claim that either. But probably only 5% of the
> new code would then be involved. Which is a lot less and a lot more
> manageable. Having said that, I think if we at all do this, we should only
> do it based on our experiences in the beta cycle, as a last resort. Based on
> my own experiences during HOT development, long running pgbench tests, with
> several concurrent clients, subjected to multiple AV cycles and periodic
> consistency checks, usually brings up issues related to heap corruption. So
> my confidence level is relatively high on that part of the code. That's not
> to suggest that there can't be any bugs.
>
> Obviously then there are other things such as regression to some workload or
> additional work required by vacuum etc. And I think we should address them
> and I'm fairly certain we can do that. It may not happen immediately, but if
> we provide right knobs, may be those who are affected can fall back to the
> old behaviour or not use the new code at all while we improve things for
> them.
>

Okay, but even if we want to provide knobs, then there should be some
consensus on those.  I am sure introducing an additional pass over
index has some impact so either we should have some way to reduce the
impact or have some additional design to handle it.  Do you think it
make sense to have a separate thread to discuss and get feedback on
same as I am not seeing much input on the knobs you are proposing to
handle second pass over index?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Jeff Janes
On Thu, Mar 30, 2017 at 4:13 AM, Pavan Deolasee 
wrote:

>
>
> On Thu, Mar 30, 2017 at 3:29 PM, Dilip Kumar 
> wrote:
>
>> On Wed, Mar 29, 2017 at 11:51 AM, Pavan Deolasee
>>  wrote:
>> > Thanks. I think your patch of tracking interesting attributes seems ok
>> too
>> > after the performance issue was addressed. Even though we can still
>> improve
>> > that further, at least Mithun confirmed that there is no significant
>> > regression anymore and in fact for one artificial case, patch does
>> better
>> > than even master.
>>
>> I was trying to compile these patches on latest
>> head(f90d23d0c51895e0d7db7910538e85d3d38691f0) for some testing but I
>> was not able to compile it.
>>
>> make[3]: *** [postgres.bki] Error 1
>>
>
> Looks like OID conflict to me.. Please try rebased set.
>

broken again on oid conflicts for 3373 to 3375 from the monitoring
permissions commi 25fff40798fc4.

After bumping those, I get these compiler warnings:

heapam.c: In function 'heap_delete':
heapam.c:3298: warning: 'root_offnum' may be used uninitialized in this
function
heapam.c: In function 'heap_update':
heapam.c:4311: warning: 'root_offnum' may be used uninitialized in this
function
heapam.c:4311: note: 'root_offnum' was declared here
heapam.c:3784: warning: 'root_offnum' may be used uninitialized in this
function
heapam.c: In function 'heap_lock_tuple':
heapam.c:5087: warning: 'root_offnum' may be used uninitialized in this
function


And I get a regression test failure, attached.

Cheers,

Jeff


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Pavan Deolasee
On Fri, Mar 31, 2017 at 11:16 PM, Robert Haas  wrote:

> On Fri, Mar 31, 2017 at 10:24 AM, Pavan Deolasee
>  wrote:
> > Having worked on it for some time now, I can say that WARM uses pretty
> much
> > the same infrastructure that HOT uses for cleanup/pruning tuples from the
> > heap. So the risk of having a bug which can eat your data from the heap
> is
> > very low. Sure, it might mess up with indexes, return duplicate keys, not
> > return a row when it should have. Not saying they are not bad bugs, but
> > probably much less severe than someone removing live rows from the heap.
>
> Yes, that's true.  If there's nothing wrong with the way pruning
> works, then any other problem can be fixed by reindexing, I suppose.
>
>
Yeah, I think so.


> I'm not generally a huge fan of on-off switches for things like this,
> but I know Simon likes them.  I think the question is how much they
> really insulate us from bugs.  For the hash index patch, for example,
> the only way to really get insulation from bugs added in this release
> would be to ship both the old and the new code in separate index AMs
> (hash, hash2).  The code has been restructured so much in the process
> of doing all of this that any other form of on-off switch would be
> pretty hit-or-miss whether it actually provided any protection.
>
> Now, I am less sure about this case, but my guess is that you can't
> really have this be something that can be flipped on and off for a
> table.  Once a table has any WARM updates in it, the code that knows
> how to cope with that has to be enabled, and it'll work as well or
> poorly as it does.


That's correct. Once enabled, we will need to handle the case of two index
pointers pointing to the same root. The only way to get rid of that is
probably do a complete rewrite/reindex, I suppose. But I was mostly talking
about immutable flag at table creation time as rightly guessed.


>   Now, I understand you to be suggesting a flag at
> table-creation time that would, maybe, be immutable after that, but
> even then - are we going to run completely unmodified 9.6 code for
> tables where that's not enabled, and only go through any of the WARM
> logic when it is enabled?  Doesn't sound likely.  The commits already
> made from this patch series certainly affect everybody, and I can't
> see us adding switches that bypass
> ce96ce60ca2293f75f36c3661e4657a3c79ffd61 for example.


I don't think I am going to claim that either. But probably only 5% of the
new code would then be involved. Which is a lot less and a lot more
manageable. Having said that, I think if we at all do this, we should only
do it based on our experiences in the beta cycle, as a last resort. Based
on my own experiences during HOT development, long running pgbench tests,
with several concurrent clients, subjected to multiple AV cycles and
periodic consistency checks, usually brings up issues related to heap
corruption. So my confidence level is relatively high on that part of the
code. That's not to suggest that there can't be any bugs.

Obviously then there are other things such as regression to some workload
or additional work required by vacuum etc. And I think we should address
them and I'm fairly certain we can do that. It may not happen immediately,
but if we provide right knobs, may be those who are affected can fall back
to the old behaviour or not use the new code at all while we improve things
for them. Some of these things I could have already implemented, but
without a clear understanding of whether the feature will get in or not,
it's hard to keep putting infinite efforts into the patch. All
non-committers go through that dilemma all the time, I'm sure.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 10:24 AM, Pavan Deolasee
 wrote:
> Having worked on it for some time now, I can say that WARM uses pretty much
> the same infrastructure that HOT uses for cleanup/pruning tuples from the
> heap. So the risk of having a bug which can eat your data from the heap is
> very low. Sure, it might mess up with indexes, return duplicate keys, not
> return a row when it should have. Not saying they are not bad bugs, but
> probably much less severe than someone removing live rows from the heap.

Yes, that's true.  If there's nothing wrong with the way pruning
works, then any other problem can be fixed by reindexing, I suppose.

> And we can make it a table level property, keep it off by default, turn it
> off on system tables in this release and change the defaults only when we
> get more confidence assuming people use it by explicitly turning it on. Now
> may be that's not the right approach and keeping it off by default will mean
> it receives much less testing than we would like. So we can keep it on in
> the beta cycle and then take a call. I went a good length to make it work on
> system tables because during HOT development, Tom told me that it better
> work for everything or it doesn't work at all. But with WARM it works for
> system tables and I know no known bugs, but if we don't want to risk system
> tables, we might want to turn it off (just prior to release may be).

I'm not generally a huge fan of on-off switches for things like this,
but I know Simon likes them.  I think the question is how much they
really insulate us from bugs.  For the hash index patch, for example,
the only way to really get insulation from bugs added in this release
would be to ship both the old and the new code in separate index AMs
(hash, hash2).  The code has been restructured so much in the process
of doing all of this that any other form of on-off switch would be
pretty hit-or-miss whether it actually provided any protection.

Now, I am less sure about this case, but my guess is that you can't
really have this be something that can be flipped on and off for a
table.  Once a table has any WARM updates in it, the code that knows
how to cope with that has to be enabled, and it'll work as well or
poorly as it does.  Now, I understand you to be suggesting a flag at
table-creation time that would, maybe, be immutable after that, but
even then - are we going to run completely unmodified 9.6 code for
tables where that's not enabled, and only go through any of the WARM
logic when it is enabled?  Doesn't sound likely.  The commits already
made from this patch series certainly affect everybody, and I can't
see us adding switches that bypass
ce96ce60ca2293f75f36c3661e4657a3c79ffd61 for example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Robert Haas
On Thu, Mar 30, 2017 at 10:49 AM, Petr Jelinek
 wrote:
> While reading this thread I am thinking if we could just not do WARM on
> TOAST and compressed values if we know there might be regressions there.
> I mean I've seen the problem WARM tries to solve mostly on timestamp or
> boolean values and sometimes counters so it would still be helpful to
> quite a lot of people even if we didn't do TOAST and compressed values
> in v1. It's not like not doing WARM sometimes is somehow terrible, we'll
> just fall back to current behavior.

Good point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Pavan Deolasee
On Fri, Mar 31, 2017 at 6:47 PM, Robert Haas  wrote:
>
>
> 2. WARM is a non-optional feature which touches the on-disk format.
> There is nothing more dangerous than that.  If hash indexes have bugs,
> people can avoid those bugs by not using them; there are good reasons
> to suppose that hash indexes have very few existing users.  The
> expression evaluation changes, IMHO, are much more dangerous because
> everyone will be exposed to them, but they will not likely corrupt
> your data because they don't touch the on-disk format.  WARM is even a
> little more dangerous than that; everyone is exposed to those bugs,
> and in the worst case they could eat your data.
>
>
Having worked on it for some time now, I can say that WARM uses pretty much
the same infrastructure that HOT uses for cleanup/pruning tuples from the
heap. So the risk of having a bug which can eat your data from the heap is
very low. Sure, it might mess up with indexes, return duplicate keys, not
return a row when it should have. Not saying they are not bad bugs, but
probably much less severe than someone removing live rows from the heap.

And we can make it a table level property, keep it off by default, turn it
off on system tables in this release and change the defaults only when we
get more confidence assuming people use it by explicitly turning it on. Now
may be that's not the right approach and keeping it off by default will
mean it receives much less testing than we would like. So we can keep it on
in the beta cycle and then take a call. I went a good length to make it
work on system tables because during HOT development, Tom told me that it
better work for everything or it doesn't work at all. But with WARM it
works for system tables and I know no known bugs, but if we don't want to
risk system tables, we might want to turn it off (just prior to release may
be).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 7:53 AM, Simon Riggs  wrote:
> So Andres says defer this, but Robert says "more review", which is
> more than just deferral.
>
> We have some risky things in this release such as Hash Indexes,
> function changes. I perfectly understand that perception of risk is
> affected significantly by whether you wrote something or not. Andres
> and Robert did not write it and so they see problems.

While that's probably true, I don't think that's the only thing going on here:

1. Hash indexes were reviewed and reworked repeatedly until nobody
could find any more problems, including people like Jesper Pederson
who do not work for EDB and who did extensive testing.  Similarly with
the expression evaluation stuff, which got some review from Heikki and
even more from Tom.  Now, several people who do not work for
2ndQuadrant have recently started looking at WARM and many of those
reviews have found problems and regressions.  If we're to hold things
to the same standard, those things should be looked into and fixed
before there is any talk of committing anything.  My concern is that
there seems to be (even with the patches already committed) a desire
to minimize the importance of the problems that have been found --
which I think is probably because fixing them would take time, and we
don't have much time left in this release cycle.  We should regard the
time between feature freeze and release as a time to fix the things
that good review missed, not as a substitute for fixing things that
should have (or actually were) found during review prior to commit.

2. WARM is a non-optional feature which touches the on-disk format.
There is nothing more dangerous than that.  If hash indexes have bugs,
people can avoid those bugs by not using them; there are good reasons
to suppose that hash indexes have very few existing users.  The
expression evaluation changes, IMHO, are much more dangerous because
everyone will be exposed to them, but they will not likely corrupt
your data because they don't touch the on-disk format.  WARM is even a
little more dangerous than that; everyone is exposed to those bugs,
and in the worst case they could eat your data.

I agree that WARM could be a pretty great feature, but I think you're
underestimating the negative effects that could result from committing
it too soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Simon Riggs
On 30 March 2017 at 16:50, Robert Haas  wrote:
> On Thu, Mar 30, 2017 at 11:41 AM, Andres Freund  wrote:
>> On 2017-03-30 16:43:41 +0530, Pavan Deolasee wrote:
>>> Looks like OID conflict to me.. Please try rebased set.
>>
>> Pavan, Alvaro, everyone: I know you guys are working very hard on this,
>> but I think at this point it's too late to commit this for v10.  This is
>> patch that's affecting the on-disk format, in quite subtle
>> ways.  Committing this just at the end of the development cyle / shortly
>> before feature freeze, seems too dangerous to me.
>>
>> Let's commit this just at the beginning of the cycle, so we have time to
>> shake out the bugs.
>
> +1, although I think it should also have substantially more review first.

So Andres says defer this, but Robert says "more review", which is
more than just deferral.

We have some risky things in this release such as Hash Indexes,
function changes. I perfectly understand that perception of risk is
affected significantly by whether you wrote something or not. Andres
and Robert did not write it and so they see problems. I confess that
those two mentioned changes make me very scared and I'm wondering
whether we should disable them. Fear is normal.

A risk perspective is a good one to take. What I think we should do is
strip out the areas of complexity, like TOAST to reduce the footprint
and minimize the risk. There is benefit in WARM and PostgreSQL has
received public critiscism around our performance in this area. This
is more important than just a nice few % points of performance.

The bottom line is that this is written by Pavan, the guy we've
trusted for a decade to write and support HOT. We all know he can and
will fix any problems that emerge because he has shown us many times
he can and does.

We also observe that people from the same company sometimes support
their colleagues when they should not. I see no reason to believe that
is influencing my comments here.

The question is not whether this is ready today, but will it be
trusted and safe to use by Sept. Given the RMT, I would say yes, it
can be.

So I say we should commit WARM in PG10, with some restrictions.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Dilip Kumar
On Thu, Mar 30, 2017 at 5:27 PM, Amit Kapila  wrote:
> I am not sure if we can consider it as completely synthetic because we
> might see some similar cases for json datatypes.  Can we once try to
> see the impact when the same test runs from multiple clients?  For
> your information, I am also trying to setup some tests along with one
> of my colleague and we will report the results once the tests are
> complete.
We have done some testing and below is the test details and results.

Test:
I have derived this test from above test given by pavan[1] except
below difference.

- I have reduced the fill factor to 40 to ensure that multiple there
is scope in the page to store multiple WARM chains.
- WARM updated all the tuples.
- Executed a large select to enforce lot of recheck tuple within single query.
- Smaller tuple size (aid field is around ~100 bytes) just to ensure
tuple have sufficient space on a page to get WARM updated.

Results:
---
* I can see more than 15% of regression in this case. This regression
is repeatable.
* If I increase the fill factor to 90 than regression reduced to 7%,
may be only fewer tuples are getting WARM updated and others are not
because of no space left on page after few WARM update.

Test Setup:

Machine Information:

Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz
RAM: 64GB

Config Change:
  synchronous_commit=off

——Setup.sql—

DROP TABLE IF EXISTS pgbench_accounts;
CREATE TABLE pgbench_accounts (
aid text,
bid bigint,
abalance bigint,
filler1 text DEFAULT md5(random()::text),
filler2 text DEFAULT md5(random()::text),
filler3 text DEFAULT md5(random()::text),
filler4 text DEFAULT md5(random()::text),
filler5 text DEFAULT md5(random()::text),
filler6 text DEFAULT md5(random()::text),
filler7 text DEFAULT md5(random()::text),
filler8 text DEFAULT md5(random()::text),
filler9 text DEFAULT md5(random()::text),
filler10 text DEFAULT md5(random()::text),
filler11 text DEFAULT md5(random()::text),
filler12 text DEFAULT md5(random()::text)
) WITH (fillfactor=40);

\set scale 10
\set end 0
\set start (:end + 1)
\set end (:start + (:scale * 100))

INSERT INTO pgbench_accounts SELECT generate_series(:start, :end
)::text || repeat('a', 100), (random()::bigint) % :scale, 0;

CREATE UNIQUE INDEX pgb_a_aid ON pgbench_accounts(aid);
CREATE INDEX pgb_a_filler1 ON pgbench_accounts(filler1);
CREATE INDEX pgb_a_filler2 ON pgbench_accounts(filler2);
CREATE INDEX pgb_a_filler3 ON pgbench_accounts(filler3);
CREATE INDEX pgb_a_filler4 ON pgbench_accounts(filler4);

UPDATE pgbench_accounts SET filler1 = 'X';--WARM update all the tuples

—Test.sql—
set enable_seqscan=off;
set enable_bitmapscan=off;
explain analyze select * FROM pgbench_accounts WHERE aid < '400' ||
repeat('a', 100) ORDER BY aid

—Script.sh—
./psql -d postgres -f setup.sql
./pgbench -c1 -j1 -T300 -M prepared -f test.sql postgres

Patch:
tps = 3554.345313 (including connections establishing)
tps = 3554.880776 (excluding connections establishing)

Head:
tps = 4208.876829 (including connections establishing)
tps = 4209.440321 (excluding connections establishing)

*** After changing fill factor to 90 —

Patch:
tps = 3794.414770 (including connections establishing)
tps = 3794.919592 (excluding connections establishing)

Head:
tps = 4206.445608 (including connections establishing)
tps = 4207.033559 (excluding connections establishing)

[1]
https://www.postgresql.org/message-id/CABOikdMduu9wOhfvNzqVuNW4YdBgbgwv-A%3DHNFCL7R5Tmbx7JA%40mail.gmail.com


I have done some perfing for the patch and I have noticed that time is
increased in heap_check_warm_chain function.

Top 10 functions in perf results (with patch):
+8.98% 1.04%  postgres  postgres[.] varstr_cmp
+7.24% 0.00%  postgres  [unknown]   [.] 
+6.34% 0.36%  postgres  libc-2.17.so[.] clock_gettime
+6.34% 0.00%  postgres  [unknown]   [.] 0x0003
+6.18% 6.15%  postgres  [vdso]  [.] __vdso_clock_gettime
+5.72% 0.02%  postgres  [kernel.kallsyms]   [k] system_call_fastpath
+4.08% 4.06%  postgres  libc-2.17.so[.] __memcpy_ssse3_back
+4.08% 4.06%  postgres  libc-2.17.so[.] get_next_seq
+3.92% 0.00%  postgres  [unknown]   [.] 0x6161616161616161
+3.07% 3.05%  postgres  postgres[.] heap_check_warm_chain


Thanks to Amit for helping in discussing the test ideas.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Bruce Momjian
On Tue, Mar 21, 2017 at 04:04:58PM -0400, Bruce Momjian wrote:
> On Tue, Mar 21, 2017 at 04:56:16PM -0300, Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > > On Tue, Mar 21, 2017 at 04:43:58PM -0300, Alvaro Herrera wrote:
> > > > Bruce Momjian wrote:
> > > > 
> > > > > I don't think it makes sense to try and save bits and add complexity
> > > > > when we have no idea if we will ever use them,
> > > > 
> > > > If we find ourselves in dire need of additional bits, there is a known
> > > > mechanism to get back 2 bits from old-style VACUUM FULL.  I assume that
> > > > the reason nobody has bothered to write the code for that is that
> > > > there's no *that* much interest.
> > > 
> > > We have no way of tracking if users still have pages that used the bits
> > > via pg_upgrade before they were removed.
> > 
> > Yes, that's exactly the code that needs to be written.
> 
> Yes, but once it is written it will take years before those bits can be
> used on most installations.

Actually, the 2 bits from old-style VACUUM FULL bits could be reused if
one of the WARM bits would be set  when it is checked.  The WARM bits
will all be zero on pre-9.0.  The check would have to be checking the
old-style VACUUM FULL bit and checking that a WARM bit is set.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Robert Haas
On Thu, Mar 30, 2017 at 11:41 AM, Andres Freund  wrote:
> On 2017-03-30 16:43:41 +0530, Pavan Deolasee wrote:
>> Looks like OID conflict to me.. Please try rebased set.
>
> Pavan, Alvaro, everyone: I know you guys are working very hard on this,
> but I think at this point it's too late to commit this for v10.  This is
> patch that's affecting the on-disk format, in quite subtle
> ways.  Committing this just at the end of the development cyle / shortly
> before feature freeze, seems too dangerous to me.
>
> Let's commit this just at the beginning of the cycle, so we have time to
> shake out the bugs.

+1, although I think it should also have substantially more review first.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Andres Freund
Hi,

On 2017-03-30 16:43:41 +0530, Pavan Deolasee wrote:
> Looks like OID conflict to me.. Please try rebased set.

Pavan, Alvaro, everyone: I know you guys are working very hard on this,
but I think at this point it's too late to commit this for v10.  This is
patch that's affecting the on-disk format, in quite subtle
ways.  Committing this just at the end of the development cyle / shortly
before feature freeze, seems too dangerous to me.

Let's commit this just at the beginning of the cycle, so we have time to
shake out the bugs.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Petr Jelinek
On 30/03/17 16:04, Pavan Deolasee wrote:
> 
> 
> On Thu, Mar 30, 2017 at 7:27 PM, Robert Haas  > wrote:
> 
> On Thu, Mar 30, 2017 at 6:37 AM, Pavan Deolasee
> > wrote:
> > I think we can fix that by comparing compressed values.  I know you had
> > raised concerns, but Robert confirmed that (IIUC) it's not a problem 
> today.
> 
> I'm not sure that's an entirely fair interpretation of what I said.
> My point was that, while it may not be broken today, it might not be a
> good idea to rely for correctness on it always being true.
> 
> 
> I take that point. We have a choice of fixing it today or whenever to
> support multiple compression techniques. We don't even know how that
> will look like and whether we will be able to look at compressed data
> and tell whether two values are compressed by exact same way.
> 

While reading this thread I am thinking if we could just not do WARM on
TOAST and compressed values if we know there might be regressions there.
I mean I've seen the problem WARM tries to solve mostly on timestamp or
boolean values and sometimes counters so it would still be helpful to
quite a lot of people even if we didn't do TOAST and compressed values
in v1. It's not like not doing WARM sometimes is somehow terrible, we'll
just fall back to current behavior.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Robert Haas
On Thu, Mar 30, 2017 at 10:08 AM, Amit Kapila  wrote:
> I think we should not consider doing compression and decompression as
> free at this point in code, because we hold a buffer lock during
> recheck. Buffer locks are meant for short-term locks (it is even
> mentioned in storage/buffer/README), doing all the
> compression/decompression/detoast stuff under these locks doesn't
> sound advisable to me.  It can block many concurrent operations.

Compression and decompression might cause performance problems, but
try to access the TOAST table would be fatal; that probably would have
deadlock hazards among other problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Amit Kapila
On Thu, Mar 30, 2017 at 5:55 PM, Pavan Deolasee
 wrote:
>
> On Thu, Mar 30, 2017 at 5:27 PM, Amit Kapila 
> wrote:
>>
>>
>>
>> How have you verified that?  Have you checked that in
>> heap_prepare_insert it has called toast_insert_or_update() and then
>> returned a tuple different from what the input tup is?  Basically, I
>> am easily able to see it and even the reason why the heap and index
>> tuples will be different.  Let me try to explain,
>> toast_insert_or_update returns a new tuple which contains compressed
>> data and this tuple is inserted in heap where as slot still refers to
>> original tuple (uncompressed one) which is passed to heap_insert.
>> Now, ExecInsertIndexTuples and the calls under it like FormIndexDatum
>> will refer to the tuple in slot which is uncompressed and form the
>> values[] using uncompressed value.
>
>
> Ah, yes. You're right. Not sure why I saw things differently. That doesn't
> anything though because during recheck we'll get compressed value and not do
> anything with it. In the index we already have compressed value and we can
> compare them. Even if we decide to decompress everything and do the
> comparison, that should be possible.
>

I think we should not consider doing compression and decompression as
free at this point in code, because we hold a buffer lock during
recheck. Buffer locks are meant for short-term locks (it is even
mentioned in storage/buffer/README), doing all the
compression/decompression/detoast stuff under these locks doesn't
sound advisable to me.  It can block many concurrent operations.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Pavan Deolasee
On Thu, Mar 30, 2017 at 7:27 PM, Robert Haas  wrote:

> On Thu, Mar 30, 2017 at 6:37 AM, Pavan Deolasee
>  wrote:
> > I think we can fix that by comparing compressed values.  I know you had
> > raised concerns, but Robert confirmed that (IIUC) it's not a problem
> today.
>
> I'm not sure that's an entirely fair interpretation of what I said.
> My point was that, while it may not be broken today, it might not be a
> good idea to rely for correctness on it always being true.
>
>
I take that point. We have a choice of fixing it today or whenever to
support multiple compression techniques. We don't even know how that will
look like and whether we will be able to look at compressed data and tell
whether two values are compressed by exact same way.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Robert Haas
On Thu, Mar 30, 2017 at 6:37 AM, Pavan Deolasee
 wrote:
> I think we can fix that by comparing compressed values.  I know you had
> raised concerns, but Robert confirmed that (IIUC) it's not a problem today.

I'm not sure that's an entirely fair interpretation of what I said.
My point was that, while it may not be broken today, it might not be a
good idea to rely for correctness on it always being true.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Pavan Deolasee
On Thu, Mar 30, 2017 at 5:27 PM, Amit Kapila 
wrote:

>
>
> How have you verified that?  Have you checked that in
> heap_prepare_insert it has called toast_insert_or_update() and then
> returned a tuple different from what the input tup is?  Basically, I
> am easily able to see it and even the reason why the heap and index
> tuples will be different.  Let me try to explain,
> toast_insert_or_update returns a new tuple which contains compressed
> data and this tuple is inserted in heap where as slot still refers to
> original tuple (uncompressed one) which is passed to heap_insert.
> Now, ExecInsertIndexTuples and the calls under it like FormIndexDatum
> will refer to the tuple in slot which is uncompressed and form the
> values[] using uncompressed value.


Ah, yes. You're right. Not sure why I saw things differently. That doesn't
anything though because during recheck we'll get compressed value and not
do anything with it. In the index we already have compressed value and we
can compare them. Even if we decide to decompress everything and do the
comparison, that should be possible. So I don't see a problem as far as
correctness goes.


>
> So IIUC, in above test during initialization you have one WARM update
> and then during actual test all are HOT updates, won't in such a case
> the WARM chain will be converted to HOT by vacuum and then all updates
> from thereon will be HOT and probably no rechecks?
>

There is no AV.. Just 1 tuple being HOT updated out of 100 tuples.
Confirmed by looking at pg_stat_user_tables. Also made sure that the tuple
doesn't get non-HOT updated in between, thus breaking the WARM chain.


>
>
> >
> > I then also repeated the tests, but this time using compressible values.
> The
> > regression in this case is much higher, may be 15% or more.
> >
>
> Sounds on higher side.
>
>
Yes, definitely. If we can't reduce that, we might want to provide table
level option to explicitly turn WARM off on such tables.


> IIUC, by the time you are comparing tuple attrs to check for modified
> columns, you don't have the compressed values for new tuple.
>
>
I think it depends. If the value is not being modified, then we will get
both values as compressed. At least I confirmed with your example and
running an update which only changes c1. Don't know if that holds for all
cases.


> >  I know you had
> > raised concerns, but Robert confirmed that (IIUC) it's not a problem
> today.
> >
>
> Yeah, but I am not sure if we can take Robert's statement as some sort
> of endorsement for what the patch does.
>
>
Sure.


> > We will figure out how to deal with it if we ever add support for
> different
> > compression algorithms or compression levels. And I also think this is
> kinda
> > synthetic use case and the fact that there is not much regression with
> > indexes as large as 2K bytes seems quite comforting to me.
> >
>
> I am not sure if we can consider it as completely synthetic because we
> might see some similar cases for json datatypes.  Can we once try to
> see the impact when the same test runs from multiple clients?


Ok. Might become hard to control HOT behaviour though. Or will need to do
mix of WARM/HOT updates. Will see if this is something easily doable by
setting high FF etc.


>   For
> your information, I am also trying to setup some tests along with one
> of my colleague and we will report the results once the tests are
> complete.
>
>
That'll be extremely helpful, especially if its a something close to
real-world scenario. Thanks for doing that.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Amit Kapila
On Thu, Mar 30, 2017 at 4:07 PM, Pavan Deolasee
 wrote:
>
> On Wed, Mar 29, 2017 at 4:42 PM, Amit Kapila 
> wrote:
>>
>> On Wed, Mar 29, 2017 at 1:10 PM, Pavan Deolasee
>>  wrote:
>> >
>> > On Wed, Mar 29, 2017 at 12:02 PM, Amit Kapila 
>> > wrote:
>> >>
>> >> On Wed, Mar 29, 2017 at 11:52 AM, Amit Kapila 
>> >> wrote:
>> >
>> > Then during recheck, we pass already compressed values to
>> > index_form_tuple(). But my point is, the following code will ensure that
>> > we
>> > don't compress it again. My reading is that the first check for
>> > !VARATT_IS_EXTENDED will return false if the value is already
>> > compressed.
>> >
>>
>> You are right.  I was confused with previous check of VARATT_IS_EXTERNAL.
>>
>
> Ok, thanks.
>
>>
>> >
>> > TBH I couldn't find why the original index insertion code will always
>> > supply
>> > uncompressed values.
>> >
>>
>> Just try by inserting large value of text column ('aa.bbb')
>> upto 2.5K.  Then have a breakpoint in heap_prepare_insert and
>> index_form_tuple, and debug both the functions, you can find out that
>> even though we compress during insertion in heap, the index will
>> compress the original value again.
>>
>
> Ok, tried that. AFAICS index_form_tuple gets compressed values.
>

How have you verified that?  Have you checked that in
heap_prepare_insert it has called toast_insert_or_update() and then
returned a tuple different from what the input tup is?  Basically, I
am easily able to see it and even the reason why the heap and index
tuples will be different.  Let me try to explain,
toast_insert_or_update returns a new tuple which contains compressed
data and this tuple is inserted in heap where as slot still refers to
original tuple (uncompressed one) which is passed to heap_insert.
Now, ExecInsertIndexTuples and the calls under it like FormIndexDatum
will refer to the tuple in slot which is uncompressed and form the
values[] using uncompressed value.  Try with a simple case as below:

Create table t_comp(c1 int, c2 text);
Create index idx_t_comp_c2 on t_comp(c2);
Create index idx_t_comp_c1 on t_comp(c1);

Insert into t_comp(1,' ...aaa');

Repeat 'a' in above line for 2700 times or so.  You should notice what
I am explaining above.


>>
>>
>> Yeah probably you are right, but I am not sure if it is good idea to
>> compare compressed values.
>>
>
> Again, I don't see a problem there.
>
>>
>> I think with this new changes in btrecheck, it would appear to be much
>> costlier as compare to what you have few versions back.  I am afraid
>> that it can impact performance for cases where there are few WARM
>> updates in chain and many HOT updates as it will run recheck for all
>> such updates.
>
>
>
> INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
> <2300 chars string>, (random()::bigint) % :scale, 0;
>
> CREATE UNIQUE INDEX pgb_a_aid ON pgbench_accounts(aid);
> CREATE INDEX pgb_a_filler1 ON pgbench_accounts(filler1);
> CREATE INDEX pgb_a_filler2 ON pgbench_accounts(filler2);
> CREATE INDEX pgb_a_filler3 ON pgbench_accounts(filler3);
> CREATE INDEX pgb_a_filler4 ON pgbench_accounts(filler4);
>
> -- Force a WARM update on one row
> UPDATE pgbench_accounts SET filler1 = 'X' WHERE aid = '100' ||
> repeat('abcdefghij', 2);
>
> Test:
> -- Fetch the row using the fat index. Since the row contains a
> BEGIN;
> SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
> <2300 chars string> ORDER BY aid;
> UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
> <2300 chars string>;
> END;
>
> I did 4 5-minutes runs with master and WARM and there is probably a 2-3%
> regression.
>

So IIUC, in above test during initialization you have one WARM update
and then during actual test all are HOT updates, won't in such a case
the WARM chain will be converted to HOT by vacuum and then all updates
from thereon will be HOT and probably no rechecks?

> (Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of
> scans on the fat index)
> master:
> txns  idx_scan
> 414117 828233
> 411109 822217
> 411848 823695
> 408424 816847
>
> WARM:
> txns   idx_scan
> 404139 808277
> 398880 797759
> 399949 799897
> 397927 795853
>
> ==
>
> I then also repeated the tests, but this time using compressible values. The
> regression in this case is much higher, may be 15% or more.
>

Sounds on higher side.

> INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
> repeat('abcdefghij', 2), (random()::bigint) % :scale, 0;
>
> -- Fetch the row using the fat index. Since the row contains a
> BEGIN;
> SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
> repeat('abcdefghij', 2) ORDER BY aid;
> UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
> repeat('abcdefghij', 2);
> END;
>
> (Results with 5 mins 

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Pavan Deolasee
On Wed, Mar 29, 2017 at 4:42 PM, Amit Kapila 
wrote:

> On Wed, Mar 29, 2017 at 1:10 PM, Pavan Deolasee
>  wrote:
> >
> > On Wed, Mar 29, 2017 at 12:02 PM, Amit Kapila 
> > wrote:
> >>
> >> On Wed, Mar 29, 2017 at 11:52 AM, Amit Kapila 
> >> wrote:
> >
> > Then during recheck, we pass already compressed values to
> > index_form_tuple(). But my point is, the following code will ensure that
> we
> > don't compress it again. My reading is that the first check for
> > !VARATT_IS_EXTENDED will return false if the value is already compressed.
> >
>
> You are right.  I was confused with previous check of VARATT_IS_EXTERNAL.
>
>
Ok, thanks.


> >
> > TBH I couldn't find why the original index insertion code will always
> supply
> > uncompressed values.
> >
>
> Just try by inserting large value of text column ('aa.bbb')
> upto 2.5K.  Then have a breakpoint in heap_prepare_insert and
> index_form_tuple, and debug both the functions, you can find out that
> even though we compress during insertion in heap, the index will
> compress the original value again.
>
>
Ok, tried that. AFAICS index_form_tuple gets compressed values.


>
> Yeah probably you are right, but I am not sure if it is good idea to
> compare compressed values.
>
>
Again, I don't see a problem there.


> I think with this new changes in btrecheck, it would appear to be much
> costlier as compare to what you have few versions back.  I am afraid
> that it can impact performance for cases where there are few WARM
> updates in chain and many HOT updates as it will run recheck for all
> such updates.


My feeling is that the recheck could be costly for very fat indexes, but
not doing WARM could be costly too for such indexes. We can possibly
construct a worst case where
1. set up a table with a fat index.
2. do a WARM update to a tuple
3. then do several HOT updates to the same tuple
4. query the row via the fat index.


Initialisation:

-- Adjust parameters to force index scans
-- enable_seqscan to false
-- seq_page_cost = 1

DROP TABLE IF EXISTS pgbench_accounts;

CREATE TABLE pgbench_accounts (
aid text,
bid bigint,
abalance bigint,
filler1 text DEFAULT md5(random()::text),
filler2 text DEFAULT md5(random()::text),
filler3 text DEFAULT md5(random()::text),
filler4 text DEFAULT md5(random()::text),
filler5 text DEFAULT md5(random()::text),
filler6 text DEFAULT md5(random()::text),
filler7 text DEFAULT md5(random()::text),
filler8 text DEFAULT md5(random()::text),
filler9 text DEFAULT md5(random()::text),
filler10 text DEFAULT md5(random()::text),
filler11 text DEFAULT md5(random()::text),
filler12 text DEFAULT md5(random()::text)
) WITH (fillfactor=90);
\set end 0
\set start (:end + 1)
\set end (:start + (:scale * 100))

INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
<2300 chars string>, (random()::bigint) % :scale, 0;

CREATE UNIQUE INDEX pgb_a_aid ON pgbench_accounts(aid);
CREATE INDEX pgb_a_filler1 ON pgbench_accounts(filler1);
CREATE INDEX pgb_a_filler2 ON pgbench_accounts(filler2);
CREATE INDEX pgb_a_filler3 ON pgbench_accounts(filler3);
CREATE INDEX pgb_a_filler4 ON pgbench_accounts(filler4);

-- Force a WARM update on one row
UPDATE pgbench_accounts SET filler1 = 'X' WHERE aid = '100' ||
repeat('abcdefghij', 2);

Test:
-- Fetch the row using the fat index. Since the row contains a
BEGIN;
SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
 <2300 chars string> ORDER BY aid;
UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
 <2300 chars string>;
END;

I did 4 5-minutes runs with master and WARM and there is probably a 2-3%
regression.

(Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of
scans on the fat index)
master:
txns  idx_scan
414117 828233
411109 822217
411848 823695
408424 816847

WARM:
txns   idx_scan
404139 808277
398880 797759
399949 799897
397927 795853

==

I then also repeated the tests, but this time using compressible values.
The regression in this case is much higher, may be 15% or more.

INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
repeat('abcdefghij', 2), (random()::bigint) % :scale, 0;

-- Fetch the row using the fat index. Since the row contains a
BEGIN;
SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
repeat('abcdefghij', 2) ORDER BY aid;
UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
repeat('abcdefghij', 2);
END;

(Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of
scans on the fat index)
master:
txns   idx_scan
56976 113953
56822 113645
56915 113831
56865 113731

WARM:
txns  idx_scan
49044 98087
49020 98039
49007 98013
49006 98011

But TBH I believe this regression is coming from the changes
to heap_tuple_attr_equals where we are decompressing both old and new
values and then 

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Dilip Kumar
On Wed, Mar 29, 2017 at 11:51 AM, Pavan Deolasee
 wrote:
> Thanks. I think your patch of tracking interesting attributes seems ok too
> after the performance issue was addressed. Even though we can still improve
> that further, at least Mithun confirmed that there is no significant
> regression anymore and in fact for one artificial case, patch does better
> than even master.

I was trying to compile these patches on latest
head(f90d23d0c51895e0d7db7910538e85d3d38691f0) for some testing but I
was not able to compile it.

make[3]: *** [postgres.bki] Error 1
make[3]: Leaving directory
`/home/dilip/work/pg_codes/pbms_final/postgresql/src/backend/catalog'
make[2]: *** [submake-schemapg] Error 2
make[2]: Leaving directory
`/home/dilip/work/pg_codes/pbms_final/postgresql/src/backend'
make[1]: *** [all-backend-recurse] Error 2
make[1]: Leaving directory `/home/dilip/work/pg_codes/pbms_final/postgresql/src'
make: *** [all-src-recurse] Error 2

I tried doing maintainer-clean, deleting postgres.bki but still the same error.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-29 Thread Alvaro Herrera
Pavan Deolasee wrote:
> On Wed, Mar 29, 2017 at 3:42 AM, Alvaro Herrera 
> wrote:
> 
> > I pushed 0002 after some makeup, since it's just cosmetic and not
> > controversial.
> 
> Thanks. I think your patch of tracking interesting attributes seems ok too
> after the performance issue was addressed. Even though we can still improve
> that further, at least Mithun confirmed that there is no significant
> regression anymore and in fact for one artificial case, patch does better
> than even master.

Great, thanks.  I pushed it, too.  One optimization we could try is
using slot deform instead of repeated heap_getattr().  Patch is
attached.  I haven't benchmarked it.

On top of that, but perhaps getting in the realm of excessive
complication, we could see if the bitmapset is a singleton, and if it is
then do heap_getattr without creating the slot.  That'd require to have
a second copy of heap_tuple_attr_equals() that takes a HeapTuple instead
of a TupleTableSlot.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0c3e2b0..976de99 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -56,6 +56,7 @@
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
+#include "executor/tuptable.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
@@ -4337,7 +4338,7 @@ l2:
  */
 static bool
 heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
-  HeapTuple tup1, HeapTuple tup2)
+  TupleTableSlot *tup1, TupleTableSlot 
*tup2)
 {
Datum   value1,
value2;
@@ -4366,13 +4367,10 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
}
 
/*
-* Extract the corresponding values.  XXX this is pretty inefficient if
-* there are many indexed columns.  Should HeapDetermineModifiedColumns 
do
-* a single heap_deform_tuple call on each tuple, instead?  But that
-* doesn't work for system columns ...
+* Extract the corresponding values.
 */
-   value1 = heap_getattr(tup1, attrnum, tupdesc, );
-   value2 = heap_getattr(tup2, attrnum, tupdesc, );
+   value1 = slot_getattr(tup1, attrnum, );
+   value2 = slot_getattr(tup2, attrnum, );
 
/*
 * If one value is NULL and other is not, then they are certainly not
@@ -4424,17 +4422,27 @@ HeapDetermineModifiedColumns(Relation relation, 
Bitmapset *interesting_cols,
 {
int attnum;
Bitmapset *modified = NULL;
+   TupleTableSlot *oldslot;
+   TupleTableSlot *newslot;
+
+   oldslot = MakeSingleTupleTableSlot(RelationGetDescr(relation));
+   ExecStoreTuple(oldtup, oldslot, InvalidBuffer, false);
+   newslot = MakeSingleTupleTableSlot(RelationGetDescr(relation));
+   ExecStoreTuple(newtup, newslot, InvalidBuffer, false);
 
while ((attnum = bms_first_member(interesting_cols)) >= 0)
{
attnum += FirstLowInvalidHeapAttributeNumber;
 
if (!heap_tuple_attr_equals(RelationGetDescr(relation),
-  attnum, 
oldtup, newtup))
+   attnum, 
oldslot, newslot))
modified = bms_add_member(modified,
  
attnum - FirstLowInvalidHeapAttributeNumber);
}
 
+   ExecDropSingleTupleTableSlot(oldslot);
+   ExecDropSingleTupleTableSlot(newslot);
+
return modified;
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-29 Thread Robert Haas
On Wed, Mar 29, 2017 at 7:12 AM, Amit Kapila  wrote:
> No as I agreed above, it won't double-compress, but still looks
> slightly risky to rely on different set of values passed to
> index_form_tuple and then compare them.

It assumes that the compressor is completely deterministic, which I'm
fairly is true today, but might be false in the future.  For example:

https://groups.google.com/forum/#!topic/snappy-compression/W8v_ydnEPuc

We've talked about using snappy as a compression algorithm before, and
if the above post is correct, an upgrade to the snappy library version
is an example of a change that would break the assumption in question.
I think it's generally true for almost any modern compression
algorithm (including pglz) that there are multiple compressed texts
that would decompress to the same uncompressed text.  Any algorithm is
required to promise that it will always produce one of the compressed
texts that decompress back to the original, but not necessarily that
it will always produce the same one.

As another example of this, consider that zlib (gzip) has a variety of
options to control compression behavior, such as, most obviously, the
compression level (1 .. 9).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-29 Thread Amit Kapila
On Wed, Mar 29, 2017 at 1:10 PM, Pavan Deolasee
 wrote:
>
> On Wed, Mar 29, 2017 at 12:02 PM, Amit Kapila 
> wrote:
>>
>> On Wed, Mar 29, 2017 at 11:52 AM, Amit Kapila 
>> wrote:
>
> Then during recheck, we pass already compressed values to
> index_form_tuple(). But my point is, the following code will ensure that we
> don't compress it again. My reading is that the first check for
> !VARATT_IS_EXTENDED will return false if the value is already compressed.
>

You are right.  I was confused with previous check of VARATT_IS_EXTERNAL.

>
> TBH I couldn't find why the original index insertion code will always supply
> uncompressed values.
>

Just try by inserting large value of text column ('aa.bbb')
upto 2.5K.  Then have a breakpoint in heap_prepare_insert and
index_form_tuple, and debug both the functions, you can find out that
even though we compress during insertion in heap, the index will
compress the original value again.

> But even if does, and even if the recheck gets it in
> compressed form, I don't see how we will double-compress that.
>

No as I agreed above, it won't double-compress, but still looks
slightly risky to rely on different set of values passed to
index_form_tuple and then compare them.

> As far as, comparing two compressed values go, I don't see a problem there.
> Exact same compressed values should decompress to exact same value. So
> comparing two compressed values and two uncompressed values should give us
> the same result.
>

Yeah probably you are right, but I am not sure if it is good idea to
compare compressed values.

I think with this new changes in btrecheck, it would appear to be much
costlier as compare to what you have few versions back.  I am afraid
that it can impact performance for cases where there are few WARM
updates in chain and many HOT updates as it will run recheck for all
such updates.  Did we any time try to measure the performance of cases
like that?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-29 Thread Pavan Deolasee
On Wed, Mar 29, 2017 at 12:02 PM, Amit Kapila 
wrote:

> On Wed, Mar 29, 2017 at 11:52 AM, Amit Kapila 
> wrote:
> > On Tue, Mar 28, 2017 at 10:35 PM, Pavan Deolasee
> >  wrote:
> >>
> >> On Tue, Mar 28, 2017 at 7:04 PM, Amit Kapila 
> >> wrote:
> >>>
> >>>   For such an heap insert, we will pass
> >>> the actual value of column to index_form_tuple during index insert.
> >>> However during recheck when we fetch the value of c2 from heap tuple
> >>> and pass it index tuple, the value is already in compressed form and
> >>> index_form_tuple might again try to compress it because the size will
> >>> still be greater than TOAST_INDEX_TARGET and if it does so, it might
> >>> make recheck fail.
> >>>
> >>
> >> Would it? I thought  "if
> >> (!VARATT_IS_EXTENDED(DatumGetPointer(untoasted_values[i]))" check
> should
> >> prevent that. But I could be reading those macros wrong. They are
> probably
> >> heavily uncommented and it's not clear what each of those VARATT_*
> macro do.
> >>
> >
> > That won't handle the case where it is simply compressed.  You need
> > check like VARATT_IS_COMPRESSED to take care of compressed heap
> > tuples, but then also it won't work because heap_tuple_fetch_attr()
> > doesn't handle compressed tuples.  You need to use
> > heap_tuple_untoast_attr() to handle the compressed case.  Also, we
> > probably need to handle other type of var attrs.  Now, If we want to
> > do all of that index_form_tuple()  might not be the right place, we
> > probably want to handle it in caller or provide an alternate API.
> >
>
> Another related, index_form_tuple() has a check for VARATT_IS_EXTERNAL
> not VARATT_IS_EXTENDED, so may be that is cause of confusion for you,
> but as I mentioned even if you change the check heap_tuple_fetch_attr
> won't suffice the need.
>
>
I am confused :-(

Assuming big-endian machine:

VARATT_IS_4B_U - !toasted && !compressed
VARATT_IS_4B_C - compressed (may or may not be toasted)
VARATT_IS_4B - !toasted (may or may not be compressed)
VARATT_IS_1B_E - toasted

#define VARATT_IS_EXTERNAL(PTR) VARATT_IS_1B_E(PTR)
#define VARATT_IS_EXTENDED(PTR) (!VARATT_IS_4B_U(PTR))

So VARATT_IS_EXTENDED means that the value is (toasted || compressed). If
we are looking at a value from the heap (untoasted) then it implies in-heap
compression. If we are looking at untoasted value, then it means
compression in the toast.

index_form_tuple() first checks if the value is externally toasted and
fetches the untoasted value if so. After that it checks if
!VARATT_IS_EXTENDED i.e. if the value is (!toasted && !compressed) and then
only try to apply compression on that.  It can't be a toasted value because
if it was, we just untoasted it. But it can be compressed either in the
heap or in the toast, in which case we don't try to compress it again. That
makes sense because if the value is already compressed there is not point
applying compression again.

Now what you're suggesting (it seems) is that when in-heap compression is
used and ExecInsertIndexTuples calls FormIndexDatum to create index tuple
values, it always passes uncompressed heap values. So when the index tuple
is originally inserted, it index_form_tuple() will try to compress it and
see if it fits in the index.

Then during recheck, we pass already compressed values to
index_form_tuple(). But my point is, the following code will ensure that we
don't compress it again. My reading is that the first check for
!VARATT_IS_EXTENDED will return false if the value is already compressed.

/*
 * If value is above size target, and is of a compressible datatype,
 * try to compress it in-line.
 */
if (!VARATT_IS_EXTENDED(DatumGetPointer(untoasted_values[i])) &&
VARSIZE(DatumGetPointer(untoasted_values[i])) > TOAST_INDEX_TARGET
&&
(att->attstorage == 'x' || att->attstorage == 'm'))
{
Datum   cvalue = toast_compress_datum(untoasted_values[i]);

if (DatumGetPointer(cvalue) != NULL)
{
/* successful compression */
if (untoasted_free[i])
pfree(DatumGetPointer(untoasted_values[i]));
untoasted_values[i] = cvalue;
untoasted_free[i] = true;
}
}

TBH I couldn't find why the original index insertion code will always
supply uncompressed values. But even if does, and even if the recheck gets
it in compressed form, I don't see how we will double-compress that.

As far as, comparing two compressed values go, I don't see a problem there.
Exact same compressed values should decompress to exact same value. So
comparing two compressed values and two uncompressed values should give us
the same result.

Would you mind creating a test case to explain the situation? I added a few
more test cases to src/test/regress/sql/warm.sql and it also shows 

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-29 Thread Amit Kapila
On Tue, Mar 28, 2017 at 10:31 PM, Pavan Deolasee
 wrote:
>
> On Tue, Mar 28, 2017 at 4:05 PM, Amit Kapila 
> wrote:
>>
>> As asked previously, can you explain me on what basis are you
>> considering it robust?  The comments on top of datumIsEqual() clearly
>> indicates the danger of using it for toasted values (Also, it will
>> probably not give the answer you want if either datum has been
>> "toasted".).
>
>
> Hmm. I don' see why the new code in recheck is unsafe. The index values
> themselves can't be toasted (IIUC), but they can be compressed.
> index_form_tuple() already untoasts any toasted heap attributes and
> compresses them if needed. So once we pass heap values via
> index_form_tuple() we should have exactly the same index values as they were
> inserted. Or am I missing something obvious here?
>

I don't think relying on datum comparison for compressed values from
heap and index is safe (even after you try to form index tuple from
heap value again during recheck) and I have mentioned one of the
hazards of doing so upthread.  Do you see any place else where we rely
on comparison of compressed values?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-29 Thread Amit Kapila
On Wed, Mar 29, 2017 at 11:52 AM, Amit Kapila  wrote:
> On Tue, Mar 28, 2017 at 10:35 PM, Pavan Deolasee
>  wrote:
>>
>> On Tue, Mar 28, 2017 at 7:04 PM, Amit Kapila 
>> wrote:
>>>
>>>   For such an heap insert, we will pass
>>> the actual value of column to index_form_tuple during index insert.
>>> However during recheck when we fetch the value of c2 from heap tuple
>>> and pass it index tuple, the value is already in compressed form and
>>> index_form_tuple might again try to compress it because the size will
>>> still be greater than TOAST_INDEX_TARGET and if it does so, it might
>>> make recheck fail.
>>>
>>
>> Would it? I thought  "if
>> (!VARATT_IS_EXTENDED(DatumGetPointer(untoasted_values[i]))" check should
>> prevent that. But I could be reading those macros wrong. They are probably
>> heavily uncommented and it's not clear what each of those VARATT_* macro do.
>>
>
> That won't handle the case where it is simply compressed.  You need
> check like VARATT_IS_COMPRESSED to take care of compressed heap
> tuples, but then also it won't work because heap_tuple_fetch_attr()
> doesn't handle compressed tuples.  You need to use
> heap_tuple_untoast_attr() to handle the compressed case.  Also, we
> probably need to handle other type of var attrs.  Now, If we want to
> do all of that index_form_tuple()  might not be the right place, we
> probably want to handle it in caller or provide an alternate API.
>

Another related, index_form_tuple() has a check for VARATT_IS_EXTERNAL
not VARATT_IS_EXTENDED, so may be that is cause of confusion for you,
but as I mentioned even if you change the check heap_tuple_fetch_attr
won't suffice the need.




-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-29 Thread Amit Kapila
On Tue, Mar 28, 2017 at 10:35 PM, Pavan Deolasee
 wrote:
>
> On Tue, Mar 28, 2017 at 7:04 PM, Amit Kapila 
> wrote:
>>
>>
>>
>>   For such an heap insert, we will pass
>> the actual value of column to index_form_tuple during index insert.
>> However during recheck when we fetch the value of c2 from heap tuple
>> and pass it index tuple, the value is already in compressed form and
>> index_form_tuple might again try to compress it because the size will
>> still be greater than TOAST_INDEX_TARGET and if it does so, it might
>> make recheck fail.
>>
>
> Would it? I thought  "if
> (!VARATT_IS_EXTENDED(DatumGetPointer(untoasted_values[i]))" check should
> prevent that. But I could be reading those macros wrong. They are probably
> heavily uncommented and it's not clear what each of those VARATT_* macro do.
>

That won't handle the case where it is simply compressed.  You need
check like VARATT_IS_COMPRESSED to take care of compressed heap
tuples, but then also it won't work because heap_tuple_fetch_attr()
doesn't handle compressed tuples.  You need to use
heap_tuple_untoast_attr() to handle the compressed case.  Also, we
probably need to handle other type of var attrs.  Now, If we want to
do all of that index_form_tuple()  might not be the right place, we
probably want to handle it in caller or provide an alternate API.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Alvaro Herrera
I pushed 0002 after some makeup, since it's just cosmetic and not
controversial.  Here's 0003 rebased on top of it.

(Also, I took out the gin and gist changes: it would be wrong to change
that unconditionally, because the 0x pattern appears in indexes that
would be pg_upgraded.  We need a different strategy, if we want to
enable WARM on GiST/GIN indexes.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From f6ba238dd46416eb29ac43dadac0c69a75f106fe Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 28 Mar 2017 17:39:26 -0300
Subject: [PATCH] Free 3 bits of ItemPointerData.ip_posid

Since we limit block size to 32 kB, the highest offset number used in a
bufpage.h-organized page is 5461, which can be represented with 13 bits.
Therefore, the upper 3 bits in 16-bit ItemPointerData.ip_posid are
unused and this commit reserves them for other purposes.

Author: Pavan Deolasee
---
 src/include/access/htup_details.h |  2 +-
 src/include/storage/itemptr.h | 30 +++---
 src/include/storage/off.h | 11 ++-
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/src/include/access/htup_details.h 
b/src/include/access/htup_details.h
index 7b6285d..d3cc0ad 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -282,7 +282,7 @@ struct HeapTupleHeaderData
  * than MaxOffsetNumber, so that it can be distinguished from a valid
  * offset number in a regular item pointer.
  */
-#define SpecTokenOffsetNumber  0xfffe
+#define SpecTokenOffsetNumber  OffsetNumberPrev(OffsetNumberMask)
 
 /*
  * HeapTupleHeader accessor macros
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h
index c21d2ad..74eed4e 100644
--- a/src/include/storage/itemptr.h
+++ b/src/include/storage/itemptr.h
@@ -57,7 +57,7 @@ typedef ItemPointerData *ItemPointer;
  * True iff the disk item pointer is not NULL.
  */
 #define ItemPointerIsValid(pointer) \
-   ((bool) (PointerIsValid(pointer) && ((pointer)->ip_posid != 0)))
+   ((bool) (PointerIsValid(pointer) && (((pointer)->ip_posid & 
OffsetNumberMask) != 0)))
 
 /*
  * ItemPointerGetBlockNumberNoCheck
@@ -84,7 +84,7 @@ typedef ItemPointerData *ItemPointer;
  */
 #define ItemPointerGetOffsetNumberNoCheck(pointer) \
 ( \
-   (pointer)->ip_posid \
+   ((pointer)->ip_posid & OffsetNumberMask) \
 )
 
 /*
@@ -98,6 +98,30 @@ typedef ItemPointerData *ItemPointer;
 )
 
 /*
+ * Get the flags stored in high order bits in the OffsetNumber.
+ */
+#define ItemPointerGetFlags(pointer) \
+( \
+   ((pointer)->ip_posid & ~OffsetNumberMask) >> OffsetNumberBits \
+)
+
+/*
+ * Set the flag bits. We first left-shift since flags are defined starting 0x01
+ */
+#define ItemPointerSetFlags(pointer, flags) \
+( \
+   ((pointer)->ip_posid |= ((flags) << OffsetNumberBits)) \
+)
+
+/*
+ * Clear all flags.
+ */
+#define ItemPointerClearFlags(pointer) \
+( \
+   ((pointer)->ip_posid &= OffsetNumberMask) \
+)
+
+/*
  * ItemPointerSet
  * Sets a disk item pointer to the specified block and offset.
  */
@@ -105,7 +129,7 @@ typedef ItemPointerData *ItemPointer;
 ( \
AssertMacro(PointerIsValid(pointer)), \
BlockIdSet(&((pointer)->ip_blkid), blockNumber), \
-   (pointer)->ip_posid = offNum \
+   (pointer)->ip_posid = (offNum) \
 )
 
 /*
diff --git a/src/include/storage/off.h b/src/include/storage/off.h
index fe8638f..f058fe1 100644
--- a/src/include/storage/off.h
+++ b/src/include/storage/off.h
@@ -26,7 +26,16 @@ typedef uint16 OffsetNumber;
 #define InvalidOffsetNumber((OffsetNumber) 0)
 #define FirstOffsetNumber  ((OffsetNumber) 1)
 #define MaxOffsetNumber((OffsetNumber) (BLCKSZ / 
sizeof(ItemIdData)))
-#define OffsetNumberMask   (0x)/* valid uint16 
bits */
+
+/*
+ * The biggest BLCKSZ we support is 32kB, and each ItemId takes 6 bytes.
+ * That limits the number of line pointers in a page to 32kB/6B = 5461.
+ * Therefore, 13 bits in OffsetNumber are enough to represent all valid
+ * on-disk line pointers.  Hence, we can reserve the high-order bits in
+ * OffsetNumber for other purposes.
+ */
+#define OffsetNumberBits   13
+#define OffsetNumberMask   uint16) 1) << OffsetNumberBits) - 1)
 
 /* 
  * support macros
-- 
2.1.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 8:34 PM, Robert Haas  wrote:

> On Mon, Mar 27, 2017 at 10:25 PM, Pavan Deolasee
>  wrote:
> > On Tue, Mar 28, 2017 at 1:59 AM, Robert Haas 
> wrote:
> >> On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
> >>  wrote:
> >> > It's quite hard to say that until we see many more benchmarks. As
> author
> >> > of
> >> > the patch, I might have got repetitive with my benchmarks. But I've
> seen
> >> > over 50% improvement in TPS even without chain conversion (6 indexes
> on
> >> > a 12
> >> > column table test).
> >>
> >> This seems quite mystifying.  What can account for such a large
> >> performance difference in such a pessimal scenario?  It seems to me
> >> that without chain conversion, WARM can only apply to each row once
> >> and therefore no sustained performance improvement should be possible
> >> -- unless rows are regularly being moved to new blocks, in which case
> >> those updates would "reset" the ability to again perform an update.
> >> However, one would hope that most updates get done within a single
> >> block, so that the row-moves-to-new-block case wouldn't happen very
> >> often.
> >
> > I think you're confusing between update chains that stay within a block
> vs
> > HOT/WARM chains. Even when the entire update chain stays within a block,
> it
> > can be made up of multiple HOT/WARM chains and each of these chains offer
> > ability to do one WARM update. So even without chain conversion, every
> > alternate update will be a WARM update. So the gains are perpetual.
>
> You're right, I had overlooked that.  But then I'm confused: how does
> the chain conversion stuff help as much as it does?  You said that you
> got a 50% improvement from WARM, because we got to skip half the index
> updates.  But then you said with chain conversion you got an
> improvement of more like 100%.  However, I would think that on this
> workload, chain conversion shouldn't save much.  If you're sweeping
> through the database constantly performing updates, the updates ought
> to be a lot more frequent than the vacuums.
>
> No?


These tests were done on a very large table of 80M rows. The table itself
was wide with 15 columns and a few indexes. So in a 8hr test, master could
do only 55M updates where as WARM did 105M updates. There were 4 autovacuum
cycles in both these runs. So while there were many updates, I am sure
autovacuum must have helped to increase the percentage of WARM updates
(from ~50% after steady state to ~67% after steady state). Also I said more
than 50%, but it was probably close to 65%.

Unfortunately these tests were done on different hardware, with different
settings and even slightly different scale factors. So they may not be
exactly comparable. But there is no doubt chain conversion will help to
some extent. I'll repeat the benchmark with chain conversion turned off and
report the exact difference.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 7:04 PM, Amit Kapila 
wrote:

>
>
>   For such an heap insert, we will pass
> the actual value of column to index_form_tuple during index insert.
> However during recheck when we fetch the value of c2 from heap tuple
> and pass it index tuple, the value is already in compressed form and
> index_form_tuple might again try to compress it because the size will
> still be greater than TOAST_INDEX_TARGET and if it does so, it might
> make recheck fail.
>
>
Would it? I thought  "if
(!VARATT_IS_EXTENDED(DatumGetPointer(untoasted_values[i]))" check should
prevent that. But I could be reading those macros wrong. They are probably
heavily uncommented and it's not clear what each of those VARATT_* macro do.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 4:07 PM, Amit Kapila 
wrote:

>
> Noted few cosmetic issues in 0005_warm_updates_v21:
>
> 1.
> pruneheap.c(939): warning C4098: 'heap_get_root_tuples' : 'void'
> function returning a value
>

Thanks. Will fix.


>
> 2.
> + *  HCWC_WARM_UPDATED_TUPLE - a tuple with HEAP_WARM_UPDATED is found
> somewhere
> + *in the chain. Note that when a tuple is WARM
> + *updated, both old and new versions are marked
> + *with this flag/
> + *
> + *  HCWC_WARM_TUPLE  - a tuple with HEAP_WARM_TUPLE is found somewhere in
> + *  the chain.
> + *
> + *  HCWC_CLEAR_TUPLE - a tuple without HEAP_WARM_TUPLE is found somewhere
> in
> + *   the chain.
>
> Description of all three flags is same.
>
>
Well the description is different (and correct), but given that it confused
you, I think I should rewrite those comments. Will do.


> 3.
> + *  HCWC_WARM_UPDATED_TUPLE - a tuple with HEAP_WARM_UPDATED is found
> somewhere
> + *in the chain. Note that when a tuple is WARM
> + *updated, both old and new versions are marked
> + *with this flag/
>
> Spurious '/' at end of line.
>
>
Thanks. Will fix.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 4:05 PM, Amit Kapila 
wrote:

>
>
> As asked previously, can you explain me on what basis are you
> considering it robust?  The comments on top of datumIsEqual() clearly
> indicates the danger of using it for toasted values (Also, it will
> probably not give the answer you want if either datum has been
> "toasted".).


Hmm. I don' see why the new code in recheck is unsafe. The index values
themselves can't be toasted (IIUC), but they can be compressed.
index_form_tuple() already untoasts any toasted heap attributes and
compresses them if needed. So once we pass heap values via
index_form_tuple() we should have exactly the same index values as they
were inserted. Or am I missing something obvious here?


  If you think that because we are using it during
> heap_update to find modified columns, then I think that is not right
> comparison, because there we are comparing compressed value (of old
> tuple) with uncompressed value (of new tuple) which should always give
> result as false.
>
>
Hmm, this seems like a problem. While HOT could tolerate occasional false
results (i.e. reporting a heap column as modified even though it it not),
WARM assumes that if the heap has reported different values, then they
better be different and should better result in different index values.
Because that's how recheck later works. Since index expressions are not
supported, I wonder if toasted heap values are the only remaining problem
in this area. So heap_tuple_attr_equals() should first detoast the heap
values and then do the comparison. I already have a test case that fails
for this reason, so let me try this approach.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread David Steele

Hi Pavan,

On 3/28/17 11:04 AM, Robert Haas wrote:

On Mon, Mar 27, 2017 at 10:25 PM, Pavan Deolasee
 wrote:

On Tue, Mar 28, 2017 at 1:59 AM, Robert Haas  wrote:

On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
 wrote:

It's quite hard to say that until we see many more benchmarks. As author
of
the patch, I might have got repetitive with my benchmarks. But I've seen
over 50% improvement in TPS even without chain conversion (6 indexes on
a 12
column table test).


This seems quite mystifying.  What can account for such a large
performance difference in such a pessimal scenario?  It seems to me
that without chain conversion, WARM can only apply to each row once
and therefore no sustained performance improvement should be possible
-- unless rows are regularly being moved to new blocks, in which case
those updates would "reset" the ability to again perform an update.
However, one would hope that most updates get done within a single
block, so that the row-moves-to-new-block case wouldn't happen very
often.


I think you're confusing between update chains that stay within a block vs
HOT/WARM chains. Even when the entire update chain stays within a block, it
can be made up of multiple HOT/WARM chains and each of these chains offer
ability to do one WARM update. So even without chain conversion, every
alternate update will be a WARM update. So the gains are perpetual.


You're right, I had overlooked that.  But then I'm confused: how does
the chain conversion stuff help as much as it does?  You said that you
got a 50% improvement from WARM, because we got to skip half the index
updates.  But then you said with chain conversion you got an
improvement of more like 100%.  However, I would think that on this
workload, chain conversion shouldn't save much.  If you're sweeping
through the database constantly performing updates, the updates ought
to be a lot more frequent than the vacuums.

No?


It appears that a patch is required to address Amit's review.  I have 
marked this as "Waiting for Author".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Robert Haas
On Mon, Mar 27, 2017 at 10:25 PM, Pavan Deolasee
 wrote:
> On Tue, Mar 28, 2017 at 1:59 AM, Robert Haas  wrote:
>> On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
>>  wrote:
>> > It's quite hard to say that until we see many more benchmarks. As author
>> > of
>> > the patch, I might have got repetitive with my benchmarks. But I've seen
>> > over 50% improvement in TPS even without chain conversion (6 indexes on
>> > a 12
>> > column table test).
>>
>> This seems quite mystifying.  What can account for such a large
>> performance difference in such a pessimal scenario?  It seems to me
>> that without chain conversion, WARM can only apply to each row once
>> and therefore no sustained performance improvement should be possible
>> -- unless rows are regularly being moved to new blocks, in which case
>> those updates would "reset" the ability to again perform an update.
>> However, one would hope that most updates get done within a single
>> block, so that the row-moves-to-new-block case wouldn't happen very
>> often.
>
> I think you're confusing between update chains that stay within a block vs
> HOT/WARM chains. Even when the entire update chain stays within a block, it
> can be made up of multiple HOT/WARM chains and each of these chains offer
> ability to do one WARM update. So even without chain conversion, every
> alternate update will be a WARM update. So the gains are perpetual.

You're right, I had overlooked that.  But then I'm confused: how does
the chain conversion stuff help as much as it does?  You said that you
got a 50% improvement from WARM, because we got to skip half the index
updates.  But then you said with chain conversion you got an
improvement of more like 100%.  However, I would think that on this
workload, chain conversion shouldn't save much.  If you're sweeping
through the database constantly performing updates, the updates ought
to be a lot more frequent than the vacuums.

No?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Amit Kapila
On Tue, Mar 28, 2017 at 4:05 PM, Amit Kapila  wrote:
> On Mon, Mar 27, 2017 at 2:19 PM, Pavan Deolasee
>  wrote:
>> On Fri, Mar 24, 2017 at 11:49 PM, Pavan Deolasee 
>> wrote:
>>>
>>>
>>>
>>> On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila 
>>> wrote:



 I was worried for the case if the index is created non-default
 collation, will the datumIsEqual() suffice the need.  Now again
 thinking about it, I think it will because in the index tuple we are
 storing the value as in heap tuple.  However today it occurred to me
 how will this work for toasted index values (index value >
 TOAST_INDEX_TARGET).  It is mentioned on top of datumIsEqual() that it
 probably won't work for toasted values.  Have you considered that
 point?

>>>
>>> No, I haven't and thanks for bringing that up. And now that I think more
>>> about it and see the code, I think the naive way of just comparing index
>>> attribute value against heap values is probably wrong. The example of
>>> TOAST_INDEX_TARGET is one such case, but I wonder if there are other varlena
>>> attributes that we might store differently in heap and index. Like
>>> index_form_tuple() ->heap_fill_tuple seem to some churning for varlena. It's
>>> not clear to me if index_get_attr will return the values which are binary
>>> comparable to heap values.. I wonder if calling index_form_tuple on the heap
>>> values, fetching attributes via index_get_attr on both index tuples and then
>>> doing a binary compare is a more robust idea. Or may be that's just
>>> duplicating efforts.
>>>
>>> While looking at this problem, it occurred to me that the assumptions made
>>> for hash indexes are also wrong :-( Hash index has the same problem as
>>> expression indexes have. A change in heap value may not necessarily cause a
>>> change in the hash key. If we don't detect that, we will end up having two
>>> hash identical hash keys with the same TID pointer. This will cause the
>>> duplicate key scans problem since hashrecheck will return true for both the
>>> hash entries. That's a bummer as far as supporting WARM for hash indexes is
>>> concerned, unless we find a way to avoid duplicate index entries.
>>>
>>
>> Revised patches are attached. I've added a few more regression tests which
>> demonstrates the problems with compressed and toasted attributes. I've now
>> implemented the idea of creating index tuple from heap values before doing
>> binary comparison using datumIsEqual. This seems to work ok and I see no
>> reason this should not be robust.
>>
>
> As asked previously, can you explain me on what basis are you
> considering it robust?  The comments on top of datumIsEqual() clearly
> indicates the danger of using it for toasted values (Also, it will
> probably not give the answer you want if either datum has been
> "toasted".).  If you think that because we are using it during
> heap_update to find modified columns, then I think that is not right
> comparison, because there we are comparing compressed value (of old
> tuple) with uncompressed value (of new tuple) which should always give
> result as false.
>


Yet another point to think about the recheck implementation is will it
work correctly when heap tuple itself is toasted.  Consider a case
where table has integer and text column (t1 (c1 int, c2 text)) and we
have indexes on c1 and c2 columns.  Now, insert a tuple such that the
text column has value more than 2 or 3K which will make it stored in
compressed form in heap (and the size of compressed value is still
more than TOAST_INDEX_TARGET).  For such an heap insert, we will pass
the actual value of column to index_form_tuple during index insert.
However during recheck when we fetch the value of c2 from heap tuple
and pass it index tuple, the value is already in compressed form and
index_form_tuple might again try to compress it because the size will
still be greater than TOAST_INDEX_TARGET and if it does so, it might
make recheck fail.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Amit Kapila
On Mon, Mar 27, 2017 at 2:19 PM, Pavan Deolasee
 wrote:
>
>
> On Fri, Mar 24, 2017 at 11:49 PM, Pavan Deolasee 
> wrote:
>>
>>
>>
>> On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila 
>> wrote:
>>>
>>>
>>>
>>> I was worried for the case if the index is created non-default
>>> collation, will the datumIsEqual() suffice the need.  Now again
>>> thinking about it, I think it will because in the index tuple we are
>>> storing the value as in heap tuple.  However today it occurred to me
>>> how will this work for toasted index values (index value >
>>> TOAST_INDEX_TARGET).  It is mentioned on top of datumIsEqual() that it
>>> probably won't work for toasted values.  Have you considered that
>>> point?
>>>
>>
>> No, I haven't and thanks for bringing that up. And now that I think more
>> about it and see the code, I think the naive way of just comparing index
>> attribute value against heap values is probably wrong. The example of
>> TOAST_INDEX_TARGET is one such case, but I wonder if there are other varlena
>> attributes that we might store differently in heap and index. Like
>> index_form_tuple() ->heap_fill_tuple seem to some churning for varlena. It's
>> not clear to me if index_get_attr will return the values which are binary
>> comparable to heap values.. I wonder if calling index_form_tuple on the heap
>> values, fetching attributes via index_get_attr on both index tuples and then
>> doing a binary compare is a more robust idea. Or may be that's just
>> duplicating efforts.
>>
>> While looking at this problem, it occurred to me that the assumptions made
>> for hash indexes are also wrong :-( Hash index has the same problem as
>> expression indexes have. A change in heap value may not necessarily cause a
>> change in the hash key. If we don't detect that, we will end up having two
>> hash identical hash keys with the same TID pointer. This will cause the
>> duplicate key scans problem since hashrecheck will return true for both the
>> hash entries. That's a bummer as far as supporting WARM for hash indexes is
>> concerned, unless we find a way to avoid duplicate index entries.
>>
>
> Revised patches are attached.
>

Noted few cosmetic issues in 0005_warm_updates_v21:

1.
pruneheap.c(939): warning C4098: 'heap_get_root_tuples' : 'void'
function returning a value

2.
+ *  HCWC_WARM_UPDATED_TUPLE - a tuple with HEAP_WARM_UPDATED is found somewhere
+ *in the chain. Note that when a tuple is WARM
+ *updated, both old and new versions are marked
+ *with this flag/
+ *
+ *  HCWC_WARM_TUPLE  - a tuple with HEAP_WARM_TUPLE is found somewhere in
+ *  the chain.
+ *
+ *  HCWC_CLEAR_TUPLE - a tuple without HEAP_WARM_TUPLE is found somewhere in
+ *   the chain.

Description of all three flags is same.

3.
+ *  HCWC_WARM_UPDATED_TUPLE - a tuple with HEAP_WARM_UPDATED is found somewhere
+ *in the chain. Note that when a tuple is WARM
+ *updated, both old and new versions are marked
+ *with this flag/

Spurious '/' at end of line.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Amit Kapila
On Mon, Mar 27, 2017 at 2:19 PM, Pavan Deolasee
 wrote:
> On Fri, Mar 24, 2017 at 11:49 PM, Pavan Deolasee 
> wrote:
>>
>>
>>
>> On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila 
>> wrote:
>>>
>>>
>>>
>>> I was worried for the case if the index is created non-default
>>> collation, will the datumIsEqual() suffice the need.  Now again
>>> thinking about it, I think it will because in the index tuple we are
>>> storing the value as in heap tuple.  However today it occurred to me
>>> how will this work for toasted index values (index value >
>>> TOAST_INDEX_TARGET).  It is mentioned on top of datumIsEqual() that it
>>> probably won't work for toasted values.  Have you considered that
>>> point?
>>>
>>
>> No, I haven't and thanks for bringing that up. And now that I think more
>> about it and see the code, I think the naive way of just comparing index
>> attribute value against heap values is probably wrong. The example of
>> TOAST_INDEX_TARGET is one such case, but I wonder if there are other varlena
>> attributes that we might store differently in heap and index. Like
>> index_form_tuple() ->heap_fill_tuple seem to some churning for varlena. It's
>> not clear to me if index_get_attr will return the values which are binary
>> comparable to heap values.. I wonder if calling index_form_tuple on the heap
>> values, fetching attributes via index_get_attr on both index tuples and then
>> doing a binary compare is a more robust idea. Or may be that's just
>> duplicating efforts.
>>
>> While looking at this problem, it occurred to me that the assumptions made
>> for hash indexes are also wrong :-( Hash index has the same problem as
>> expression indexes have. A change in heap value may not necessarily cause a
>> change in the hash key. If we don't detect that, we will end up having two
>> hash identical hash keys with the same TID pointer. This will cause the
>> duplicate key scans problem since hashrecheck will return true for both the
>> hash entries. That's a bummer as far as supporting WARM for hash indexes is
>> concerned, unless we find a way to avoid duplicate index entries.
>>
>
> Revised patches are attached. I've added a few more regression tests which
> demonstrates the problems with compressed and toasted attributes. I've now
> implemented the idea of creating index tuple from heap values before doing
> binary comparison using datumIsEqual. This seems to work ok and I see no
> reason this should not be robust.
>

As asked previously, can you explain me on what basis are you
considering it robust?  The comments on top of datumIsEqual() clearly
indicates the danger of using it for toasted values (Also, it will
probably not give the answer you want if either datum has been
"toasted".).  If you think that because we are using it during
heap_update to find modified columns, then I think that is not right
comparison, because there we are comparing compressed value (of old
tuple) with uncompressed value (of new tuple) which should always give
result as false.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 1:32 AM, Alvaro Herrera 
wrote:

> Is the WARM tap test suite supposed to work when applied without all the
> other patches?  I just tried applied that one and running "make check -C
> src/test/modules", and it seems to hang after giving "ok 5" for
> t/002_warm_stress.pl.  (I had to add a Makefile too, attached.)
>
>
Yeah, sorry. Looks like I forgot to git add the Makefile.

BTW just tested on Ubuntu, and it works fine on that too. FWIW I'm using
perl v5.22.1 and IPC::Run 0.94 (assuming I got the versions correctly).


$ make -C src/test/modules/warm/ prove-check
make: Entering directory '/home/ubuntu/postgresql/src/test/modules/warm'
rm -rf /home/ubuntu/postgresql/src/test/modules/warm/tmp_check/log
cd . && TESTDIR='/home/ubuntu/postgresql/src/test/modules/warm'
PATH="/home/ubuntu/postgresql/tmp_install/home/ubuntu/pg-master-install/bin:$PATH"
LD_LIBRARY_PATH="/home/ubuntu/postgresql/tmp_install/home/ubuntu/pg-master-install/lib"
PGPORT='65432'
PG_REGRESS='/home/ubuntu/postgresql/src/test/modules/warm/../../../../src/test/regress/pg_regress'
prove -I ../../../../src/test/perl/ -I . --verbose t/*.pl
t/001_recovery.pl .
1..2
ok 1 - balanace matches after recovery
ok 2 - sum(delta) matches after recovery
ok
t/002_warm_stress.pl ..
1..10
ok 1 - dummy test passed
ok 2 - Fine match
ok 3 - psql exited normally
ok 4 - psql exited normally
ok 5 - psql exited normally
ok 6 - psql exited normally
ok 7 - psql exited normally
ok 8 - psql exited normally
ok 9 - psql exited normally
ok 10 - Fine match
ok
All tests successful.
Files=2, Tests=12, 22 wallclock secs ( 0.03 usr  0.00 sys +  7.94 cusr
 2.41 csys = 10.38 CPU)
Result: PASS
make: Leaving directory '/home/ubuntu/postgresql/src/test/modules/warm'

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Pavan Deolasee
On Mon, Mar 27, 2017 at 4:45 PM, Amit Kapila 
wrote:

> On Sat, Mar 25, 2017 at 1:24 PM, Amit Kapila 
> wrote:
> > On Fri, Mar 24, 2017 at 11:49 PM, Pavan Deolasee
> >  wrote:
> >>
> >> On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila 
> >> wrote:
> >>>
> >
> >> While looking at this problem, it occurred to me that the assumptions
> made
> >> for hash indexes are also wrong :-( Hash index has the same problem as
> >> expression indexes have. A change in heap value may not necessarily
> cause a
> >> change in the hash key. If we don't detect that, we will end up having
> two
> >> hash identical hash keys with the same TID pointer. This will cause the
> >> duplicate key scans problem since hashrecheck will return true for both
> the
> >> hash entries.
>
> Isn't it possible to detect duplicate keys in hashrecheck if we
> compare both hashkey and tid stored in index tuple with the
> corresponding values from heap tuple?
>
>
Hmm.. I thought that won't work. For example, say we have a tuple (X, Y, Z)
in the heap with a btree index on X and a hash index on Y. If that is
updated to (X, Y', Z) and say we do a WARM update and insert a new entry in
the hash index. Now if Y and Y' both generate the same hashkey, we will
have exactly similar looking  tuples in the hash index
leading to duplicate key scans.

I think one way to solve this is to pass both old and new heap values to
amwarminsert and expect each AM to detect duplicates and avoid creating of
a WARM pointer if index keys are exactly the same (we can do that since
there already exists another index tuple with the same keys pointing to the
same root TID).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Bruce Momjian
On Tue, Mar 28, 2017 at 08:04:34AM +0530, Pavan Deolasee wrote:
> And I've answered it so many times by now :-) 

LOL

> Just to add more to what I just said in another email, note that HOT/WARM
> chains are created when a new root line pointer is created in the heap (a line
> pointer that has an index pointing to it). And a new root line pointer is
> created when a non-HOT/non-WARM update is performed. As soon as you do a
> non-HOT/non-WARM update, the next update can again be a WARM update even when
> everything fits in a single block. 
> 
> That's why for a workload which doesn't do HOT updates and where not all index
> keys are updated, you'll find every alternate update to a row to be a WARM
> update, even when there is no chain conversion. That itself can save lots of
> index bloat, reduce IO on the index and WAL.
> 
> Let me know if its still not clear and I can draw some diagrams to explain it.

Ah, yes, that does help to explain the 50% because 50% of updates are
now HOT/WARM.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 7:49 AM, Bruce Momjian  wrote:

> On Mon, Mar 27, 2017 at 04:29:56PM -0400, Robert Haas wrote:
> > On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
> >  wrote:
> > > It's quite hard to say that until we see many more benchmarks. As
> author of
> > > the patch, I might have got repetitive with my benchmarks. But I've
> seen
> > > over 50% improvement in TPS even without chain conversion (6 indexes
> on a 12
> > > column table test).
> >
> > This seems quite mystifying.  What can account for such a large
> > performance difference in such a pessimal scenario?  It seems to me
> > that without chain conversion, WARM can only apply to each row once
> > and therefore no sustained performance improvement should be possible
> > -- unless rows are regularly being moved to new blocks, in which case
> > those updates would "reset" the ability to again perform an update.
> > However, one would hope that most updates get done within a single
> > block, so that the row-moves-to-new-block case wouldn't happen very
> > often.
> >
> > I'm perplexed.
>
> Yes, I asked the same question in this email:
>
> https://www.postgresql.org/message-id/2017032119.
> ge16...@momjian.us
>
>
And I've answered it so many times by now :-)

Just to add more to what I just said in another email, note that HOT/WARM
chains are created when a new root line pointer is created in the heap (a
line pointer that has an index pointing to it). And a new root line pointer
is created when a non-HOT/non-WARM update is performed. As soon as you do a
non-HOT/non-WARM update, the next update can again be a WARM update even
when everything fits in a single block.

That's why for a workload which doesn't do HOT updates and where not all
index keys are updated, you'll find every alternate update to a row to be a
WARM update, even when there is no chain conversion. That itself can save
lots of index bloat, reduce IO on the index and WAL.

Let me know if its still not clear and I can draw some diagrams to explain
it.

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 1:59 AM, Robert Haas  wrote:

> On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
>  wrote:
> > It's quite hard to say that until we see many more benchmarks. As author
> of
> > the patch, I might have got repetitive with my benchmarks. But I've seen
> > over 50% improvement in TPS even without chain conversion (6 indexes on
> a 12
> > column table test).
>
> This seems quite mystifying.  What can account for such a large
> performance difference in such a pessimal scenario?  It seems to me
> that without chain conversion, WARM can only apply to each row once
> and therefore no sustained performance improvement should be possible
> -- unless rows are regularly being moved to new blocks, in which case
> those updates would "reset" the ability to again perform an update.
> However, one would hope that most updates get done within a single
> block, so that the row-moves-to-new-block case wouldn't happen very
> often.
>
>
I think you're confusing between update chains that stay within a block vs
HOT/WARM chains. Even when the entire update chain stays within a block, it
can be made up of multiple HOT/WARM chains and each of these chains offer
ability to do one WARM update. So even without chain conversion, every
alternate update will be a WARM update. So the gains are perpetual.

For example, if I take a simplistic example of a table with just one tuple
and four indexes and where every update updates just one of the indexes.
Assuming no WARM chain conversion this is what would happen for every
update:

1. WARM update, new entry in just one index
2. Regular update, new entries in all indexes
3. WARM update, new entry in just one index
4. Regular update, new entries in all indexes

At the end of N updates (assuming all fit in the same block), one index
will have N entries and rest will have N/2 entries.

Compare that against master:
1. Regular update, new entries in all indexes
2. Regular update, new entries in all indexes
3. Regular update, new entries in all indexes
4. Regular update, new entries in all indexes


At the end of N updates (assuming all fit in the same block), all indexes
will have N entries.  So with WARM we reduce bloats in 3 indexes. And WARM
works almost in a perpetual way even without chain conversion. If you see
the graph I earlier shared (attach again), without WARM chain conversion
the rate of WARM updates settle down to 50%, which is not surprising given
what I explained above.

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Bruce Momjian
On Mon, Mar 27, 2017 at 04:29:56PM -0400, Robert Haas wrote:
> On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
>  wrote:
> > It's quite hard to say that until we see many more benchmarks. As author of
> > the patch, I might have got repetitive with my benchmarks. But I've seen
> > over 50% improvement in TPS even without chain conversion (6 indexes on a 12
> > column table test).
> 
> This seems quite mystifying.  What can account for such a large
> performance difference in such a pessimal scenario?  It seems to me
> that without chain conversion, WARM can only apply to each row once
> and therefore no sustained performance improvement should be possible
> -- unless rows are regularly being moved to new blocks, in which case
> those updates would "reset" the ability to again perform an update.
> However, one would hope that most updates get done within a single
> block, so that the row-moves-to-new-block case wouldn't happen very
> often.
> 
> I'm perplexed.

Yes, I asked the same question in this email:

https://www.postgresql.org/message-id/2017032119.ge16...@momjian.us

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 1:32 AM, Alvaro Herrera 
wrote:

> Is the WARM tap test suite supposed to work when applied without all the
> other patches?  I just tried applied that one and running "make check -C
> src/test/modules", and it seems to hang after giving "ok 5" for
> t/002_warm_stress.pl.  (I had to add a Makefile too, attached.)
>
>
These tests should run without WARM. I wonder though if IPC::Run's
start/pump/finish facility is fully portable. Andrew on off-list
conversation reminded me that there are no (or may be one) tests currently
using that in Postgres. I've run these tests on OSX, will try on some linux
platform too.

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Robert Haas
On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
 wrote:
> It's quite hard to say that until we see many more benchmarks. As author of
> the patch, I might have got repetitive with my benchmarks. But I've seen
> over 50% improvement in TPS even without chain conversion (6 indexes on a 12
> column table test).

This seems quite mystifying.  What can account for such a large
performance difference in such a pessimal scenario?  It seems to me
that without chain conversion, WARM can only apply to each row once
and therefore no sustained performance improvement should be possible
-- unless rows are regularly being moved to new blocks, in which case
those updates would "reset" the ability to again perform an update.
However, one would hope that most updates get done within a single
block, so that the row-moves-to-new-block case wouldn't happen very
often.

I'm perplexed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Alvaro Herrera
Is the WARM tap test suite supposed to work when applied without all the
other patches?  I just tried applied that one and running "make check -C
src/test/modules", and it seems to hang after giving "ok 5" for
t/002_warm_stress.pl.  (I had to add a Makefile too, attached.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
# src/test/modules/indexing/Makefile

MODULE = indexing
PGFILEDESC = "indexing - Test indexing interfaces"

ifdef USE_PGXS
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
else
subdir = src/test/modules/indexing
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
include $(top_srcdir)/contrib/contrib-global.mk
endif

check: prove-check

prove-check:
$(prove_check)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Amit Kapila
On Sat, Mar 25, 2017 at 1:24 PM, Amit Kapila  wrote:
> On Fri, Mar 24, 2017 at 11:49 PM, Pavan Deolasee
>  wrote:
>>
>> On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila 
>> wrote:
>>>
>
>> While looking at this problem, it occurred to me that the assumptions made
>> for hash indexes are also wrong :-( Hash index has the same problem as
>> expression indexes have. A change in heap value may not necessarily cause a
>> change in the hash key. If we don't detect that, we will end up having two
>> hash identical hash keys with the same TID pointer. This will cause the
>> duplicate key scans problem since hashrecheck will return true for both the
>> hash entries.

Isn't it possible to detect duplicate keys in hashrecheck if we
compare both hashkey and tid stored in index tuple with the
corresponding values from heap tuple?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-25 Thread Amit Kapila
On Sat, Mar 25, 2017 at 11:24 PM, Pavan Deolasee
 wrote:
>
> On Sat, 25 Mar 2017 at 11:03 PM, Peter Geoghegan  wrote:
>>
>> On Sat, Mar 25, 2017 at 12:54 AM, Amit Kapila 
>> wrote:
>> > I am not sure how do you want to binary compare two datums, if you are
>> > thinking datumIsEqual(), that won't work.  I think you need to use
>> > datatype specific compare function something like what we do in
>> > _bt_compare().
>>
>> How will that interact with types like numeric, that have display
>> scale or similar?
>>
>  I wonder why Amit thinks that datumIsEqual won't work once we convert the
> heap values to index tuple and then fetch using index_get_attr. After all
> that's how the current index tuple was constructed when it was inserted.

I think for toasted values you need to detoast before comparison and
it seems datamIsEqual won't do that job.  Am I missing something which
makes you think that datumIsEqual will work in this context.

> In
> fact, we must not rely on _bt_compare because that might return "false
> positive" even for two different heap binary values  (I think).

I am not telling to rely on _bt_compare, what I was trying to hint at
it was that I think we might need to use some column type specific
information for comparison.  I am not sure at this stage what is the
best way to deal with this problem without incurring non trivial cost.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-25 Thread Pavan Deolasee
On Sat, 25 Mar 2017 at 11:03 PM, Peter Geoghegan  wrote:

> On Sat, Mar 25, 2017 at 12:54 AM, Amit Kapila 
> wrote:
> > I am not sure how do you want to binary compare two datums, if you are
> > thinking datumIsEqual(), that won't work.  I think you need to use
> > datatype specific compare function something like what we do in
> > _bt_compare().
>
> How will that interact with types like numeric, that have display
> scale or similar?
>
>  I wonder why Amit thinks that datumIsEqual won't work once we convert the
heap values to index tuple and then fetch using index_get_attr. After all
that's how the current index tuple was constructed when it was inserted. In
fact, we must not rely on _bt_compare because that might return "false
positive" even for two different heap binary values  (I think). To decide
whether to do WARM update or not in heap_update we only rely on binary
comparison. Could it happen that for two different binary heap values, we
still compute the same index attribute? Even when expression indexes are
not supported?

Thanks,
Pavan



> --
> Peter Geoghegan
>
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-25 Thread Peter Geoghegan
On Sat, Mar 25, 2017 at 12:54 AM, Amit Kapila  wrote:
> I am not sure how do you want to binary compare two datums, if you are
> thinking datumIsEqual(), that won't work.  I think you need to use
> datatype specific compare function something like what we do in
> _bt_compare().

How will that interact with types like numeric, that have display
scale or similar?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-25 Thread Amit Kapila
On Fri, Mar 24, 2017 at 11:49 PM, Pavan Deolasee
 wrote:
>
>
> On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila 
> wrote:
>>
>>
>>
>> I was worried for the case if the index is created non-default
>> collation, will the datumIsEqual() suffice the need.  Now again
>> thinking about it, I think it will because in the index tuple we are
>> storing the value as in heap tuple.  However today it occurred to me
>> how will this work for toasted index values (index value >
>> TOAST_INDEX_TARGET).  It is mentioned on top of datumIsEqual() that it
>> probably won't work for toasted values.  Have you considered that
>> point?
>>
>
> No, I haven't and thanks for bringing that up. And now that I think more
> about it and see the code, I think the naive way of just comparing index
> attribute value against heap values is probably wrong. The example of
> TOAST_INDEX_TARGET is one such case, but I wonder if there are other varlena
> attributes that we might store differently in heap and index. Like
> index_form_tuple() ->heap_fill_tuple seem to some churning for varlena. It's
> not clear to me if index_get_attr will return the values which are binary
> comparable to heap values.. I wonder if calling index_form_tuple on the heap
> values, fetching attributes via index_get_attr on both index tuples and then
> doing a binary compare is a more robust idea.
>

I am not sure how do you want to binary compare two datums, if you are
thinking datumIsEqual(), that won't work.  I think you need to use
datatype specific compare function something like what we do in
_bt_compare().

> Or may be that's just
> duplicating efforts.
>

I think if we do something on the lines as mentioned by me above we
might not need to duplicate the efforts.

> While looking at this problem, it occurred to me that the assumptions made
> for hash indexes are also wrong :-( Hash index has the same problem as
> expression indexes have. A change in heap value may not necessarily cause a
> change in the hash key. If we don't detect that, we will end up having two
> hash identical hash keys with the same TID pointer. This will cause the
> duplicate key scans problem since hashrecheck will return true for both the
> hash entries. That's a bummer as far as supporting WARM for hash indexes is
> concerned,
>

Yeah, I also think so.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-24 Thread Pavan Deolasee
On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila 
wrote:

>
>
> I was worried for the case if the index is created non-default
> collation, will the datumIsEqual() suffice the need.  Now again
> thinking about it, I think it will because in the index tuple we are
> storing the value as in heap tuple.  However today it occurred to me
> how will this work for toasted index values (index value >
> TOAST_INDEX_TARGET).  It is mentioned on top of datumIsEqual() that it
> probably won't work for toasted values.  Have you considered that
> point?
>
>
No, I haven't and thanks for bringing that up. And now that I think more
about it and see the code, I think the naive way of just comparing index
attribute value against heap values is probably wrong. The example of
TOAST_INDEX_TARGET is one such case, but I wonder if there are other
varlena attributes that we might store differently in heap and index. Like
index_form_tuple() ->heap_fill_tuple seem to some churning for varlena.
It's not clear to me if index_get_attr will return the values which are
binary comparable to heap values.. I wonder if calling index_form_tuple on
the heap values, fetching attributes via index_get_attr on both index
tuples and then doing a binary compare is a more robust idea. Or may be
that's just duplicating efforts.

While looking at this problem, it occurred to me that the assumptions made
for hash indexes are also wrong :-( Hash index has the same problem as
expression indexes have. A change in heap value may not necessarily cause a
change in the hash key. If we don't detect that, we will end up having two
hash identical hash keys with the same TID pointer. This will cause the
duplicate key scans problem since hashrecheck will return true for both the
hash entries. That's a bummer as far as supporting WARM for hash indexes is
concerned, unless we find a way to avoid duplicate index entries.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-24 Thread Amit Kapila
On Thu, Mar 23, 2017 at 3:54 PM, Pavan Deolasee
 wrote:
> Thanks Amit. v19 addresses some of the comments below.
>
> On Thu, Mar 23, 2017 at 10:28 AM, Amit Kapila 
> wrote:
>>
>> On Wed, Mar 22, 2017 at 4:06 PM, Amit Kapila 
>> wrote:
>> > On Tue, Mar 21, 2017 at 6:47 PM, Pavan Deolasee
>> >  wrote:
>
>>
>> 5.
>> +btrecheck(Relation indexRel, IndexInfo *indexInfo, IndexTuple indexTuple,
>> + Relation heapRel, HeapTuple heapTuple)
>> {
>> ..
>> + if (!datumIsEqual(values[i - 1], indxvalue, att->attbyval,
>> + att->attlen))
>> ..
>> }
>>
>> Will this work if the index is using non-default collation?
>>
>
> Not sure I understand that. Can you please elaborate?
>

I was worried for the case if the index is created non-default
collation, will the datumIsEqual() suffice the need.  Now again
thinking about it, I think it will because in the index tuple we are
storing the value as in heap tuple.  However today it occurred to me
how will this work for toasted index values (index value >
TOAST_INDEX_TARGET).  It is mentioned on top of datumIsEqual() that it
probably won't work for toasted values.  Have you considered that
point?

>>
>> 6.
>> +++ b/src/backend/access/nbtree/nbtxlog.c
>> @@ -390,83 +390,9 @@ btree_xlog_vacuum(XLogReaderState *record)
>> -#ifdef UNUSED
>>   xl_btree_vacuum *xlrec = (xl_btree_vacuum *) XLogRecGetData(record);
>>
>>   /*
>> - * This section of code is thought to be no longer needed, after analysis
>> - * of the calling paths. It is retained to allow the code to be
>> reinstated
>> - * if a flaw is revealed in that thinking.
>> - *
>> ..
>>
>> Why does this patch need to remove the above code under #ifdef UNUSED
>>
>
> Yeah, it isn't strictly necessary. But that dead code was coming in the way
> and hence I decided to strip it out. I can put it back if it's an issue or
> remove that as a separate commit first.
>

I think it is better to keep unrelated changes out of patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-24 Thread Pavan Deolasee
On Fri, Mar 24, 2017 at 4:04 PM, Amit Kapila 
wrote:

> On Fri, Mar 24, 2017 at 12:25 AM, Pavan Deolasee
>  wrote:
> >
> >
> > On Thu, Mar 23, 2017 at 7:53 PM, Amit Kapila 
> >
> > The general sense I've got
> > here is that we're ok to push some work in background if it helps the
> > real-time queries, and I kinda agree with that.
> >
>
> I don't think we can define this work as "some" work, it can be a lot
> of work depending on the number of indexes.  Also, I think for some
> cases it will generate maintenance work without generating benefit.
> For example, when there is one index on a table and there are updates
> for that index column.
>
>
That's a fair point. I think we can address this though. At the end of
first index scan we would know how many warm pointers the index has and
whether it's worth doing a second scan. For the case you mentioned, we will
do a second scan just on that one index and skip on all other indexes and
still achieve the same result. On the other hand, if one index receives
many updates and other indexes are rarely updated then we might leave
behind a few WARM chains behind and won't be able to do IOS on those pages.
But given the premise that other indexes are receiving rare updates, it may
not be a problem. Note: the code is not currently written that way, but it
should be a fairly small change.

The other thing that we didn't talk about is that vacuum will need to track
dead tuples and warm candidate chains separately which increases memory
overhead. So for very large tables, and for the same amount of
maintenance_work_mem, one round of vacuum will be able to clean lesser
pages. We can work out more compact representation, but something not done
currently.


>
> > But this is clearly not
> > PG 10 material.
> >
>
> I don't see much discussion about this aspect of the patch, so not
> sure if it is acceptable to increase the cost of vacuum.  Now, I don't
> know if your idea of GUC's make it such that the additional cost will
> occur seldom and this additional pass has a minimal impact which will
> make it acceptable.


Yeah, I agree. I'm trying to schedule some more benchmarks, but any help is
appreciated.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-24 Thread Amit Kapila
On Fri, Mar 24, 2017 at 12:25 AM, Pavan Deolasee
 wrote:
>
>
> On Thu, Mar 23, 2017 at 7:53 PM, Amit Kapila 
> wrote:
>>
>> >
>>
>> I am not sure on what basis user can set such parameters, it will be
>> quite difficult to tune those parameters.  I think the point is
>> whatever threshold we keep, once that is crossed, it will perform two
>> scans for all the indexes.
>
>
> Well, that applies to even vacuum parameters, no?
>

I don't know how much we can directly compare the usability of the new
parameters you are proposing here to existing parameters.

> The general sense I've got
> here is that we're ok to push some work in background if it helps the
> real-time queries, and I kinda agree with that.
>

I don't think we can define this work as "some" work, it can be a lot
of work depending on the number of indexes.  Also, I think for some
cases it will generate maintenance work without generating benefit.
For example, when there is one index on a table and there are updates
for that index column.

> Having said that, I see many ways we can improve on this later. Like we can
> track somewhere else information about tuples which may have received WARM
> updates (I think it will need to be a per-index bitmap or so) and use that
> to do WARM chain conversion in a single index pass.
>

Sure, if we have some way to do it in a single pass or does most of
the time in foreground process (like we have some dead marking idea
for indexes), then it would have been better.

> But this is clearly not
> PG 10 material.
>

I don't see much discussion about this aspect of the patch, so not
sure if it is acceptable to increase the cost of vacuum.  Now, I don't
know if your idea of GUC's make it such that the additional cost will
occur seldom and this additional pass has a minimal impact which will
make it acceptable.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 7:53 PM, Amit Kapila 
wrote:

> On Thu, Mar 23, 2017 at 3:44 PM, Pavan Deolasee
>
> >
> > Yes, this is a very fair point. The way I proposed to address this
> upthread
> > is by introducing a set of threshold/scale GUCs specific to WARM. So
> users
> > can control when to invoke WARM cleanup. Only if the WARM cleanup is
> > required, we do 2 index scans. Otherwise vacuum will work the way it
> works
> > today without any additional overhead.
> >
>
> I am not sure on what basis user can set such parameters, it will be
> quite difficult to tune those parameters.  I think the point is
> whatever threshold we keep, once that is crossed, it will perform two
> scans for all the indexes.


Well, that applies to even vacuum parameters, no? The general sense I've
got here is that we're ok to push some work in background if it helps the
real-time queries, and I kinda agree with that. If WARM improves things in
a significant manner even with these additional maintenance work, it's
still worth doing.

Having said that, I see many ways we can improve on this later. Like we can
track somewhere else information about tuples which may have received WARM
updates (I think it will need to be a per-index bitmap or so) and use that
to do WARM chain conversion in a single index pass. But this is clearly not
PG 10 material.


>   IIUC, this conversion of WARM chains is
> required so that future updates can be WARM or is there any other
> reason?  I see this as a big penalty for future updates.
>

It's also necessary for index-only-scans. But I don't see this as a big
penalty for future updates, because if there are indeed significant WARM
updates then not preparing for future updates will result in
write-amplification, something we are trying to solve here and something
which seems to be showing good gains.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 11:44 PM, Mithun Cy 
wrote:

> Hi Pavan,
> On Thu, Mar 23, 2017 at 12:19 AM, Pavan Deolasee
>  wrote:
> > Ok, no problem. I did some tests on AWS i2.xlarge instance (4 vCPU, 30GB
> > RAM, attached SSD) and results are shown below. But I think it is
> important
> > to get independent validation from your side too, just to ensure I am not
> > making any mistake in measurement. I've attached naively put together
> > scripts which I used to run the benchmark. If you find them useful,
> please
> > adjust the paths and run on your machine.
>
> I did a similar test appears. Your v19 looks fine to me, it does not
> cause any regression, On the other hand, I also ran tests reducing
> table fillfactor to 80 there I can see a small regression 2-3% in
> average when updating col2 and on updating col9 again I do not see any
> regression.
>
>
Thanks Mithun for repeating the tests and confirming that the v19 patch
looks ok.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Mithun Cy
Hi Pavan,
On Thu, Mar 23, 2017 at 12:19 AM, Pavan Deolasee
 wrote:
> Ok, no problem. I did some tests on AWS i2.xlarge instance (4 vCPU, 30GB
> RAM, attached SSD) and results are shown below. But I think it is important
> to get independent validation from your side too, just to ensure I am not
> making any mistake in measurement. I've attached naively put together
> scripts which I used to run the benchmark. If you find them useful, please
> adjust the paths and run on your machine.

I did a similar test appears. Your v19 looks fine to me, it does not
cause any regression, On the other hand, I also ran tests reducing
table fillfactor to 80 there I can see a small regression 2-3% in
average when updating col2 and on updating col9 again I do not see any
regression.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


WARM_test_02.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Amit Kapila
On Thu, Mar 23, 2017 at 3:44 PM, Pavan Deolasee
 wrote:
> On Wed, Mar 22, 2017 at 4:06 PM, Amit Kapila 
> wrote:
>>
>
>> 3.
>> + /*
>> + * HASH indexes compute a hash value of the key and store that in the
>> + * index. So
>> we must first obtain the hash of the value obtained from the
>> + * heap and then do a comparison
>> +
>>  */
>> + _hash_convert_tuple(indexRel, values, isnull, values2, isnull2);
>>
>> I think here, you need to handle the case where heap has a NULL value
>> as the hash index doesn't contain NULL values, otherwise, the code in
>> below function can return true which is not right.
>>
>
> I think we can simply conclude hashrecheck has failed the equality if the
> heap has NULL value because such a tuple should not have been reached via
> hash index unless a non-NULL hash key was later updated to a NULL key,
> right?
>

Right.

>>
>>
>> 6.
>> + *stats = index_bulk_delete(, *stats,
>> +lazy_indexvac_phase1, (void *) vacrelstats);
>> + ereport(elevel,
>> +(errmsg("scanned index \"%s\" to remove %d row version, found "
>> +"%0.f warm pointers, %0.f clear pointers, removed "
>> +"%0.f warm pointers, removed %0.f clear pointers",
>> +RelationGetRelationName(indrel),
>> + vacrelstats->num_dead_tuples,
>> + (*stats)->num_warm_pointers,
>> +(*stats)->num_clear_pointers,
>> +(*stats)->warm_pointers_removed,
>> + (*stats)->clear_pointers_removed)));
>> +
>> + (*stats)->num_warm_pointers = 0;
>> + (*stats)->num_clear_pointers = 0;
>> + (*stats)->warm_pointers_removed = 0;
>> + (*stats)->clear_pointers_removed = 0;
>> + (*stats)->pointers_cleared = 0;
>> +
>> + *stats =index_bulk_delete(, *stats,
>> + lazy_indexvac_phase2, (void *)vacrelstats);
>>
>> To convert WARM chains, we need to do two index passes for all the
>> indexes.  I think it can substantially increase the random I/O. I
>> think this can help us in doing more WARM updates, but I don't see how
>> the downside of that (increased random I/O) will be acceptable for all
>> kind of cases.
>>
>
> Yes, this is a very fair point. The way I proposed to address this upthread
> is by introducing a set of threshold/scale GUCs specific to WARM. So users
> can control when to invoke WARM cleanup. Only if the WARM cleanup is
> required, we do 2 index scans. Otherwise vacuum will work the way it works
> today without any additional overhead.
>

I am not sure on what basis user can set such parameters, it will be
quite difficult to tune those parameters.  I think the point is
whatever threshold we keep, once that is crossed, it will perform two
scans for all the indexes.  IIUC, this conversion of WARM chains is
required so that future updates can be WARM or is there any other
reason?  I see this as a big penalty for future updates.

> We already have some intelligence to skip the second index scan if we did
> not find any WARM candidate chains during the first heap scan. This should
> take care of majority of the users who never update their indexed columns.
> For others, we need either a knob or some built-in way to deduce whether to
> do WARM cleanup or not.
>
> Does that seem worthwhile?
>

Is there any consensus on your proposal, because I feel this needs
somewhat broader discussion, me and you can't take a call on this
point.  I request others also to share their opinion on this point.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 4:08 PM, Pavan Deolasee 
wrote:

>
>
> On Thu, Mar 23, 2017 at 3:02 PM, Amit Kapila 
> wrote:
>
>>
>>
>> That sounds like you are dodging the actual problem.  I mean you can
>> put that same PageIsFull() check in master code as well and then you
>> will most probably again see the same regression.
>
>
> Well I don't see it that way. There was a specific concern about a
> specific workload that WARM might regress. I think this change addresses
> that. Sure if you pick that one piece, put it in master first and then
> compare against rest of the WARM code, you will see a regression.
>

BTW the PageIsFull() check may not help as much in master as it does with
WARM. In master we anyways bail out early after couple of column checks. In
master it may help to reduce the 10% drop that we see while updating last
index column, but if we compare master and WARM with the patch applied,
regression should be quite nominal.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 3:02 PM, Amit Kapila 
wrote:

>
>
> That sounds like you are dodging the actual problem.  I mean you can
> put that same PageIsFull() check in master code as well and then you
> will most probably again see the same regression.


Well I don't see it that way. There was a specific concern about a specific
workload that WARM might regress. I think this change addresses that. Sure
if you pick that one piece, put it in master first and then compare against
rest of the WARM code, you will see a regression. But I thought what we
were worried is WARM causing regression to some existing user, who might
see her workload running 10% slower, which this change mitigates.


> Also, I think if we
> test at fillfactor 80 or 75 (which is not unrealistic considering an
> update-intensive workload), then we might again see regression.


Yeah, we might, but it will be lesser than before, may be 2% instead of
10%. And by doing this we are further narrowing an already narrow test
case. I think we need to see things in totality and weigh in costs-benefits
trade offs. There are numbers for very common workloads, where WARM may
provide 20, 30 or even more than 100% improvements.

Andres and Alvaro already have other ideas to address this problem even
further. And as I said, we can pass-in index specific information and make
that routine bail-out even earlier. We need to accept that WARM will need
to do more attr checks than master, especially when there are more than 1
indexes on the table,  and sometimes those checks will go waste. I am ok if
we want to provide table-specific knob to disable WARM, but not sure if
others would like that idea.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
Thanks Amit. v19 addresses some of the comments below.

On Thu, Mar 23, 2017 at 10:28 AM, Amit Kapila 
wrote:

> On Wed, Mar 22, 2017 at 4:06 PM, Amit Kapila 
> wrote:
> > On Tue, Mar 21, 2017 at 6:47 PM, Pavan Deolasee
> >  wrote:
> >>
> >>>
> >>
> >> Please find attached rebased patches.
> >>
> >
> > Few comments on 0005_warm_updates_v18.patch:
> >
>
> Few more comments on 0005_warm_updates_v18.patch:
> 1.
> @@ -234,6 +241,25 @@ index_beginscan(Relation heapRelation,
>   scan->heapRelation = heapRelation;
>   scan->xs_snapshot = snapshot;
>
> + /*
> + * If the index supports recheck,
> make sure that index tuple is saved
> + * during index scans. Also build and cache IndexInfo which is used by
> + * amrecheck routine.
> + *
> + * XXX Ideally, we should look at
> all indexes on the table and check if
> + * WARM is at all supported on the base table. If WARM is not supported
> + * then we don't need to do any recheck.
> RelationGetIndexAttrBitmap() does
> + * do that and sets rd_supportswarm after looking at all indexes. But we
> + * don't know if the function was called earlier in the
> session when we're
> + * here. We can't call it now because there exists a risk of causing
> + * deadlock.
> + */
> + if (indexRelation->rd_amroutine->amrecheck)
> + {
> +scan->xs_want_itup = true;
> + scan->indexInfo = BuildIndexInfo(indexRelation);
> + }
> +
>
> Don't we need to do this rechecking during parallel scans?  Also what
> about bitmap heap scans?
>
>
Yes, we need to handle parallel scans. Bitmap scans are not a problem
because it can never return the same TID twice. I fixed this though by
moving this inside index_beginscan_internal.


> 2.
> +++ b/src/backend/access/nbtree/nbtinsert.c
> -
>  typedef struct
>
> Above change is not require.
>
>
Sure. Fixed.


> 3.
> +_bt_clear_items(Page page, OffsetNumber *clearitemnos, uint16 nclearitems)
> +void _hash_clear_items(Page page, OffsetNumber *clearitemnos,
> +   uint16 nclearitems)
>
> Both the above functions look exactly same, isn't it better to have a
> single function like page_clear_items?  If you want separation for
> different index types, then we can have one common function which can
> be called from different index types.
>
>
Yes, makes sense. Moved that to bufpage.c. The reason why I originally had
index-specific versions because I started by putting WARM flag in
IndexTuple header. But since hash index does not have the bit free, moved
everything to TID bit-flag. I still left index-specific wrappers, but they
just call PageIndexClearWarmTuples.


> 4.
> - if (callback(htup, callback_state))
> + flags = ItemPointerGetFlags(>t_tid);
> + is_warm = ((flags & BTREE_INDEX_WARM_POINTER) != 0);
> +
> + if (is_warm)
> + stats->num_warm_pointers++;
> + else
> + stats->num_clear_pointers++;
> +
> + result = callback(htup, is_warm, callback_state);
> + if (result == IBDCR_DELETE)
> + {
> + if (is_warm)
> + stats->warm_pointers_removed++;
> + else
> + stats->clear_pointers_removed++;
>
> The patch looks to be inconsistent in collecting stats for btree and
> hash.  I don't see above stats are getting updated in hash index code.
>
>
Fixed. The hashbucketcleanup signature is just getting a bit too long. May
be we should move some of these counters in a structure and pass that
around. Not done here though.


> 5.
> +btrecheck(Relation indexRel, IndexInfo *indexInfo, IndexTuple indexTuple,
> + Relation heapRel, HeapTuple heapTuple)
> {
> ..
> + if (!datumIsEqual(values[i - 1], indxvalue, att->attbyval,
> + att->attlen))
> ..
> }
>
> Will this work if the index is using non-default collation?
>
>
Not sure I understand that. Can you please elaborate?


> 6.
> +++ b/src/backend/access/nbtree/nbtxlog.c
> @@ -390,83 +390,9 @@ btree_xlog_vacuum(XLogReaderState *record)
> -#ifdef UNUSED
>   xl_btree_vacuum *xlrec = (xl_btree_vacuum *) XLogRecGetData(record);
>
>   /*
> - * This section of code is thought to be no longer needed, after analysis
> - * of the calling paths. It is retained to allow the code to be reinstated
> - * if a flaw is revealed in that thinking.
> - *
> ..
>
> Why does this patch need to remove the above code under #ifdef UNUSED
>
>
Yeah, it isn't strictly necessary. But that dead code was coming in the way
and hence I decided to strip it out. I can put it back if it's an issue or
remove that as a separate commit first.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Amit Kapila
On Thu, Mar 23, 2017 at 12:19 AM, Pavan Deolasee
 wrote:
>
> Ok, no problem. I did some tests on AWS i2.xlarge instance (4 vCPU, 30GB
> RAM, attached SSD) and results are shown below. But I think it is important
> to get independent validation from your side too, just to ensure I am not
> making any mistake in measurement. I've attached naively put together
> scripts which I used to run the benchmark. If you find them useful, please
> adjust the paths and run on your machine.
>
> I reverted back to UNLOGGED table because with WAL the results looked very
> weird (as posted earlier) even when I was taking a CHECKPOINT before each
> set and had set max_wal_size and checkpoint_timeout high enough to avoid any
> checkpoint during the run. Anyways, that's a matter of separate
> investigation and not related to this patch.
>
> I did two kinds of tests.
> a) update last column of the index
> b) update second column of the index
>
> v19 does considerably better than even master for the last column update
> case and pretty much inline for the second column update test. The reason is
> very clear because v19 determines early in the cycle that the buffer is
> already full and there is very little chance of doing a HOT update on the
> page. In that case, it does not check any columns for modification.
>

That sounds like you are dodging the actual problem.  I mean you can
put that same PageIsFull() check in master code as well and then you
will most probably again see the same regression.  Also, I think if we
test at fillfactor 80 or 75 (which is not unrealistic considering an
update-intensive workload), then we might again see regression.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-22 Thread Amit Kapila
On Wed, Mar 22, 2017 at 4:06 PM, Amit Kapila  wrote:
> On Tue, Mar 21, 2017 at 6:47 PM, Pavan Deolasee
>  wrote:
>>
>>>
>>
>> Please find attached rebased patches.
>>
>
> Few comments on 0005_warm_updates_v18.patch:
>

Few more comments on 0005_warm_updates_v18.patch:
1.
@@ -234,6 +241,25 @@ index_beginscan(Relation heapRelation,
  scan->heapRelation = heapRelation;
  scan->xs_snapshot = snapshot;

+ /*
+ * If the index supports recheck,
make sure that index tuple is saved
+ * during index scans. Also build and cache IndexInfo which is used by
+ * amrecheck routine.
+ *
+ * XXX Ideally, we should look at
all indexes on the table and check if
+ * WARM is at all supported on the base table. If WARM is not supported
+ * then we don't need to do any recheck.
RelationGetIndexAttrBitmap() does
+ * do that and sets rd_supportswarm after looking at all indexes. But we
+ * don't know if the function was called earlier in the
session when we're
+ * here. We can't call it now because there exists a risk of causing
+ * deadlock.
+ */
+ if (indexRelation->rd_amroutine->amrecheck)
+ {
+scan->xs_want_itup = true;
+ scan->indexInfo = BuildIndexInfo(indexRelation);
+ }
+

Don't we need to do this rechecking during parallel scans?  Also what
about bitmap heap scans?

2.
+++ b/src/backend/access/nbtree/nbtinsert.c
-
 typedef struct

Above change is not require.

3.
+_bt_clear_items(Page page, OffsetNumber *clearitemnos, uint16 nclearitems)
+void _hash_clear_items(Page page, OffsetNumber *clearitemnos,
+   uint16 nclearitems)

Both the above functions look exactly same, isn't it better to have a
single function like page_clear_items?  If you want separation for
different index types, then we can have one common function which can
be called from different index types.

4.
- if (callback(htup, callback_state))
+ flags = ItemPointerGetFlags(>t_tid);
+ is_warm = ((flags & BTREE_INDEX_WARM_POINTER) != 0);
+
+ if (is_warm)
+ stats->num_warm_pointers++;
+ else
+ stats->num_clear_pointers++;
+
+ result = callback(htup, is_warm, callback_state);
+ if (result == IBDCR_DELETE)
+ {
+ if (is_warm)
+ stats->warm_pointers_removed++;
+ else
+ stats->clear_pointers_removed++;

The patch looks to be inconsistent in collecting stats for btree and
hash.  I don't see above stats are getting updated in hash index code.

5.
+btrecheck(Relation indexRel, IndexInfo *indexInfo, IndexTuple indexTuple,
+ Relation heapRel, HeapTuple heapTuple)
{
..
+ if (!datumIsEqual(values[i - 1], indxvalue, att->attbyval,
+ att->attlen))
..
}

Will this work if the index is using non-default collation?

6.
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -390,83 +390,9 @@ btree_xlog_vacuum(XLogReaderState *record)
-#ifdef UNUSED
  xl_btree_vacuum *xlrec = (xl_btree_vacuum *) XLogRecGetData(record);

  /*
- * This section of code is thought to be no longer needed, after analysis
- * of the calling paths. It is retained to allow the code to be reinstated
- * if a flaw is revealed in that thinking.
- *
..

Why does this patch need to remove the above code under #ifdef UNUSED

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-22 Thread Mithun Cy
On Thu, Mar 23, 2017 at 12:19 AM, Pavan Deolasee
 wrote:
>
>
> On Wed, Mar 22, 2017 at 4:53 PM, Mithun Cy 
> wrote:
> Ok, no problem. I did some tests on AWS i2.xlarge instance (4 vCPU, 30GB
> RAM, attached SSD) and results are shown below. But I think it is important
> to get independent validation from your side too, just to ensure I am not
> making any mistake in measurement. I've attached naively put together
> scripts which I used to run the benchmark. If you find them useful, please
> adjust the paths and run on your machine.

Looking at your postgresql.conf  JFYI, I have synchronous_commit = off
but same is on with your run (for logged tables) and rest remains
same. Once I get the machine probably next morning, I will run same
tests on v19.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   >