Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-06 Thread Victor Yegorov
пн, 7 мар. 2022 г. в 00:34, Andres Freund :

> One thing that's likely worth doing as part of the cross version upgrade
> test,
> even if it wouldn't even help in this case, is to run amcheck post
> upgrade. Just dumping data isn't going to touch indices at all.
>
> A sequence of
>   pg_upgrade; amcheck; upgrade all extensions; amcheck;
> would make sense.
>

Is it possible to amcheck gist indexes?..


-- 
Victor Yegorov


Re: 64-bit XIDs in deleted nbtree pages

2021-02-12 Thread Victor Yegorov
сб, 13 февр. 2021 г. в 05:39, Masahiko Sawada :

> > (BTW, I've been using txid_current() for my own "laptop testing", as a
> > way to work around this issue.)
> >
> > * More generally, if you really can't do recycling of pages that you
> > deleted during the last VACUUM during this VACUUM (perhaps because of
> > the presence of a long-running xact that holds open a snapshot), then
> > you have lots of *huge* problems already, and this is the least of
> > your concerns. Besides, at that point an affected VACUUM will be doing
> > work for an affected index through a btbulkdelete() call, so the
> > behavior of _bt_vacuum_needs_cleanup() becomes irrelevant.
> >
>
> I agree that there already are huge problems in that case. But I think
> we need to consider an append-only case as well; after bulk deletion
> on an append-only table, vacuum deletes heap tuples and index tuples,
> marking some index pages as dead and setting an XID into btpo.xact.
> Since we trigger autovacuums even by insertions based on
> autovacuum_vacuum_insert_scale_factor/threshold autovacuum will run on
> the table again. But if there is a long-running query a "wasted"
> cleanup scan could happen many times depending on the values of
> autovacuum_vacuum_insert_scale_factor/threshold and
> vacuum_cleanup_index_scale_factor. This should not happen in the old
> code. I agree this is DBA problem but it also means this could bring
> another new problem in a long-running query case.
>

I'd like to outline one relevant case.

Quite often bulk deletes are done on a time series data (oldest) and
effectively
removes a continuous chunk of data at the (physical) beginning of the table,
this is especially true for the append-only tables.
After the delete, planning queries takes a long time, due to MergeJoin
estimates
are using IndexScans ( see
https://postgr.es/m/17467.1426090...@sss.pgh.pa.us )
Right now we have to disable MergeJoins via the ALTER SYSTEM to mitigate
this.

So I would, actually, like it very much for VACUUM to kick in sooner in
such cases.

-- 
Victor Yegorov


Re: New IndexAM API controlling index vacuum strategies

2021-02-02 Thread Victor Yegorov
вт, 2 февр. 2021 г. в 05:27, Peter Geoghegan :

> And now here is the second thing I thought of, which is much better:
>
> Sometimes 1% of the dead tuples in a heap relation will be spread
> across 90%+ of the pages. With other workloads 1% of dead tuples might
> be highly concentrated, and appear in no more than 1% of all heap
> pages. Obviously the distinction between these two cases/workloads
> matters a lot. And so the triggering criteria must be quantitative
> *and* qualitative. It should not be based on counting dead tuples,
> since that alone won't differentiate these two extreme cases - both of
> which are probably quite common (in the real world extremes are
> actually the normal and common case IME).
>
> I like the idea of basing it on counting *heap blocks*, not dead
> tuples. We can count heap blocks that have *at least* one dead tuple
> (of course it doesn't matter how they're dead, whether it was this
> VACUUM operation or some earlier opportunistic pruning). Note in
> particular that it should not matter if it's a heap block that has
> only one LP_DEAD line pointer or a heap page that is near the
> MaxHeapTuplesPerPage limit for the page -- we count either type of
> page towards the heap-page based limit used to decide if index
> vacuuming goes ahead for all indexes during VACUUM.
>

I really like this idea!

It resembles the approach used in bottom-up index deletion, block-based
accounting provides a better estimate for the usefulness of the operation.

I suppose that 1% threshold should be configurable as a cluster-wide GUC
and also as a table storage parameter?


-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2021-01-18 Thread Victor Yegorov
пн, 18 янв. 2021 г. в 21:43, Peter Geoghegan :

> In the end, I couldn't justify imposing this cost on anything other
> than a non-HOT updater, which is what the flag proposal would require
> me to do -- then it's not 100% clear that the relative cost of each
> "bet" placed in heapam.c really is extremely low (we can no longer say
> for sure that we have very little to lose, given the certain
> alternative that is a version churn page split).


I must admit, that it's a bit difficult to understand you here (at least
for me).

I assume that by "bet" you mean flagged tuple, that we marked as such
(should we implement the suggested case).
As heapam will give up early in case there are no deletable tuples, why do
say,
that "bet" is no longer low? At the end, we still decide between page split
(and
index bloat) vs a beneficial space cleanup.

The fact is that
> "version chains in indexes" still cannot get very long. Plus there are
> other subtle ways in which it's unlikely to be a problem (e.g. the
> LP_DEAD deletion stuff also got much better, deduplication can combine
> with bottom-up deletion so that we'll trigger a useful bottom-up
> deletion pass sooner or later without much downside, and possibly
> other things I haven't even thought of).
>

I agree with this, except for "version chains" not being long. It all
really depends
on the data distribution. It's perfectly common to find indexes supporting
FK constraints
on highly skewed sets, with 80% of the index belonging to a single value
(say, huge business
customer vs several thousands of one-time buyers).


-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2021-01-18 Thread Victor Yegorov
пн, 18 янв. 2021 г. в 13:42, Amit Kapila :

> I don't think any of these can happen in what I am actually saying. Do
> you still have the same feeling after reading this email? Off-hand, I
> don't see any downsides as compared to the current approach and it
> will have fewer splits in some other workloads like when there is a
> mix of inserts and non-HOT updates (that doesn't logically change the
> index).
>

If I understand you correctly, you suggest to track _all_ the hints that
came
from the executor for possible non-HOT logical duplicates somewhere on
the page. And when we hit the no-space case, we'll check not only the last
item being hinted, but all items on the page, which makes it more probable
to kick in and do something.

Sounds interesting. And judging on the Peter's tests of extra LP_DEAD tuples
found on the page (almost always being more, than actually flagged), this
can
have some positive effects.

Also, I'm not sure where to put it. We've deprecated the BTP_HAS_GARBAGE
flag, maybe it can be reused for this purpose?


-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2021-01-18 Thread Victor Yegorov
пн, 18 янв. 2021 г. в 07:44, Amit Kapila :

> The summary of the above is that with Case-1 (clean-up based on hint
> received with the last item on the page) it takes fewer operations to
> cause a page split as compared to Case-2 (clean-up based on hint
> received with at least of the items on the page) for a mixed workload.
> How can we say that it doesn't matter?
>

I cannot understand this, perhaps there's a word missing in the brackets?..


Thinking of your proposal to undertake bottom-up deletion also on the
last-to-fit tuple insertion,
I'd like to start with my understanding of the design:

* we try to avoid index bloat, but we do it in the “lazy” way (for a
reason, see below)

* it means, that if there is enough space on the leaf page to fit new index
tuple,
  we just fit it there

* if there's not enough space, we first look at the presence of LP_DEAD
tuples,
  and if they do exits, we scan the whole (index) page to re-check all
tuples in order
  to find others, not-yet-marked-but-actually-being-LP_DEAD tuples and
clean all those up.

* if still not enough space, only now we try bottom-up-deletion. This is
heavy operation and
  can cause extra IO (tests do show this), therefore this operation is
undertaken at the point,
  when we can justify extra IO against leaf page split.

* if no space also after bottom-up-deletion, we perform deduplication (if
possible)

* finally, we split the page.

Should we try bottom-up-deletion pass in a situation where we're inserting
the last possible tuple
into the page (enough space for *current* insert, but likely no space for
the next),
then (among others) exists the following possibilities:

- no new tuples ever comes to this page anymore (happy accident), which
means that
  we've wasted IO cycles

- we undertake bottom-up-deletion pass without success and we're asked to
insert new tuple in this
  fully packed index page. This can unroll to:
  - due to the delay, we've managed to find some space either due to
LP_DEAD or bottom-up-cleanup.
which means we've wasted IO cycles on the previous iteration (too early
attempt).
  - we still failed to find any space and are forced to split the page.
in this case we've wasted IO cycles twice.

In my view these cases when we generated wasted IO cycles (producing no
benefit) should be avoided.
And this is main reason for current approach.

Again, this is my understanding and I hope I got it right.


-- 
Victor Yegorov


Re: cgit view availabel

2021-01-17 Thread Victor Yegorov
вс, 17 янв. 2021 г. в 17:19, Magnus Hagander :

> > First thing I've noted:
> >
> >
> https://git.postgresql.org/cgit/postgresql.git/commit/960869da0803427d14335bba24393f414b476e2c
> >
> > silently shows another commit.
>
> Where did you get that URL from?
>

I've made it up manually, comparing cgit and gitweb links.



> And AFAICT, and URL like that in cgit shows the latest commit in the
> repo, for the path that you entered (which in this case is the hash
> put int he wrong place).
>

Yes, that's what I've noted too.


I guess we could capture a specific "looks like a hash" and redirect
> that, assuming we would never ever have anything in a path or filename
> in any of our repositories that looks like a hash. That seems like
> maybe it's a bit of a broad assumption?
>

I thought maybe it's possible to rewrite requests in a form:

/cgit/*/commit/*

into

/cgit/*/commit/?id=&

?

-- 
Victor Yegorov


Re: cgit view availabel

2021-01-17 Thread Victor Yegorov
вс, 17 янв. 2021 г. в 14:48, Magnus Hagander :

> After a short time (ahem, several years) of badgering of me my a
> certain community member, I've finally gotten around to putting up a
> cgit instance on our git server, to allow for browsing of the git
> repositories. You can find this at:
>
> https://git.postgresql.org/cgit/
>
> or specifically for the postgresql git repo:
>
> https://git.postgresql.org/cgit/postgresql.git/
>

Looks nice!

First thing I've noted:


https://git.postgresql.org/cgit/postgresql.git/commit/960869da0803427d14335bba24393f414b476e2c

silently shows another commit.

Is it possible to make the scheme above work?
Our gitweb (and also github) is using it, so I assume people are quite used
to it.



-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2021-01-11 Thread Victor Yegorov
пн, 11 янв. 2021 г. в 22:10, Peter Geoghegan :

> I imagine that this happened because you have track_io_timing=on in
> postgresql.conf. Doesn't the same failure take place with the current
> master branch?
>
> I have my own way of running the locally installed Postgres when I
> want "make installcheck" to pass that specifically accounts for this
> (and a few other similar things).
>

Oh, right, haven't thought of this. Thanks for pointing that out.

Now everything looks good!


-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2021-01-11 Thread Victor Yegorov
пн, 11 янв. 2021 г. в 01:07, Peter Geoghegan :

>
> Attached is v13, which has this tweak, and other miscellaneous cleanup
> based on review from both Victor and Heikki. I consider this version
> of the patch to be committable. I intend to commit something close to
> it in the next week, probably no later than Thursday. I still haven't
> got to the bottom of the shellsort question raised by Heikki. I intend
> to do further performance validation before committing the patch. I
> will look into the shellsort thing again as part of this final
> performance validation work -- perhaps I can get rid of the
> specialized shellsort implementation entirely, simplifying the state
> structs added to tableam.h. (As I said before, it seems best to
> address this last of all to avoid making performance validation even
> more complicated.)
>

I've checked this version quickly. It applies and compiles without issues.
`make check` and `make check-world` reported no issue.

But `make installcheck-world` failed on:
…
test explain  ... FAILED   22 ms
test event_trigger... ok  178 ms
test fast_default ... ok  262 ms
test stats... ok  586 ms


 1 of 202 tests failed.


(see attached diff). It doesn't look like the fault of this patch, though.

I suppose you plan to send another revision before committing this.
Therefore I didn't perform any tests here, will wait for the next version.


-- 
Victor Yegorov


20210111-v13-regression.diffs
Description: Binary data


Re: Deadlock between backend and recovery may not be detected

2021-01-05 Thread Victor Yegorov
вт, 5 янв. 2021 г. в 07:26, Fujii Masao :

> This situation makes me feel that I'm inclined to skip the back-patch to
> v9.5.
> Because the next minor version release is the final one for v9.5. So if we
> unexpectedly introduce the bug to v9.5 by the back-patch, there is no
> chance to fix that. OTOH, of course, if we don't do the back-patch, there
> is
> no chance to fix the deadlock detection bug since the final minor version
> for v9.5 doesn't include that bug fix. But I'm afraid that the back-patch
> to v9.5 may give more risk than gain.
>
> Thought?
>

Honestly, I was thinking that this will not be backpatched at all
and really am glad we're getting this fixed in the back branches as well.

Therefore I think it's fine to skip 9.5, though I
would've mentioned this in the commit message.


-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2021-01-04 Thread Victor Yegorov
чт, 31 дек. 2020 г. в 03:55, Peter Geoghegan :

> Attached is v12, which fixed bitrot against the master branch. This
> version has significant comment and documentation revisions. It is
> functionally equivalent to v11, though.
>
> I intend to commit the patch in the next couple of weeks. While it
> certainly would be nice to get a more thorough review, I don't feel
> that it is strictly necessary. The patch provides very significant
> benefits with certain workloads that have traditionally been
> considered an Achilles' heel for Postgres. Even zheap doesn't provide
> a solution to these problems. The only thing that I can think of that
> might reasonably be considered in competition with this design is
> WARM, which hasn't been under active development since 2017 (I assume
> that it has been abandoned by those involved).
>

I've looked through the v12 patch.

I like the new outline:

- _bt_delete_or_dedup_one_page() is the main entry for the new code
- first we try _bt_simpledel_pass() does improved cleanup of LP_DEAD entries
- then (if necessary) _bt_bottomupdel_pass() for bottomup deletion
- finally, we perform _bt_dedup_pass() to deduplication

We split the leaf page only if all the actions above failed to provide
enough space.

Some comments on the code.

v12-0001


1. For the following comment

+* Only do this for key columns.  A change to a non-key column within an
+* INCLUDE index should not be considered because that's just payload to
+* the index (they're not unlike table TIDs to the index AM).

The last part of it (in the parenthesis) is difficult to grasp due to
the double negation (not unlike). I think it's better to rephrase it.

2. After reading the patch, I also think, that fact, that
index_unchanged_by_update()
and index_unchanged_by_update_var_walker() return different bool states
(i.e. when the latter returns true, the first one returns false) is a bit
misleading.

Although logic as it is looks fine, maybe
index_unchanged_by_update_var_walker()
can be renamed to avoid this confusion, to smth like
index_expression_changed_walker() ?

v12-0002


1. Thanks for the comments, they're well made and do help to read the code.

2. I'm not sure the bottomup_delete_items parameter is very helpful. In
order to disable
bottom-up deletion, DBA needs somehow to measure it's impact on a
particular index.
Currently I do not see how to achieve this. Not sure if this is overly
important, though, as
you have a similar parameter for the deduplication.

3. It feels like indexUnchanged is better to make indexChanged and negate
its usage in the code.
As !indexChanged reads more natural than !indexUnchanged, at least to
me.

This is all I have. I agree, that this code is pretty close to being
committed.

Now for the tests.

First, I run a 2-hour long case with the same setup as I used in my e-mail
from 15 of November.
I found no difference between patch and master whatsoever. Which makes me
think, that current
master is quite good at keeping better bloat control (not sure if this is
an effect of
4228817449 commit or deduplication).

I created another setup (see attached testcases). Basically, I emulated
queue operations(INSERT at the end and DELETE



-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2020-12-31 Thread Victor Yegorov
чт, 31 дек. 2020 г. в 20:01, Zhihong Yu :

> For v12-0001-Pass-down-logically-unchanged-index-hint.patch
>
> +   if (hasexpression)
> +   return false;
> +
> +   return true;
>
> The above can be written as return !hasexpression;
>

To be honest, I prefer the way Peter has it in his patch.
Yes, it's possible to shorten this part. But readability is hurt — for
current code I just read it, for the suggested change I need to think about
it.

-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2020-12-31 Thread Victor Yegorov
чт, 31 дек. 2020 г. в 03:55, Peter Geoghegan :

> Attached is v12, which fixed bitrot against the master branch. This
> version has significant comment and documentation revisions. It is
> functionally equivalent to v11, though.
>
> I intend to commit the patch in the next couple of weeks. While it
> certainly would be nice to get a more thorough review, I don't feel
> that it is strictly necessary. The patch provides very significant
> benefits with certain workloads that have traditionally been
> considered an Achilles' heel for Postgres. Even zheap doesn't provide
> a solution to these problems. The only thing that I can think of that
> might reasonably be considered in competition with this design is
> WARM, which hasn't been under active development since 2017 (I assume
> that it has been abandoned by those involved).
>

I am planning to look into this patch in the next few days.

-- 
Victor Yegorov


Re: Deadlock between backend and recovery may not be detected

2020-12-16 Thread Victor Yegorov
ср, 16 дек. 2020 г. в 13:49, Fujii Masao :

> After doing this procedure, you can see the startup process and backend
> wait for the table lock each other, i.e., deadlock. But this deadlock
> remains
> even after deadlock_timeout passes.
>
> This seems a bug to me.
>
> > * Deadlocks involving the Startup process and an ordinary backend process
> > * will be detected by the deadlock detector within the ordinary backend.
>
> The cause of this issue seems that ResolveRecoveryConflictWithLock() that
> the startup process calls when recovery conflict on lock happens doesn't
> take care of deadlock case at all. You can see this fact by reading the
> above
> source code comment for ResolveRecoveryConflictWithLock().
>
> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT
> timer in ResolveRecoveryConflictWithLock() so that the startup process can
> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend.
> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives,
> the backend should check whether the deadlock actually happens or not.
> Attached is the POC patch implimenting this.
>

I agree that this is a bug.

Unfortunately, we've been hit by it in production.
Such deadlock will, eventually, make all sessions wait on the startup
process, making
streaming replica unusable. In case replica is used for balancing out RO
queries from the primary,
it causes downtime for the project.

If I understand things right, session will release it's locks
when max_standby_streaming_delay is reached.
But it'd be much better if conflict is resolved faster,
around deadlock_timeout.

So — huge +1 from me for fixing it.


-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2020-11-25 Thread Victor Yegorov
ср, 25 нояб. 2020 г. в 19:41, Peter Geoghegan :

> We have promising tuples for bottom-up deletion. Why not have
> "promising heap blocks" for traditional LP_DEAD index tuple deletion?
> Or if you prefer, we can consider index tuples that *don't* have their
> LP_DEAD bits set already but happen to point to the *same heap block*
> as other tuples that *do* have their LP_DEAD bits set promising. (The
> tuples with their LP_DEAD bits set are not just promising -- they're
> already a sure thing.)
>

In the _bt_delete_or_dedup_one_page() we start with the simple loop over
items on the page and
if there're any LP_DEAD tuples, we're kicking off _bt_delitems_delete().

So if I understood you right, you plan to make this loop (or a similar one
somewhere around)
to track TIDs of the LP_DEAD tuples and then (perhaps on a second loop over
the page) compare all other
currently-not-LP_DEAD tuples and mark those pages, that have at least 2
TIDs pointing at (one LP_DEAD and other not)
as a promising one.

Later, should we require to kick deduplication, we'll go visit promising
pages first.

Is my understanding correct?


> I am missing a general perspective here.
> >
> > Is it true, that despite the long (vacuum preventing) transaction we can
> re-use space,
> > as after the DELETE statements commits, IndexScans are setting LP_DEAD
> hints after
> > they check the state of the corresponding heap tuple?
>
> The enhancement to traditional LP_DEAD deletion that I just described
> does not affect the current restrictions on setting LP_DEAD bits in
> the presence of a long-running transaction, or anything like that.
> That seems like an unrelated project. The value of this enhancement is
> purely its ability to delete *extra* index tuples that could have had
> their LP_DEAD bits set already (it was possible in principle), but
> didn't. And only when they are nearby to index tuples that really do
> have their LP_DEAD bits set.
>

I wasn't considering improvements here, that was a general question about
how this works
(trying to clear up gaps in my understanding).

What I meant to ask — will LP_DEAD be set by IndexScan in the presence of
the long transaction?


-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2020-11-25 Thread Victor Yegorov
ср, 25 нояб. 2020 г. в 05:35, Peter Geoghegan :

> Then I had a much better idea: Make the existing LP_DEAD stuff a
> little more like bottom-up index deletion. We usually have to access
> heap blocks that the index tuples point to today, in order to have a
> latestRemovedXid cutoff (to generate recovery conflicts). It's worth
> scanning the leaf page for index tuples with TIDs whose heap block
> matches the index tuples that actually have their LP_DEAD bits set.
> This only consumes a few more CPU cycles. We don't have to access any
> more heap blocks to try these extra TIDs, so it seems like a very good
> idea to try them out.
>

I don't seem to understand this.

Is it: we're scanning the leaf page for all LP_DEAD tuples that point to
the same
heap block? Which heap block we're talking about here, the one that holds
entry we're about to add (the one that triggered bottom-up-deletion due to
lack
of space I mean)?

I ran the regression tests with an enhanced version of the patch, with
> this LP_DEAD-deletion-with-extra-TIDs thing. It also had custom
> instrumentation that showed exactly what happens in each case. We
> manage to delete at least a small number of extra index tuples in
> almost all cases -- so we get some benefit in practically all cases.
> And in the majority of cases we can delete significantly more. It's
> not uncommon to increase the number of index tuples deleted. It could
> go from 1 - 10 or so without the enhancement to LP_DEAD deletion, to
> 50 - 250 with the LP_DEAD enhancement. Some individual LP_DEAD
> deletion calls can free more than 50% of the space on the leaf page.
>

I am missing a general perspective here.

Is it true, that despite the long (vacuum preventing) transaction we can
re-use space,
as after the DELETE statements commits, IndexScans are setting LP_DEAD
hints after
they check the state of the corresponding heap tuple?

If my thinking is correct for both cases — nature of LP_DEAD hint bits and
the mechanics of
suggested optimization — then I consider this a very promising improvement!

I haven't done any testing so far since sending my last e-mail.
If you'll have a chance to send a new v10 version with
LP_DEAD-deletion-with-extra-TIDs thing,
I will do some tests (planned).

-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2020-11-17 Thread Victor Yegorov
вт, 17 нояб. 2020 г. в 17:24, Peter Geoghegan :

> I prefer to continue to maintain the flag in the same way, regardless
> of which B-Tree version is in use (i.e. if it's heapkeyspace or not).
> Maintaining the flag is not expensive, may have some small value for
> forensic or debugging purposes, and saves callers the trouble of
> telling  _bt_delitems_delete() (and code like it) whether or not this
> is a heapkeyspace index.
>

OK. Can you explain what deprecation means here?
If this functionality is left as is, it is not really deprecation?..

-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2020-11-17 Thread Victor Yegorov
пт, 13 нояб. 2020 г. в 00:01, Peter Geoghegan :

> Attached is v8, which has the enhancements for low cardinality data
> that I mentioned earlier today. It also simplifies the logic for
> dealing with posting lists that we need to delete some TIDs from.
> These posting list simplifications also make the code a bit more
> efficient, which might be noticeable during benchmarking.
>

I've looked through the code and it looks very good from my end:
- plenty comments, good description of what's going on
- I found no loose ends in terms of AM integration
- magic constants replaced with defines
Code looks good. Still, it'd be good if somebody with more experience could
look into this patch.


Question: why in the comments you're using double spaces after dots?
Is this a convention of the project?

I am thinking of two more scenarios that require testing:
- queue in the table, with a high rate of INSERTs+DELETEs and a long
transaction.
  Currently I've seen such conditions yield indexes of several GB in size
wil holding less
  than a thousand of live records.
- upgraded cluster with !heapkeyspace indexes.

-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2020-11-17 Thread Victor Yegorov
чт, 12 нояб. 2020 г. в 23:00, Peter Geoghegan :

> Another thing that I'll probably add to v8: Prefetching. This is
> probably necessary just so I can have parity with the existing
> heapam.c function that the new code is based on,
> heap_compute_xid_horizon_for_tuples(). That will probably help here,
> too.
>

I don't quite see this part. Do you mean top_block_groups_favorable() here?


-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2020-11-17 Thread Victor Yegorov
чт, 12 нояб. 2020 г. в 23:00, Peter Geoghegan :

> It would be helpful if you could take a look at the nbtree patch --
> particularly the changes related to deprecating the page-level
> BTP_HAS_GARBAGE flag. I would like to break those parts out into a
> separate patch, and commit it in the next week or two. It's just
> refactoring, really. (This commit can also make nbtree only use the
> word VACUUM for things that strictly involve VACUUM. For example,
> it'll rename _bt_vacuum_one_page() to _bt_delete_or_dedup_one_page().)
>
> We almost don't care about the flag already, so there is almost no
> behavioral change from deprecated BTP_HAS_GARBAGE in this way.
>
> Indexes that use deduplication already don't rely on BTP_HAS_GARBAGE
> being set ever since deduplication was added to Postgres 13 (the
> deduplication code doesn't want to deal with LP_DEAD bits, and cannot
> trust that no LP_DEAD bits can be set just because BTP_HAS_GARBAGE
> isn't set in the special area). Trusting the BTP_HAS_GARBAGE flag can
> cause us to miss out on deleting items with their LP_DEAD bits set --
> we're better off "assuming that BTP_HAS_GARBAGE is always set", and
> finding out if there really are LP_DEAD bits set for ourselves each
> time.
>
> Missing LP_DEAD bits like this can happen when VACUUM unsets the
> page-level flag without actually deleting the items at the same time,
> which is expected when the items became garbage (and actually had
> their LP_DEAD bits set) after VACUUM began, but before VACUUM reached
> the leaf pages. That's really wasteful, and doesn't actually have any
> upside -- we're scanning all of the line pointers only when we're
> about to split (or dedup) the same page anyway, so the extra work is
> practically free.
>

I've looked over the BTP_HAS_GARBAGE modifications, they look sane.
I've double checked that heapkeyspace indexes don't use this flag (don't
rely on it),
while pre-v4 ones still use it.

I have a question. This flag is raised in the _bt_check_unique() and
in _bt_killitems().
If we're deprecating this flag, perhaps it'd be good to either avoid
raising it at least for
_bt_check_unique(), as it seems to me that conditions are dealing with
postings, therefore
we are speaking of heapkeyspace indexes here.

If we'll conditionally raise this flag in the functions above, we can get
rid of blocks that drop it
in _bt_delitems_delete(), I think.

-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2020-10-29 Thread Victor Yegorov
пн, 26 окт. 2020 г. в 22:15, Peter Geoghegan :

> Attached is v5, which has changes that are focused on two important
> high level goals:
>

And some more comments after another round of reading the patch.

1. Looks like UNIQUE_CHECK_NO_WITH_UNCHANGED is used for HOT updates,
   should we use UNIQUE_CHECK_NO_HOT here? It is better understood like
this.

2. You're modifying the table_tuple_update() function on 1311 line of
include/access/tableam.h,
   adding modified_attrs_hint. There's a large comment right before it
describing parameters,
   I think there should be a note about modified_attrs_hint parameter in
that comment, 'cos
   it is referenced from other places in tableam.h and also from
backend/access/heap/heapam.c

3. Can you elaborate on the scoring model you're using?
   Why do we expect a score of 25, what's the rationale behind this number?
   And should it be #define-d ?

4. heap_compute_xid_horizon_for_tuples contains duplicate logic. Is it
possible to avoid this?

5. In this comment

+ * heap_index_batch_check() helper function.  Sorts deltids array in the
+ * order needed for useful processing.

   perhaps it is better to replace "useful" with more details? Or point to
the place
   where "useful processing" is described.

6. In this snippet in _bt_dedup_delete_pass()

+   else if (_bt_keep_natts_fast(rel, state->base, itup) > nkeyatts &&
+_bt_dedup_save_htid(state, itup))
+   {
+
+   }

   I would rather add a comment, explaining that the empty body of the
clause is actually expected.

7. In the _bt_dedup_delete_finish_pending() you're setting ispromising to
false for both
   posting and non-posting tuples. This contradicts comments before
function.




-- 
Victor Yegorov


Re: Autovacuum worker doesn't immediately exit on postmaster death

2020-10-28 Thread Victor Yegorov
ср, 28 окт. 2020 г. в 19:44, Alexander Kukushkin :

> I know, nobody in their mind should do that, but, if the postmaster
> process is killed with SIGKILL signal, most backend processes
> correctly notice the fact of the postmaster process absence and exit.
> There is one exception though, when there are autovacuum worker
> processes they are continuing to run until eventually finish and exit.
>
> …
>
> I was able to reproduce it with 13.0 and 12.4, and I believe older
> versions are also affected.
>

Do you get the same behaviour also on master?
As there was some work in this area for 14, see
https://git.postgresql.org/pg/commitdiff/44fc6e259b

-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2020-10-16 Thread Victor Yegorov
пт, 16 окт. 2020 г. в 21:12, Peter Geoghegan :

> I ran the script for two hours and 16 clients with the patch, then for
> another two hours with master. After that time, all 3 indexes were
> exactly the same size with the patch, but had almost doubled in size
> on master:
>
> aid_pkey_include_abalance: 784,264 pages (or ~5.983 GiB)
> one: 769,545 pages (or ~5.871 GiB)
> two: 770,295 pages (or ~5.876 GiB)
>
> (With the patch, all three indexes were 100% pristine -- they remained
> precisely 411,289 pages in size by the end, which is ~3.137 GiB.)
>
> …
>
> The TPS/throughput is about what you'd expect for the two hour run:
>
> 18,988.762398 TPS for the patch
> 11,123.551707 TPS for the master branch.
>

I really like these results, great work!

I'm also wondering how IO numbers changed due to these improvements,
shouldn't be difficult to look into.

Peter, according to cfbot patch no longer compiles.
Can you send and update, please?


-- 
Victor Yegorov


Re: Failure to create GiST on ltree column

2020-05-27 Thread Victor Yegorov
пн, 25 мая 2020 г. в 18:25, Justin Pryzby :

> I wonder if/how that fails if you create the index before adding data:
>
> CREATE TABLE test_path(path ltree);
> CREATE INDEX ON test_path USING GIST(path);
> INSERT INTO test_path SELECT * FROM comments.mp_comments;
>
> Does that fail on a particular row ?
>
> How many paths do you have and how long?  How big is the table?
>

Yes, it fails.

I got permission and created a partial dump of the data with:
CREATE TABLE lc AS SELECT id, path FROM comments.mp_comments WHERE
length(path::text)>=500;

Attached. It is reproduces the error I get. One needs to create ltree
extension first.

I understand, that issue most likely comes from the length of the ltree
data stored in the columns.
But error is a bit misleading…


-- 
Victor Yegorov


lc.pgdump
Description: Binary data


Failure to create GiST on ltree column

2020-05-24 Thread Victor Yegorov
Greetings.

I am getting random failures in `CREATE INDEX USING gist` over ltree column
while performing pg_restore.
I get either
ERROR:  stack depth limit exceeded
or
ERROR:  failed to add item to index page

Thing is — if I retry index creation manually, I get it successfully built
in ~50% of the cases.

I would like to find out what's the real cause here, but I am not sure how
to do it.
If anybody could provide some guidance, I am open to investigate this case.

I'm on PostgreSQL 11.8 (Debian 11.8-1.pgdg90+1), debugging symbols
installed.

-- 
Victor Yegorov


Re: Poll: are people okay with function/operator table redesign?

2020-04-20 Thread Victor Yegorov
вс, 19 апр. 2020 г. в 20:00, Tom Lane :

> In the meantime I plan to push forward with the markup approach we've
> got.  The editorial content should still work if we find a better
> markup answer, and I'm willing to do the work of replacing the markup
> as long as somebody else figures out what it should be.
>

I am following this thread as a frequent documentation user.

While table 9.5 with functions looks quite nice, I quite dislike 9.4 with
operators.
Previously, I could lookup operator in the leftmost column and read on.
Right now I have to look through the whole table (well, not really, but
still) to find the operator.

-- 
Victor Yegorov


Re: open-source equivalent of golden-gate

2020-02-11 Thread Victor Yegorov
вт, 11 февр. 2020 г. в 12:23, ROS Didier :

> In the Oracle world we use the product "golden gate" to execute
> transactions from a source database (Oracle, Mysql) to a PostgreSQL
> instance.
>
> This allows 2 Oracle and PostgreSQL databases to be updated at the same
> time in real time.
>
> I would like to know if there is an equivalent open-source product.
>

There is a SQL/MED standard exactly for this:
https://wiki.postgresql.org/wiki/SQL/MED

Implemented in PostgreSQL as Foreign Data Wrappers:
https://wiki.postgresql.org/wiki/Fdw
You need to do the following:
1. Add wrapper via
https://www.postgresql.org/docs/current/sql-createextension.html
2. Create remote source via
https://www.postgresql.org/docs/current/sql-createserver.html
3. Create foreign table via
https://www.postgresql.org/docs/current/sql-createforeigntable.html

Note, that PostgreSQL provides only infrastructure, wrappers for different
remote systems are not supported by the PostgreSQL community,
except for postgres_fdw and csv_fdw provided by the project.


-- 
Victor Yegorov


Re: Alter index rename concurrently to

2018-07-16 Thread Victor Yegorov
пн, 16 июл. 2018 г. в 21:58, Andrey Klychkov :

> I made a patch to solve this issue (see the attachment).
> It allows to avoid locks by a query like this:
> “alter index  rename CONCURRENTLY to ”.
>

Please, have a look at previous discussions on the subject:
- 2012
https://www.postgresql.org/message-id/cab7npqtys6juqdxuczbjb0bnw0kprw8wdzuk11kaxqq6o98...@mail.gmail.com
- 2013
https://www.postgresql.org/message-id/cab7npqstfkuc7dzxcdx4hotu63yxhrronv2aouzyd-zz_8z...@mail.gmail.com
- https://commitfest.postgresql.org/16/1276/

-- 
Victor Yegorov