Re: Index bloat and REINDEX/VACUUM optimization for partial index

2023-08-28 Thread Peter Geoghegan
On Mon, Aug 28, 2023 at 5:33 PM jayaprabhakar k  wrote:
> REINDEX requires a full table scan
>
> Roughly create a new index, rename index, drop old index.
> REINDEX is not incremental. running reindex frequently does not reduce the 
> future reindex time.

You didn't say which Postgres version you're on. Note that Postgres 14
can deal with index bloat a lot better than earlier versions could.
This is known to work well with partial indexes. See:

https://www.postgresql.org/message-id/flat/CAL9smLAjt9mZC2%3DqBeJwuNPq7KMAYGTWWQw_hvA-Lfo0b3ycow%40mail.gmail.com

-- 
Peter Geoghegan




Re: Forced to use UNION ALL when having multiple ANY operators and ORDER BY LIMIT

2023-06-12 Thread Peter Geoghegan
On Mon, Jun 12, 2023 at 1:17 PM benoit  wrote:
> Is there a misusage of my indexes?
>
> Is there a limitation when using ANY or IN operators and ordered LIMIT behind?

It's complicated. Do you find that you get satisfactory performance if
you force a bitmap index scan? In other words, what is the effect of
"set enable_indexscan = off" on your original query? Does that speed
up execution at all? (I think that this approach ought to produce a
plan that uses a bitmap index scan in place of the index scan, without
changing anything else.)

-- 
Peter Geoghegan




Re: High QPS, random index writes and vacuum

2023-04-17 Thread Peter Geoghegan
On Mon, Apr 17, 2023 at 7:43 PM peter plachta  wrote:
> Version: I sheepishly admit it's 9.6, 10 and 11 (it's Azure Single Server, 
> that's another story).

If you can upgrade to 14, you'll find that there is much improved
management of index updates on that version:

https://www.postgresql.org/docs/current/btree-implementation.html#BTREE-DELETION

But it's not clear what the problem really is here. If the problem is
that you're dependent on vacuum to get acceptable response times by
holding back index bloat, then an upgrade could easily help a lot. But
an upgrade might not make VACUUM take less time, given that you've
already tuned it fairly aggressively. It depends.

An upgrade might make VACUUM go faster if you set
vacuum_cost_page_miss to 2, which is the default on later versions
anyway -- looks like you didn't touch that. And, as Thomas said, later
versions do have parallel VACUUM, though that cannot be used by
autovacuum workers.

-- 
Peter Geoghegan




Re: High QPS, random index writes and vacuum

2023-04-17 Thread Peter Geoghegan
On Mon, Apr 17, 2023 at 5:35 PM peter plachta  wrote:
> My question is: what is the recommended strategy to deal with such cases in 
> Postgres?

You didn't say what version of Postgres you're using...

-- 
Peter Geoghegan




Re: creating hash indexes

2022-12-14 Thread Peter Geoghegan
On Wed, Dec 14, 2022 at 12:03 PM Rick Otten  wrote:
> Assuming I can live with the slower inserts, is there any parameter in 
> particular I can tweak that would make the time it takes to create the hash 
> index closer to the btree index creation time?  In particular if I wanted to 
> try this on a several billion row table in a busy database?

No. B-Tree index builds are parallelized, and are far better optimized
in general.

> -  As long as the index fits in memory, varchar btree isn't really that much 
> slower in postgresql 14 (the way it was a few years ago), so we'll probably 
> just live with that for the forseeable future given the complexity of 
> changing things at the moment.

The other things to consider are 1.) the index size after retail
inserts, 2.) the index size following some number of updates and
deletes.

Even if you just had plain inserts for your production workload, the
picture will not match your test case (which I gather just looked at
the index size after a CREATE INDEX ran). I think that B-Tree indexes
will still come out ahead if you take this growth into account, and by
quite a bit, but probably not due to any effect that your existing test case
exercises.

B-Tree indexes are good at accommodating unpredictable growth, without
ever getting terrible performance on any metric of interest. So it's
not just that they tend to have better performance on average than
hash indexes (though they do); it's that they have much more
*predictable* performance characteristics as conditions change.

--
Peter Geoghegan




Re: Lock contention high

2021-10-13 Thread Peter Geoghegan
On Wed, Oct 13, 2021 at 6:54 PM Jeremy Schneider
 wrote:
> only a half GB memory for autovac? (it will have a mandatory run as soon
> as you hit 200 mil XIDs, seems like you'd want the full max 1GB for it)

While anti-wraparound vacuums will become a problem for TPC-C (unless
you tune for it), it's not too sensitive to mwm. You just don't end up
accumulating too many TIDs to delete from indexes in practice, even
though the overhead from VACUUM is a concern. The new autovacuum
instrumentation in Postgres 14 makes this far clearer.

-- 
Peter Geoghegan




Re: Lock contention high

2021-10-13 Thread Peter Geoghegan
On Tue, Oct 12, 2021 at 12:45 AM Ashkil Dighin  wrote:
> Lock contention observed high in PostgreSQLv13.3
> The source code compiled with GNC(GCCv11.x)
> PostgreSQL version: 13.3
> Operating system:   RHEL8.3
> Kernel name:4.18.0-305.10.2.el8_4.x86_64
> RAM Size:512GB
> SSD: 1TB
> The environment used IBM metal and test benchmark environment HammerDbv4.2
> Test case :TPC-C

You didn't say how many TPC-C warehouses you used. In my experience,
people sometimes run TPC-C with relatively few, which will tend to
result in extreme contention on certain B-Tree leaf pages. (My
experiences are with BenchmarkSQL, but I can't imagine HammerDB is too
much different.)

Assuming that's the case here, for you, then it's not clear that you
have a real problem. You're really not supposed to run the benchmark
in that way, per the TPC-C spec, which strictly limits the number of
transactions per minute per warehouse -- for better or worse, valid
results generally require that you use lots of warehouses to get a
very large database (think terabytes). If you run the benchmark with
100 warehouses or less, on a big server, then the contention you'll
see will be out of all proportion to what you're ever likely to see in
the real world.

-- 
Peter Geoghegan




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

2021-10-05 Thread Peter Geoghegan
On Fri, Oct 1, 2021 at 1:06 PM Jeff Holt  wrote:
> Now looking closely at postgreSQL, I see an opportunity to more quickly 
> implement Oracle's current feature list.
>
> I've come to this point because I see many roadblocks for users who want to 
> see a detailed "receipt" for their response time.

I have heard of method R. Offhand it seems roughly comparable to
something like the Top-down Microarchitecture Analysis Method that low
level systems programmers sometimes use, along with Intel's pmu-tools
-- at least at a very high level. The point seems to be to provide a
workflow that can plausibly zero in on low-level bottlenecks, by
providing high level context. Many tricky real world problems are in
some sense a high level problem that is disguised as a low level
problem. And so all of the pieces need to be present on the board, so
to speak.

Does that sound accurate?

One obvious issue with much of the Postgres instrumentation is that it
makes it hard to see how things change over time. I think that that is
often *way* more informative than static snapshots.

I can see why you'd emphasize the need for PostgreSQL to more or less
own the end to end experience for something like this. It doesn't
necessarily follow that the underlying implementation cannot make use
of infrastructure like eBPF, though. Fast user space probes provably
have no overhead, and can be compiled-in by distros that can support
it. There hasn't been a consistent effort to make that stuff
available, but I doubt that that tells us much about what is possible.
The probes that we have today are somewhat of a grab-bag, that aren't
particularly useful -- so it's a chicken-and-egg thing.

It would probably be helpful if you could describe what you feel is
missing in more general terms -- while perhaps giving specific
practical examples of specific scenarios that give us some sense of
what the strengths of the model are. ISTM that it's not so much a lack
of automation in PostgreSQL. It's more like a lack of a generalized
model, which includes automation, but also some high level top-down
theory.

-- 
Peter Geoghegan




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

2021-10-04 Thread Peter Geoghegan
On Mon, Oct 4, 2021 at 9:04 PM Mladen Gogala  wrote:
> What angered me was the presumptuous tone of voice directed to an Oracle
> legend. I have probably talked to many more Oracle people than you,
> including Tanel, whom I have met personally. I am not on Twitter,
> unfortunately I am older than 20. Before you ask, I am not on Instagram,
> Facebook or Tiktok. I am not on OnlyFans either. I have never understood
> the need to share one's every thought in real time. Being rather private
> person has probably stymied my career of an internet influencer. I'll
> never rival Kim Kardashian.

You do seem shy.

> As for Jeff Holt, I believe that a person of his stature needs to be
> taken seriously and not lectured "how are things done in Postgres
> community".

I haven't met Jeff Holt either, but I believe that he is also older
than 20. I have to imagine that he doesn't particularly need you to
defend his honor.

> I  am rather confused by the thinly veiled hostility toward
> Oracle. In my opinion, Postgres community should be rather welcoming to
> Oracle people like Frits Hoogland, Frank Pachot or Jeff Holt. After all,
> we are using Postgres and telling us "you can't have what you used to
> get from Oracle" is not either encouraging or smart.

I agree with all that. I am also friendly with Frank, as it happens.

I think that Laurenze was just trying to establish common terms of
reference for discussion -- it's easy for two groups of people with
similar but different terminology to talk past each other. For
example, I think that there may be confusion about what is possible
with a tool like eBPF today, and what might be possible in an ideal
world.

-- 
Peter Geoghegan




Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

2021-10-04 Thread Peter Geoghegan
On Mon, Oct 4, 2021 at 6:51 PM Mladen Gogala  wrote:
> Haughty lectures about "Oracle has it" not being good enough could
> hardly be more out of place here. To put it as politely as is possible
> in this case, shut your pie hole. What Jeff is asking for is not
> something that "Oracle has", it's something that customers want. That
> was the case few years ago when I was asking for the optimizer hints. I
> was castigated by the former pastry baker turned Postgres guru and my
> reaction was simple: I threw Postgres out of the company that I was a
> working for as the lead DBA. You see, customer is always right, whether
> the database is open source or not. Needless to say, Postgres has
> optimizer hints these days. It still has them in "we do not want" part
> of the Wiki, which is hilarious.

In all sincerity: Chill out. I don't think that this is worth getting
into an argument over. I think that there is a good chance that you'd
have had a much better experience if the conversation had been in
person. Text has a way of losing a lot of important nuance.

I have personally met and enjoyed talking to quite a few people that
personally worked on Oracle, in various capacities -- the world of
database internals experts is not huge. I find Tanel Poder very
interesting, too -- never met the man, but we follow each other on
Twitter. Oracle is a system that has some interesting properties in
general (not just as a counterpoint to PostgreSQL), and I definitely
respect it. Same with SQL Server.

There are lots of smart and accomplished people in the world,
including Jeff. I think that it's worth understanding these
differences in perspective. There is likely to be merit in all of the
specific points made by both Laurenze and Jeff. They may not be
irreconcilable, or anything like it.

-- 
Peter Geoghegan




Re: Big performance slowdown from 11.2 to 13.3

2021-07-22 Thread Peter Geoghegan
On Thu, Jul 22, 2021 at 10:35 AM Tom Lane  wrote:
> Well, what we really ought to be using is size_t (a/k/a Size), at least
> for memory-space-related calculations.  I don't have an opinion right
> now about what logtape.c ought to use.  I do agree that avoiding "long"
> altogether would be a good ultimate goal.

I assume that we often use "long" in contexts where a signed integer
type is required. Maybe this is not true in the case of the work_mem
style calculations. But I know that it works that way in logtape.c,
where -1 is a sentinel value.

We already use int64 (not size_t) in tuplesort.c for roughly the same
reason: LACKMEM() needs to work with negative values, to handle
certain edge cases.

> In the short term though, the question is whether we want to regard this
> hashagg issue as something we need a fix for in v13/v14.  The fact that
> it's Windows-only makes it slightly less pressing in my mind, but it's
> still a regression that some people are going to hit.

True. I worry about the potential for introducing new bugs on Windows
by backpatching a fix for this. Technically this restriction existed
in every other work_mem consumer on Windows. Of course this won't
matter much to users like Laurent.

-- 
Peter Geoghegan




Re: Big performance slowdown from 11.2 to 13.3

2021-07-22 Thread Peter Geoghegan
On Thu, Jul 22, 2021 at 10:33 AM l...@laurent-hasson.com
 wrote:
> Damn... I know Windows is a lower priority, and this is yet another issue, 
> but in Healthcare, Windows is so prevalent everywhere...

To be clear, I didn't actually say that. I said that it doesn't matter
either way, as far as addressing this long standing "int64 vs long"
issue goes.

-- 
Peter Geoghegan




Re: Big performance slowdown from 11.2 to 13.3

2021-07-22 Thread Peter Geoghegan
On Thu, Jul 22, 2021 at 10:11 AM Tom Lane  wrote:
> No, he already tried, upthread.  The trouble is that he's on a Windows
> machine, so get_hash_mem is quasi-artificially constraining the product
> to 2GB.  And he needs it to be a bit more than that.  Whether the
> constraint is hitting at the ngroups stage or it's related to actual
> memory consumption isn't that relevant.

Somehow I missed that part.

> What I'm wondering about is whether it's worth putting in a solution
> for this issue in isolation, or whether we ought to embark on the
> long-ignored project of getting rid of use of "long" for any
> memory-size-related computations.  There would be no chance of
> back-patching something like the latter into v13, though.

+1. Even if we assume that Windows is a low priority platform, in the
long run it'll be easier to make it more like every other platform.

The use of "long" is inherently suspect to me. It signals that the
programmer wants something wider than "int", even if the standard
doesn't actually require that "long" be wider. This seems to
contradict what we know to be true for Postgres, which is that in
general it's unsafe to assume that long is int64. It's not just
work_mem related calculations. There is also code like logtape.c,
which uses long for block numbers -- that also exposes us to risk on
Windows.

By requiring int64 be used instead of long, we don't actually increase
risk for non-Windows platforms to any significant degree. I'm pretty
sure that "long" means int64 on non-Windows 64-bit platforms anyway.

-- 
Peter Geoghegan




Re: Big performance slowdown from 11.2 to 13.3

2021-07-22 Thread Peter Geoghegan
On Thu, Jul 22, 2021 at 9:53 AM Peter Geoghegan  wrote:
> I suspect David's theory about hash_agg_set_limits()'s ngroup limit is
> correct. It certainly seems like a good starting point.

I also suspect that if Laurent set work_mem and/or hash_mem_multiplier
*extremely* aggressively, then eventually the hash agg would be
in-memory. And without actually using all that much memory.

I'm not suggesting that that is a sensible resolution to Laurent's
complaint. I'm just pointing out that it's probably not fundamentally
impossible to make the hash agg avoid spilling through tuning these
GUCs. At least I see no evidence of that right now.

-- 
Peter Geoghegan




Re: Big performance slowdown from 11.2 to 13.3

2021-07-22 Thread Peter Geoghegan
On Thu, Jul 22, 2021 at 9:42 AM Tom Lane  wrote:
> Right.  The point here is that before v13, hash aggregation was not
> subject to the work_mem limit, nor any related limit.  If you did an
> aggregation requiring more than 2GB-plus-slop, it would work just fine
> as long as your machine had enough RAM.  Now, the performance sucks and
> there is no knob you can turn to fix it.  That's unacceptable in my book.

Oh! That makes way more sense.

I suspect David's theory about hash_agg_set_limits()'s ngroup limit is
correct. It certainly seems like a good starting point.

-- 
Peter Geoghegan




Re: Big performance slowdown from 11.2 to 13.3

2021-07-22 Thread Peter Geoghegan
On Thu, Jul 22, 2021 at 9:21 AM Tom Lane  wrote:
> Yeah, I should have said "2GB plus palloc slop".  It doesn't surprise
> me a bit that we seem to be eating another 20% on top of the nominal
> limit.

MAX_KILOBYTES is the max_val for the work_mem GUC itself, and has been
for many years. The function get_hash_mem() returns a work_mem-style
int that callers refer to as hash_mem -- the convention is that
callers pretend that there is a work_mem style GUC (called hash_mem)
that they must access by calling get_hash_mem().

I don't see how it's possible for get_hash_mem() to be unable to
return a hash_mem value that could be represented by work_mem
directly. MAX_KILOBYTES is an annoyingly low limit on Windows, where
sizeof(long) is 4. But that's nothing new.

-- 
Peter Geoghegan




Re: Big performance slowdown from 11.2 to 13.3

2021-07-22 Thread Peter Geoghegan
On Thu, Jul 22, 2021 at 8:45 AM Tom Lane  wrote:
> That is ... weird.  Maybe you have found a bug in the spill-to-disk logic;
> it's quite new after all.  Can you extract a self-contained test case that
> behaves this way?

I wonder if this has something to do with the way that the input data
is clustered. I recall noticing that that could significantly alter
the behavior of HashAggs as of Postgres 13.

-- 
Peter Geoghegan




Re: Big performance slowdown from 11.2 to 13.3

2021-07-21 Thread Peter Geoghegan
On Wed, Jul 21, 2021 at 4:19 PM l...@laurent-hasson.com
 wrote:
> As you can see, the V13.3 execution is about 10x slower.
>
> It may be hard for me to create a whole copy of the database on 11.12 and 
> check that environment by itself. I'd want to do it on the same machine to 
> control variables, and I don't have much extra disk space at the moment.

I imagine that this has something to do with the fact that the hash
aggregate spills to disk in Postgres 13.

You might try increasing hash_mem_multiplier from its default of 1.0,
to 2.0 or even 4.0. That way you'd be able to use 2x or 4x more memory
for executor nodes that are based on hashing (hash join and hash
aggregate), without also affecting other kinds of nodes, which are
typically much less sensitive to memory availability. This is very
similar to increasing work_mem, except that it is better targeted.

It might even make sense to *decrease* work_mem and increase
hash_mem_multiplier even further than 4.0. That approach is more
aggressive, though, so I wouldn't use it until it actually proved
necessary.

-- 
Peter Geoghegan




Re: Very slow Query compared to Oracle / SQL - Server

2021-05-07 Thread Peter Geoghegan
On Fri, May 7, 2021 at 2:28 PM Peter Geoghegan  wrote:
> That's a very reasonable interpretation, since the bitmap index scans
> themselves just aren't doing that much I/O -- we see that there is
> much more I/O for the heap scan, which is likely to be what the
> general picture looks like no matter how much bloat there is.
>
> However, I'm not sure if that reasonable interpretation is actually
> correct. The nbtinsert.c code that handles deleting LP_DEAD index
> tuples no longer relies on having a page-level garbage item flag set
> in Postgres 13 -- it just scans the line pointer array for LP_DEAD
> items each time.

BTW, I am pointing all of this out because I've heard informal reports
of big improvements following an upgrade to Postgres 13 that seem
unlikely to be related to the simple fact that indexes are smaller
(most of the time you cannot save that much I/O by shrinking indexes
without affected when and how TIDs/heap tuples are scanned).

It's necessary to simulate the production workload to have *any* idea
if LP_DEAD index tuple deletion might be a factor. If the OP is just
testing this one query on Postgres 13 in isolation, without anything
bloating up (or cleaning up) indexes, then that doesn't really tell us
anything about how Postgres 13 compares to Postgres 12. As you said,
simply shrinking the indexes is nice, but not enough -- we'd need some
second of second order effect to get acceptable performance over time
and under real world conditions.

-- 
Peter Geoghegan




Re: Very slow Query compared to Oracle / SQL - Server

2021-05-07 Thread Peter Geoghegan
On Fri, May 7, 2021 at 9:16 AM Justin Pryzby  wrote:
> In pg13, indexes are de-duplicated by default.
>
> But I suspect the performance is better because data was reload, and the
> smaller indexes are a small, additional benefit.

That's a very reasonable interpretation, since the bitmap index scans
themselves just aren't doing that much I/O -- we see that there is
much more I/O for the heap scan, which is likely to be what the
general picture looks like no matter how much bloat there is.

However, I'm not sure if that reasonable interpretation is actually
correct. The nbtinsert.c code that handles deleting LP_DEAD index
tuples no longer relies on having a page-level garbage item flag set
in Postgres 13 -- it just scans the line pointer array for LP_DEAD
items each time. VACUUM has a rather unhelpful tendency to unset the
flag when it shouldn't, which we're no longer affected by. So that's
one possible explanation.

Another possible explanation is that smaller indexes (due to
deduplication) are more likely to get index scans, which leads to
setting the LP_DEAD bit of known-dead index tuples in passing more
often (bitmap index scans won't do the kill_prior_tuple optimization).
There could even be a virtuous circle over time. (Note that the index
deletion stuff in Postgres 14 pretty much makes sure that this
happens, but it is probably at least much more likely in Postgres 13
compared to 12.)

I could easily be very wrong about all of this in this instance,
though, because the behavior I've described is highly non-linear and
therefore highly unpredictable in general (not to mention highly
sensitive to workload characteristics). I'm sure that I've thought
about this stuff way more than any other individual Postgres
contributor, but it's easy to be wrong in any given instance. The real
explanation might be something else entirely. Though it's hard not to
imagine that what really matters here is avoiding all of that bitmap
heap scan I/O.

-- 
Peter Geoghegan




Re: Consecutive Query Executions with Increasing Execution Time

2019-12-16 Thread Peter Geoghegan
On Mon, Dec 16, 2019 at 9:28 AM Shijia Wei  wrote:
> 1st Query:

>  Buffers: shared hit=17074 read=16388

> 20th Query:

>  Buffers: shared hit=17037 read=16390

Why do the first and the twentieth executions of the query have almost
identical "buffers shared/read" numbers? That seems odd.

-- 
Peter Geoghegan




Re: Huge shared hit for small table

2019-11-04 Thread Peter Geoghegan
On Mon, Nov 4, 2019 at 12:38 PM Scott Rankin  wrote:
> Definitely no long-running transactions on this table; in fact, this table is 
> pretty infrequently updated – on the order of a few tens of rows updated per 
> day.

But a long running transaction will have an impact on all tables --
not just the tables that happen to have been accessed so far in the
long running transaction. This is necessary because nothing stops the
long running transaction from SELECTing data from any table at any
time -- we need to pessimistically keep around the data required to
make that work.

-- 
Peter Geoghegan




Re: Huge shared hit for small table

2019-11-04 Thread Peter Geoghegan
On Mon, Nov 4, 2019 at 12:32 PM Jeff Janes  wrote:
> Could there be a long-open transaction, which is preventing hint-bits from 
> getting on set on the table rows, as well on the index rows?

Contention on a small number of rows may also be a factor.

> A reindex would not by itself fix the problem if it were the long open 
> transaction.  But  if the long open transaction held a sufficient lock on the 
> table, then the reindex would block until the transaction went away on its 
> own, at which point the problem would go away on its own, so it might 
> **appear** to have fixed the problem.

That seems like the simplest and most likely explanation to me, even
though it isn't particularly simple.

-- 
Peter Geoghegan




Re: Huge shared hit for small table

2019-11-04 Thread Peter Geoghegan
On Mon, Nov 4, 2019 at 11:56 AM Justin Pryzby  wrote:
> I think it's because some heap pages are being visited many times, due to the
> index tuples being badly "fragmented".  Note, I'm not talking about
> fragmentation of index *pages*, which is what pgstattuple reports (which
> wouldn't have nearly so detrimental effect).  I could probably say that the
> index tuples are badly "correlated" with the heap.

But this is a unique index, and Scott indicates that the problem seems
to go away for a while following a REINDEX.

> In PG v12 you can use REINDEX CONCURRENTLY (but beware there's a crash
> affecting its progress reporting, fix to be included in v12.1).

PG v12 will store B-Tree duplicates in heap TID order, so if that's
the problem then upgrading to v12 (and REINDEXing if the upgrade was
performed using pg_upgrade) will fix it for good.

-- 
Peter Geoghegan




Re: UUID v1 optimizations...

2019-07-07 Thread Peter Geoghegan
Please don't top post -- trim the your response down so that only
still-relevant text remains.

On Tue, Jun 11, 2019 at 1:27 PM Ancoron Luciferis
 wrote:
> Primary key indexes after an ANALYZE:
> table_name | bloat  | index_mb | table_mb
> ---++--+--
>  uuid_v1   | 767 MiB (49 %) | 1571.039 | 1689.195
>  uuid_v1_timestamp | 768 MiB (49 %) | 1571.039 | 1689.195
>  uuid_seq  | 759 MiB (49 %) | 1562.766 | 1689.195
>  uuid_serial   | 700 MiB (47 %) | 1504.047 | 1689.195
>
> OK, sadly no reclaim in any of them.

I don't know how you got these figures, but most likely they don't
take into account the fact that the FSM for the index has free blocks
available. You'll only notice that if you have additional page splits
that can recycle that space. Or, you could use pg_freespacemap to get
some idea.

> 5.) REINDEX
> Table: uuid_v1  Time: 21549.860 ms (00:21.550)
> Table: uuid_v1_timestampTime: 27367.817 ms (00:27.368)
> Table: uuid_seq Time: 19142.711 ms (00:19.143)
> Table: uuid_serial  Time: 16889.807 ms (00:16.890)
>
> Even in this case it looks as if my implementation is faster than
> anything else - which I really don't get.

Sorting already-sorted data is faster. CREATE INDEX is mostly a big
sort operation in the case of B-Tree indexes.

> I might implement a different opclass for the standard UUID to enable
> time-wise index sort order. This will naturally be very close to
> physical order but I doubt that this is something I can tell PostgreSQL, or?

PostgreSQL only knows whether or not your page splits occur in the
rightmost page in the index -- it fills the page differently according
to whether or not that is the case.

-- 
Peter Geoghegan




Re: Perplexing, regular decline in performance

2019-06-26 Thread Peter Geoghegan
On Wed, Jun 26, 2019 at 12:02 PM Hugh Ranalli  wrote:
> I'm sorry, but I'm not sure what you mean by the "distribution of values 
> within the columns." Can you clarify or provide an link to an example?

I would mostly just like to see the schema of the table in question,
including indexes, and a high-level description of the nature of the
data in the table. Ideally, you would also include pg_stats.*
information for all columns in the table. That will actually let us
see a summary of the data. Though you should be careful about leaking
sensitive information that happens to be contained in the statistics,
such as the most common values.

-- 
Peter Geoghegan




Re: Perplexing, regular decline in performance

2019-06-26 Thread Peter Geoghegan
On Tue, Jun 25, 2019 at 8:49 AM Hugh Ranalli  wrote:
> What we continued to notice was a milder but still definite trend of 
> increased query times, during the course of each week, from the mid to high 
> 200 ms, to the high 300 ms to low 400 ms. Some years ago, someone had noticed 
> that as the number of "raw_page" columns in a particular table grew, 
> performance would decline. They wrote a script that once a week locks the 
> table, deletes the processed large columns (they are not needed after 
> processing), copies the remaining data to a backup table, truncates the 
> original table, then copies it back. When this script runs we see an 
> immediate change in performance, from 380 ms in the hour before the drop, to 
> 250 ms in the hour of the drop. As rows with these populated columns are 
> added during the course of a week, the performance drops, steadily, until the 
> next week's cleaning operation. Each week the performance increase is clear 
> and significant.

Can you show us the definition of the table, including its indexes?
Can you describe the data and distribution of values within the
columns, particularly where they're indexed?

-- 
Peter Geoghegan




Re: Incorrect index used in few cases..

2019-06-18 Thread Peter Geoghegan
On Tue, Jun 18, 2019 at 2:08 PM Tom Lane  wrote:
> Are you perhaps running with non-default values for any planner cost
> parameters?  Or it's not a stock build of Postgres?
>
> If you could find a way to adjust the attached example so that it
> produces the same misbehavior you see with live data, that would be
> very interesting ...

FWIW, if you move the CREATE INDEX statements before the INSERT, and
compared earlier versions of Postgres to 12, you'll see that the size
of some of the indexes are a lot smaller on 12.

v11 (representative of 9.6):

pg@tc:5411 [1067]=# \di+ i_*
   List of relations
 Schema │  Name   │ Type  │ Owner │ Table │ Size  │ Description
┼─┼───┼───┼───┼───┼─
 public │ i_agi_tc_tcn│ index │ pg│ tc│ 74 MB │
 public │ i_cid_agid_tcn  │ index │ pg│ tc│ 82 MB │
 public │ i_tc_adid_tid   │ index │ pg│ tc│ 57 MB │
 public │ i_tc_advertiser_id  │ index │ pg│ tc│ 27 MB │
 public │ i_tc_campaign_id│ index │ pg│ tc│ 28 MB │
 public │ i_tc_lud_agi│ index │ pg│ tc│ 57 MB │
 public │ i_tc_uniqueness_hash_v2 │ index │ pg│ tc│ 21 MB │
(7 rows)

v12/master:

pg@regression:5432 [1022]=# \di+ i_*
   List of relations
 Schema │  Name   │ Type  │ Owner │ Table │ Size  │ Description
┼─┼───┼───┼───┼───┼─
 public │ i_agi_tc_tcn│ index │ pg│ tc│ 69 MB │
 public │ i_cid_agid_tcn  │ index │ pg│ tc│ 78 MB │
 public │ i_tc_adid_tid   │ index │ pg│ tc│ 36 MB │
 public │ i_tc_advertiser_id  │ index │ pg│ tc│ 20 MB │
 public │ i_tc_campaign_id│ index │ pg│ tc│ 24 MB │
 public │ i_tc_lud_agi│ index │ pg│ tc│ 30 MB │
 public │ i_tc_uniqueness_hash_v2 │ index │ pg│ tc│ 21 MB │
(7 rows)

Note that i_tc_lud_agi is 30 MB, not 57MB, and that i_tc_adid_tid is
36 MB, not 57 MB.

I can see that both i_tc_lud_agi and i_tc_adid_tid consistently use
the "split after new tuple" optimization on v12.

-- 
Peter Geoghegan


Re: Temporarily very slow planning time after a big delete

2019-05-21 Thread Peter Geoghegan
On Tue, May 21, 2019 at 11:27 AM Walter Smith  wrote:
> Very low. Probably less than ten over all time. I suspect the only use of the 
> index is to rapidly find the processed=false rows, so the notifiable_type 
> value isn’t important, really. It would probably work just as well on any 
> other column.

This problem has been fixed in Postgres 12, which treats heap TID as a
tiebreaker column within B-Tree indexes. It sounds like you have the
right idea about how to work around the problem.

VACUUM will need to kill tuples in random locations in the low
cardinality index, since the order of tuples is unspecified between
duplicate tuples -- it is more or less random. VACUUM will tend to
dirty far more pages than is truly necessary in this scenario, because
there is no natural temporal locality that concentrates dead tuples in
one or two particular places in the index. This has a far more
noticeable impact on VACUUM duration than you might expect, since
autovacuum is throttled by delays that vary according to how many
pages were dirtied (and other such factors).

-- 
Peter Geoghegan




Re: Temporarily very slow planning time after a big delete

2019-05-21 Thread Peter Geoghegan
On Tue, May 21, 2019 at 11:16 AM Walter Smith  wrote:
> It occurs to me that is a somewhat unusual index -- it tracks unprocessed 
> notifications so it gets an insert and delete for every row, and is normally 
> almost empty.

Is it a very low cardinality index? In other words, is the total
number of distinct keys rather low? Not just at any given time, but
over time?

-- 
Peter Geoghegan




Re: GCC 8.3.0 vs. 9.0.1

2019-05-07 Thread Peter Geoghegan
On Tue, May 7, 2019 at 10:06 AM Tom Lane  wrote:
> Given the described test setup, I'd put basically no stock in these
> numbers.  It's unlikely that this test case's performance is CPU-bound
> per se; more likely, I/O and lock contention are dominant factors.
> So I'm afraid whatever they're measuring is a more-or-less chance
> effect rather than a real system-wide code improvement.

Or a compiler bug. Link-time optimizations give the compiler a view of
the program as a whole, not just a single TU at a time. This enables
it to perform additional aggressive optimization.

-- 
Peter Geoghegan




Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.

2019-02-23 Thread Peter Geoghegan
On Sat, Feb 23, 2019 at 1:06 PM Gunther  wrote:
> I thought to keep my index tight, I would define it like this:
>
> CREATE UNIQUE INDEX Queue_idx_pending ON Queue(jobId) WHERE pending;
>
> so that only pending jobs are in that index.
>
> When a job is done, follow up work is often inserted into the Queue as 
> pending, thus adding to that index.

How many distinct jobIds are there in play, roughly? Would you say
that there are many fewer distinct Jobs than distinct entries in the
index/table? Is the number of jobs fixed at a fairly low number, that
doesn't really grow as the workload needs to scale up?

-- 
Peter Geoghegan



Re: How can sort performance be so different

2019-02-20 Thread Peter Geoghegan
On Wed, Feb 20, 2019 at 2:25 PM Peter Geoghegan  wrote:
> I suspect that the reasons why the Lao locale sorts so much slower may
> also have something to do with the intrinsic cost of supporting more
> complicated rules.

I strongly suspect that it has something to do with the issue
described here specifically:

http://userguide.icu-project.org/collation/concepts#TOC-Thai-Lao-reordering
-- 
Peter Geoghegan



Re: How can sort performance be so different

2019-02-20 Thread Peter Geoghegan
On Wed, Feb 20, 2019 at 1:42 PM Bob Jolliffe  wrote:
> It seems not to be (completely) particular to the installation.
> Testing on different platforms we found variable speed difference
> between 100x and 1000x slower, but always a considerable order of
> magnitiude.  The very slow performance comes from sorting Lao
> characters using en_US.UTF-8 collation.

I knew that some collations were slower, generally for reasons that
make some sense. For example, I was aware that ICU's use of Japanese
standard JIS X 4061 is particularly complicated and expensive, but
produces the most useful possible result from the point of view of a
Japanese speaker. Apparently glibc does not use that algorithm, and so
offers less useful sort order (though it may actually be faster in
that particular case).

I suspect that the reasons why the Lao locale sorts so much slower may
also have something to do with the intrinsic cost of supporting more
complicated rules. However, it's such a ridiculously large difference
that it also seems likely that somebody was disinclined to go to the
effort of optimizing it. The ICU people found that to be a tractable
goal, but they may have had to work at it. I also have a vague notion
that there are special cases that are more or less only useful for
sorting French. These complicate the implementation of UCA style
algorithms.

I am only speculating, based on what I've heard about other cases --
perhaps this explanation is totally wrong. I know a lot more about
this stuff than most people on this mailing list, but I'm still far
from being an expert.

-- 
Peter Geoghegan



Re: blending fast and temp space volumes

2018-02-21 Thread Peter Geoghegan
On Wed, Feb 21, 2018 at 12:07 PM, Claudio Freire <klaussfre...@gmail.com> wrote:
> On Wed, Feb 21, 2018 at 4:50 PM, Peter Geoghegan <p...@bowt.ie> wrote:
>> On Wed, Feb 21, 2018 at 7:53 AM, Rick Otten <rottenwindf...@gmail.com> wrote:
>>> side note:  The disadvantage of local SSD is that it won't survive "hitting
>>> the virtual power button" on an instance, nor can it migrate automatically
>>> to other hardware.  (We have to hit the power button to add memory/cpu to
>>> the system, and sometimes the power button might get hit by accident.)  This
>>> is OK for temp space.  I never have my database come up automatically on
>>> boot, and I have scripted the entire setup of the temp space volume and data
>>> structures.  I can run that script before starting the database.   I've done
>>> some tests and it seems to work great.  I don't mind rolling back any
>>> transaction that might be in play during a power failure.
>>
>> It sounds like you're treating a temp_tablespaces tablespace as
>> ephemeral, which IIRC can have problems that an ephemeral
>> stats_temp_directory does not have.
>
> For instance?
>
> I've been doing that for years without issue. If you're careful to
> restore the skeleton directory structure at server boot up, I haven't
> had any issues.

Then you clearly know what I mean already. That's not documented as
either required or safe anywhere.

-- 
Peter Geoghegan



Re: blending fast and temp space volumes

2018-02-21 Thread Peter Geoghegan
On Wed, Feb 21, 2018 at 7:53 AM, Rick Otten <rottenwindf...@gmail.com> wrote:
> side note:  The disadvantage of local SSD is that it won't survive "hitting
> the virtual power button" on an instance, nor can it migrate automatically
> to other hardware.  (We have to hit the power button to add memory/cpu to
> the system, and sometimes the power button might get hit by accident.)  This
> is OK for temp space.  I never have my database come up automatically on
> boot, and I have scripted the entire setup of the temp space volume and data
> structures.  I can run that script before starting the database.   I've done
> some tests and it seems to work great.  I don't mind rolling back any
> transaction that might be in play during a power failure.

It sounds like you're treating a temp_tablespaces tablespace as
ephemeral, which IIRC can have problems that an ephemeral
stats_temp_directory does not have.

-- 
Peter Geoghegan