Re: Index bloat and REINDEX/VACUUM optimization for partial index
On Mon, Aug 28, 2023 at 5:33 PM jayaprabhakar k wrote: > REINDEX requires a full table scan > > Roughly create a new index, rename index, drop old index. > REINDEX is not incremental. running reindex frequently does not reduce the > future reindex time. You didn't say which Postgres version you're on. Note that Postgres 14 can deal with index bloat a lot better than earlier versions could. This is known to work well with partial indexes. See: https://www.postgresql.org/message-id/flat/CAL9smLAjt9mZC2%3DqBeJwuNPq7KMAYGTWWQw_hvA-Lfo0b3ycow%40mail.gmail.com -- Peter Geoghegan
Re: Forced to use UNION ALL when having multiple ANY operators and ORDER BY LIMIT
On Mon, Jun 12, 2023 at 1:17 PM benoit wrote: > Is there a misusage of my indexes? > > Is there a limitation when using ANY or IN operators and ordered LIMIT behind? It's complicated. Do you find that you get satisfactory performance if you force a bitmap index scan? In other words, what is the effect of "set enable_indexscan = off" on your original query? Does that speed up execution at all? (I think that this approach ought to produce a plan that uses a bitmap index scan in place of the index scan, without changing anything else.) -- Peter Geoghegan
Re: High QPS, random index writes and vacuum
On Mon, Apr 17, 2023 at 7:43 PM peter plachta wrote: > Version: I sheepishly admit it's 9.6, 10 and 11 (it's Azure Single Server, > that's another story). If you can upgrade to 14, you'll find that there is much improved management of index updates on that version: https://www.postgresql.org/docs/current/btree-implementation.html#BTREE-DELETION But it's not clear what the problem really is here. If the problem is that you're dependent on vacuum to get acceptable response times by holding back index bloat, then an upgrade could easily help a lot. But an upgrade might not make VACUUM take less time, given that you've already tuned it fairly aggressively. It depends. An upgrade might make VACUUM go faster if you set vacuum_cost_page_miss to 2, which is the default on later versions anyway -- looks like you didn't touch that. And, as Thomas said, later versions do have parallel VACUUM, though that cannot be used by autovacuum workers. -- Peter Geoghegan
Re: High QPS, random index writes and vacuum
On Mon, Apr 17, 2023 at 5:35 PM peter plachta wrote: > My question is: what is the recommended strategy to deal with such cases in > Postgres? You didn't say what version of Postgres you're using... -- Peter Geoghegan
Re: creating hash indexes
On Wed, Dec 14, 2022 at 12:03 PM Rick Otten wrote: > Assuming I can live with the slower inserts, is there any parameter in > particular I can tweak that would make the time it takes to create the hash > index closer to the btree index creation time? In particular if I wanted to > try this on a several billion row table in a busy database? No. B-Tree index builds are parallelized, and are far better optimized in general. > - As long as the index fits in memory, varchar btree isn't really that much > slower in postgresql 14 (the way it was a few years ago), so we'll probably > just live with that for the forseeable future given the complexity of > changing things at the moment. The other things to consider are 1.) the index size after retail inserts, 2.) the index size following some number of updates and deletes. Even if you just had plain inserts for your production workload, the picture will not match your test case (which I gather just looked at the index size after a CREATE INDEX ran). I think that B-Tree indexes will still come out ahead if you take this growth into account, and by quite a bit, but probably not due to any effect that your existing test case exercises. B-Tree indexes are good at accommodating unpredictable growth, without ever getting terrible performance on any metric of interest. So it's not just that they tend to have better performance on average than hash indexes (though they do); it's that they have much more *predictable* performance characteristics as conditions change. -- Peter Geoghegan
Re: Lock contention high
On Wed, Oct 13, 2021 at 6:54 PM Jeremy Schneider wrote: > only a half GB memory for autovac? (it will have a mandatory run as soon > as you hit 200 mil XIDs, seems like you'd want the full max 1GB for it) While anti-wraparound vacuums will become a problem for TPC-C (unless you tune for it), it's not too sensitive to mwm. You just don't end up accumulating too many TIDs to delete from indexes in practice, even though the overhead from VACUUM is a concern. The new autovacuum instrumentation in Postgres 14 makes this far clearer. -- Peter Geoghegan
Re: Lock contention high
On Tue, Oct 12, 2021 at 12:45 AM Ashkil Dighin wrote: > Lock contention observed high in PostgreSQLv13.3 > The source code compiled with GNC(GCCv11.x) > PostgreSQL version: 13.3 > Operating system: RHEL8.3 > Kernel name:4.18.0-305.10.2.el8_4.x86_64 > RAM Size:512GB > SSD: 1TB > The environment used IBM metal and test benchmark environment HammerDbv4.2 > Test case :TPC-C You didn't say how many TPC-C warehouses you used. In my experience, people sometimes run TPC-C with relatively few, which will tend to result in extreme contention on certain B-Tree leaf pages. (My experiences are with BenchmarkSQL, but I can't imagine HammerDB is too much different.) Assuming that's the case here, for you, then it's not clear that you have a real problem. You're really not supposed to run the benchmark in that way, per the TPC-C spec, which strictly limits the number of transactions per minute per warehouse -- for better or worse, valid results generally require that you use lots of warehouses to get a very large database (think terabytes). If you run the benchmark with 100 warehouses or less, on a big server, then the contention you'll see will be out of all proportion to what you're ever likely to see in the real world. -- Peter Geoghegan
Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle
On Fri, Oct 1, 2021 at 1:06 PM Jeff Holt wrote: > Now looking closely at postgreSQL, I see an opportunity to more quickly > implement Oracle's current feature list. > > I've come to this point because I see many roadblocks for users who want to > see a detailed "receipt" for their response time. I have heard of method R. Offhand it seems roughly comparable to something like the Top-down Microarchitecture Analysis Method that low level systems programmers sometimes use, along with Intel's pmu-tools -- at least at a very high level. The point seems to be to provide a workflow that can plausibly zero in on low-level bottlenecks, by providing high level context. Many tricky real world problems are in some sense a high level problem that is disguised as a low level problem. And so all of the pieces need to be present on the board, so to speak. Does that sound accurate? One obvious issue with much of the Postgres instrumentation is that it makes it hard to see how things change over time. I think that that is often *way* more informative than static snapshots. I can see why you'd emphasize the need for PostgreSQL to more or less own the end to end experience for something like this. It doesn't necessarily follow that the underlying implementation cannot make use of infrastructure like eBPF, though. Fast user space probes provably have no overhead, and can be compiled-in by distros that can support it. There hasn't been a consistent effort to make that stuff available, but I doubt that that tells us much about what is possible. The probes that we have today are somewhat of a grab-bag, that aren't particularly useful -- so it's a chicken-and-egg thing. It would probably be helpful if you could describe what you feel is missing in more general terms -- while perhaps giving specific practical examples of specific scenarios that give us some sense of what the strengths of the model are. ISTM that it's not so much a lack of automation in PostgreSQL. It's more like a lack of a generalized model, which includes automation, but also some high level top-down theory. -- Peter Geoghegan
Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle
On Mon, Oct 4, 2021 at 9:04 PM Mladen Gogala wrote: > What angered me was the presumptuous tone of voice directed to an Oracle > legend. I have probably talked to many more Oracle people than you, > including Tanel, whom I have met personally. I am not on Twitter, > unfortunately I am older than 20. Before you ask, I am not on Instagram, > Facebook or Tiktok. I am not on OnlyFans either. I have never understood > the need to share one's every thought in real time. Being rather private > person has probably stymied my career of an internet influencer. I'll > never rival Kim Kardashian. You do seem shy. > As for Jeff Holt, I believe that a person of his stature needs to be > taken seriously and not lectured "how are things done in Postgres > community". I haven't met Jeff Holt either, but I believe that he is also older than 20. I have to imagine that he doesn't particularly need you to defend his honor. > I am rather confused by the thinly veiled hostility toward > Oracle. In my opinion, Postgres community should be rather welcoming to > Oracle people like Frits Hoogland, Frank Pachot or Jeff Holt. After all, > we are using Postgres and telling us "you can't have what you used to > get from Oracle" is not either encouraging or smart. I agree with all that. I am also friendly with Frank, as it happens. I think that Laurenze was just trying to establish common terms of reference for discussion -- it's easy for two groups of people with similar but different terminology to talk past each other. For example, I think that there may be confusion about what is possible with a tool like eBPF today, and what might be possible in an ideal world. -- Peter Geoghegan
Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle
On Mon, Oct 4, 2021 at 6:51 PM Mladen Gogala wrote: > Haughty lectures about "Oracle has it" not being good enough could > hardly be more out of place here. To put it as politely as is possible > in this case, shut your pie hole. What Jeff is asking for is not > something that "Oracle has", it's something that customers want. That > was the case few years ago when I was asking for the optimizer hints. I > was castigated by the former pastry baker turned Postgres guru and my > reaction was simple: I threw Postgres out of the company that I was a > working for as the lead DBA. You see, customer is always right, whether > the database is open source or not. Needless to say, Postgres has > optimizer hints these days. It still has them in "we do not want" part > of the Wiki, which is hilarious. In all sincerity: Chill out. I don't think that this is worth getting into an argument over. I think that there is a good chance that you'd have had a much better experience if the conversation had been in person. Text has a way of losing a lot of important nuance. I have personally met and enjoyed talking to quite a few people that personally worked on Oracle, in various capacities -- the world of database internals experts is not huge. I find Tanel Poder very interesting, too -- never met the man, but we follow each other on Twitter. Oracle is a system that has some interesting properties in general (not just as a counterpoint to PostgreSQL), and I definitely respect it. Same with SQL Server. There are lots of smart and accomplished people in the world, including Jeff. I think that it's worth understanding these differences in perspective. There is likely to be merit in all of the specific points made by both Laurenze and Jeff. They may not be irreconcilable, or anything like it. -- Peter Geoghegan
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 10:35 AM Tom Lane wrote: > Well, what we really ought to be using is size_t (a/k/a Size), at least > for memory-space-related calculations. I don't have an opinion right > now about what logtape.c ought to use. I do agree that avoiding "long" > altogether would be a good ultimate goal. I assume that we often use "long" in contexts where a signed integer type is required. Maybe this is not true in the case of the work_mem style calculations. But I know that it works that way in logtape.c, where -1 is a sentinel value. We already use int64 (not size_t) in tuplesort.c for roughly the same reason: LACKMEM() needs to work with negative values, to handle certain edge cases. > In the short term though, the question is whether we want to regard this > hashagg issue as something we need a fix for in v13/v14. The fact that > it's Windows-only makes it slightly less pressing in my mind, but it's > still a regression that some people are going to hit. True. I worry about the potential for introducing new bugs on Windows by backpatching a fix for this. Technically this restriction existed in every other work_mem consumer on Windows. Of course this won't matter much to users like Laurent. -- Peter Geoghegan
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 10:33 AM l...@laurent-hasson.com wrote: > Damn... I know Windows is a lower priority, and this is yet another issue, > but in Healthcare, Windows is so prevalent everywhere... To be clear, I didn't actually say that. I said that it doesn't matter either way, as far as addressing this long standing "int64 vs long" issue goes. -- Peter Geoghegan
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 10:11 AM Tom Lane wrote: > No, he already tried, upthread. The trouble is that he's on a Windows > machine, so get_hash_mem is quasi-artificially constraining the product > to 2GB. And he needs it to be a bit more than that. Whether the > constraint is hitting at the ngroups stage or it's related to actual > memory consumption isn't that relevant. Somehow I missed that part. > What I'm wondering about is whether it's worth putting in a solution > for this issue in isolation, or whether we ought to embark on the > long-ignored project of getting rid of use of "long" for any > memory-size-related computations. There would be no chance of > back-patching something like the latter into v13, though. +1. Even if we assume that Windows is a low priority platform, in the long run it'll be easier to make it more like every other platform. The use of "long" is inherently suspect to me. It signals that the programmer wants something wider than "int", even if the standard doesn't actually require that "long" be wider. This seems to contradict what we know to be true for Postgres, which is that in general it's unsafe to assume that long is int64. It's not just work_mem related calculations. There is also code like logtape.c, which uses long for block numbers -- that also exposes us to risk on Windows. By requiring int64 be used instead of long, we don't actually increase risk for non-Windows platforms to any significant degree. I'm pretty sure that "long" means int64 on non-Windows 64-bit platforms anyway. -- Peter Geoghegan
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 9:53 AM Peter Geoghegan wrote: > I suspect David's theory about hash_agg_set_limits()'s ngroup limit is > correct. It certainly seems like a good starting point. I also suspect that if Laurent set work_mem and/or hash_mem_multiplier *extremely* aggressively, then eventually the hash agg would be in-memory. And without actually using all that much memory. I'm not suggesting that that is a sensible resolution to Laurent's complaint. I'm just pointing out that it's probably not fundamentally impossible to make the hash agg avoid spilling through tuning these GUCs. At least I see no evidence of that right now. -- Peter Geoghegan
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 9:42 AM Tom Lane wrote: > Right. The point here is that before v13, hash aggregation was not > subject to the work_mem limit, nor any related limit. If you did an > aggregation requiring more than 2GB-plus-slop, it would work just fine > as long as your machine had enough RAM. Now, the performance sucks and > there is no knob you can turn to fix it. That's unacceptable in my book. Oh! That makes way more sense. I suspect David's theory about hash_agg_set_limits()'s ngroup limit is correct. It certainly seems like a good starting point. -- Peter Geoghegan
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 9:21 AM Tom Lane wrote: > Yeah, I should have said "2GB plus palloc slop". It doesn't surprise > me a bit that we seem to be eating another 20% on top of the nominal > limit. MAX_KILOBYTES is the max_val for the work_mem GUC itself, and has been for many years. The function get_hash_mem() returns a work_mem-style int that callers refer to as hash_mem -- the convention is that callers pretend that there is a work_mem style GUC (called hash_mem) that they must access by calling get_hash_mem(). I don't see how it's possible for get_hash_mem() to be unable to return a hash_mem value that could be represented by work_mem directly. MAX_KILOBYTES is an annoyingly low limit on Windows, where sizeof(long) is 4. But that's nothing new. -- Peter Geoghegan
Re: Big performance slowdown from 11.2 to 13.3
On Thu, Jul 22, 2021 at 8:45 AM Tom Lane wrote: > That is ... weird. Maybe you have found a bug in the spill-to-disk logic; > it's quite new after all. Can you extract a self-contained test case that > behaves this way? I wonder if this has something to do with the way that the input data is clustered. I recall noticing that that could significantly alter the behavior of HashAggs as of Postgres 13. -- Peter Geoghegan
Re: Big performance slowdown from 11.2 to 13.3
On Wed, Jul 21, 2021 at 4:19 PM l...@laurent-hasson.com wrote: > As you can see, the V13.3 execution is about 10x slower. > > It may be hard for me to create a whole copy of the database on 11.12 and > check that environment by itself. I'd want to do it on the same machine to > control variables, and I don't have much extra disk space at the moment. I imagine that this has something to do with the fact that the hash aggregate spills to disk in Postgres 13. You might try increasing hash_mem_multiplier from its default of 1.0, to 2.0 or even 4.0. That way you'd be able to use 2x or 4x more memory for executor nodes that are based on hashing (hash join and hash aggregate), without also affecting other kinds of nodes, which are typically much less sensitive to memory availability. This is very similar to increasing work_mem, except that it is better targeted. It might even make sense to *decrease* work_mem and increase hash_mem_multiplier even further than 4.0. That approach is more aggressive, though, so I wouldn't use it until it actually proved necessary. -- Peter Geoghegan
Re: Very slow Query compared to Oracle / SQL - Server
On Fri, May 7, 2021 at 2:28 PM Peter Geoghegan wrote: > That's a very reasonable interpretation, since the bitmap index scans > themselves just aren't doing that much I/O -- we see that there is > much more I/O for the heap scan, which is likely to be what the > general picture looks like no matter how much bloat there is. > > However, I'm not sure if that reasonable interpretation is actually > correct. The nbtinsert.c code that handles deleting LP_DEAD index > tuples no longer relies on having a page-level garbage item flag set > in Postgres 13 -- it just scans the line pointer array for LP_DEAD > items each time. BTW, I am pointing all of this out because I've heard informal reports of big improvements following an upgrade to Postgres 13 that seem unlikely to be related to the simple fact that indexes are smaller (most of the time you cannot save that much I/O by shrinking indexes without affected when and how TIDs/heap tuples are scanned). It's necessary to simulate the production workload to have *any* idea if LP_DEAD index tuple deletion might be a factor. If the OP is just testing this one query on Postgres 13 in isolation, without anything bloating up (or cleaning up) indexes, then that doesn't really tell us anything about how Postgres 13 compares to Postgres 12. As you said, simply shrinking the indexes is nice, but not enough -- we'd need some second of second order effect to get acceptable performance over time and under real world conditions. -- Peter Geoghegan
Re: Very slow Query compared to Oracle / SQL - Server
On Fri, May 7, 2021 at 9:16 AM Justin Pryzby wrote: > In pg13, indexes are de-duplicated by default. > > But I suspect the performance is better because data was reload, and the > smaller indexes are a small, additional benefit. That's a very reasonable interpretation, since the bitmap index scans themselves just aren't doing that much I/O -- we see that there is much more I/O for the heap scan, which is likely to be what the general picture looks like no matter how much bloat there is. However, I'm not sure if that reasonable interpretation is actually correct. The nbtinsert.c code that handles deleting LP_DEAD index tuples no longer relies on having a page-level garbage item flag set in Postgres 13 -- it just scans the line pointer array for LP_DEAD items each time. VACUUM has a rather unhelpful tendency to unset the flag when it shouldn't, which we're no longer affected by. So that's one possible explanation. Another possible explanation is that smaller indexes (due to deduplication) are more likely to get index scans, which leads to setting the LP_DEAD bit of known-dead index tuples in passing more often (bitmap index scans won't do the kill_prior_tuple optimization). There could even be a virtuous circle over time. (Note that the index deletion stuff in Postgres 14 pretty much makes sure that this happens, but it is probably at least much more likely in Postgres 13 compared to 12.) I could easily be very wrong about all of this in this instance, though, because the behavior I've described is highly non-linear and therefore highly unpredictable in general (not to mention highly sensitive to workload characteristics). I'm sure that I've thought about this stuff way more than any other individual Postgres contributor, but it's easy to be wrong in any given instance. The real explanation might be something else entirely. Though it's hard not to imagine that what really matters here is avoiding all of that bitmap heap scan I/O. -- Peter Geoghegan
Re: Consecutive Query Executions with Increasing Execution Time
On Mon, Dec 16, 2019 at 9:28 AM Shijia Wei wrote: > 1st Query: > Buffers: shared hit=17074 read=16388 > 20th Query: > Buffers: shared hit=17037 read=16390 Why do the first and the twentieth executions of the query have almost identical "buffers shared/read" numbers? That seems odd. -- Peter Geoghegan
Re: Huge shared hit for small table
On Mon, Nov 4, 2019 at 12:38 PM Scott Rankin wrote: > Definitely no long-running transactions on this table; in fact, this table is > pretty infrequently updated – on the order of a few tens of rows updated per > day. But a long running transaction will have an impact on all tables -- not just the tables that happen to have been accessed so far in the long running transaction. This is necessary because nothing stops the long running transaction from SELECTing data from any table at any time -- we need to pessimistically keep around the data required to make that work. -- Peter Geoghegan
Re: Huge shared hit for small table
On Mon, Nov 4, 2019 at 12:32 PM Jeff Janes wrote: > Could there be a long-open transaction, which is preventing hint-bits from > getting on set on the table rows, as well on the index rows? Contention on a small number of rows may also be a factor. > A reindex would not by itself fix the problem if it were the long open > transaction. But if the long open transaction held a sufficient lock on the > table, then the reindex would block until the transaction went away on its > own, at which point the problem would go away on its own, so it might > **appear** to have fixed the problem. That seems like the simplest and most likely explanation to me, even though it isn't particularly simple. -- Peter Geoghegan
Re: Huge shared hit for small table
On Mon, Nov 4, 2019 at 11:56 AM Justin Pryzby wrote: > I think it's because some heap pages are being visited many times, due to the > index tuples being badly "fragmented". Note, I'm not talking about > fragmentation of index *pages*, which is what pgstattuple reports (which > wouldn't have nearly so detrimental effect). I could probably say that the > index tuples are badly "correlated" with the heap. But this is a unique index, and Scott indicates that the problem seems to go away for a while following a REINDEX. > In PG v12 you can use REINDEX CONCURRENTLY (but beware there's a crash > affecting its progress reporting, fix to be included in v12.1). PG v12 will store B-Tree duplicates in heap TID order, so if that's the problem then upgrading to v12 (and REINDEXing if the upgrade was performed using pg_upgrade) will fix it for good. -- Peter Geoghegan
Re: UUID v1 optimizations...
Please don't top post -- trim the your response down so that only still-relevant text remains. On Tue, Jun 11, 2019 at 1:27 PM Ancoron Luciferis wrote: > Primary key indexes after an ANALYZE: > table_name | bloat | index_mb | table_mb > ---++--+-- > uuid_v1 | 767 MiB (49 %) | 1571.039 | 1689.195 > uuid_v1_timestamp | 768 MiB (49 %) | 1571.039 | 1689.195 > uuid_seq | 759 MiB (49 %) | 1562.766 | 1689.195 > uuid_serial | 700 MiB (47 %) | 1504.047 | 1689.195 > > OK, sadly no reclaim in any of them. I don't know how you got these figures, but most likely they don't take into account the fact that the FSM for the index has free blocks available. You'll only notice that if you have additional page splits that can recycle that space. Or, you could use pg_freespacemap to get some idea. > 5.) REINDEX > Table: uuid_v1 Time: 21549.860 ms (00:21.550) > Table: uuid_v1_timestampTime: 27367.817 ms (00:27.368) > Table: uuid_seq Time: 19142.711 ms (00:19.143) > Table: uuid_serial Time: 16889.807 ms (00:16.890) > > Even in this case it looks as if my implementation is faster than > anything else - which I really don't get. Sorting already-sorted data is faster. CREATE INDEX is mostly a big sort operation in the case of B-Tree indexes. > I might implement a different opclass for the standard UUID to enable > time-wise index sort order. This will naturally be very close to > physical order but I doubt that this is something I can tell PostgreSQL, or? PostgreSQL only knows whether or not your page splits occur in the rightmost page in the index -- it fills the page differently according to whether or not that is the case. -- Peter Geoghegan
Re: Perplexing, regular decline in performance
On Wed, Jun 26, 2019 at 12:02 PM Hugh Ranalli wrote: > I'm sorry, but I'm not sure what you mean by the "distribution of values > within the columns." Can you clarify or provide an link to an example? I would mostly just like to see the schema of the table in question, including indexes, and a high-level description of the nature of the data in the table. Ideally, you would also include pg_stats.* information for all columns in the table. That will actually let us see a summary of the data. Though you should be careful about leaking sensitive information that happens to be contained in the statistics, such as the most common values. -- Peter Geoghegan
Re: Perplexing, regular decline in performance
On Tue, Jun 25, 2019 at 8:49 AM Hugh Ranalli wrote: > What we continued to notice was a milder but still definite trend of > increased query times, during the course of each week, from the mid to high > 200 ms, to the high 300 ms to low 400 ms. Some years ago, someone had noticed > that as the number of "raw_page" columns in a particular table grew, > performance would decline. They wrote a script that once a week locks the > table, deletes the processed large columns (they are not needed after > processing), copies the remaining data to a backup table, truncates the > original table, then copies it back. When this script runs we see an > immediate change in performance, from 380 ms in the hour before the drop, to > 250 ms in the hour of the drop. As rows with these populated columns are > added during the course of a week, the performance drops, steadily, until the > next week's cleaning operation. Each week the performance increase is clear > and significant. Can you show us the definition of the table, including its indexes? Can you describe the data and distribution of values within the columns, particularly where they're indexed? -- Peter Geoghegan
Re: Incorrect index used in few cases..
On Tue, Jun 18, 2019 at 2:08 PM Tom Lane wrote: > Are you perhaps running with non-default values for any planner cost > parameters? Or it's not a stock build of Postgres? > > If you could find a way to adjust the attached example so that it > produces the same misbehavior you see with live data, that would be > very interesting ... FWIW, if you move the CREATE INDEX statements before the INSERT, and compared earlier versions of Postgres to 12, you'll see that the size of some of the indexes are a lot smaller on 12. v11 (representative of 9.6): pg@tc:5411 [1067]=# \di+ i_* List of relations Schema │ Name │ Type │ Owner │ Table │ Size │ Description ┼─┼───┼───┼───┼───┼─ public │ i_agi_tc_tcn│ index │ pg│ tc│ 74 MB │ public │ i_cid_agid_tcn │ index │ pg│ tc│ 82 MB │ public │ i_tc_adid_tid │ index │ pg│ tc│ 57 MB │ public │ i_tc_advertiser_id │ index │ pg│ tc│ 27 MB │ public │ i_tc_campaign_id│ index │ pg│ tc│ 28 MB │ public │ i_tc_lud_agi│ index │ pg│ tc│ 57 MB │ public │ i_tc_uniqueness_hash_v2 │ index │ pg│ tc│ 21 MB │ (7 rows) v12/master: pg@regression:5432 [1022]=# \di+ i_* List of relations Schema │ Name │ Type │ Owner │ Table │ Size │ Description ┼─┼───┼───┼───┼───┼─ public │ i_agi_tc_tcn│ index │ pg│ tc│ 69 MB │ public │ i_cid_agid_tcn │ index │ pg│ tc│ 78 MB │ public │ i_tc_adid_tid │ index │ pg│ tc│ 36 MB │ public │ i_tc_advertiser_id │ index │ pg│ tc│ 20 MB │ public │ i_tc_campaign_id│ index │ pg│ tc│ 24 MB │ public │ i_tc_lud_agi│ index │ pg│ tc│ 30 MB │ public │ i_tc_uniqueness_hash_v2 │ index │ pg│ tc│ 21 MB │ (7 rows) Note that i_tc_lud_agi is 30 MB, not 57MB, and that i_tc_adid_tid is 36 MB, not 57 MB. I can see that both i_tc_lud_agi and i_tc_adid_tid consistently use the "split after new tuple" optimization on v12. -- Peter Geoghegan
Re: Temporarily very slow planning time after a big delete
On Tue, May 21, 2019 at 11:27 AM Walter Smith wrote: > Very low. Probably less than ten over all time. I suspect the only use of the > index is to rapidly find the processed=false rows, so the notifiable_type > value isn’t important, really. It would probably work just as well on any > other column. This problem has been fixed in Postgres 12, which treats heap TID as a tiebreaker column within B-Tree indexes. It sounds like you have the right idea about how to work around the problem. VACUUM will need to kill tuples in random locations in the low cardinality index, since the order of tuples is unspecified between duplicate tuples -- it is more or less random. VACUUM will tend to dirty far more pages than is truly necessary in this scenario, because there is no natural temporal locality that concentrates dead tuples in one or two particular places in the index. This has a far more noticeable impact on VACUUM duration than you might expect, since autovacuum is throttled by delays that vary according to how many pages were dirtied (and other such factors). -- Peter Geoghegan
Re: Temporarily very slow planning time after a big delete
On Tue, May 21, 2019 at 11:16 AM Walter Smith wrote: > It occurs to me that is a somewhat unusual index -- it tracks unprocessed > notifications so it gets an insert and delete for every row, and is normally > almost empty. Is it a very low cardinality index? In other words, is the total number of distinct keys rather low? Not just at any given time, but over time? -- Peter Geoghegan
Re: GCC 8.3.0 vs. 9.0.1
On Tue, May 7, 2019 at 10:06 AM Tom Lane wrote: > Given the described test setup, I'd put basically no stock in these > numbers. It's unlikely that this test case's performance is CPU-bound > per se; more likely, I/O and lock contention are dominant factors. > So I'm afraid whatever they're measuring is a more-or-less chance > effect rather than a real system-wide code improvement. Or a compiler bug. Link-time optimizations give the compiler a view of the program as a whole, not just a single TU at a time. This enables it to perform additional aggressive optimization. -- Peter Geoghegan
Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.
On Sat, Feb 23, 2019 at 1:06 PM Gunther wrote: > I thought to keep my index tight, I would define it like this: > > CREATE UNIQUE INDEX Queue_idx_pending ON Queue(jobId) WHERE pending; > > so that only pending jobs are in that index. > > When a job is done, follow up work is often inserted into the Queue as > pending, thus adding to that index. How many distinct jobIds are there in play, roughly? Would you say that there are many fewer distinct Jobs than distinct entries in the index/table? Is the number of jobs fixed at a fairly low number, that doesn't really grow as the workload needs to scale up? -- Peter Geoghegan
Re: How can sort performance be so different
On Wed, Feb 20, 2019 at 2:25 PM Peter Geoghegan wrote: > I suspect that the reasons why the Lao locale sorts so much slower may > also have something to do with the intrinsic cost of supporting more > complicated rules. I strongly suspect that it has something to do with the issue described here specifically: http://userguide.icu-project.org/collation/concepts#TOC-Thai-Lao-reordering -- Peter Geoghegan
Re: How can sort performance be so different
On Wed, Feb 20, 2019 at 1:42 PM Bob Jolliffe wrote: > It seems not to be (completely) particular to the installation. > Testing on different platforms we found variable speed difference > between 100x and 1000x slower, but always a considerable order of > magnitiude. The very slow performance comes from sorting Lao > characters using en_US.UTF-8 collation. I knew that some collations were slower, generally for reasons that make some sense. For example, I was aware that ICU's use of Japanese standard JIS X 4061 is particularly complicated and expensive, but produces the most useful possible result from the point of view of a Japanese speaker. Apparently glibc does not use that algorithm, and so offers less useful sort order (though it may actually be faster in that particular case). I suspect that the reasons why the Lao locale sorts so much slower may also have something to do with the intrinsic cost of supporting more complicated rules. However, it's such a ridiculously large difference that it also seems likely that somebody was disinclined to go to the effort of optimizing it. The ICU people found that to be a tractable goal, but they may have had to work at it. I also have a vague notion that there are special cases that are more or less only useful for sorting French. These complicate the implementation of UCA style algorithms. I am only speculating, based on what I've heard about other cases -- perhaps this explanation is totally wrong. I know a lot more about this stuff than most people on this mailing list, but I'm still far from being an expert. -- Peter Geoghegan
Re: blending fast and temp space volumes
On Wed, Feb 21, 2018 at 12:07 PM, Claudio Freire <klaussfre...@gmail.com> wrote: > On Wed, Feb 21, 2018 at 4:50 PM, Peter Geoghegan <p...@bowt.ie> wrote: >> On Wed, Feb 21, 2018 at 7:53 AM, Rick Otten <rottenwindf...@gmail.com> wrote: >>> side note: The disadvantage of local SSD is that it won't survive "hitting >>> the virtual power button" on an instance, nor can it migrate automatically >>> to other hardware. (We have to hit the power button to add memory/cpu to >>> the system, and sometimes the power button might get hit by accident.) This >>> is OK for temp space. I never have my database come up automatically on >>> boot, and I have scripted the entire setup of the temp space volume and data >>> structures. I can run that script before starting the database. I've done >>> some tests and it seems to work great. I don't mind rolling back any >>> transaction that might be in play during a power failure. >> >> It sounds like you're treating a temp_tablespaces tablespace as >> ephemeral, which IIRC can have problems that an ephemeral >> stats_temp_directory does not have. > > For instance? > > I've been doing that for years without issue. If you're careful to > restore the skeleton directory structure at server boot up, I haven't > had any issues. Then you clearly know what I mean already. That's not documented as either required or safe anywhere. -- Peter Geoghegan
Re: blending fast and temp space volumes
On Wed, Feb 21, 2018 at 7:53 AM, Rick Otten <rottenwindf...@gmail.com> wrote: > side note: The disadvantage of local SSD is that it won't survive "hitting > the virtual power button" on an instance, nor can it migrate automatically > to other hardware. (We have to hit the power button to add memory/cpu to > the system, and sometimes the power button might get hit by accident.) This > is OK for temp space. I never have my database come up automatically on > boot, and I have scripted the entire setup of the temp space volume and data > structures. I can run that script before starting the database. I've done > some tests and it seems to work great. I don't mind rolling back any > transaction that might be in play during a power failure. It sounds like you're treating a temp_tablespaces tablespace as ephemeral, which IIRC can have problems that an ephemeral stats_temp_directory does not have. -- Peter Geoghegan