Re: UUID v7
On Thu, Jan 18, 2024 at 5:18 AM Andrey Borodin wrote: > Current patch version attached. I've addressed all other requests: > function renames, aliases, multiple functions instead of optional params, > cleaner catalog definitions, not throwing error when [var,ver,time] value > is unknown. > What is left: deal with timezones, improve documentation. > I've done a test of the v10 patch, and ran into an interesting behavior when passing in a timestamp to the function (which, as a side note, is actually very useful to have as a feature, to support creating time-based range partitions on UUIDv7 fields): postgres=# SELECT uuid_extract_time(uuidv7()); uuid_extract_time --- 2024-01-18 18:49:00.01-08 (1 row) postgres=# SELECT uuid_extract_time(uuidv7('2024-04-01')); uuid_extract_time 2024-04-01 00:00:00-07 (1 row) postgres=# SELECT uuid_extract_time(uuidv7()); uuid_extract_time 2024-04-01 00:00:00-07 (1 row) Note how calling the uuidv7 function again after having called it with a fixed future timestamp, returns the future timestamp, even though it should return the current time. I believe this is caused by incorrectly re-using the cached previous_timestamp. In the second call here (with a fixed future timestamp), we end up setting ts and tms to 2024-04-01, with increment_counter = false, which leads us to set previous_timestamp to the passed in timestamp (else branch of the second if in uuidv7). When we then call the function again without an argument, we end up getting a new timestamp from gettimeofday, but because we try to detect backwards leaps, we set increment_counter to true, and thus end up reusing the previous (future) timestamp here: /* protection from leap backward */ tms = previous_timestamp; Not sure how to fix this, but clearly something is amiss here. Thanks, Lukas -- Lukas Fittl
Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
On Thu, Jul 6, 2023 at 12:56 AM Daniel Gustafsson wrote: > Lukas: do you have an updated patch for this commitfest to address David's > comments? > I have a draft - I should be able to post an updated patch in the next days. Thanks for checking! Thanks, Lukas -- Lukas Fittl
Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Hi, I was debugging a planner problem on Postgres 14.4 the other day - and the involved "bad" plan was including Memoize - though I don't necessarily think that Memoize is to blame (and this isn't any of the problems recently fixed in Memoize costing). However, what I noticed whilst trying different ways to fix the plan, is that the Memoize output was a bit hard to reason about - especially since the plan involving Memoize was expensive to run, and so I was mostly running EXPLAIN without ANALYZE to look at the costing. Here is an example of the output I was looking at: -> Nested Loop (cost=1.00..971672.56 rows=119623 width=0) -> Index Only Scan using table1_idx on table1 (cost=0.43..372676.50 rows=23553966 width=8) -> Memoize (cost=0.57..0.61 rows=1 width=8) Cache Key: table1.table2_id Cache Mode: logical -> Index Scan using table2_idx on table2 (cost=0.56..0.60 rows=1 width=8) Index Cond: (id = table1.table2_id) The other plan I was comparing with (that I wanted the planner to choose instead), had a total cost of 1,451,807.35 -- and so I was trying to figure out why the Nested Loop was costed as 971,672.56. Simple math makes me expect the Nested Loop should roughly have a total cost of14,740,595.76 here (372,676.50 + 23,553,966 * 0.61), ignoring a lot of the smaller costs. Thus, in this example, it appears Memoize made the plan cost significantly cheaper (roughly 6% of the regular cost). Essentially this comes down to the "cost reduction" performed by Memoize only being implicitly visible in the Nested Loop's total cost - and with nothing useful on the Memoize node itself - since the rescan costs are not shown. I think explicitly adding the estimated cache hit ratio for Memoize nodes might make this easier to reason about, like this: -> Memoize (cost=0.57..0.61 rows=1 width=8) Cache Key: table1.table2_id Cache Mode: logical Cache Hit Ratio Estimated: 0.94 Alternatively (or in addition) we could consider showing the "ndistinct" value that is calculated in cost_memoize_rescan - since that's the most significant contributor to the cache hit ratio (and you can influence that directly by improving the ndistinct statistics). See attached a patch that implements showing the cache hit ratio as a discussion starter. I'll park this in the July commitfest for now. Thanks, Lukas -- Lukas Fittl v1-0001-Add-Estimated-Cache-Hit-Ratio-for-Memoize-plan-no.patch Description: Binary data
Re: Sampling-based timing for EXPLAIN ANALYZE
On Fri, Jan 6, 2023 at 1:19 AM Jelte Fennema wrote: > Nice addition! And the code looks pretty straight forward. > Thanks for reviewing! The current patch triggers warnings: > https://cirrus-ci.com/task/6016013976731648 Looks like you need to add > void as the argument. > Fixed in v2 attached. This also adds a simple regression test, as well as fixes the parallel working handling. Do you have some performance comparison between TIMING ON and TIMING > SAMPLING? > Here are some benchmarks of auto_explain overhead on my ARM-based M1 Macbook for the following query run with pgbench on a scale factor 100 data set: SELECT COUNT(*) FROM pgbench_branches JOIN pgbench_accounts USING (bid) JOIN pgbench_tellers USING (bid) WHERE bid = 42; (the motivation is to use a query that is more complex than the standard pgbench select-only test query) avg latency (best of 3), -T 300, -c 4, -s 100, shared_buffers 2GB, fsync off, max_parallel_workers_per_gather 0: master, log_timing = off: 871 ms (878 / 877 / 871) patch, log_timing = off: 869 ms (882 / 880 / 869) patch, log_timing = on: 890 ms (917 / 930 / 890) patch, log_timing = sampling, samplefreq = 1000: 869 ms (887 / 869 / 894) Additionally, here is Andres' benchmark from [1], with the sampling option added: % psql -Xc 'DROP TABLE IF EXISTS t; CREATE TABLE t AS SELECT * FROM generate_series(1, 10) g(i);' postgres && pgbench -n -r -t 100 -f <(echo -e "SELECT COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING SAMPLING) SELECT COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*) FROM t;") postgres |grep '^ ' DROP TABLE SELECT 10 3.507 0 SELECT COUNT(*) FROM t; 3.476 0 EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t; 3.576 0 EXPLAIN (ANALYZE, TIMING SAMPLING) SELECT COUNT(*) FROM t; 5.096 0 EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*) FROM t; My pg_test_timing data for reference: % pg_test_timing Testing timing overhead for 3 seconds. Per loop time including overhead: 23.65 ns Histogram of timing durations: < us % of total count 1 97.64472 123876325 2 2.354212986658 4 0.00022277 8 0.00016202 16 0.00064815 32 0.5 64 In InstrStartSampling there's logic to increase/decrease the frequency of > an already existing timer. It's not clear to me when this can occur. I'd > expect sampling frequency to remain constant throughout an explain plan. If > it's indeed needed, I think a code comment would be useful to explain why > this edge case is necessary. > Clarified in a code comment in v2. This is needed for handling nested statements which could have different sampling frequencies for each nesting level, i.e. a function might want to sample it's queries at a higher frequency than its caller. Thanks, Lukas [1] https://postgr.es/m/20230116213913.4oseovlzvc2674z7%40awork3.anarazel.de -- Lukas Fittl v2-0001-Add-TIMING-SAMPLING-option-for-EXPLAIN-ANALYZE.patch Description: Binary data
Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Hi David, Thanks for continuing to work on this patch, and my apologies for silence on the patch. Its been hard to make time, and especially so because I typically develop on an ARM-based macOS system where I can't test this directly - hence my tests with virtualized EC2 instances, where I ran into the timing oddities. On Mon, Jan 2, 2023 at 5:28 AM David Geier wrote: > The INSTR_TIME_GET_MICROSEC() returns a uint64 while the other variants > return double. This seems error prone. What about renaming the function > or also have the function return a double and cast where necessary at > the call site? > Minor note, but in my understanding using a uint64 (where we can) is faster for any simple arithmetic we do with the values. > If no one objects I would also re-register this patch in the commit fest. > +1, and feel free to carry this patch forward - I'll try to make an effort to review my earlier testing issues again, as well as your later improvements to the patch. Also, FYI, I just posted an alternate idea for speeding up EXPLAIN ANALYZE with timing over in [0], using a sampling-based approach to reduce the timing overhead. [0]: https://www.postgresql.org/message-id/CAP53PkxXMk0j-%2B0%3DYwRti2pFR5UB2Gu4v2Oyk8hhZS0DRART6g%40mail.gmail.com Thanks, Lukas -- Lukas Fittl
Sampling-based timing for EXPLAIN ANALYZE
Hi, Since EXPLAIN ANALYZE with TIMING ON still carries noticeable overhead on modern hardware (despite time sources being faster), I'd like to propose a new setting EXPLAIN ANALYZE, called "TIMING SAMPLING", as compared to TIMING ON. This new timing mode uses a timer on a fixed recurring frequency (e.g. 100 or 1000 Hz) to gather a sampled timestamp on a predefined schedule, instead of getting the time on-demand when InstrStartNode/InstrStopNode is called. To implement the timer, we can use the existing timeout infrastructure, which is backed by a wall clock timer (ITIMER_REAL). Conceptually this is inspired by how sampling profilers work (e.g. "perf"), but it ties into the existing per-plan node instrumentation done by EXPLAIN ANALYZE, and simply provides a lower accuracy version of the total time for each plan node. In EXPLAIN output this is marked as "sampled time", and scaled to the total wall clock time (to adjust for the sampling undercounting): =# EXPLAIN (ANALYZE, BUFFERS, TIMING SAMPLING, SAMPLEFREQ 100) SELECT ...; QUERY PLAN - HashAggregate (cost=201747.90..201748.00 rows=10 width=12) (actual sampled time=5490.974 rows=9 loops=1) ... -> Hash Join (cost=0.23..199247.90 rows=49 width=4) (actual sampled time=3738.619 rows=900 loops=1) ... -> Seq Scan on large (cost=0.00..144247.79 rows=979 width=4) (actual sampled time=1004.671 rows=1001 loops=1) ... -> Hash (cost=0.10..0.10 rows=10 width=4) (actual sampled time=0.000 rows=10 loops=1) ... Execution Time: 5491.475 ms --- In simple query tests like this on my local machine, this shows a consistent benefit over TIMING ON (and behaves close to ANALYZE with TIMING OFF), whilst providing a "good enough" accuracy to identify which part of the query was problematic. Attached is a prototype patch for early feedback on the concept, with tests and documentation to come in a follow-up. Since the January commitfest is still marked as open I'll register it there, but note that my assumption is this is *not* Postgres 16 material. As an open item, note that in the patch the requested sampling frequency is not yet passed to parallel workers (it always defaults to 1000 Hz when sampling is enabled). Also, note the timing frequency is limited to a maximum of 1000 Hz (1ms) due to current limitations of the timeout infrastructure. With thanks to Andres Freund for help on refining the idea, collaborating on early code and finding the approach to hook into the timeout API. Thanks, Lukas -- Lukas Fittl v1-0001-Add-TIMING-SAMPLING-option-for-EXPLAIN-ANALYZE.patch Description: Binary data
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Tue, Sep 27, 2022 at 11:20 AM Melanie Plageman wrote: > v30 attached > rebased and pgstat_io_ops.c builds with meson now > also, I tested with pgstat_report_stat() only flushing when forced and > tests still pass > First of all, I'm excited about this patch, and I think it will be a big help to understand better which part of Postgres is producing I/O (and why). I've paired up with Maciek (CCed) on a review of this patch and had a few comments, focused on the user experience: The term "strategy" as an "io_context" is hard to understand, as its not a concept an end-user / DBA would be familiar with. Since this comes from BufferAccessStrategyType (i.e. anything not NULL/BAS_NORMAL is treated as "strategy"), maybe we could instead split this out into the individual strategy types? i.e. making "strategy" three different I/O contexts instead: "shared_bulkread", "shared_bulkwrite" and "shared_vacuum", retaining "shared" to mean NULL / BAS_NORMAL. Separately, could we also track buffer hits without incurring extra overhead? (not just allocs and reads) -- Whilst we already have shared read and hit counters in a few other places, this would help make the common "What's my cache hit ratio" question more accurate to answer in the presence of different shared buffer access strategies. Tracking hits could also help for local buffers (e.g. to tune temp_buffers based on seeing a low cache hit ratio). Additionally, some minor notes: - Since the stats are counting blocks, it would make sense to prefix the view columns with "blks_", and word them in the past tense (to match current style), i.e. "blks_written", "blks_read", "blks_extended", "blks_fsynced" (realistically one would combine this new view with other data e.g. from pg_stat_database or pg_stat_statements, which all use the "blks_" prefix, and stop using pg_stat_bgwriter for this which does not use such a prefix) - "alloc" as a name doesn't seem intuitive (and it may be confused with memory allocations) - whilst this is already named this way in pg_stat_bgwriter, it feels like this is an opportunity to eventually deprecate the column there and make this easier to understand - specifically, maybe we can clarify that this means buffer *acquisitions*? (either by renaming the field to "blks_acquired", or clarifying in the documentation) - Assuming we think this view could realistically cover all I/O produced by Postgres in the future (thus warranting the name "pg_stat_io"), it may be best to have an explicit list of things that are not currently tracked in the documentation, to reduce user confusion (i.e. WAL writes are not tracked, temporary files are not tracked, and some forms of direct writes are not tracked, e.g. when a table moves to a different tablespace) - In the view documentation, it would be good to explain the different values for "io_strategy" (and what they mean) - Overall it would be helpful if we had a dedicated documentation page on I/O statistics that's linked from the pg_stat_io view description, and explains how the I/O statistics tie into the various concepts of shared buffers / buffer access strategies / etc (and what is not tracked today) Thanks, Lukas -- Lukas Fittl
Re: pg_get_constraintdef: Schema qualify foreign tables unless pretty printing is enabled
On Tue, Aug 9, 2022 at 5:33 PM Tom Lane wrote: > I would say that pg_get_indexdef is the one that's out of step. > I count 11 calls of generate_relation_name in ruleutils.c, > of which only three have this business of being overridden > when not-pretty. What is the rationale for that, and why > would we move pg_get_constraintdef from one category to the > other? > The overall motivation here is to make it easy to recreate the schema without having to match the search_path on the importing side to be identical to the exporting side. There is a workaround, which is to do a SET search_path before calling these functions that excludes the referenced schemas (which I guess is what pg_dump does?). But I wonder, why do we have an explicit pretty printing flag on these functions, and PRETTYFLAG_SCHEMA in the code to represent this behavior. If we don't want pretty printing to affect schema qualification, why does that flag exist? Of the other call sites, in terms of using generate_relation_name vs generate_qualified_relation_name: * pg_get_triggerdef_worker makes it conditional on pretty=true, but only for ON, not the FROM (not clear why that difference exists?) * pg_get_indexdef_worker makes it conditional on prettyFlags & PRETTYFLAG_SCHEMA for the ON * pg_get_statisticsobj_worker does not handle pretty printing (always uses generate_relation_name) * make_ruledef makes it conditional on prettyFlags & PRETTYFLAG_SCHEMA for the TO * get_insert_query_def does not handle pretty printing (always uses generate_relation_name) * get_update_query_def does not handle pretty printing (always uses generate_relation_name) * get_delete_query_def does not handle pretty printing (always uses generate_relation_name) * get_rule_expr does not handle pretty printing (always uses generate_relation_name) * get_from_clause_item does not handle pretty printing (always uses generate_relation_name) Looking at that, it seems we didn't make the effort for the view related code with all its complexity, and didn't do it for pg_get_statisticsobjdef since it doesn't have a pretty flag. Why we didn't do it in pg_get_triggerdef_worker for FROM isn't clear to me. If we want to be entirely consistent (and keep supporting PRETTYFLAG_SCHEMA), that probably means: * Adding a pretty flag to pg_get_statisticsobjdef * Teaching get_query_def to pass down prettyFlags to get_*_query_def functions * Update pg_get_triggerdef_worker to handle pretty for FROM as well If that seems like a sensible direction I'd be happy to work on a patch. Thanks, Lukas -- Lukas Fittl
pg_get_constraintdef: Schema qualify foreign tables unless pretty printing is enabled
Hi hackers, Whilst debugging an issue with the output of pg_get_constraintdef, we've discovered that pg_get_constraintdef doesn't schema qualify foreign tables mentioned in the REFERENCES clause, even if pretty printing (PRETTYFLAG_SCHEMA) is turned off. This is a problem because it means there is no way to get a constraint definition that can be recreated on another system when multiple schemas are in use, but a different search_path is set. It's also different from pg_get_indexdef, where this flag is correctly respected. I assume this is an oversight, since the fix is pretty straightforward, see attached patch. I'll register the patch for the next commitfest. Here is a test case from my colleague Maciek showing this difference: create schema s; create table s.foo(a int primary key); create table s.bar(a int primary key, b int references s.foo(a)); select pg_get_indexdef(indexrelid, 0, false) from pg_index order by indexrelid desc limit 3; pg_get_indexdef --- CREATE UNIQUE INDEX bar_pkey ON s.bar USING btree (a) CREATE UNIQUE INDEX foo_pkey ON s.foo USING btree (a) CREATE UNIQUE INDEX pg_toast_13593_index ON pg_toast.pg_toast_13593 USING btree (chunk_id, chunk_seq) (3 rows) select pg_get_constraintdef(oid, false) from pg_constraint order by oid desc limit 3; pg_get_constraintdef --- FOREIGN KEY (b) REFERENCES foo(a) PRIMARY KEY (a) PRIMARY KEY (a) (3 rows) Thanks, Lukas -- Lukas Fittl From 83a1ab6081f70d0eed820b2b13bc3ab6e93af8ec Mon Sep 17 00:00:00 2001 From: Lukas Fittl Date: Tue, 9 Aug 2022 16:55:35 -0700 Subject: [PATCH v1] pg_get_constraintdef: Schema qualify foreign tables by default This matches pg_get_constraintdef to behave the same way as pg_get_indexdef, which is to schema qualify referenced objects in the definition, unless pretty printing is explicitly requested. For pretty printing the previous behaviour is retained, which is to only include the schema information if the referenced object is not in the current search path. --- src/backend/utils/adt/ruleutils.c | 5 +++-- src/test/regress/expected/foreign_key.out | 8 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index d575aa0066..b7a2a356b4 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -2249,8 +2249,9 @@ pg_get_constraintdef_worker(Oid constraintId, bool fullCommand, /* add foreign relation name */ appendStringInfo(, ") REFERENCES %s(", - generate_relation_name(conForm->confrelid, - NIL)); + (prettyFlags & PRETTYFLAG_SCHEMA) ? + generate_relation_name(conForm->confrelid, NIL) : + generate_qualified_relation_name(conForm->confrelid)); /* Fetch and build referenced-column list */ val = SysCacheGetAttr(CONSTROID, tup, diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out index da26f083bc..a70f7c2491 100644 --- a/src/test/regress/expected/foreign_key.out +++ b/src/test/regress/expected/foreign_key.out @@ -775,10 +775,10 @@ CREATE TABLE FKTABLE ( FOREIGN KEY (tid, fk_id_del_set_default) REFERENCES PKTABLE ON DELETE SET DEFAULT (fk_id_del_set_default) ); SELECT pg_get_constraintdef(oid) FROM pg_constraint WHERE conrelid = 'fktable'::regclass::oid ORDER BY oid; -pg_get_constraintdef - - FOREIGN KEY (tid, fk_id_del_set_null) REFERENCES pktable(tid, id) ON DELETE SET NULL (fk_id_del_set_null) - FOREIGN KEY (tid, fk_id_del_set_default) REFERENCES pktable(tid, id) ON DELETE SET DEFAULT (fk_id_del_set_default) + pg_get_constraintdef +--- + FOREIGN KEY (tid, fk_id_del_set_null) REFERENCES public.pktable(tid, id) ON DELETE SET NULL (fk_id_del_set_null) + FOREIGN KEY (tid, fk_id_del_set_default) REFERENCES public.pktable(tid, id) ON DELETE SET DEFAULT (fk_id_del_set_default) (2 rows) INSERT INTO PKTABLE VALUES (1, 0), (1, 1), (1, 2); -- 2.34.0
Re: log_line_prefix: make it possible to add the search_path
On Mon, Jul 25, 2022 at 12:38 AM Pierre Ducroquet wrote: > usecase by not showing the schema, one of them being log_line_prefix. > It is possible to work around this using the application_name, but a > mistake > on the application side would be fatal, while the search_path would still > indicate the real tables used in a query. > I'm assuming this is mostly referring to STATEMENT log lines and other situations where the original query is output (e.g. auto_explain). +1 on the benefit of solving this (I've had this use case before), but I think we can keep this more specific than a general log_line_prefix option. The search_path isn't relevant to any log line that doesn't reference a query, since e.g. autovacuum log output fully qualifies its relation names, and many other common log lines have nothing to do with tables or queries. What if we instead had something like this, as an extra CONTEXT (or DETAIL) log line: LOG: duration: 4079.697 ms execute : SELECT * FROM x WHERE y = $1 LIMIT $2 DETAIL: parameters: $1 = 'long string', $2 = '1' CONTEXT: settings: search_path = 'my_tenant_schema, "$user", public' That way you could determine that the slow query was affecting the "x" table in "my_tenant_schema". This log output would be controlled by a new GUC, e.g. "log_statement_search_path" with three settings: (1) never, (2) non_default, (3) always. The default would be "never" (same as today). "non_default" would output the search path when a SET has modified it in the current session (and so we couldn't infer it from the config or the role/database overrides). "always" would always output the search path for statement-related log lines. Thanks, Lukas -- Lukas Fittl
Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
On Fri, Jun 12, 2020 at 4:28 PM Andres Freund wrote: > The attached patches are really just a prototype. I'm also not really > planning to work on getting this into a "production ready" patchset > anytime soon. I developed it primarily because I found it the overhead > made it too hard to nail down in which part of a query tree performance > changed. If somebody else wants to continue from here... > > I do think it's be a pretty significant improvement if we could reduce > the timing overhead of EXPLAIN ANALYZE by this much. Even if requires a > bunch of low-level code. > Based on an off-list conversation with Andres, I decided to dust off this old patch for using rdtsc directly. The significant EXPLAIN ANALYZE performance improvements (especially when using rdtsc instead of rdtsc*p*) seem to warrant giving this a more thorough look. See attached an updated patch (adding it to the July commitfest), with a few changes: - Keep using clock_gettime() as a fallback if we decide to not use rdtsc - Fallback to /proc/cpuinfo for clock frequency, if cpuid(0x16) doesn't work - The decision to use rdtsc (or not) is made at runtime, in the new INSTR_TIME_INITIALIZE() -- we can't make this decision at compile time because this is dependent on the specific CPU in use, amongst other things - In an abundance of caution, for now I've decided to only enable this if we are on Linux/x86, and the current kernel clocksource is TSC (the kernel has quite sophisticated logic around making this decision, see [1]) Note that if we implemented the decision logic ourselves (instead of relying on the Linux kernel), I'd be most worried about older virtualization technology. In my understanding getting this right is notably more complicated than just checking cpuid, see [2]. Known WIP problems with this patch version: * There appears to be a timing discrepancy I haven't yet worked out, where the \timing data reported by psql doesn't match what EXPLAIN ANALYZE is reporting. With Andres' earlier test case, I'm seeing a consistent ~700ms higher for \timing than for the EXPLAIN ANALYZE time reported on the server side, only when rdtsc measurement is used -- its likely there is a problem somewhere with how we perform the cycles to time conversion * Possibly related, the floating point handling for the cycles_to_sec variable is problematic in terms of precision (see FIXME, taken over from Andres' POC) Open questions from me: 1) Do we need to account for different TSC offsets on different CPUs in SMP systems? (the Linux kernel certainly has logic to that extent, but [3] suggests this is no longer a problem on Nehalem and newer chips, i.e. those having an invariant TSC) 2) Should we have a setting "--with-tsc" for configure? (instead of always enabling it when on Linux/x86 with a TSC clocksource) 3) Are there cases where we actually want to use rdtsc*p*? (i.e. wait for current instructions to finish -- the prior discussion seemed to suggest we don't want it for node instruction measurements, but possibly we do want this in other cases?) 4) Should we support using the "mrs" instruction on ARM? (which is similar to rdtsc, see [4]) Thanks, Lukas [1] https://github.com/torvalds/linux/blob/master/arch/x86/kernel/tsc.c [2] http://oliveryang.net/2015/09/pitfalls-of-TSC-usage/ [3] https://stackoverflow.com/a/11060619/1652607 [4] https://cpufun.substack.com/p/fun-with-timers-and-cpuid -- Lukas Fittl v2-0002-WIP-Use-cpu-reference-cycles-via-rdtsc-to-measure.patch Description: Binary data v2-0001-WIP-Change-instr_time-to-just-store-nanoseconds-t.patch Description: Binary data
Re: shared-memory based stats collector - v70
On Wed, Apr 6, 2022 at 12:34 PM Justin Pryzby wrote: > On Wed, Apr 06, 2022 at 12:27:34PM -0700, Andres Freund wrote: > > > > + next use of statistical information will cause a new snapshot to > be built > > > > + or accessed statistics to be cached. > > > > > > I believe this should be an "and", not an "or". (next access builds > both a > > > new snapshot and caches accessed statistics) > > > > I *think* or is correct? The new snapshot is when > stats_fetch_consistency = > > snapshot, the cached is when stats_fetch_consistency = cache. Not sure > how to > > make that clearer without making it a lot longer. Suggestions? > > I think it's correct. Maybe it's clearer to say: > > + next use of statistical information will (when in snapshot mode) cause > a new snapshot to be built > + or (when in cache mode) accessed statistics to be cached. > Ah, yes, that does clarify what was meant. +1 to Justin's edit, or something like it. Thanks, Lukas -- Lukas Fittl
Re: shared-memory based stats collector - v70
On Tue, Apr 5, 2022 at 8:00 PM Andres Freund wrote: > Here comes v70: > Some small nitpicks on the docs: > From 13090823fc4c7fb94512110fb4d1b3e86fb312db Mon Sep 17 00:00:00 2001 > From: Andres Freund > Date: Sat, 2 Apr 2022 19:38:01 -0700 > Subject: [PATCH v70 14/27] pgstat: update docs. > ... > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml > - These parameters control server-wide statistics collection features. > - When statistics collection is enabled, the data that is produced can be > + These parameters control server-wide cumulative statistics system. > + When enabled, the data that is collected can be Missing "the" ("These parameters control the server-wide cumulative statistics system"). > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml > + any of the accumulated statistics, acessed values are cached until the end "acessed" => "accessed" > + stats_fetch_consistency can be set > + snapshot, at the price of increased memory usage for Missing "to" ("can be set to snapshot") > + caching not-needed statistics data. Conversely, if it's known that statistics Double space between "data." and "Conversely" (not sure if that matters) > + current transaction's statistics snapshot or cached values (if any). The Double space between "(if any)." and "The" (not sure if that matters) > + next use of statistical information will cause a new snapshot to be built > + or accessed statistics to be cached. I believe this should be an "and", not an "or". (next access builds both a new snapshot and caches accessed statistics) Thanks, Lukas -- Lukas Fittl
Re: [PATCH] Add extra statistics to explain for Nested Loop
On Sun, Nov 21, 2021 at 8:55 PM Justin Pryzby wrote: > I'm curious to hear what you and others think of the refactoring. > > It'd be nice if there's a good way to add a test case for verbose output > involving parallel workers, but the output is unstable ... > I've reviewed this patch, and it works as expected - the refactoring changes by Justin also appear to make sense to me. I've briefly thought whether this needs documentation (currently the patch includes none), but there does not appear to be a good place to add documentation about this from a quick glance, so it seems acceptable to leave this out given the lack of more detailed EXPLAIN documentation in general. The one item that still feels a bit open to me is benchmarking, based on Andres' comment a while ago: On Mon, Oct 19, 2020 at 4:20 PM Andres Freund wrote: > I'm a bit worried that further increasing the size of struct > Instrumentation will increase the overhead of EXPLAIN ANALYZE further - > in some workloads we spend a fair bit of time in code handling that. It > would be good to try to find a few bad cases, and see what the overhead is. > Whilst no specific bad cases were provided, I wonder if even a simple pgbench with auto_explain (and log_analyze=1) would be a way to test this? The overhead of the Instrumentation struct size should show regardless of whether a plan actually includes a Nested Loop. Thanks, Lukas -- Lukas Fittl
Re: PostgreSQL 13 Beta 1 Release Announcement Draft
On Mon, May 18, 2020 at 7:29 PM Jonathan S. Katz wrote: > Attached is a draft of the release announcement for the PostgreSQL 13 > Beta 1 release this week. > We could call out the additional commits that Tom has done for wait event renaming, re: compatibility - next to "Rename some recovery-related wait events". These are the relevant commits, I think: Rename SLRU structures and associated LWLocks. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5da14938f7bfb96b648ee3c47e7ea2afca5bcc4a Rename assorted LWLock tranches. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=36ac359d3621578cefc2156a3917024cdd3b1829 Drop the redundant "Lock" suffix from LWLock wait event names. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=14a91010912632cae322b06fce0425faedcf7353 Mop-up for wait event naming issues. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3048898e73c75f54bb259323382e0e7f6368cb6f Thanks, Lukas -- Lukas Fittl
Procedure calls are not tracked in pg_stat_user_functions / track_functions
Hi all, It seems that currently procedures do not get tracked when track_functions is enabled, which means one needs to resort to other workarounds in order to monitor procedure calls/runtime. To illustrate: =# SHOW track_functions; ┌─┐ │ track_functions │ ├─┤ │ all │ └─┘ (1 row) =# CALL abc(); CALL =# SELECT def(); ┌─┐ │ def │ ├─┤ │ │ └─┘ (1 row) =# SELECT * FROM pg_stat_user_functions; ┌─[ RECORD 1 ]┐ │ funcid │ 75223 │ │ schemaname │ public │ │ funcname │ def│ │ calls │ 1 │ │ total_time │ 3.222 │ │ self_time │ 3.222 │ └┴┘ Was this intentional, or an oversight? If welcome, I would be happy to work on a patch. Whilst slightly confusing in terms of naming, we could just track this together with functions, since one can always join with pg_proc to determine whether something is a function or a procedure. Thanks, Lukas -- Lukas Fittl
Re: Query is over 2x slower with jit=on
On Tue, Sep 25, 2018 at 1:17 PM, Andres Freund wrote: > > I've pushed the change without that bit - it's just a few additional > lines if we want to change that. > It seems that since this commit, JIT statistics are now only being printed for parallel query plans. This is due to ExplainPrintJIT being called before ExecutorEnd, so in a non-parallel query, queryDesc->estate->es_jit_combined_instr will never get set. Attached a patch that instead introduces a new ExplainPrintJITSummary method that summarizes the statistics before they get printed. I've also removed an (I believe) unnecessary "if (estate->es_instrument)" check that prevented EXPLAIN without ANALYZE from showing whether JIT would be used or not. In addition this also updates a missed section in the documentation with the new stats output format. Best, Lukas -- Lukas Fittl fix-jit-statistics-in-EXPLAIN-for-non-parallel-queries-v1.patch Description: Binary data
Re: auto_explain: Include JIT output if applicable
On Mon, Sep 24, 2018 at 1:48 PM, Andres Freund wrote: > > Thanks for noticing - pushed! > Thanks! Best, Lukas -- Lukas Fittl
auto_explain: Include JIT output if applicable
Hi, Whilst playing around with auto_explain and JIT today, I realized that auto_explain currently doesn't output JIT information, which is rather unfortunate when analyzing a larger set of queries in a semi-automated manner. Attached a trivial patch that fixes the issue and adds JIT information to auto_explain with the same logic as used for regular EXPLAIN. Thanks, Lukas -- Lukas Fittl auto_explain-include-jit-output-v1.patch Description: Binary data
Re: [HACKERS] Planning counters in pg_stat_statements
On Tue, Jan 23, 2018 at 3:31 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > > Unfortunately I'm not going to have bandwidth to figure this out > > during this commitfest due to other priorities so I'm going to call > > this "returned with feedback". > > OK. There's still time to get it done in the March 'fest. > I've had an interesting incident earlier this week where seeing planning time in pg_stat_statements would have been very helpful to determine the root cause more quickly. Essentially the situation was a statement that executed < 1ms but took more than 250ms to plan, running often - once we found the statement (using pg_stat_activity sampling) we were able to fix quickly by rewriting the query and reduce system load from ~8 to ~1. Having this patch in pg_stat_statements would have been critical to get the full picture of what was going on earlier. Thomas: I'm not as familiar with planner internals as you are, but happy to try and contribute here. Would it be useful for me to try to adjust the patch to Tom's proposal? Best, Lukas -- Lukas Fittl