> Any concerns with doing something like this [0] for the back-branches? The
> constant would be 6 instead of 7 on v14 through v16.
As far as backpatching the present inconsistencies in the docs,
[0] looks good to me.
[0]
> postgres -D pgdev-dev -c shared_buffers=16MB -C
> shared_memory_size_in_huge_pages
> 13
> postgres -D pgdev-dev -c shared_buffers=16MB -c huge_page_size=1GB -C
> shared_memory_size_in_huge_pages
> 1
> Which is very useful to be able to actually configure that number of huge
> pages. I don't
>>> If the exact values
>>> are important, maybe we could introduce more GUCs like
>>> shared_memory_size_in_huge_pages that can be consulted (instead of
>>> requiring users to break out their calculators).
>>
>> I don't especially like shared_memory_size_in_huge_pages, and I don't
>> want to
> I'm unsure if it needs to call ExecutorStart in the bind code. But if we
> don't change the current logic, would it make more sense to move
> pgstat_report_query_id to the ExecutorRun routine?
I initially thought about that, but for utility statements (CTAS, etc.) being
executed with extended
>>> That's true, but using a hard-coded limit means we no longer need to add a
>>> new GUC. Always allocating, say, 256 slots might require a few additional
>>> kilobytes of shared memory, most of which will go unused, but that seems
>>> unlikely to be a problem for the systems that will run
> Okay, that's what I precisely wanted to understand: queryId doesn't have
> semantics to show the job that consumes resources right now—it is mostly
> about convenience to know that the backend processes nothing except
> (probably) this query.
It may be a good idea to expose in pg_stat_activity
> I discovered the current state of queryId reporting and found that it
> may be unlogical: Postgres resets queryId right before query execution
> in simple protocol and doesn't reset it at all in extended protocol and
> other ways to execute queries.
In exec_parse_message, exec_bind_message and
> This is about how I feel, too. In any case, I +1'd a higher default
> because I think we need to be pretty conservative with these changes, at
> least until we have a better prioritization strategy. While folks may opt
> to set this value super low, I think that's more likely to lead to some
>
> That's true, but using a hard-coded limit means we no longer need to add a
> new GUC. Always allocating, say, 256 slots might require a few additional
> kilobytes of shared memory, most of which will go unused, but that seems
> unlikely to be a problem for the systems that will run Postgres v18.
> And as far as that goes, I'd like you - and others - to spell out more
> precisely why you think 100 or 200 million tuples is too much. It
> might be, or maybe it is in some cases but not in others. To me,
> that's not a terribly large amount of data. Unless your tuples are
> very wide, it's a
I've been following this discussion and would like to add my
2 cents.
> Unless I'm missing something major, that's completely bonkers. It
> might be true that it would be a good idea to vacuum such a table more
> often than we do at present, but there's no shot that we want to do it
> that much
Here is a new rev of the patch which deals with the scenario
mentioned by Andrei [1] in which the queryId may change
due to a cached query invalidation.
[1]
https://www.postgresql.org/message-id/724348C9-8023-41BC-895E-80634E79A538%40amazon.com
Regards,
Sami
>> But simplistic case with a prepared statement shows how the value of
>> queryId can be changed if you don't acquire all the objects needed for
>> the execution:
>> CREATE TABLE test();
>> PREPARE name AS SELECT * FROM test;
>> EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
>> DROP
> We choose a arguably more user-friendly option:
> https://www.postgresql.org/docs/current/sql-prepare.html
Thanks for pointing this out!
Regards,
Sami
> But simplistic case with a prepared statement shows how the value of
> queryId can be changed if you don't acquire all the objects needed for
> the execution:
> CREATE TABLE test();
> PREPARE name AS SELECT * FROM test;
> EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
> DROP TABLE test;
>
> I am also a bit surprised with the choice of using the first Query
> available in the list for the ID, FWIW.
IIUC, the query trees returned from QueryRewrite
will all have the same queryId, so it appears valid to
use the queryId from the first tree in the list. Right?
Here is an example I
> FWIW, I'd like to think that we could improve the situation, requiring
> a mix of calling pgstat_report_query_id() while feeding on some query
> IDs retrieved from CachedPlanSource->query_list. I have not in
> details looked at how much could be achieved, TBH.
I was dealing with this today and
> Part of the underlying problem here is that, AFAIK, neither PostgreSQL
> as a piece of software nor we as human beings who operate PostgreSQL
> databases have much understanding of how autovacuum_max_workers should
> be set. It's relatively easy to hose yourself by raising
>
> Here is a first attempt at a proper patch set based on the discussion thus
> far. I've split it up into several small patches for ease of review, which
> is probably a bit excessive. If this ever makes it to commit, they could
> likely be combined.
I looked at the patch set. With the help of
> Another option could be to just remove the restart-only GUC and hard-code
> the upper limit of autovacuum_max_workers to 64 or 128 or something. While
> that would simplify matters, I suspect it would be hard to choose an
> appropriate limit that won't quickly become outdated.
Hardcoded values
> IIRC using GUC hooks to handle dependencies like this is generally frowned
> upon because it tends to not work very well [0]. We could probably get it
> to work for this particular case, but IMHO we should still try to avoid
> this approach.
Thanks for pointing this out. I agree, this could
>> 1/ We should emit a log when autovacuum_workers is set higher than the max.
>> Hm. Maybe the autovacuum launcher could do that.
Would it be better to use a GUC check_hook that compares the
new value with the max allowed values and emits a WARNING ?
autovacuum_max_workers already has a
I spent sometime reviewing/testing the POC. It is relatively simple with a lot
of obvious value.
I tested with 16 tables that constantly reach the autovac threashold and the
patch did the right thing. I observed concurrent autovacuum workers matching
the setting as I was adjusting it
> My concern with this approach is that other background workers could use up
> all the slots and prevent autovacuum workers from starting
That's a good point, the current settings do not guarantee that you
get a worker for the purpose if none are available,
i.e. max_parallel_workers_per_gather,
> I frequently hear about scenarios where users with thousands upon thousands
> of tables realize that autovacuum is struggling to keep up. When they
> inevitably go to bump up autovacuum_max_workers, they discover that it
> requires a server restart (i.e., downtime) to take effect, causing
> The original \conninfo was designed to report values from the libpq API
> about what libpq connected to. And the convention in psql is that "+"
> provide more or less the same information but a bit more. So I think it
> is wrong to make "\conninfo+" something fundamentally different than
>
> The point about application_name is a valid one. I guess it's there
> because it's commonly given from the client side rather than being set
>server-side, even though it's still a GUC. Arguably we could remove it
> from \conninfo+, and claim that nothing that shows up in \dconfig should
> also
Building the docs fail for v26. The error is:
ref/psql-ref.sgml:1042: element member: validity error : Element term is not
declared in member list of possible children
^
I am able to build up to v24 before the was replaced with
I tested building with a
> Here I considered your suggestion (Sami and Álvaro's). However, I haven't yet
> added the links for the functions system_user(), current_user(), and
> session_user().
> 'm not sure how to do it. Any suggestion on how to create/add the link?
Here is an example [1] where the session information
> That minor point aside, I disagree with Sami about repeating the docs
> for system_user() here. I would just say "The authentication data
> provided for this connection; see the function system_user() for more
> details."
+1. FWIW; Up the thread [1], I did mention we should link to the
>. However,
> I didn't complete item 4. I'm not sure, but I believe that linking it to the
> documentation
> could confuse the user a bit. I chose to keep the descriptions as they were.
> However, if
> you have any ideas on how we could outline it, let me know and perhaps we can
> implement it.
Thank you for the updated patch.
First and foremost, thank you very much for the review.
> The initial and central idea was always to keep the metacommand
> "\conninfo" in its original state, that is, to preserve the string as it is.
> The idea of "\conninfo+" is to expand this to include more
Hi,
Thanks for working on this.
I have a few comments about the current patch.
1/ I looked through other psql-meta commands and the “+” adds details but
does not change output format. In this patch, conninfo and conninfo+
have completely different output. The former is a string with all the
> What is the actual
> use case for such a setting?
I don't have exact details on the use-case, bit this is not a common
use-case.
> Doesn't it risk security problems?
I cannot see how setting it on the database being more problematic than
setting it on a session level.
> I'm rather
Hi,
A recent case in the field in which a database session_authorization is
altered to a non-superuser, non-owner of tables via alter database .. set
session_authorization ..
caused autovacuum to skip tables.
The issue was discovered on 13.10, and the logs show such messages:
warning:
> So, I've spent more time on that and applied the simplification today,
> doing as you have suggested to use the head page rather than the tail
> page when the tail XID is ahead of the head XID, but without disabling
> the whole. I've simplified a bit the code and the comments, though,
> while on
> I believe both cumulative statistics and logs are needed. Logs excel in
> pinpointing specific queries at precise times, while statistics provide
> a broader overview of the situation. Additionally, I often encounter
> situations where clients lack pg_stat_statements and can't restart their
Sorry for the delay in response.
> Back then, we were pretty much OK with the amount of space that could
> be wasted even in this case. Actually, how much space are we talking
> about here when a failed truncation happens?
It is a transient waste in space as it will eventually clean up.
> As
>> Currently explain ( analyze ) will give you the "Workers Planned"
>> and "Workers launched". Logging this via auto_explain is possible, so I am
>> not sure we need additional GUCs or debug levels for this info.
>>
>> -> Gather (cost=10430.00..10430.01 rows=2 width=8) (actual tim
>>
Hi,
This thread has been quiet for a while, but I'd like to share some
thoughts.
+1 to the idea of improving visibility into parallel worker saturation.
But overall, we should improve parallel processing visibility, so DBAs can
detect trends in parallel usage ( is the workload doing more
023, at 6:29 PM, Imseih (AWS), Sami wrote:
>
> I added an additional condition to make sure that the tailPage proceeds the
> headPage
> as well.
> I think the smallest fix here would be to change CheckPointPredicate()
> so that if tailPage > headPage, pass headPage to SimpleLruTruncate()
> instead of tailPage. Or perhaps it should go into the "The SLRU is no
> longer needed" codepath in that case. If tailPage > headPage, the SLRU
> isn't
> I don't really understand what exactly the problem is, or how this fixes
> it. But this doesn't feel right:
As the repro show, false reports of "pg_serial": apparent wraparound”
messages are possible. For a very busy system which checkpoints frequently
and heavy usage of serializable isolation,
> Still this grouping is much better than having thousands of entries
> with different values. I am not sure if we should bother improving
> that more than what you suggest that, especially as FuncExpr->args can
> itself include Const nodes as far as I recall.
I agree.
> As far as the SET
Hi,
The proposal by Bertrand in CC to jumble CALL and SET in [1] was
rejected at the time for a more robust solution to jumble DDL.
Michael also in CC made this possible with commit 3db72ebcbe.
The attached patch takes advantage of the jumbling infrastructure
added in the above mentioned commit
> This looks mostly fine to me modulo "sort or hash". I do see many
> instances of "and/or" in the docs. Maybe that would work better.
"sort or hash operations at the same time" is clear explanation IMO.
This latest version of the patch looks good to me.
Regards,
Sami
> Here is a new version of the patch that avoids changing the names of the
> existing functions. I'm not thrilled about the name
> (pgstat_fetch_stat_local_beentry_by_backend_id), so I am open to
> suggestions. In any case, I'd like to rename all three of the>
> pgstat_fetch_stat_* functions in
I tested the patch and it does the correct thing.
I have a few comments:
1/ cast the return of bsearch. This was done previously and is the common
convention in the code.
So
+ return bsearch(, localBackendStatusTable, localNumBackends,
+
Attached a patch with a new CF entry: https://commitfest.postgresql.org/44/4516/
Regards,
Sami Imseih
Amazon Web Services (AWS)
0001-v1-Fix-false-report-of-wraparound-in-pg_serial.patch
Description: 0001-v1-Fix-false-report-of-wraparound-in-pg_serial.patch
blob/master/src/backend/access/transam/slru.c#L306
Regards,
Sami
From: "Imseih (AWS), Sami"
Date: Tuesday, August 22, 2023 at 7:56 PM
To: "pgsql-hack...@postgresql.org"
Subject: False "pg_serial": apparent wraparound” in logs
Hi,
I Recently encountered a situa
Hi,
I Recently encountered a situation on the field in which the message
“could not truncate directory "pg_serial": apparent wraparound”
was logged even through there was no danger of wraparound. This
was on a brand new cluster and only took a few minutes to see
the message in the logs.
Reading
Hi,
Sorry for the delay in response and thanks for the feedback!
> I've reviewed and built the documentation for the updated patch. As it stands
> right now I think the documentation for this section is quite clear.
Sorry, I am not understanding. What is clear? The current documentation -or-
Hi,
While recently looking into partition maintenance, I found a case in which
DETACH PARTITION FINALIZE could case deadlocks. This occurs when a
ALTER TABLE DETACH CONCURRENTLY, followed by a cancel, the followed by
an ALTER TABLE DETACH FINALIZE, and this sequence of steps occur in the
middle
> I'm sorry for not having read (and not reading) the other thread yet,
> but what was the reason we couldn't store that oid in a column in the
> pg_s_p_vacuum-view?
> Could you summarize the other solutions that were considered for this issue?
Thanks for your feedback!
The reason we cannot
Hi,
[1] is a ready-for-committer enhancement to pg_stat_progress_vacuum which
exposes
the total number of indexes to vacuum and how many indexes have been vacuumed in
the current vacuum cycle.
To even further improve visibility into index vacuuming, it would be beneficial
to have a
function
> So it's lacking a rule to tell it what to do in this case, and the
> default is the wrong way around. I think we need to fix it in
> about the same way as the equivalent case for matviews, which
> leads to the attached barely-tested patch.
Thanks for the patch! A test on the initially reported
Hi,
What appears to be a pg_dump/pg_restore bug was observed with the new
BEGIN ATOMIC function body syntax introduced in Postgres 14.
Dependencies inside a BEGIN ATOMIC function cannot be resolved
if those dependencies are dumped after the function body. The repro
case is when a primary key
Based on the feedback, here is a v1 of the suggested doc changes.
I modified Gurjeets suggestion slightly to make it clear that a specific
query execution could have operations simultaneously using up to
work_mem.
I also added the small hash table memory limit clarification.
Regards,
Sami
> > especially since the next sentence uses "concurrently" to describe the
> > other case. I think we need a more thorough rewording, perhaps like
> >
> > - Note that for a complex query, several sort or hash operations
> > might be
> > - running in parallel; each operation will
Hi,
I recently noticed the following in the work_mem [1] documentation:
“Note that for a complex query, several sort or hash operations might be
running in parallel;”
The use of “parallel” here is misleading as this has nothing to do with
parallel query, but
rather several operations in a
> This should be OK (also checked the code paths where the reports are
> added). Note that the patch needed a few adjustments for its
> indentation.
Thanks for the formatting corrections! This looks good to me.
--
Sami
> + case 'P': /* Parallel progress reporting */
I kept this comment as-is but inside case code block I added
more comments. This is to avoid cluttering up the one-liner comment.
> + * Increase and report the number of index scans. Also, we reset the progress
> + * counters.
> The counters
> The arguments of pgstat_progress_update_param() would be given by the
> worker directly as components of the 'P' message. It seems to me that
> this approach would have the simplicity to not require the setup of a
> shmem area for the extra counters, and there would be no need for a
> callback.
> As one thing,
> for example, it introduces a dependency to parallel.h to do progress
> reporting without touching at backend_progress.h.
Containing the logic in backend_progress.h is a reasonable point
from a maintenance standpoint.
We can create a new function in backend_progress.h called
> Makes sense to me. I'll look at that again today, potentially apply
> the fix on HEAD.
Here is v6. That was my mistake not to zero out the es_total_processed.
I had it in the first version.
--
Regards,
Sami Imseih
Amazon Web Services (AWS)
> > The key point of the patch is here. From what I understand based on
> > the information of the thread, this is used as a way to make the
> progress reporting done by the leader more responsive so as we'd
> > update the index counters each time the leader is poked at with a 'P'
> > message by
> Doing nothing for calls now is fine by me, though I
> agree that this could be improved at some point, as seeing only 1
> rather than N for each fetch depending on the size is a bit confusing.
I think we will need to clearly define what "calls" is. Perhaps as mentioned
above, we may need
> I was looking back at this thread, and the suggestion to use one field
> in EState sounds fine to me. Sami, would you like to send a new
> version of the patch (simplified version based on v1)?
Here is v4.
The "calls" tracking is removed from Estate. Unlike v1 however,
I added a check for the
> Maybe, but is there any field demand for that?
I don't think there is.
> We clearly do need to fix the
> reported rowcount for cases where ExecutorRun is invoked more than
> once per ExecutorEnd call; but I think that's sufficient.
Sure, the original proposed fix, but with tracking the
> Why should that be the definition? Partial execution of a portal
> might be something that is happening at the driver level, behind the
> user's back. You can't make rational calculations of, say, plan
> time versus execution time if that's how "calls" is measured.
Correct, and there are also
> * Yeah, it'd be nice to have an in-core test, but it's folly to insist
> on one that works via libpq and psql. That requires a whole new set
> of features that you're apparently designing on-the-fly with no other
> use cases in mind. I don't think that will accomplish much except to
> ensure
> So... The idea here is to set a custom fetch size so as the number of
> calls can be deterministic in the tests, still more than 1 for the
> tests we'd have. And your point is that libpq enforces always 0 when
> sending the EXECUTE message causing it to always return all the rows
> for any
> I wonder that this patch changes the meaning of "calls" in the
> pg_stat_statement
> view a bit; previously it was "Number of times the statement was executed" as
> described in the documentation, but currently this means "Number of times the
> portal was executed". I'm worried that this makes
> How does JDBC test that? Does it have a dependency on
> pg_stat_statements?
No, at the start of the thread, a sample jdbc script was attached.
But I agree, we need to add test coverage. See below.
>> But, I'm tempted to say that adding new tests could be addressed
>> separately though (as
> What about using an uint64 for calls? That seems more appropriate to me (even
> if
> queryDesc->totaltime->calls will be passed (which is int64), but that's
> already
> also the case for the "rows" argument and
> queryDesc->totaltime->rows_processed)
That's fair
> I'm not sure it's worth
> This indeed feels a bit more natural seen from here, after looking at
> the code paths using an Instrumentation in the executor and explain,
> for example. At least, this stresses me much less than adding 16
> bytes to EState for something restricted to the extended protocol when
> it comes to
Sorry about the delay in response about this.
I was thinking about this and it seems to me we can avoid
adding new fields to Estate. I think a better place to track
rows and calls is in the Instrumentation struct.
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@
> If I remove this patch and recompile again, then "initdb -D $PGDATA" works.
It appears you must "make clean; make install" to correctly compile after
applying the patch.
Regards,
Sami Imseih
Amazon Web Services (AWS)
> >>> Now I've a second thought: what do you think about resetting the related
> >>> number
> >>> of operations and *_time fields when enabling/disabling track_io_timing?
> >>> (And mention it in the doc).
> >>>
> >>> That way it'd prevent bad interpretation (at least as far the time per
> >>>
> > It's a bit annoying that the info is missing since pg 14, but we
> > probably can't
> > backpatch this as it might break log parser tools.
> What do you think?
That's a good point about log parsing tools, i.e. pgbadger.
Backpatching does not sounds to appealing to me after
giving this a
I am wondering if this patch should be backpatched?
The reason being is in auto_explain documentation [1],
there is a claim of equivalence of the auto_explain.log_verbose
option and EXPLAIN(verbose)
". it's equivalent to the VERBOSE option of EXPLAIN."
This can be quite confusing for users
> Well, it is one of these areas where it seems to me we have never been
> able to put a definition on what should be the correct behavior when
> it comes to pg_stat_statements.
What needs to be defined here is how pgss should account for # of rows
processed when A) a select goes through
> +
> + Queries on which normalization can be applied may be observed with constant
> + values in pg_stat_statements, especially when there
> + is a high rate of entry deallocations. To reduce the likelihood of this
> + happening, consider increasing pg_stat_statements.max.
> + The
> I am OK with an addition to the documentation to warn that one may
> have to increase the maximum number of entries that can be stored if
> seeing a non-normalized entry that should have been normalized.
I agree. We introduce the concept of a plannable statement in a
previous section and we
>On Sat, Feb 25, 2023 at 01:59:04PM +0000, Imseih (AWS), Sami wrote:>
>> The overhead of storing this additional private data for the life of the
> query
>> execution may not be desirable.
>Okay, but why?
Additional memory to maintain the JumbleSta
>> Could things be done in a more stable way? For example, imagine that
>> we have an extra Query field called void *private_data that extensions
>> can use to store custom data associated to a query ID, then we could
>> do something like that:
>> - In the post-analyze hook,
Replacing constants in pg_stat_statements is on a best effort basis.
It is not unlikely that on a busy workload with heavy entry deallocation,
the user may observe the query with the constants in pg_stat_statements.
From what I can see, this is because the only time an entry is normalized is
Thanks!
> I think PROGRESS_VACUUM_INDEXES_TOTAL and
> PROGRESS_VACUUM_INDEXES_PROCESSED are better for consistency. The rest
> looks good to me.
Took care of that in v25.
Regards
--
Sami Imseih
Amazon Web Services
v25-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Thanks for the review!
>+
>+ ParallelVacuumFinish
>+ Waiting for parallel vacuum workers to finish index
>vacuum.
>+
>This change is out-of-date.
That was an oversight. Thanks for catching.
>Total number of indexes that will be vacuumed or
Hi,
Thanks for your reply!
I addressed the latest comments in v23.
1/ cleaned up the asserts as discussed.
2/ used pq_putmessage to send the message on index scan completion.
Thanks
--
Sami Imseih
Amazon Web Services (AWS)
v23-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Doing some work with extended query protocol, I encountered the same
issue that was discussed in [1]. It appears when a client is using
extended query protocol and sends an Execute message to a portal with
max_rows, and a portal is executed multiple times,
pg_stat_statements does not correctly
>Number of indexes that will be vacuumed or cleaned up. This counter only
>advances when the phase is vacuuming indexes or cleaning up indexes.
I agree, this reads better.
---
-/* Report that we are now vacuuming indexes */
-
Thanks for the feedback and I apologize for the delay in response.
>I think the problem here is that you're basically trying to work around the
>lack of an asynchronous state update mechanism between leader and workers.
> The
>workaround is to add a lot of different places that poll
> My point is whether we should show
> indexes_total throughout the vacuum execution (even also in not
> relevant phases such as heap scanning/vacuum/truncation).
That is a good point. We should only show indexes_total
and indexes_completed only during the relevant phases.
V21 addresses this
>Similar to above three cases, vacuum can bypass index vacuuming if
>there are almost zero TIDs. Should we set indexes_total to 0 in this
>case too? If so, I think we can set both indexes_total and
>indexes_completed at the beginning of the index vacuuming/cleanup and
>reset
Thanks for the review!
Addressed the comments.
> "Increment the indexes completed." (dot at the end) instead?
Used the commenting format being used in other places in this
file with an inclusion of a double-dash. i.,e.
/* Wraparound emergency -- end current index scan */
> It seems to me that
> cirrus-ci.com/task/4557389261701120
I earlier compiled without building with --enable-cassert,
which is why the compilation errors did not produce on my
buid.
Fixed in v19.
Thanks
--
Sami Imseih
Amazon Web Services: https://aws.amazon.com
>cfbot is complaining that this patch no longer applies. Sami, would you
>mind rebasing it?
Rebased patch attached.
--
Sami Imseih
Amazon Web Services: https://aws.amazon.com
v18-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Description:
>I adjusted the FAILSAFE_EVERY_PAGES comments, which now point out that
>FAILSAFE_EVERY_PAGES is a power-of-two. The implication is that the
>compiler is all but guaranteed to be able to reduce the modulo
>division into a shift in the lazy_scan_heap loop, at the point of the
>
Attached is a patch to check scanned pages rather
than blockno.
Regards,
Sami Imseih
Amazon Web Services (AWS)
v1-0001-fixed-when-wraparound-failsafe-is-checked.patch
Description: v1-0001-fixed-when-wraparound-failsafe-is-checked.patch
1 - 100 of 152 matches
Mail list logo