from:"Tom Lane"

Re: Outer cost higher than the inner cost

2024-07-26 Thread Tom Lane

=?UTF-8?Q?Stanis=C5=82aw_Skonieczny?=  writes:
> What bothers me is that the inner plan cost (7430940) is higher than the
> outer plan cost (1131699).

I think it is estimating (based on knowledge of the ranges of join keys
in the two relations) that that input subplan won't need to be run to
completion.  See initial_cost_mergejoin in costsize.c.

regards, tom lane

Re: Query performance issue

2024-07-10 Thread Tom Lane

Dheeraj Sonawane  writes:
> While executing the join query on the postgres database we have observed 
> sometimes randomly below query is being fired which is affecting our response 
> time.

> Query randomly fired in the background:-
> SELECT p.proname,p.oid FROM pg_catalog.pg_proc p, pg_catalog.pg_namespace n 
> WHERE p.pronamespace=n.oid AND n.nspname='pg_catalog' AND ( proname = 
> 'lo_open' or proname = 'lo_close' or proname = 'lo_creat' or proname = 
> 'lo_unlink' or proname = 'lo_lseek' or proname = 'lo_lseek64' or proname = 
> 'lo_tell' or proname = 'lo_tell64' or proname = 'loread' or proname = 
> 'lowrite' or proname = 'lo_truncate' or proname = 'lo_truncate64')

That looks very similar to libpq's preparatory lookup before executing
large object accesses (cf lo_initialize in fe-lobj.c).  The details
aren't identical so it's not from libpq, but I'd guess this is some
other client library's version of the same thing.

> Query intended to be executed:-
> SELECT a.* FROM tablename1 a INNER JOIN users u ON u.id = a.user_id INNER 
> JOIN tablename2 c ON u.client_id = c.id WHERE u.external_id = ? AND c.name = 
> ? AND (c.namespace = ? OR (c.namespace IS NULL AND ? IS NULL))

It is *really* hard to believe that that lookup query would make any
noticeable difference on response time for some other session, unless
you are running the server on seriously underpowered hardware.

It could be that you've misinterpreted your data, and what is actually
happening is that that other session has completed its lookup query
and is now doing fast-path large object reads and writes using the
results.  Fast-path requests might not show up as queries in your
monitoring, but if the large object I/O is sufficiently fast and
voluminous maybe that'd account for visible performance impact.

>   2.  Is there any way we can suppress this query?

Stop using large objects?  But the alternatives won't be better
in terms of performance impact.  Really, if this is a problem
for you, you need a beefier server.  Or split the work across
more than one server.

regards, tom lane

Re: performance of sql and plpgsql functions

2024-06-17 Thread Tom Lane

Julius Tuskenis  writes:
> EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
>    SELECT
>      COALESCE(sum(mok_nepadengta), 0)
>    FROM
>      public.b_pardavimai
>      JOIN public.b_mokejimai ON (mok_pardavimas = pard_id)
>    WHERE
>      (pard_tipas = ANY('{1, 2, 6, 7}'))
>      AND (mok_saskaita = 7141968)

I believe that the SQL-language function executor always uses generic
plans for parameterized queries (which is bad, but nobody's gotten
round to improving it).  So the above is a poor way of investigating
what will happen, because it corresponds to a custom plan for the
value 7141968.  You should try something like

PREPARE p(integer) AS
  SELECT COALESCE ...
  ... AND (mok_saskaita = $1);

SET plan_cache_mode TO force_generic_plan;

EXPLAIN ANALYZE EXECUTE p(7141968);

What I suspect is that the statistics for mok_saskaita are
highly skewed and so with a generic plan the planner will
not risk using a plan that depends on the parameter value
being infrequent, as the one you're showing does.

        regards, tom lane

Re: Extremely slow to establish connection when user has a high number of roles

2024-04-21 Thread Tom Lane

Michal Charemza  writes:
> Tom Lane  writes:
>> I'm fairly sure that if that exists it's always noticed first,
>> bypassing the need for any role membership tests.  So please
>> confirm whether your production database has revoked PUBLIC
>> connect privilege.

> I realised that in fact we hadn't revoked this. So it sounds like whatever
> the issue, it's not about checking if the user has the CONNECT privilege?

Yeah.  I double-checked the code (see aclmask()), and it will detect
holding a privilege via PUBLIC before it performs any role membership
searches.  So whatever is happening, it's not that lookup.

>> It could be that the problem is not associated with the
>> database's connect privilege, but with role membership lookups
>> triggered by pg_hba.conf entries.  Do you have any entries there
>> that require testing membership (i.e. the role column is not
>> "all")?

> Running `select * from pg_hba_file_rules` it looks like the user column is
> always {all} or {rdsadmin}

You'll need to look closer and figure out which of the HBA rules is
being used for the slow connection attempts.  If it's {rdsadmin}
then that would definitely involve a role membership search.
If it's {all} then we're back to square one.

A different line of thought could be that the slow connections
are slow because they are waiting on a lock that some other
process has got and is in no hurry to release.  It would be
worth trying to capture the contents of the pg_locks view
(and I guess also pg_stat_activity) while one of these sessions
is stuck, if you can reproduce it often enough to make that
feasible.

regards, tom lane

Re: Extremely slow to establish connection when user has a high number of roles

2024-04-20 Thread Tom Lane

Michal Charemza  writes:
> Tom Lane  writes:
>> It's not very clear what you mean by "sometimes".  Is the slowness
> reproducible for a particular user and role configuration, or does
> it seem to come and go by itself?

> Ah it's more come and go by itself - as in one connection takes 30 seconds,
> then the next say 0.06s. It's happened for every user we've tried. Even
> more anecdotally, I would say it happens more when the database is busy in
> terms of tables being dropped/created and permissions changing.

OK, that pretty much eliminates the idea that it's a new manifestation
of the catcache-inefficiency problem.  Vijaykumar may well have the
right idea, that it's a form of catalog bloat.  Do you do bulk
permissions updates that might affect thousands of role memberships at
once?

> Also: realise we did have one user that had directly was a member of
> several thousand roles, but indirectly several million. It would sometimes
> take 10 minutes for that user to connect. We've since changed that to one
> role, and that user connects fine now.

Interesting --- but even for that user, it was sometimes fast to
connect?

> I am wondering - what happens on connection? What catalogue tables does
> PostgreSQL check and how? What's allowed to happen concurrently and what
> isn't? If I knew, maybe I could come up with a reproduction script that
> does reproduce the issue?

Well, it's going to be looking to see that the user has CONNECT
privileges on the target database.  If that db doesn't have public
connect privileges, but only grants CONNECT to certain roles, then
we'll have to test whether the connecting user is a member of those
roles --- which involves looking into pg_auth_members and possibly
even doing recursive searches there.  For the sort of setup you're
describing with thousands of role grants (pg_auth_members entries)
it's not hard to imagine that search being rather expensive.  What
remains to be explained is how come it's only expensive sometimes.

The catalog-bloat idea comes from thinking about how Postgres handles
row updates.  There will be multiple physical copies (row versions)
of any recently-updated row, and this is much more expensive to scan
than a static situation with only one live row version.  First just
because we have to look at more than one copy, and second because
testing whether that copy is the live version is noticeably more
expensive if it's recent than once it's older than the xmin horizon,
and third because if we are the first process to scan it since it
became dead-to-everybody then it's our responsibility to mark it as
dead-to-everybody, so that we have to incur additional I/O to do that.
A plausible idea for particular connection attempts being slow is that
they came in just as a whole lot of pg_auth_members entries became
dead-to-everybody, and hence they were unlucky enough to get saddled
with a whole lot of that hint-bit-updating work.  (This also nicely
explains why the next attempt isn't slow: the work's been done.)

But this is only plausible if you regularly do actions that cause a
lot of pg_auth_members entries to be updated at the same time.
So we still don't have good insight into that, and your test script
isn't shedding any light.

A couple of other thoughts:

* I don't think your test script would show a connection-slowness
problem even if there was one to be shown, because you forgot to
revoke the PUBLIC connect privilege on the postgres database.
I'm fairly sure that if that exists it's always noticed first,
bypassing the need for any role membership tests.  So please
confirm whether your production database has revoked PUBLIC
connect privilege.

* It could be that the problem is not associated with the
database's connect privilege, but with role membership lookups
triggered by pg_hba.conf entries.  Do you have any entries there
that require testing membership (i.e. the role column is not
"all")?

regards, tom lane

Re: Extremely slow to establish connection when user has a high number of roles

2024-04-20 Thread Tom Lane

Michal Charemza  writes:
> The issue is that we're hitting a strange performance problem on
> connection. Sometimes it can take ~25 to 40 seconds just to connect,
> although it's often way quicker. There seems to be no middle ground - never
> have I seen a connection take between 0.5 and 25 seconds for example. We
> suspect it's related to the number of roles the connecting user has
> (including via other roles), because if we remove all roles but one from
> the connecting user (the one that grants connection permissions),
> connecting is always virtually instantaneous.

It's not very clear what you mean by "sometimes".  Is the slowness
reproducible for a particular user and role configuration, or does
it seem to come and go by itself?

As Tomas said, a self-contained reproduction script would be very
helpful for looking into this.

> The closest issue that I can find that's similar is
> https://www.postgresql.org/message-id/flat/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz%3DvggErdSG7pv2s6vmmTOLJSg%40mail.gmail.com,
> which reports that GRANT role is slow with a high number of roles - but in
> our case, it's connecting that's the problem, before (as far as we can
> tell) even one query is run.

That specific problem is (we think) new in v16, but the root cause
is an inefficient lookup mechanism that has been there a long time.
Maybe you have found a usage pattern that exposes its weakness in
older branches.  If so, we could consider back-patching 14e991db8
further than v16 ... but I don't plan to take any risk there without
concrete evidence that it'd improve things.

regards, tom lane

Re: Simple JOIN on heavy table not using expected index

2024-02-09 Thread Tom Lane

kimaidou  writes:
> It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text
> field), even if the corresponding number of lines for this WHERE clause is
> a smal subset of the entire data:
> approx 6M against 80M in total

6M out of 80M rows is not a "small subset".  Typically I'd expect
the planner to use an index-based scan for up to 1 or 2 percent of
the table.  Beyond that, you're going to be touching most pages
of the table anyway.

You can try reducing random_page_cost to favor indexscans, but
you might not find that the query gets any faster.

        regards, tom lane

Re: I don't understand that EXPLAIN PLAN timings

2024-01-25 Thread Tom Lane

David Rowley  writes:
> I do wonder now if it was a bad idea to make Memoize build the hash
> table on plan startup rather than delaying that until we fetch the
> first tuple. I see Hash Join only builds its table during executor
> run.

Ouch!  If it really does that, yes it's a bad idea.

    regards, tom lane

Re: Why is a sort required for this query? (IS NULL predicate on leading key column)

2024-01-17 Thread Tom Lane

Jerry Brenner  writes:
> I'm wondering why a sort is required for this query, as the index should be
> providing the required ordering to satisfy the ORDER BY clause.  Does it
> have to do with the IS NULL predicate on the leading key column in the
> index?

IS NULL is not seen as an equality condition, no.  It's pretty much
of a hack that makes it an indexable condition at all, and we don't
really do any advanced optimization with it.

    regards, tom lane

Re: Selection not "pushed down into" CTE

2024-01-07 Thread Tom Lane

Clemens Eisserer  writes:
> running postgresql 15.5 I was recently surpised postgresql didn't
> perform an optimization which I thought would be easy to apply.

It is not.

> running the following query results in a full sort (caused by lead
> over order by) as the ts > '2024-01-04' selection doesn't seem to be
> applied to the CTE but only later:

> with cte as (select ts, lead(ts, 1) over (order by ts) as ts2 from smartmeter)
> select ts, ts2 from cte where ts > '2024-01-04' and  extract(epoch
> from ts2) - extract(epoch from ts) > 9;

The ts restriction is not pushed down because of the rules in
allpaths.c:

 * 4. If the subquery has any window functions, we must not push down quals
 * that reference any output columns that are not listed in all the subquery's
 * window PARTITION BY clauses.  We can push down quals that use only
 * partitioning columns because they should succeed or fail identically for
 * every row of any one window partition, and totally excluding some
 * partitions will not change a window function's results for remaining
 * partitions.  (Again, this also requires nonvolatile quals, but
 * subquery_is_pushdown_safe handles that.)

To conclude that it'd be safe with this particular window function
requires deep knowledge of that function's semantics, which the
planner has not got.

    regards, tom lane

Re: Questions about "Output" in EXPLAIN ANALYZE VERBOSE

2024-01-02 Thread Tom Lane

Jeff Janes  writes:
> On Tue, Jan 2, 2024 at 1:29 PM Jerry Brenner  wrote:
>> - It looks like "Output" includes more than just the columns with
>> predicates and/or being accessed or returned in other nodes.

> Not in my hands. For SELECTs it just lists the columns that are needed.

It depends.  The planner may choose to tell a non-top-level scan node
to return all columns, in hopes of saving a tuple projection step at
runtime.  That's heuristic and depends on a number of factors, so you
shouldn't count on it happening or not happening.

        regards, tom lane

Re: Question about semantics of $ variables in json explain plans in 13

2023-12-08 Thread Tom Lane

Jerry Brenner  writes:
> Is there any documentation on the semantics of $ variables in json explain
> plans for both InitPlans and SubPlans in 13?

I don't think there's anything much in the user-facing docs, which is
somewhat unfortunate because it's confusing: the notation is overloaded.
$N could be a parameter supplied from outside the query (as in your $1,
$2 and $3 in the source text), but it could also be a parameter supplied
from an outer query level to a subplan, or it could be the result value
of an InitPlan.  The numbering of outside-the-query parameters is
disjoint from that of the other kind.

>- It looks like $0 represents the value from the outer query block when
>the correlated subquery is evaluated
>- It looks like $1 represents the result of the subquery evaluation

Yeah, I think you're right here.  $0 evidently corresponds to
qRoot.contactID from the outer plan, and the plan label itself
shows that $1 carries the sub-select's value back out.  This $1
is unrelated to the $1 you wrote in the query text.  (It looks
like this is a custom plan in which "67" was explicitly substituted
for your $3.  Presumably $1 and $2 were replaced as well; we don't
do half-custom plans.)

    regards, tom lane

Re: 2 json explain plans for the same query/plan - why does one have constants while the other has parameter markers?

2023-12-08 Thread Tom Lane

Jerry Brenner  writes:
> We are currently on 13.  We are capturing the explain plans for query
> executions taking 1 second or longer and storing the json files.  We are
> most of the way through implementing a home grown solution to generate a
> consistent hash value for a query plan, so we can find queries with
> multiple plans.  I've attached 2 query plans that we've captured that
> differ in a seemingly strange way.  (All executions are from the same exact
> code path.)  One of the plans has parameter markers in the predicates in
> the values for "Recheck Cond" and "Index Cond", while the other does not.
> Any insight into why we are seeing parameter markers in the body of the
> query plan?

The one with parameter markers is a "generic" plan for a parameterized
query.  When you get a plan without parameter markers for the same
input query, that's a "custom" plan in which concrete values of the
parameters have been substituted, possibly allowing const-simplification
and more accurate rowcount estimates.  The backend will generally try
custom plans a few times and then try a generic plan to see if that's
meaningfully slower -- if not, replanning each time is deemed to be
wasteful.

regards, tom lane

Re: Does Postgres have consistent identifiers (plan hash value) for explain plans?

2023-12-04 Thread Tom Lane

Michael Paquier  writes:
> On Mon, Dec 04, 2023 at 09:57:24AM -0500, Tom Lane wrote:
>> Jerry Brenner  writes:
>>> Both Oracle and SQL Server have
>>> consistent hash values for query plans and that makes it easy to identify
>>> when there are multiple plans for the same query.  Does that concept exist
>>> in later releases of Postgres (and is the value stored in the json explain
>>> plan)?

>> No, there's no support currently for obtaining a hash value that's
>> associated with a plan rather than an input query tree.

> PlannerGlobal includes a no_query_jumble that gets inherited by all
> its lower-level nodes, so adding support for hashes compiled from
> these node structures would not be that complicated.  My point is that
> the basic infrastructure is in place in the tree to be able to do
> that, and it should not be a problem to even publish the compiled
> hashes in EXPLAIN outputs, behind an option of course.

Well, yeah, we could fairly easily activate that infrastructure for
plans, but we haven't.  More to the point, it's not clear to me that
that would satisfy the OP's request for "consistent" hash values.
The hashes would vary depending on object OID values, system version,
possibly endianness, etc.

I'm also wondering exactly what the OP thinks qualifies as different
plans.  Remembering the amount of fooling-around that's gone on with
querytree hashes to satisfy various people's ill-defined desires for
pg_stat_statements aggregation behavior, I'm not really eager to buy
into the same definitional morass at the plan level.

regards, tom lane

Re: Does Postgres have consistent identifiers (plan hash value) for explain plans?

2023-12-04 Thread Tom Lane

Jerry Brenner  writes:
> Both Oracle and SQL Server have
> consistent hash values for query plans and that makes it easy to identify
> when there are multiple plans for the same query.  Does that concept exist
> in later releases of Postgres (and is the value stored in the json explain
> plan)?

No, there's no support currently for obtaining a hash value that's
associated with a plan rather than an input query tree.

        regards, tom lane

Re: Performance degradation with CTEs, switching from PG 11 to PG 15

2023-11-22 Thread Tom Lane

Jean-Christophe Boggio  writes:
> I did upgrade :-) But we have many users for which we don't decide on 
> when they do upgrade so we have to keep compatibility with most versions 
> of PG and in that particular case (non-existence of the materialized 
> keyword for PG 11 and before) it is a real problem.

PG 11 is out of support as of earlier this month, so your users really
need to be prioritizing getting onto more modern versions.

    regards, tom lane

Re: Postgres Locking

2023-10-31 Thread Tom Lane

"Dirschel, Steve"  writes:
> Above I can see PID 3740 is blocking PID 3707.  The PK on table
> wln_mart.ee_fact is ee_fact_id.  I assume PID 3740 has updated a row
> (but not committed it yet) that PID 3707 is also trying to update.

Hmm. We can see that 3707 is waiting for 3740 to commit, because it's
trying to take ShareLock on 3740's transactionid:

> transactionid |  |  |  |   || 
> 251189986 | |   |  | 54/196626  | 3707 | 
> ShareLock| f   | f| 2023-10-31 14:40:21.837507-05

251189986 is indeed 3740's, because it has ExclusiveLock on that:

> transactionid |  |  |  |   || 
> 251189986 | |   |  | 60/259887  | 3740 | 
> ExclusiveLock| t   | f|

There are many reasons why one xact might be waiting on another to commit,
not only that they tried to update the same tuple.  However, in this case
I suspect that that is the problem, because we can also see that 3707 has
an exclusive tuple-level lock:

> tuple |91999 |93050 |0 | 1 || 
>   | |   |  | 54/196626  | 3707 | ExclusiveLock
> | t   | f|

That kind of lock would only be held while queueing to modify a tuple.
(Basically, it establishes that 3707 is next in line, in case some
other transaction comes along and also wants to modify the same tuple.)
It should be released as soon as the tuple update is made, so 3707 is
definitely stuck waiting to modify a tuple, and AFAICS it must be stuck
because of 3740's uncommitted earlier update.

> But I am being told those 2 sessions should not be trying to process the
> same PK rows.

Perhaps "should not" is wrong.  Or it could be some indirect update
(caused by a foreign key with CASCADE, or the like).

You have here the relation OID (try "SELECT 93050::regclass" to
decode it) and the tuple ID, so it should work to do

SELECT * FROM that_table WHERE ctid = '(0,1)';

to see the previous state of the problematic tuple.  Might
help to decipher the problem.

regards, tom lane

Re: Postgres 15 SELECT query doesn't use index under RLS

2023-10-26 Thread Tom Lane

Alexander Okulovich  writes:
> I've attempted to reproduce this on my PC in Docker from the stage 
> database dump, but no luck. The first query execution on Postgres 15 
> behaves like on the real stage, but subsequent ones use the index.

Can you force it in either direction with "set enable_seqscan = off"
(resp. "set enable_indexscan = off")?  If so, how do the estimated
costs compare for the two plan shapes?

> Also, 
> they execute much faster. Looks like the hardware and(or) the data 
> structure on disk matters.

Maybe your prod installation has a bloated index, and that's driving
up the estimated cost enough to steer the planner away from it.

    regards, tom lane

Re: Postgres 15 SELECT query doesn't use index under RLS

2023-10-18 Thread Tom Lane

Alexander Okulovich  writes:
> We used the product with the default settings. The requests are simple. 
> We didn't change the hardware (actually, we use even more performant 
> hardware because of that issue) and DDL. I've checked the request on old 
> and new databases. Requests that rely on this index execute more than 10 
> times longer. Planner indeed used Index Scan before, but now it doesn't.

> So, from my perspective, the only reason we experience that is database 
> logic change.

[ shrug... ]  Maybe, but it's still not clear if it's a bug, or an
intentional change, or just a cost estimate that was on the hairy
edge before and your luck ran out.

If you could provide a self-contained test case that performs 10x worse
under v15 than v12, we'd surely take a look at it.  But with the
information you've given so far, little is possible beyond speculation.

    regards, tom lane

Re: Postgres 15 SELECT query doesn't use index under RLS

2023-10-13 Thread Tom Lane

Alexander Okulovich  writes:
> Recently, we upgraded the AWS RDS instance from Postgres 12.14 to 15.4 
> and noticed extremely high disk consumption on the following query 
> execution:
> select (exists (select 1 as "one" from "public"."indexed_commit" where 
> "public"."indexed_commit"."repo_id" in (964992,964994,964999, ...);
> For some reason, the query planner starts using Seq Scan instead of the 
> index on the "repo_id" column when requesting under user limited with 
> RLS. On prod, it happens when there are more than 316 IDs in the IN part 
> of the query, on stage - 3. If we execute the request from Superuser, 
> the planner always uses the "repo_id" index.

The superuser bypasses the RLS policy.  When that's enforced, the
query can no longer use an index-only scan (because it needs to fetch
tenant_id too).  Moreover, it may be that only a small fraction of the
rows fetched via the index will satisfy the RLS condition.  So the
estimated cost of an indexscan query could be high enough to persuade
the planner that a seqscan is a better idea.

> Luckily, we can easily reproduce this on our stage database (which is 
> smaller). If we add a multicolumn "repo_id, tenant_id" index, the 
> planner uses it (Index Only Scan) with any IN params count under RLS.

Yeah, that would be the obvious way to ameliorate both problems.

If in fact you were getting decent performance from an indexscan plan
before, the only explanation I can think of is that the repo_ids you
are querying for are correlated with the tenant_id, so that the RLS
filter doesn't eliminate very many rows from the index result.  The
planner wouldn't realize that by default, but if you create extended
statistics on repo_id and tenant_id then it might do better.  Still,
you probably want the extra index.

> Could you please clarify if this is a Postgres bug or not?

You haven't shown any evidence suggesting that.

> Should we 
> include the "tenant_id" column in all our indexes to make them work 
> under RLS?

Adding tenant_id is going to bloat your indexes quite a bit,
so I wouldn't do that except in cases where you've demonstrated
it's important.

regards, tom lane

Re: Unexpected termination looping over table.

2023-10-01 Thread Tom Lane

Matt Gibbins  writes:
> Have encountered an intriguing issue processing a table with a large 
> number of rows. The process is to loop over the table processing each 
> row executing the Apache AGE cypher function over each row 
> individually.The looping is necessary because of encountering a limit on 
> the number of rows that the cypher function would process.

> The process terminates unexpectedly with the following message. Notable 
> that it runs for quite some time before termination.:

> SQL Error [42703]: ERROR: could not find rte for 
> a01a724103fbb3d059b8387bf043dbc8
>    Where: PL/pgSQL function 
> analysis.create_trips(text,text,text,text,text,text,integer,text,integer) 
> line 5 at EXECUTE

There is no occurrence of "could not find rte" anywhere in the
core Postgres source code.  I surmise that you're using some
extension that isn't happy.

regards, tom lane

Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

2023-09-15 Thread Tom Lane

Achilleas Mantzios  writes:
> Thank you, I see that both systems use en_US.UTF-8 as lc_collate and 
> lc_ctype,

Doesn't necessarily mean they interpret that the same way, though :-(

> the below seems ok

> FreeBSD :

> postgres@[local]/dynacom=# select * from (values 
> ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
> column1
> -
> _
> .
> 0
> a
> Z
> (5 rows)

Sadly, this proves very little about Linux's behavior.  glibc's idea
of en_US involves some very complicated multi-pass sort rules.
AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US
as "same as C except case-insensitive", whereas I'm pretty sure
that underscores and other punctuation are nearly ignored in
glibc's interpretation; they'll only be taken into account if the
alphanumeric parts of the strings sort equal.

regards, tom lane

Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

2023-09-15 Thread Tom Lane

Achilleas Mantzios - cloud  writes:
> *FreeBSD*
> 
>->  Index Only Scan using mail_vessel_addressbook_address_regex_idx 
> on mail_vessel_addressbook  (cost=0.42..2912.06 rows=620 width=32) 
> (actual time=96.704..96.705 rows=1 loops=1)
>  Filter: ('f...@bar.com'::text ~* address_regex)
>  Rows Removed by Filter: 14738
>  Heap Fetches: 0
>  Buffers: shared hit=71
> 
> *Linux*
> 
>->  Index Only Scan using mail_vessel_addressbook_address_regex_idx 
> on mail_vessel_addressbook  (cost=0.42..2913.04 rows=620 width=32) 
> (actual time=1768.724..1768.725 rows=1 loops=1)
>  Filter: ('f...@bar.com'::text ~* address_regex)
>  Rows Removed by Filter: 97781
>  Heap Fetches: 0
>  Buffers: shared hit=530

> The file in FreeBSD came by pg_dump from the linux system, I am puzzled 
> why this huge difference in Buffers: shared hit.

The "rows removed" value is also quite a bit different, so it's not
just a matter of buffer touches --- there's evidently some real difference
in how much of the index is being scanned.  I speculate that you are
using different collations on the two systems, and FreeBSD's collation
happens to place the first matching row earlier in the index.

regards, tom lane

Re: Planning time is time-consuming

2023-09-11 Thread Tom Lane

David Rowley  writes:
> I'm not sure if you're asking for help here because you need planning
> to be faster than it currently is, or if it's because you believe that
> planning should always be faster than execution. If you think the
> latter, then you're mistaken.

Yeah.  I don't see anything particularly troubling here.  Taking
circa three-quarters of a millisecond (on typical current hardware)
to plan a four-way join on large tables is not unreasonable.
In most cases one could expect the execution of such a query to
take a good bit longer than that.  I think the OP is putting too
much emphasis on an edge case where execution finishes quickly
because there are in fact zero rows matching the uuid restriction.

BTW, in addition to the duplicative indexes, I wonder why the
uuid columns being joined on aren't all of "uuid" type.  While
I doubt fixing that would move the needle greatly, it still
seems sloppy.

        regards, tom lane

Re: Slow query, possibly not using index

2023-08-27 Thread Tom Lane

Les  writes:
>>> If I try to select a single unused block this way:
>>> explain analyze select id from media.block b where nrefs =0 limit 1
>>> then it runs for more than 10 minutes (I'm not sure how long, I cancelled
>>> the query after 10 minutes).

>> You might think that even so, it shouldn't take that long ... but
>> indexes on UUID columns are a well known performance antipattern.

> I'm aware of the problems with random UUID values. I was using this
> function to create ulids from the beginning:

Oh, well that would have been useful information to provide at the
outset.  Now that we know the index order is correlated with creation
time, I wonder if it is also correlated with nrefs, in such a way that
scanning in index order is disastrous because all the desired rows are
at the end of the index.

Also, you deny deleting any rows, but that wasn't the point.  Do you
ever update nrefs from zero to nonzero?  That'd also leave dead
entries behind in this index.  If that is a routine event that is
correlated with creation time, it gets easier to believe that your
index could have lots of dead entries at the front.

We'd still have to figure out why autovacuum is failing to clean out
those entries in a timely fashion, but that seems like a plausible
way for the performance problem to exist.

> I can try to do VACUUM on this table, but I'm limited on resources. I think
> it will take days to do this. Maybe I can try to dump the whole database
> and restore it on another machine.

Pretty hard to believe that dump-and-restore would be faster than
VACUUM.

> (Is there a way to check the number of dead rows?)

I think contrib/pgstattuple might help.

regards, tom lane

Re: Slow query, possibly not using index

2023-08-27 Thread Tom Lane

Les  writes:
> If I try to select a single unused block this way:
> explain analyze select id from media.block b where nrefs =0 limit 1
> then it runs for more than 10 minutes (I'm not sure how long, I cancelled
> the query after 10 minutes).

Are you sure it isn't blocked on a lock?

Another theory is that the index contains many thousands of references
to now-dead rows, and the query is vainly searching for a live entry.
Given that EXPLAIN thinks there are only about 2300 live entries,
and yet you say the index is 400MB, this seems pretty plausible.
Have you disabled autovacuum, or something like that?  (REINDEX
could help here, at least till the index gets bloated again.)

You might think that even so, it shouldn't take that long ... but
indexes on UUID columns are a well known performance antipattern.
The index entry order is likely to have precisely zip to do with
the table's physical order, resulting in exceedingly-random access
to the table, which'll be horribly expensive when the table is so
much bigger than RAM.  Can you replace the UUID column with a simple
serial (identity) column?

> I believe it is not actually using the index, because reading a single
> (random?) entry from an index should not run for >10 minutes.

You should believe what EXPLAIN tells you about the plan shape.
(Its rowcount estimates are only estimates, though.)

    regards, tom lane

Re: slow delete

2023-08-15 Thread Tom Lane

Les  writes:
> It seems that two foreign key constraints use 10.395 seconds out of the
> total 11.24 seconds. But I don't see why it takes that much?

Probably because you don't have an index on the referencing column.
You can get away with that, if you don't care about the speed of
deletes from the PK table ...

regards, tom lane

Re: Plan weirdness. A sort produces more rows than the node beneath it

2023-08-04 Thread Tom Lane

Dane Foster  writes:
>> If the sort is the inner input to a merge join, this could reflect
>> mark-and-restore rescanning of the sort's output.  Are there a
>> whole lot of duplicate keys on the merge's other side?

> Yes. The course_id column's values repeat a LOT on the merge side.

Hmm.  The planner should avoid using a merge join if it knows that
to be true.  Maybe analyze'ing that table would prompt it to use
some other join method?

    regards, tom lane

Re: Plan weirdness. A sort produces more rows than the node beneath it

2023-08-04 Thread Tom Lane

Dane Foster  writes:
> I'm trying to understand a bit of weirdness in a plan output. There is a
> sort node above a sequential scan node where the scan node produces 26,026
> rows yet the sort node above it produces 42,995,408. How is it possible to
> sort more data than you received?

If the sort is the inner input to a merge join, this could reflect
mark-and-restore rescanning of the sort's output.  Are there a
whole lot of duplicate keys on the merge's other side?

    regards, tom lane

Re: Postgresql equal join on function with columns not use index

2023-06-13 Thread Tom Lane

"James Pang (chaolpan)"  writes:
> Looks like it's the function "regexp_replace" volatile and restrict=false 
> make the difference,  we have our application role with default 
> search_path=oracle,$user,public,pg_catalog.
>  =#select 
> oid,proname,pronamespace::regnamespace,prosecdef,proisstrict,provolatile from 
> pg_proc where proname='regexp_replace' order by oid;
>   oid  |proname | pronamespace | prosecdef | proisstrict | provolatile
> ---++--+---+-+-
>   2284 | regexp_replace | pg_catalog   | f | t   | i
>   2285 | regexp_replace | pg_catalog   | f | t   | i
>  17095 | regexp_replace | oracle   | f | f   | v 
>  17096 | regexp_replace | oracle   | f | f   | v
>  17097 | regexp_replace | oracle   | f | f   | v
>  17098 | regexp_replace | oracle   | f | f   | v

Why in the world are the oracle ones marked volatile?  That's what's
preventing them from being used in index quals.

regards, tom lane

Re: Postgresql equal join on function with columns not use index

2023-06-12 Thread Tom Lane

"James Pang (chaolpan)"  writes:
>We migrate from Oracle to Postgresql14.8, one SQL has regression in 
> Postgres  run in 5800 milliseconds in Postgresql v14.8,  but the same SQL got 
> done in several hundred milliseconds in Oracle database.
>With multiple table JOINs, if the join condition is  
> tablea.column1=tableb.column1, optimizer will use the index to filter data in 
> nest loops, but if  tablea.column1=regexp_replace(tableb.column1),
> Optimizer will not be able to use the index on tablea.column1, then it do a 
> table scan and nestloop to produce a lot rows then use 
> tablea.column1=regexp_replace(tableb.column1) as a filter.  As a 
> workaround we create a view then use tablea.column1=view.column1 that works.
>  Is it expected ?details as below.

It's impossible to comment on this usefully with such a fragmentary
description of the problem.  Please send a complete, self-contained
test case if you want anybody to look at it carefully.

https://wiki.postgresql.org/wiki/Slow_Query_Questions

regards, tom lane

Re: Weird behavior of INSERT QUERY

2023-06-04 Thread Tom Lane

Satalabaha Postgres  writes:
> The select query in the insert returns about 499 rows. However, this insert
> statement when executed from application user i.e. schema1_u takes close to
>  8 minutes. When the same insert statement gets executed as  postgres user
> it takes less than 280 ms. Both the executions use the same execution plan
> with only difference that when schema1_u executes the SQL, we observe
> "Trigger for constraint fk_con_tablea: time=426499.314 calls=499" taking
> more time.

So you need to find out what's happening in the trigger.  Perhaps
auto_explain with auto_explain.log_nested_statements enabled
would give some insight.

I suspect there might be a permissions problem causing schema1_u
to not be allowed to "see" the statistics for table_b, resulting
in a bad plan choice for the FK enforcement query; but that's
just a guess.

regards, tom lane

Re: PostgreSQL performance on ARM i.MX6

2023-05-23 Thread Tom Lane

"Druckenmueller, Marc"  writes:
> I am investigating possible throughput with PostgreSQL 14.4 on an ARM i.MX6 
> Quad CPU (NXP sabre board).
> Testing with a simple python script (running on the same CPU), I get ~1000 
> request/s.

That does seem pretty awful for modern hardware, but it's hard to
tease apart the various potential causes.  How beefy is that CPU
really?  Maybe the overhead is all down to client/server network round
trips?  Maybe psycopg is doing something unnecessarily inefficient?

For comparison, on my development workstation I get

[ create the procedure manually in db test ]
$ cat bench.sql
call dummy_call(1,2,3,array[1,2,3]::float8[]);
$ pgbench -f bench.sql -n -T 10 test
pgbench (16beta1)
transaction type: bench.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 353891
number of failed transactions: 0 (0.000%)
latency average = 0.028 ms
initial connection time = 7.686 ms
tps = 35416.189844 (without initial connection time)

and it'd be more if I weren't using an assertions-enabled
debug build.  It would be interesting to see what you get
from exactly that test case on your ARM board.

BTW, one thing I see that's definitely an avoidable inefficiency in
your test is that you're forcing the array parameter to real[]
(i.e. float4) when the procedure takes double precision[]
(i.e. float8).  That forces an extra run-time conversion.  Swapping
between float4 and float8 in my pgbench test doesn't move the needle
a lot, but it's noticeable.

Another thing to think about is that psycopg might be defaulting
to a TCP rather than Unix-socket connection, and that might add
overhead depending on what kernel you're using.  Although, rather
than try to micro-optimize that, you probably ought to be thinking
of how to remove network round trips altogether.  I can get upwards
of 300K calls/second if I push the loop to the server side:

test=# \timing
Timing is on.
test=# do $$
declare x int := 1; a float8[] := array[1,2,3];
begin
for i in 1..100 loop
  call dummy_call (x,x,x,a);
end loop;
end $$;
DO
Time: 3256.023 ms (00:03.256)
test=# select 100/3.256023;
  ?column?   
-
 307123.137643683721
(1 row)

Again, it would be interesting to compare exactly that
test case on your ARM board.

        regards, tom lane

Re: Performance issues in query with multiple joins

2023-04-28 Thread Tom Lane

=?UTF-8?B?zqDOsc+BzrHPg866zrXPhc63IM6gzrHPg8+DzrHPgc63?= 
 writes:
> We are facing a performance issue with the following query. Executing this
> query takes about 20 seconds.

Raising join_collapse_limit (to more than the number of joins in
the query) might help.  But I think really if performance is a
problem you should think about ditching the star schema design.

regards, tom lane

Re: High QPS, random index writes and vacuum

2023-04-17 Thread Tom Lane

peter plachta  writes:
> The company I work for has a large (50+ instances, 2-4 TB each) Postgres
> install. One of the key problems we are facing in vanilla Postgres is
> vacuum behavior on high QPS (20K writes/s), random index access on UUIDs.

Indexing on a UUID column is an antipattern, because you're pretty much
guaranteed the worst-case random access patterns for both lookups and
insert/delete/maintenance cases.  Can you switch to timestamps or
the like?

There are proposals out there for more database-friendly ways of
generating UUIDs than the traditional ones, but nobody's gotten
around to implementing that in Postgres AFAIK.

regards, tom lane

Re: Query unable to utilize index without typecast to fixed length character

2023-04-06 Thread Tom Lane

ahi  writes:
> CREATE TABLE public.marketplace_sale (
> log_index integer NOT NULL,
> created_at timestamp with time zone DEFAULT now() NOT NULL,
> updated_at timestamp with time zone DEFAULT now() NOT NULL,
> block_timestamp timestamp with time zone NOT NULL,
> block bigint NOT NULL,
> contract_address character(42) NOT NULL,
> buyer_address character(42) NOT NULL,
> seller_address character(42) NOT NULL,
> transaction_hash character(66) NOT NULL,
> quantity numeric NOT NULL,
> token_id numeric NOT NULL,
  ...

Type character(N) is a hangover from the days of punched cards.
Don't use it.  It has weird semantics concerning trailing spaces,
which are almost never the behavior you actually want, and cause
interoperability issues with type text.  (Text is Postgres' native
string type, meaning that unlabeled string constants will tend to
get resolved to that.)

    regards, tom lane

Re: Why are commits consuming most of the database time?

2023-04-04 Thread Tom Lane

Tim Slechta  writes:
> The test data below is from a non-virtualized (client system and database
> server) Postgres 14 environment, with no replication, no high availability,
> and with no load balancing. This environment has older and slower disk
> drives, and the test is driven by a single client process.

> In this case 24% of the round trips (client to database and back) are for
> commit processing.  However, commit processing is consuming 89% of the
> total database time. (All times are measured from within the client.)

You didn't say how big the transactions are, but if they're not writing
a lot of data apiece, this result seems totally non-surprising.  The
commits have to push WAL log data down to disk before they can promise
that the transaction's results are durable, while the statements within
the transactions probably are not waiting for any disk writes at all.

If you don't need strict ACID compliance, you could turn off
synchronous_commit so that commits don't wait for WAL flush.
(This doesn't risk the consistency of your database, but it does
mean that a crash might lose the last few transactions that clients
were told got committed.)

If you do need strict ACID compliance, get a better disk subsystem.
Or, perhaps, just a better OS ... Windows is generally not thought of
as the best-performing platform for Postgres.

    regards, tom lane

Re: Connection forcibly closed remote server error.

2023-02-15 Thread Tom Lane

aditya desai  writes:
> We are getting this error when transferring data using COPY command or
> running workflow for huge data. We are using Password Authentication(LDAP)
> "Connection forcibly closed remote server"

Network timeout perhaps?  If so, setting more-aggressive TCP keepalive
parameters might help.

        regards, tom lane

Re: Domain check taking place unnecessarily?

2023-02-09 Thread Tom Lane

Mark Hills  writes:
> On Wed, 8 Feb 2023, Laurenz Albe wrote:
>> It takes 30 seconds to schan the table and determine that all existing 
>> rows satisky the constraint.

> But there's no existing data (note this is adding column, not constraint)

> Existing rows are guaranteed to satisfy the domain check, because the 
> domain check is guaranteed to be immutable (per [1] in my original mail)

immutable != "will accept null".

There could be some more optimizations here, perhaps, but there aren't.

    regards, tom lane

Re: ALTER STATEMENT getting blocked

2023-01-19 Thread Tom Lane

aditya desai  writes:
> We have a Postgres 11.16 DB which is continuously connected to informatica
> and data gets read from it continuously.

> When we have to ALTER TABLE.. ADD COLUMN.. it gets blocked by the SELECTs
> on the table mentioned by process above.

> Is there any way to ALTER the table concurrently without  getting blocked?
> Any parameter or option? Can someone give a specific command?

ALTER TABLE requires exclusive lock to do that, so it will queue up
behind any existing table locks --- but then new lock requests will
queue up behind its request.  So this'd only happen if your existing
reading transactions don't terminate.  Very long-running transactions
are unfriendly to other transactions for lots of reasons including
this one; see if you can fix your application to avoid that.  Or
manually cancel the blocking transaction(s) after the ALTER begins
waiting.

        regards, tom lane

Re: change the default value of enable_bitmapscan to off

2023-01-14 Thread Tom Lane

"hehaoc...@hotmail.com"  writes:
> The default is enable_bitmapscan on. However, TPC-H.query17 get slower 
> running on my NVMe SSD (WD SN850) after switching on the parameter: latency 
> drop from 9secs to 16secs. During a B-tree Index Scan, bitmapscan 
> optimization converts random I/O into sequential. However, many users use 
> SSDs rather than HDDs. But they may not know the trick. Is there a 
> possibility that can change the default value to off?

Use ALTER SYSTEM SET, or edit postgresql.conf:

https://www.postgresql.org/docs/current/config-setting.html

Note that changing planner parameters on the basis of a single
query getting slower is a classic beginner error.  You need
to think about the totality of the installation's workload.

        regards, tom lane

Re: When you really want to force a certain join type?

2022-12-28 Thread Tom Lane

Gunther Schadow  writes:
> Also, why even use the RECURSIVE keyword, DB2 didn't need it, and the 
> query analyzer should immediately see the recursion, so no need to have 
> that keyword.

Our reading of the SQL spec is that it's required.  The scope of
visibility of CTE names is different depending on whether you
write RECURSIVE or not, so it's not a question of "the query analyzer
should see it": the analyzer is required NOT to see it.

DB2 generally has a reputation for agreeing with the spec,
so I'm surprised to hear that they're not doing this per spec.

        regards, tom lane

Re: Fwd: temp_file_limit?

2022-12-18 Thread Tom Lane

Frits Jalvingh  writes:
> Just to be clear: my real question is: why is temp_file_limit not working
> at the specified size?

I've not looked at that code lately, but I strongly suspect that
it's implemented in such a way that it's a per-process limit, not a
per-session limit.  So each parallel worker could use up that much
space.

It's also possible that you've found an actual bug, but without
a reproducer case nobody's going to take that possibility too
seriously.  We're unlikely to accept "the limit should work
across multiple processes" as a valid bug though.  That would
require a vastly more complicated implementation.

        regards, tom lane

Re: JSON down performacen when id:1

2022-12-16 Thread Tom Lane

"Render Comunicacion S.L."  writes:
> The issue:
> When we search our locator with section_id: 1 (or any number < 4), PostgreSQL 
> takes around 4, 5000, 8000ms or more.
> When we search our locator with section_id: 4 (or any other bigger number), 
> PostgreSQL takes around 100 ms. ( ~ expected time)

Your index is providing pretty awful performance:

> ->  Bitmap Heap Scan on matrix  (cost=92.21..199.36 rows=27 
> width=1144) (actual time=415.708..8325.296 rows=11 loops=1)
>   Recheck Cond: ((datos #> '{relations}'::text[]) @> 
> '[{"section_id": "1", "section_tipo": "numisdata3"}]'::jsonb)
>   Rows Removed by Index Recheck: 272037
>   Heap Blocks: exact=34164 lossy=33104
>   ->  Bitmap Index Scan on matrix_relations_idx  
> (cost=0.00..92.20 rows=27 width=0) (actual time=61.462..61.462 rows=155031 
> loops=1)
> Index Cond: ((datos #> '{relations}'::text[]) @> 
> '[{"section_id": "1", "section_tipo": "numisdata3"}]'::jsonb)

I read that as 155K hits delivered by the index, of which only 11 were
real matches.  To make matters worse, with so many hits the bitmap was
allowed to become "lossy" (ie track some hits at page-level not
tuple-level) to conserve memory, so that the executor actually had to
check even more than 155K rows.

You need a better index.  It might be that switching to a jsonb_path_ops
index would be enough to fix it, or you might need to build an expression
index matched specifically to this type of query.  See

https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING

Also, if any of the terminology there doesn't make sense, read

https://www.postgresql.org/docs/current/indexes.html

regards, tom lane

Re: Increased iowait and blk_read_time with higher shared_buffers

2022-12-14 Thread Tom Lane

Jordan Hurwich  writes:
> I'm familiar with the article you linked to, and part of my surprise is
> that with these 32GB RAM machines we're seeing better performance at 12.5%
> (4GB) than the commonly recommended 25% (8GB) of system memory for
> shared_buffers. Your notes about disk read stats from Postgres potentially
> actually representing blocks read from the OS cache make sense, I just
> imagined that Postgres would be better at managing the memory when it was
> dedicated to it via shared_buffers than the OS (obviously with some point
> of diminishing returns); and I'm still hoping there's some Postgres
> configuration change we can make that enables better performance through
> improved utilization of shared_buffers at the commonly recommended 25% of
> system memory.

Keep in mind that 25% was never some kind of golden number.  It is
a rough rule of thumb that was invented for far smaller machines than
what you're talking about here.

    regards, tom lane

Re: Odd Choice of seq scan

2022-12-01 Thread Tom Lane

Ronuk Raval  writes:
> We've been working around the problem by rewriting queries to use UNION 
> instead.

Yeah, that.  The real issue here is that the seqscan and indexscan plans
both suck, because they will both run that sub-select for every row
in the table.  The index-only plan might fetch fewer blocks along the
way, because it only has to read the index not the table proper ...
but that's only true if the table's pages are mostly marked all-visible.
(My bet about the plan instability is that the planner might choose
differently depending on how much of the table it believes is
all-visible.)  That only helps a bit, though.

What you really want to have happen, assuming there are not too many
interesting orderid values, is to do a point indexscan for each one
of them.  Currently the planner won't think of that by itself when
faced with OR'd conditions in WHERE.  You have to help it along with
UNION or some similar locution.

regards, tom lane

Re: Geometric types row estimation

2022-11-30 Thread Tom Lane

Igor ALBUQUERQUE SILVA  writes:
> Thanks a lot for the explanation, I thought the built-in types were more
> standard, so I didn't mention that I was having the same thing using
> postgis.

Hm --- you'd have to take that up with the PostGIS people.  But they
at least would be likely to have motivation to improve things.

regards, tom lane

Re: Geometric types row estimation

2022-11-30 Thread Tom Lane

Igor ALBUQUERQUE SILVA  writes:
> I'm having a problem regarding the point type/gist indexes. Here's a
> minimal reproduction of it:
> ...
> What I was expecting is the first query to estimate 4 rows and the second
> to estimate 1, like what I get If I try the same thing using integers.

Unfortunately, the selectivity estimation functions for PG's geometric
types are mostly just stubs.  The estimation function for point <@ box
in particular is contsel [1]:

/*
 *  contsel -- How likely is a box to contain (be contained by) a given box?
 *
 * This is a tighter constraint than "overlap", so produce a smaller
 * estimate than areasel does.
 */
Datum
contsel(PG_FUNCTION_ARGS)
{
PG_RETURN_FLOAT8(0.001);
}

It's been like that (excepting notational changes) since Berkeley days,
because nobody has bothered to make it better.

In general, PG's built-in geometric types have never gotten much
beyond their origins as an academic proof-of-concept.  I think people
who are doing serious work that requires such operations mostly use
PostGIS, and I'd suggest looking into that.

Or, if you feel like doing a lot of work to make these estimators
better, have at it.

        regards, tom lane

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/utils/adt/geo_selfuncs.c;hb=HEAD

Re: Catching up with performance & PostgreSQL 15

2022-11-30 Thread Tom Lane

Andres Freund  writes:
> On November 30, 2022 3:47:32 AM PST, Andrew Dunstan  
> wrote:
>> I think Alvaro's point is that it would have been better to work out
>> these wrinkles before turning on JIT by default. Based on anecdotal
>> reports from the field I'm inclined to agree.

> The problem is that back when it was introduced these problems didn't exist 
> to a significant degree. JIT was developed when partitioning was very 
> minimal- and the problems we're seeing are almost exclusively with queries 
> with many partitions. The problems really only started much more recently. It 
> also wasn't enabled in the first release..

Well, wherever you want to pin the blame, it seems clear that we
have a problem now.  And I don't think flipping back to off-by-default
is the answer -- surely there is some population of users who will
not be happy with that.  We really need to prioritize fixing the
cost-estimation problems, and/or tweaking the default thresholds.

    regards, tom lane

Re: Catching up with performance & PostgreSQL 15

2022-11-29 Thread Tom Lane

Alvaro Herrera  writes:
> IMO it was a mistake to turn JIT on in the default config, so that's one
> thing you'll likely want to change.

I wouldn't necessarily go quite that far, but I do think that the
default cost thresholds for invoking it are enormously too low,
or else there are serious bugs in the cost-estimation algorithms
for deciding when to use it.  A nearby example[1] of a sub-1-sec
partitioned query that took 30sec after JIT was enabled makes me
wonder if we're accounting correctly for per-partition JIT costs.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/B6025887-D73F-4B5B-9925-4DA4B675F7E5%40elevated-dev.com

Re: why choosing an hash index instead of the btree version even if the cost is lower?

2022-11-18 Thread Tom Lane

Tomas Vondra  writes:
> On 11/18/22 13:15, Luca Ferrari wrote:
>> I've a table with a numeric column (integer), and I've created two
>> indexes on such column, one btree and one hash. The hash results much
>> larger as the btree, but what puzzles me is that executing an equality
>> simple query, the system chooses the hash index (that has a final cost
>> of 8984.08 while the btree index would have a final cost a little
>> lower (8901.94).
>> 
>> The only difference I can spot in the EXPLAIN plans is that the btree
>> index has an initial cost, but I don't think this is the reason, since
>> it should be the final cost what matters, right?

> My guess is this is due to STD_FUZZ_FACTOR, see [1] and [2].

> That is, when comparing costs, we require the cost to be at least 1%,
> because we have a cheapest path, and we're checking if it's worth
> building another one (which is not free - we have to allocate stuff
> etc.). And if the difference is tiny, it's not worth it.

Even more to the point: if the total costs are fuzzily the same,
then the next point of comparison will be the startup costs,
which is where the hash index wins.  I'm not sure if it's quite
fair to give hash a zero startup cost; but it doesn't have to
descend a search tree, so it is fair that its startup cost is
less than btree's.

regards, tom lane

Re: When can joins be avoided?

2022-11-17 Thread Tom Lane

Stefan Fehrenbach  writes:
> I'm interested in the rules that Postgres uses to avoid joins. Are
> these documented somewhere? If I had to look at the source code, where
> would I start?

src/backend/optimizer/plan/analyzejoins.c

> They don't seem to match my intuition about which joins
> could be avoided.

I believe only left joins to single tables can be elided ATM.
It's too hard to prove uniqueness of the join key in more-
complicated cases.

        regards, tom lane

Re: pgbench: could not connect to server: Resource temporarily unavailable

2022-08-23 Thread Tom Lane

Thomas Munro  writes:
> Oh, one comment there is actually obsolete now AFAIK.  Unless there is
> some reason to think personality(ADDR_NO_RANDOMIZE) might not work in
> some case where sysctl -w kernel.randomize_va_space=0 will, I think we
> can just remove that.

AFAICS, f3e78069db7 silently does nothing on platforms lacking
ADDR_NO_RANDOMIZE and PROC_ASLR_FORCE_DISABLE.  Are you asserting
there are no such platforms?

(I'm happy to lose the comment if it's really useless now, but
I think we have little evidence of that.)

    regards, tom lane

Re: pgbench: could not connect to server: Resource temporarily unavailable

2022-08-22 Thread Tom Lane

Thomas Munro  writes:
> On Tue, Aug 23, 2022 at 4:57 AM Tom Lane  wrote:
> +service the requests, with those clients receiving unhelpful
> +connection failure errors such as Resource temporarily
> +unavailable.

> LGTM but I guess I would add "... or Connection refused"?

Is that the spelling that appears on FreeBSD?  Happy to add it.

    regards, tom lane

Re: pgbench: could not connect to server: Resource temporarily unavailable

2022-08-22 Thread Tom Lane

Junwang Zhao  writes:
> Just curious, *backlog* defines the maximum pending connections,
> why do we need to double the MaxConnections as the queue size?

The postmaster allows up to twice MaxConnections child processes
to exist, per the comment in canAcceptConnections:

 * We allow more connections here than we can have backends because some
 * might still be authenticating; they might fail auth, or some existing
 * backend might exit before the auth cycle is completed.  The exact
 * MaxBackends limit is enforced when a new backend tries to join the
 * shared-inval backend array.

You can argue that 2X might not be the right multiplier, and you
can argue that the optimal listen queue length might be more or
less than the limit on number of child processes, but that's how
we've historically done it.  I'm not especially interested in
changing that without somebody making a well-reasoned case for
some other number.

regards, tom lane

Re: pgbench: could not connect to server: Resource temporarily unavailable

2022-08-22 Thread Tom Lane

OK, here's some proposed patches.

0001 adds a para about how to raise the listen queue length.

0002 isn't quite related, but while writing 0001 I noticed a nearby
use of /proc/sys/... which I thought should be converted to sysctl.
IMO /proc/sys pretty much sucks, at least for documentation purposes,
for multiple reasons:

* It's unlike the way you do things on other platforms.

* "man sysctl" will lead you to useful documentation about how to
use that command.  There's no obvious way to find documentation
about /proc/sys.

* It's not at all sudo-friendly.  Compare
sudo sh -c 'echo 0 >/proc/sys/kernel/randomize_va_space'
sudo sysctl -w kernel.randomize_va_space=0
The former is a lot longer and it's far from obvious why you have
to do it that way.

* You have to think in sysctl terms anyway if you want to make the
setting persist across reboots, which you almost always do.

* Everywhere else in runtime.sgml, we use sysctl not /proc/sys.

0003 removes PG_SOMAXCONN.  While doing that I noticed that this
computation hadn't been touched throughout all the various
changes fooling with exactly what gets counted in MaxBackends.
I think the most appropriate definition for the listen queue
length is now MaxConnections * 2, not MaxBackends * 2, because
the other processes counted in MaxBackends don't correspond to
incoming connections.

I propose 0003 for HEAD only, but the docs changes could be
back-patched.

    regards, tom lane

diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 963b18ed85..1192faa6ae 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1298,6 +1298,22 @@ default:\
 linkend="guc-max-files-per-process"/> configuration parameter to
 limit the consumption of open files.

+
+   
+Another kernel limit that may be of concern when supporting large
+numbers of client connections is the maximum socket connection queue
+length.  If more than that many connection requests arrive within a
+very short period, some may get rejected before the postmaster can
+service the requests, with those clients receiving unhelpful
+connection failure errors such as Resource temporarily
+unavailable.  The default queue length limit is 128 on many
+platforms.  To raise it, adjust the appropriate kernel parameter
+via sysctl, then restart the postmaster.
+The parameter is variously named net.core.somaxconn
+on Linux, kern.ipc.soacceptqueue on newer FreeBSD,
+and kern.ipc.somaxconn on macOS and other BSD
+variants.
+   
   
 
   
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 963b18ed85..1192faa6ae 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1258,11 +1258,12 @@ default:\
 
  
   
-  On Linux
-  /proc/sys/fs/file-max determines the
-  maximum number of open files that the kernel will support.  It can
-  be changed by writing a different number into the file or by
-  adding an assignment in /etc/sysctl.conf.
+  On Linux the kernel parameter
+  fs.file-max determines the maximum number of open
+  files that the kernel will support.  It can be changed with
+  sysctl -w fs.file-max=N.
+  To make the setting persist across reboots, add an assignment
+  in /etc/sysctl.conf.
   The maximum limit of files per process is fixed at the time the
   kernel is compiled; see
   /usr/src/linux/Documentation/proc.txt for
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 8a038d1b2a..1664fcee2a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4891,7 +4891,7 @@ SubPostmasterMain(int argc, char *argv[])
 	 * If testing EXEC_BACKEND on Linux, you should run this as root before
 	 * starting the postmaster:
 	 *
-	 * echo 0 >/proc/sys/kernel/randomize_va_space
+	 * sysctl -w kernel.randomize_va_space=0
 	 *
 	 * This prevents using randomized stack and code addresses that cause the
 	 * child process's memory map to be different from the parent's, making it
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 8ff3be611d..7112e9751b 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -537,13 +537,11 @@ StreamServerPort(int family, const char *hostName, unsigned short portNumber,
 		}
 
 		/*
-		 * Select appropriate accept-queue length limit.  PG_SOMAXCONN is only
-		 * intended to provide a clamp on the request on platforms where an
-		 * overly large request provokes a kernel error (are there any?).
+		 * Select appropriate accept-queue length limit.  It seems reasonable
+		 * to use a value similar to the maximum number of child processes
+		 * that the postmaster will permit.
 		 */
-		maxconn = MaxBackends * 2;
-		if (maxconn > PG_SOMAXCONN)
-			maxconn = PG_SOMAXCONN;
+		maxconn = MaxConnections * 2;
 
 		err = lis

Re: pgbench: could not connect to server: Resource temporarily unavailable

2022-08-21 Thread Tom Lane

Thomas Munro  writes:
> Cool.  BTW small correction to something I said about FreeBSD: it'd be
> better to document the new name kern.ipc.soacceptqueue (see listen(2)
> HISTORY) even though the old name still works and matches OpenBSD and
> macOS.

Thanks.  Sounds like we get to document at least three different
sysctl names for this setting :-(

    regards, tom lane

Re: pgbench: could not connect to server: Resource temporarily unavailable

2022-08-21 Thread Tom Lane

Thomas Munro  writes:
> On Mon, Aug 22, 2022 at 12:20 PM Tom Lane  wrote:
>> Hmm.  It'll be awhile till the 128 default disappears entirely
>> though, especially if assorted BSDen use that too.  Probably
>> worth the trouble to document.

> I could try to write a doc patch if you aren't already on it.

I haven't done anything about it yet, but could do so tomorrow or so.

(BTW, I just finished discovering that NetBSD has the same 128 limit.
It looks like they intended to make that settable via sysctl, because
it's a variable not a constant; but they haven't actually wired up the
variable to sysctl yet.)

> Oh, right.  Looks like that was just  paranoia in commit 153f4006763,
> back when you got away from using the (very conservative) SOMAXCONN
> macro.  Looks like that was 5 on ancient systems going back to the
> original sockets stuff, and later 128 was a popular number.  Yeah I'd
> say +1 for removing our cap.  I'm pretty sure every system will
> internally cap whatever value we pass in if it doesn't like it, as
> POSIX explicitly says it can freely do with this "hint".

Yeah.  I hadn't thought to check the POSIX text, but their listen(2)
page is pretty clear that implementations should *silently* reduce
the value to what they can handle, not fail.  Also, SUSv2 says the
same thing in different words, so the requirement's been that way
for a very long time.  I think we could drop this ancient bit of
paranoia.

> ... Hmm, maybe we could add a hint to the error,
> though?

libpq doesn't really have a notion of hints --- perhaps we ought
to fix that sometime.  But this doesn't seem like a very exciting
place to start, given the paucity of prior complaints.  (And anyway
people using other client libraries wouldn't be helped.)  I think
some documentation in the "Managing Kernel Resources" section
should be plenty for this.

regards, tom lane

Re: pgbench: could not connect to server: Resource temporarily unavailable

2022-08-21 Thread Tom Lane

Thomas Munro  writes:
> Yeah retrying doesn't seem that nice.  +1 for a bit of documentation,
> which I guess belongs in the server tuning part where we talk about
> sysctls, perhaps with a link somewhere near max_connections?  More
> recent Linux kernels bumped it to 4096 by default so I doubt it'll
> come up much in the future, though.

Hmm.  It'll be awhile till the 128 default disappears entirely
though, especially if assorted BSDen use that too.  Probably
worth the trouble to document.

> Note that we also call listen()
> with a backlog value capped to our own PG_SOMAXCONN which is 1000.  I
> doubt many people benchmark with higher numbers of connections but
> it'd be nicer if it worked when you do...

Actually it's 1.  Still, I wonder if we couldn't just remove
that limit now that we've desupported a bunch of stone-age kernels.
It's hard to believe any modern kernel can't defend itself against
silly listen-queue requests.

        regards, tom lane

Re: pgbench: could not connect to server: Resource temporarily unavailable

2022-08-21 Thread Tom Lane

Thomas Munro  writes:
> If it's something like that, maybe increasing
> /proc/sys/net/core/somaxconn would help?  I think older kernels only
> had 128 here.

Bingo!  I see

$ cat /proc/sys/net/core/somaxconn
128

by default, which is right about where the problem starts.  After

$ sudo sh -c 'echo 1000 >/proc/sys/net/core/somaxconn'

*and restarting the PG server*, I can do a lot more threads without
a problem.  Evidently, the server's socket's listen queue length
is fixed at creation and adjusting the kernel limit won't immediately
change it.

So what we've got is that EAGAIN from connect() on a Unix socket can
mean "listen queue overflow" and the kernel won't treat that as a
nonblock-waitable condition.  Still seems like a kernel bug perhaps,
or at least a misfeature.

Not sure what I think at this point about making libpq retry after
EAGAIN.  It would make sense for this particular undocumented use
of EAGAIN, but I'm worried about others, especially the documented
reason.  On the whole I'm inclined to leave the code alone;
but is there sufficient reason to add something about adjusting
somaxconn to our documentation?

        regards, tom lane

Re: pgbench: could not connect to server: Resource temporarily unavailable

2022-08-21 Thread Tom Lane

Andrew Dunstan  writes:
> On 2022-08-21 Su 17:15, Tom Lane wrote:
>> On the whole this is smelling more like a Linux kernel bug than
>> anything else.

> *nod*

Conceivably we could work around this in libpq: on EAGAIN, just
retry the failed connect(), or maybe better to close the socket
and take it from the top with the same target server address.

On the one hand, reporting EAGAIN certainly sounds like an
invitation to do just that.  On the other hand, if the failure
is persistent then libpq is locked up in a tight loop --- and
"Insufficient entries in the routing cache" doesn't seem like a
condition that would clear immediately.

It's also pretty unclear why the kernel would want to return
EAGAIN instead of letting the nonblock connection path do the
waiting, which is why I'm suspecting a bug rather than designed
behavior.

I think I'm disinclined to install such a workaround unless we
get confirmation from some kernel hacker that it's operating
as designed and application-level retry is intended.

        regards, tom lane

Re: pgbench: could not connect to server: Resource temporarily unavailable

2022-08-21 Thread Tom Lane

Andrew Dunstan  writes:
> On 2022-08-20 Sa 23:20, Tom Lane wrote:
>> Kevin McKibbin  writes:
>>> What's limiting my DB from allowing more connections?

> The first question in my mind from the above is where this postgres
> instance is actually listening. Is it really /var/run/postgresql? Its
> postmaster.pid will tell you. I have often seen client programs pick up
> a system libpq which is compiled with a different default socket directory.

I wouldn't think that'd explain a symptom of some connections succeeding
and others not within the same pgbench run.

I tried to duplicate this behavior locally (on RHEL8) and got something
interesting.  After increasing the server's max_connections to 1000,
I can do

$ pgbench -S -c 200 -j 100 -t 100 bench

and it goes through fine.  But:

$ pgbench -S -c 200 -j 200 -t 100 bench
pgbench (16devel)
starting vacuum...end.
pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5440" failed: 
Resource temporarily unavailable
Is the server running locally and accepting connections on that socket?
pgbench: error: could not create connection for client 154

So whatever is triggering this has nothing to do with the server,
but with how many threads are created inside pgbench.  I notice
also that sometimes it works, making it seem like possibly a race
condition.  Either that or there's some limitation on how fast
threads within a process can open sockets.

Also, I determined that libpq's connect() call is failing synchronously
(we get EAGAIN directly from the connect() call, not later).  I wondered
if libpq should accept EAGAIN as a synonym for EINPROGRESS, but no:
that just makes it fail on the next touch of the socket.

The only documented reason for connect(2) to fail with EAGAIN is

   EAGAIN Insufficient entries in the routing cache.

which seems pretty unlikely to be the issue here, since all these
connections are being made to the same local address.

On the whole this is smelling more like a Linux kernel bug than
anything else.

    regards, tom lane

Re: pgbench: could not connect to server: Resource temporarily unavailable

2022-08-20 Thread Tom Lane

Kevin McKibbin  writes:
> What's limiting my DB from allowing more connections?

> This is a sample of the output I'm getting, which repeats the error 52
> times (one for each failed connection)

> -bash-4.2$ pgbench -c 200 -j 200 -t 100 benchy
> ...
> connection to database "benchy" failed:
> could not connect to server: Resource temporarily unavailable
> Is the server running locally and accepting
> connections on Unix domain socket
> "/var/run/postgresql/.s.PGSQL.5432"?

This is apparently a client-side failure not a server-side failure
(you could confirm that by seeing whether any corresponding
failure shows up in the postmaster log).  That means that the
kernel wouldn't honor pgbench's attempt to open a connection,
which implies you haven't provisioned enough networking resources
to support the number of connections you want.  Since you haven't
mentioned what platform this is on, it's impossible to say more
than that --- but it doesn't look like Postgres configuration
settings are at issue at all.

regards, tom lane

Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

2022-08-03 Thread Tom Lane

bruno da silva  writes:
> *Question: *Besides the gc issue that you mentioned, having a large ( 700MB
> or 1GB ) pgss_query_texts.stat could cause slowness in pg_stat_statement
> processing
> than leading to slower query responses with a 32bit PG? I'm thinking in
> reducing pg_stat_statements.max from 10k to 3k

Whether or not we've fully identified the problem, I think cutting
pg_stat_statements.max is a good idea.  Especially as long as you're
stuck on an unsupported PG version.

        regards, tom lane

Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

2022-08-02 Thread Tom Lane

I wrote:
> bruno da silva  writes:
>> Do you have a lot of especially long statements being tracked
>> in the pg_stat_statements view?* well, the view was showing the query
>> column null.*
>> * but looking on  pgss_query_texts.stat there are very large sql
>> statements, of around ~ 400kb, multiple thousands. *

I see one possible piece of the puzzle here: since you're using a 32-bit
build, overflowing size_t is a reachable hazard.  Specifically, in this
test to see if we need to garbage-collect the query text file:

if (extent < pgss->mean_query_len * pgss_max * 2)
return false;

You said earlier that pg_stat_statements.max = 1, so a mean_query_len
exceeding about 2^32 / 1 / 2 = 214748.3648 would be enough to overflow
size_t and break this comparison.  Now, a mean SQL query length in excess
of 200kB sounds mighty improbable, but it's really the mean length of the
query texts in the view.  If your "normal" queries fall into just a few
patterns they might be represented by a relatively small number of view
entries.  And if the "big" queries are sufficiently not alike, they might
each get their own view entry, which could potentially drive the mean high
enough to cause trouble.  It'd be interesting to track what
"SELECT avg(length(query)) FROM pg_stat_statements" gives.

However, even if we grant that mean_query_len is that big, overflow here
would make garbage collection of the query text file more likely not less
so.  What I'm speculating is that overflow is occurring and causing all
processes to decide they need to run gc_qtexts() every time they insert
a new query entry, even though the query texts file isn't actually
bloated.  That could possibly explain your performance issues: a garbage
collection pass over a multi-gig file will take awhile, and what's worse
is that it's done under an exclusive lock, meaning that all the backends
stack up waiting their turn to perform a useless GC pass.

What this doesn't explain is why the condition doesn't clear once you
observe one of those "out of memory" complaints, because that should
lead to truncating the texts file.  Maybe it does get truncated, but
then the cycle repeats after awhile?  If you have a steady stream of
incoming new 400kB queries, you could build back up to 2.2GB of text
after five thousand or so of those.

I'm also curious whether this installation is in the habit of doing
pg_stat_statements_reset() a lot.  It looks like that fails to
reset mean_query_len, which might be intentional but perhaps it
could play into getting a silly result here later on.

regards, tom lane

Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

2022-08-02 Thread Tom Lane

bruno da silva  writes:
> Do you have a lot of especially long statements being tracked
> in the pg_stat_statements view?* well, the view was showing the query
> column null.*
> * but looking on  pgss_query_texts.stat there are very large sql
> statements, of around ~ 400kb, multiple thousands. *

Hm.  We try to recover from such failures by (a) resetting all the view's
query text fields to null and (b) truncating the file --- well, unlinking
it and creating it as empty.  It seems like (a) happened and (b) didn't.
It's pretty hard to explain that from the code though.  Are you quite
sure this is a 9.5.21 version of the pg_stat_statements extension?
Is it possible that the pg_stat_tmp directory has been made non-writable?

        regards, tom lane




> Are there any other signs of distress
> in the postmaster log, like complaints about being unable to write
> pgss_query_texts.stat? *no, just complaints for reading it. *

> Thanks

> On Tue, Aug 2, 2022 at 11:59 AM Tom Lane  wrote:

>> bruno da silva  writes:
> After more investigation, we found that pgss_query_texts.stat of a size
>> of
> 2.2GB. and this deployment has a 32bit pg.
>> 
>> Hm ... we've heard one previous report of pg_stat_statements' query text
>> file getting unreasonably large, but it's not clear how that can come
>> to be.  Do you have a lot of especially long statements being tracked
>> in the pg_stat_statements view?  Are there any other signs of distress
>> in the postmaster log, like complaints about being unable to write
>> pgss_query_texts.stat?
>> 
>> regards, tom lane
>> 


> -- 
> Bruno da Silva

Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

2022-08-02 Thread Tom Lane

bruno da silva  writes:
> After more investigation, we found that pgss_query_texts.stat of a size of
> 2.2GB. and this deployment has a 32bit pg.

Hm ... we've heard one previous report of pg_stat_statements' query text
file getting unreasonably large, but it's not clear how that can come
to be.  Do you have a lot of especially long statements being tracked
in the pg_stat_statements view?  Are there any other signs of distress
in the postmaster log, like complaints about being unable to write
pgss_query_texts.stat?

regards, tom lane

Re: alter table xxx set unlogged take long time

2022-07-27 Thread Tom Lane

Joe Conway  writes:
> Then (completely untested) I *think* you could create the "partition" 
> initially as a free standing unlogged table, load it, index it, switch 
> to logged, and then attach it to the partitioned table.

I'm still of the opinion that this plan to load the data unlogged
and switch to logged later is a loser.  Sooner or later you have
got to write the data to WAL, and this approach doesn't eliminate
that cost.  What it does do is create one whole extra cycle of
writing the data to disk and reading it back.  I don't think
it's an oversight that no such thing is suggested in our standard
tips for bulk-loading data:

https://www.postgresql.org/docs/current/populate.html

What perhaps *is* an oversight is that we don't suggest
use of COPY FREEZE there.  AFAIK that doesn't reduce the initial
data loading cost directly, but it would save overhead later.

        regards, tom lane

Re: alter table xxx set unlogged take long time

2022-07-26 Thread Tom Lane

"James Pang (chaolpan)"  writes:
> How to make it fast ?  These are our steps about copy large data from Oracle 
> to Postgres
>   1.  Create table in Postgres  2. Extract data from Oracle to CSV  3. Alter 
> table set xxx unlogged,   4. Run copy command into Postgres db  5. Alter 
> table set xxx logged  6. Create index …

The easy answer is to skip steps 3 and 5.

    regards, tom lane

Re: partition pruning only works for select but update

2022-07-01 Thread Tom Lane

Justin Pryzby  writes:
> On Fri, Jul 01, 2022 at 08:30:40AM +, James Pang (chaolpan) wrote:
>> We have other application depend on V13, possible to backport code changes 
>> to V13 as 
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=86dc90056dfdbd9d1b891718d2e5614e3e432f35

> In any case, I'm sure the feature won't be backpatched to v13 - it's an
> improvement but not a bugfix.

Even more to the point, it was an extremely major change and would take
a huge amount of QA effort to ensure that dropping it into v13 wouldn't
cause fresh problems.  The PG community has exactly no interest in making
such effort.

Besides which, what do you imagine "depends on v13" actually means?
If you have an app that works on v13 but not v14, maybe it's because
it depends on the old behavior in some way.

Spend your effort on updating your app, instead.

regards, tom lane

Re: partition pruning only works for select but update

2022-06-28 Thread Tom Lane

"James Pang (chaolpan)"  writes:
> But when
> Explain update table set .. where  partitionkey between  to_timestamp() and 
> to_timestamp();
>   It still show all of partitions with update ...

In releases before v14, partition pruning is far stupider for UPDATE
(and DELETE) than it is for SELECT.

        regards, tom lane

Re: Adding non-selective key to jsonb query @> reduces performance?

2022-06-08 Thread Tom Lane

Marcin Krupowicz  writes:
> However this one, is slow:

> Q2
> select count(*) from tbl where row @> '{"SELECTIVE_COL":
> "SearchValue", "DATE": "20220606", "NON_SELECTIVE_COL": "Abc"}'::jsonb
> It takes 17ms

> Note that the only difference is adding one more - not very unique -
> key. If in Q2 I replaced NON_SELECTIVE_COL with another selective
> column, it's becoming fast again.

This doesn't surprise me a whole lot based on what I know of GIN.
It's going to store sets of TIDs associated with each key or value
mentioned in the data, and then a query will have to AND the sets
of TIDs for keys/values mentioned in the query.  That will take
longer when some of those sets are big.

It might be worth experimenting with an index built using the
non-default jsonb_path_ops opclass [1].  I'm not sure if that'd be
faster for this scenario, but it seems worth trying.

regards, tom lane

[1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING

Re: Query is taking too long i intermittent

2022-06-06 Thread Tom Lane

Justin Pryzby  writes:
> On Mon, Jun 06, 2022 at 03:28:43PM +0530, Mayank Kandari wrote:
>> SELECT event_id FROM event WHERE (event_sec > time.seconds) OR
>> ((event_sec=time.seconds) AND (event_usec>=time.useconds) ORDER BY
>> event_sec, event_usec LIMIT 1

> I think it'd be better if the column was a float storing the fractional number
> of seconds.  Currently, it may be hard for the planner to estimate rowcounts 
> if
> the conditions are not independent.  I don't know if it's related to this
> problem, though.

Also, even if you can't change the data representation, there's a more
idiomatic way to do that in SQL: use a row comparison.

SELECT ...
WHERE row(event_sec, event_usec) >= row(time.seconds, time.useconds) ...

I doubt this is notably more execution-efficient, but if you're getting a
bad rowcount estimate it should help with that.  It's easier to read too.

regards, tom lane

Re: postgres backend process hang on " D " state

2022-05-29 Thread Tom Lane

"James Pang (chaolpan)"  writes:
> 1. extensions 
>   shared_preload_libraries = 
> 'orafce,pgaudit,pg_cron,pg_stat_statements,set_user'

Can you still reproduce this if you remove all of those?

        regards, tom lane

Re: rows selectivity overestimate for @> operator for arrays

2022-05-27 Thread Tom Lane

Alexey Ermakov  writes:
> so if we store mostly one element in array and they're almost all 
> distinct then in tables with more then stats_target/0.0063 (~1.58M for 
> maximum stats target 1) rows we'll get 0.005 constant for selectivity.

Yeah.  There's a comment in array_selfuncs.c about

 * TODO: this estimate probably could be improved by using the distinct
 * elements count histogram.  For example, excepting the special case of
 * "column @> '{}'", we can multiply the calculated selectivity by the
 * fraction of nonempty arrays in the column.

but I'm not sure whether that's relevant here.

One thought is that if there is a pg_statistic row but it contains
no MCE list, we could assume that the column elements are all distinct
and see what sort of estimate that leads us to.

        regards, tom lane

Re: Why is there a Sort after an Index Only Scan?

2022-05-04 Thread Tom Lane

=?UTF-8?Q?Andr=C3=A9_H=C3=A4nsel?=  writes:
> Limit  (cost=0.43..296.63 rows=50 width=4) (actual time=1052.692..1052.737 
> rows=50 loops=1)
>   ->  Index Scan using orders_test_pkey on orders_test  (cost=0.43..71149.43 
> rows=12010 width=4) (actual time=1052.690..1052.728 rows=50 loops=1)
> Filter: ((shipping_date >= '2022-04-30'::date) AND (shipping_date <= 
> '2022-05-01'::date))
> Rows Removed by Filter: 1998734

> Is it overestimating the cost of the sorting?

No, but it's guessing it will hit 50 rows that satisfy the filter
before it's gone very far in this index.  If the shipping date and
pkey are correlated in the wrong direction, that could be a very
overoptimistic guess.  I don't think we have adequate stats yet
to detect this sort of problem.

    regards, tom lane

Re: LISTEN NOTIFY sometimes huge delay

2022-04-28 Thread Tom Lane

"Peter Eser HEUFT [Germany]"  writes:
> I have a table with time series data and on this table a trigger for 
> notifies:
> containers_notify AFTER INSERT ON containers FOR EACH ROW EXECUTE 
> PROCEDURE containers_notify('containers_notify_collector')
> and the function does:
> PERFORM pg_notify(CAST(TG_ARGV[0] AS text), row_to_json(NEW)::text);

> This works as a charm but occasionally (I think with more load on the 
> system) the notifications are received much time (up to hours!) after 
> the INSERTs.
> Nevertheless no notifications become lost, they are only very late! The 
> delay grows, seems as a queue grows, but the java process tries to fetch 
> the notifications fairly fast,

Hm.  We've not previously had reports of late notifications.  One idea
that comes to mind is that the server won't deliver notifications as
long as the client has an open transaction, so is it possible your
listening process sometimes forgets to close its transaction?

> Versions:
> PostgreSQL 10.12 on x86_64-pc-linux-gnu, compiled by 
> x86_64-pc-linux-gnu-gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0, 64-bit
> JDBC 42.2.23

That's pretty old.  We've made a number of changes to the LISTEN/NOTIFY
code since then; although in reading the commit log entries about them,
nothing is said about long-delayed notifications.

regards, tom lane

Re: Unworkable plan above certain row count

2022-04-27 Thread Tom Lane

=?iso-8859-1?Q?Andr=E9_H=E4nsel?=  writes:
> Now if you change
> INSERT INTO a(a_id) SELECT generate_series(1, 2);
> to
> INSERT INTO a(a_id) SELECT generate_series(1, 20);
> i.e. add a zero, the plan becomes [ not a hashed subplan ]

Yeah, it won't hash the subplan if the estimated size of the hash
table exceeds work_mem.  In this case, boosting work_mem would be
a mighty good idea.

        regards, tom lane

Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.

2022-04-25 Thread Tom Lane

Emil Iggland  writes:
> The query that is giving us issues is the following, channel 752433 has 
> NO values, 752431 has values.
> (Channel 752433 only has valueseries 752434)

> select * from datavalue
> where dataview in ( select id from valueseries where channel = 
> %channel_idx%)
> ORDER BY VALUETIMESTAMP DESC
> FETCH FIRST ROW only;

> Running explain analyze shows strange numbers, 52'000 rows are being 
> returned but there are no rows there.

> For channel 752433
> ->  Index Scan Backward using 
> datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03 
> datavalue_6  (cost=0.42..7166.19 rows=119673 width=226) (actual 
> time=0.008..32.831 rows=119601 loops=1)

You've got the wrong column order (for this query anyway) in that
index.  It'd work a lot better if dataview were the first column;
or at least, it wouldn't tempt the planner to try this unstably-
performing plan.  It's trying to use the index ordering to satisfy
the ORDER BY, which works great as long as it finds a dataview
match in some reasonably recent index entry.  Otherwise, it's
going to crawl the whole index to discover that there's no match.

regards, tom lane

Re: How to find the final transformed query in postgresql

2022-04-18 Thread Tom Lane

Goti  writes:
>  I would like to know if postgres performs any transformations when it does
> the parsing?

This might be helpful reading:

https://www.postgresql.org/docs/current/overview.html

> If yes, is there a way we can get the final transformed query?

See debug_print_parse and friends [1].  Depending on what you mean by
"final transformed query", you might instead want debug_print_rewritten,
or maybe you want the plan, in which case EXPLAIN is a much friendlier
way to look at it than debug_print_plan.

        regards, tom lane

[1] 
https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT

Re: Query Planner not taking advantage of HASH PARTITION

2022-04-17 Thread Tom Lane

Benjamin Tingle  writes:
> Interesting. Why is it impossible to prune hash partitions? Maybe prune
> isn’t the best word, more so use to advantage. At the very least, it should
> be possible to utilize a parallel insert against a table partitioned by
> hash. (Partition query rows, then distribute these rows to parallel workers)

Your plan-shape complaint had nothing to do with insertions; it had
to do with joining the partitioned table to another table.  That
join can't be optimized.

    regards, tom lane

Re: Query Planner not taking advantage of HASH PARTITION

2022-04-17 Thread Tom Lane

Benjamin Tingle  writes:
> I've recently started taking advantage of the PARTITION BY HASH feature for
> my database system. It's a really great fit since my tables can get quite
> large (900M+ rows for some) and splitting them up into manageable chunks
> should let me upload to them without having to update an enormous index
> every time. What's more, since each partition has a write lock independent
> of the parent table, it should theoretically be possible to perform a
> parallelized insert operation, provided the data to be added is partitioned
> beforehand.

> What has been disappointing is that the query planner doesn't seem to
> recognize this potential.

That's because there isn't any.  The hash partitioning rule has
basically nothing to do with any plausible WHERE condition.  If you're
hoping to see partition pruning happen, you need to be using list or
range partitions, with operators compatible with your likely WHERE
conditions.

(I'm of the opinion that the hash partitioning option is more in the
category of a dangerous nuisance than a useful feature.  There are some
around here who will argue otherwise, but they're wrong for exactly the
reason that it's impossible to prune hash partitions.)

        regards, tom lane

Re: Slow Planning Times

2022-04-06 Thread Tom Lane

"David G. Johnston"  writes:
> On Wed, Apr 6, 2022 at 5:27 PM Saurabh Sehgal  wrote:
>> I have the following query:
>> I don't think it is super complex. But when I run explain analyze on this
>> I get the following:
>> Planning Time: 578.068 ms
>> Execution Time: 0.113 ms

> The fundamental issue here is that you have basically 12 conditions across
> 5 tables that need to be evaluated to determine which one of the 1,680
> possible join orders is the most efficient.

A 5-way join doesn't seem particularly outrageous.  But I'm wondering
if these are all plain tables or if some of them are actually complex
views.  Another possibility is that the statistics target has been
cranked to the moon and the planner is spending all its time sifting
through huge statistics arrays.

It'd be interesting to see the actual schemas for the tables,
as well as EXPLAIN's output for this query.  I'm wondering
exactly which PG version this is, too.

regards, tom lane

Re: XA transactions much slower on 14.2 than on 13.5

2022-03-04 Thread Tom Lane

Mladen Gogala  writes:
> My boss asked me to upgrade one of the development  databases from 13.5 
> --> 14.2. One thing that we've noticed right away is that XA 
> transactions (2-phase commit) are much slower on 14.2 than on 13.5. Were 
> there any significant changes to the XA protocol in the version 14? Did 
> anybody else encountered this problem?

There were a bunch of changes around the 2PC code to support logical
replication of 2PC transactions, but I don't think they should have
made for any particular performance difference in non-replicated
servers.  Can you put together a self-contained test case that
demonstrates what you're seeing?

        regards, tom lane

Re: Any way to speed up INSERT INTO

2022-03-04 Thread Tom Lane

aditya desai  writes:
> One of the service layer app is inserting Millions of records in a table
> but one row at a time. Although COPY is the fastest way to import a file in
> a table. Application has a requirement of processing a row and inserting it
> into a table. Is there any way this INSERT can be tuned by increasing
> parameters? It is taking almost 10 hours for just 2.2 million rows in a
> table. Table does not have any indexes or triggers.

Using a prepared statement for the INSERT would help a little bit.
What would help more, if you don't expect any insertion failures,
is to group multiple inserts per transaction (ie put BEGIN ... COMMIT
around each batch of 100 or 1000 or so insertions).  There's not
going to be any magic bullet that lets you get away without changing
the app, though.

It's quite possible that network round trip costs are a big chunk of your
problem, in which case physically grouping multiple rows into each INSERT
command (... or COPY ...) is the only way to fix it.  But I'd start with
trying to reduce the transaction commit overhead.

        regards, tom lane

Re: OOM killer while pg_restore

2022-03-03 Thread Tom Lane

=?UTF-8?Q?Marc_Recht=c3=a9?=  writes:
> Le 03/03/2022 à 16:31, Tom Lane a écrit :
>> Does memory consumption hold steady if you drop the FK constraints?

> Actually the number of rows is 232735712.
> Accordingly the RAM consumption would be x12 x3 = 7.8 GiB.
> This is close to the 8,1g I reported earlier (actually it was closer to 
> 7.8 GB, due to GiB vs. GB confusion).

> So there is no memory leak.

> It took 16 hours on my box to reach that RAM consumption, and then the 
> COPY failed when checking the first FK (as the referenced table was empty).

I'm guessing it was swapping like mad :-(

We've long recommended dropping FK constraints during bulk data loads,
and then re-establishing them later.  That's a lot cheaper than retail
validity checks, even without the memory-consumption angle.  Ideally
that sort of behavior would be automated, but nobody's gotten that
done yet.  (pg_restore does do it like that during a full restore,
but not for a data-only restore, so I guess you were doing the latter.)

        regards, tom lane

Re: OOM killer while pg_restore

2022-03-03 Thread Tom Lane

=?UTF-8?Q?Marc_Recht=c3=a9?=  writes:
> We have a pg_restore which fails due to RAM over-consumption of the 
> corresponding PG backend, which ends-up with OOM killer.
> The table has one PK, one index, and 3 FK constraints, active while 
> restoring.
> The dump contains over 200M rows for that table and is in custom format, 
> which corresponds to 37 GB of total relation size in the original DB.

The FKs would result in queueing row trigger events, which would occupy
some memory.  But those should only need ~12 bytes per FK per row,
which works out to less than 10GB for this number of rows, so it may
be that you've hit something else that we would consider a leak.

Does memory consumption hold steady if you drop the FK constraints?

If not, as others have noted, we'd need more info to investigate
this.  The leak is probably independent of the specific data in
the table, so maybe you could make a small self-contained example
using a script to generate dummy data.

        regards, tom lane

Re: Advice needed: query performance deteriorates by 2000% within 1 minute

2022-02-24 Thread Tom Lane

Peter Adlersburg  writes:
>  Limit  (cost=0.00..804.97 rows=10 width=22) (actual
> time=23970.845..25588.432 rows=1 loops=1)
>->  Seq Scan on "order"  (cost=0.00..3863.86 rows=48 width=22) (actual
> time=23970.843..25588.429 rows=1 loops=1)
>  Filter: (jsonb_to_tsvector('english'::regconfig, content,
> '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
>  Rows Removed by Filter: 9652
>  Planning Time: 0.430 ms
>  Execution Time: 25588.448 ms

I think the expense here comes from re-executing jsonb_to_tsvector
a lot of times.  By default that's estimated as 100 times more expensive
than a simple function (such as addition), but these results make it
seem like that's an understatement.  You might try something like

alter function jsonb_to_tsvector(regconfig, jsonb, jsonb) cost 1000;

to further discourage the planner from picking this plan shape.

Possibly the cost estimate for ts_match_tq (the function underlying
this variant of @@) needs to be bumped up as well.

(Bear in mind that pg_dump will not propagate such hacks on
system-defined objects, so you'll need a note to reapply
any such changes after dump/reload or pg_upgrade.)

regards, tom lane

Re: slow "select count(*) from information_schema.tables;" in some cases

2022-02-07 Thread Tom Lane

Lars Aksel Opsahl  writes:
>> SELECT relname, pg_size_pretty(pg_relation_size(C.oid)) FROM pg_class C LEFT 
>> JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname = 'pg_catalog' 
>> ORDER BY 2 DESC LIMIT 20; can you show the output of this query

"ORDER BY 2" is giving you a textual sort of the sizes, which is entirely
unhelpful.  Try

SELECT relname, pg_size_pretty(pg_relation_size(C.oid)) FROM pg_class C LEFT 
JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname = 'pg_catalog' 
ORDER BY pg_relation_size(C.oid) DESC LIMIT 20;

        regards, tom lane

Re: 9.6 write time

2021-12-28 Thread Tom Lane

Marc Millas  writes:
> checkpoint completion target is 0.9

checkpoint_timeout is the more interesting number here.

regards, tom lane

Re: 9.6 write time

2021-12-28 Thread Tom Lane

Marc Millas  writes:
> I am looking at a postgres 9.6 on rh7
> I see that checkpoint_write_time is huge which looks quite strange as the
> average amount of data written is not that big.

checkpoint_write_time is not the amount of time spent doing I/O;
it's the elapsed wall-clock time in the write phase.  If the I/O
is being throttled because of an un-aggressive checkpoint completion
target, it could be a lot more than the actual I/O time.  What have
you got your checkpoint parameters set to?

regards, tom lane

Re: WAL files keep piling up

2021-12-22 Thread Tom Lane

Zbigniew Kostrzewa  writes:
> Thanks for responding. On current master it looks like so:
> ls pg_xlog/archive_status/ | grep ready | wc -l
> 0
> ls pg_xlog/archive_status/ | grep done | wc -l
> 501

Hmm, if you've got archiving turned off, I wonder why you have
any .done files at all.  Perhaps they are leftover from a time
when you did have archiving on, and for some reason they are
confusing the non-archive-mode cleanup logic.

Anyway, you could certainly manually remove the .done files and
the corresponding WAL segment files, and then see what happens.

BTW, I'm sure you realize that 9.6.15 is not exactly current.

        regards, tom lane

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-21 Thread Tom Lane

Lars Bergeson  writes:
> I'm running PostgreSQL under AWS Aurora, and I didn't set it up or install
> it, so I'm not sure about the OS version.

Oh!  Aurora is not Postgres.  My admittedly-not-well-informed
understanding is that they stuck a Postgres front end on their
existing storage engine, so it's not surprising if storage-level
behaviors are quite different from stock Postgres.

regards, tom lane

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Tom Lane

"David G. Johnston"  writes:
> On Monday, December 20, 2021, Tom Lane  wrote:
>> It would help if somebody had labeled the units of I/O Time
>> ... but I'm guessing those are microsec vs. the millisec
>> of the other times, because otherwise it's completely wrong.

> Related to my preceding observation, from the explain (buffers) docs:
> “…and the time spent reading and writing data file blocks (in milliseconds)
> if track_io_timing
> <https://www.postgresql.org/docs/current/runtime-config-statistics.html#GUC-TRACK-IO-TIMING>
> is enabled.“

Hmm ... the code sure looks like it's supposed to be millisec:

appendStringInfoString(es->str, "I/O Timings:");
if (!INSTR_TIME_IS_ZERO(usage->blk_read_time))
appendStringInfo(es->str, " read=%0.3f",
 INSTR_TIME_GET_MILLISEC(usage->blk_read_time));
if (!INSTR_TIME_IS_ZERO(usage->blk_write_time))
appendStringInfo(es->str, " write=%0.3f",
 
INSTR_TIME_GET_MILLISEC(usage->blk_write_time));

And when I try some cases here, I get I/O timing numbers that are
consistent with the overall time reported by EXPLAIN, for example

 Seq Scan on foo  (cost=0.00..843334.10 rows=1110 width=508) (actual time=0.
015..1897.492 rows=1100 loops=1)
   Buffers: shared hit=15874 read=717460
   I/O Timings: read=1184.638
 Planning:
   Buffers: shared hit=5 read=2
   I/O Timings: read=0.025
 Planning Time: 0.229 ms
 Execution Time: 2151.529 ms

So now we have a real mystery about what is happening on Lars'
system.  Those numbers can't be right.

regards, tom lane

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Tom Lane

Justin Pryzby  writes:
> On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote:
>> Still taking 10X more I/O to read the smaller table. Very odd.

> If I'm not wrong, it's even worse than that ?
> It takes 20 or 30sec to run the query - but it says the associated I/O times
> are ~500sec or ~6000sec ?

It would help if somebody had labeled the units of I/O Time
... but I'm guessing those are microsec vs. the millisec
of the other times, because otherwise it's completely wrong.

        regards, tom lane

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Tom Lane

Lars Bergeson  writes:
> What is it about null values in the table that slows down the full table
> scan?

If a row has any nulls, then it contains a "nulls bitmap" [1] that says
which columns are null, and that bitmap has to be consulted while
walking through the row contents.  So the most obvious theory here
is that that adds overhead that's significant in your case.  But there
are some holes in that theory, mainly that the I/O timings you are
showing don't seem very consistent:

no nulls:
>   I/O Timings: read=1884365.335
> Execution Time: 11135.368 ms

with nulls:
>   I/O Timings: read=17141420.771
> Execution Time: 25407.318 ms

Regardless of CPU time required, it should not take 10X less I/O
time to read a physically larger table.  So there's something
fairly bogus going on there.  One thing you might try is disabling
parallelism (set max_parallel_workers_per_gather = 0) to see if
that's confusing the numbers somehow.

        regards, tom lane

[1] 
https://www.postgresql.org/docs/current/storage-page-layout.html#STORAGE-TUPLE-LAYOUT

Re: pg_trgm word_similarity query does not use index for input strings longer than 8 characters

2021-12-07 Thread Tom Lane

Laurenz Albe  writes:
> On Tue, 2021-11-30 at 22:38 +0100, pgsql-performa...@jhacker.de wrote:
>> INSERT INTO song (artist, title)
>> SELECT 'artist','title'
>> FROM generate_series(1,1);
>>
>> \set query '12345678'
>> 
>> -- This query is slow
>> EXPLAIN ANALYZE
>> SELECT song.artist, song.title
>> FROM song
>> WHERE (song.artist %> :'query' OR song.title %> :'query')
>> ;

> The table is quite small; with a bigger table, the test would be more 
> meaningful.

Yeah, this test case seems very unrealistic, both as to table size
and as to the lack of variability of the table entries.  I think the
latter is causing the indexscans to take less time than they otherwise
might, because none of the extracted trigrams find any matches.

> Since you have SSDs, you should tune "random_page_cost = 1.1".

Right.  Poking at gincostestimate a bit, I see that for this
operator the indexscan cost estimate is basically driven by the
number of trigrams extracted from the query string (nine in this
test case) and the index size; those lead to a predicted number
of index page fetches that's then scaled by random_page_cost.
That's coming out to make it look more expensive than the seqscan.
It's actually not more expensive, but that's partially because
page fetch costs are really zero in this test case (everything
will stay in shared buffers the whole time), and partially because
the unrealistic data pattern is leading to not having to look at
as much of the index as gincostestimate expected.

In general, it appears correct that longer query strings lead to a
higher index cost estimate, because they produce more trigrams so
there's more work for the index match to do.  (At some level, a
longer query means more work in the seqscan case too; but our cost
models are inadequate to predict that.)

regards, tom lane

Re: LwLockRelease performance

2021-12-02 Thread Tom Lane

Ashkil Dighin  writes:
> The performance bottleneck in LWLockRelease()method goes through an array
> one by one to see which lock was released with O(N). As soon as the lock is
> found it performs an array to remove the lock.

Typically, such locks are released in LIFO order.  Do you have any
actual evidence of a performance problem here?

regards, tom lane

Re: Out of memory error

2021-11-23 Thread Tom Lane

aditya desai  writes:
> In a trigger function I am creating a temp table . When an update on a
> table is executed for say 10k rows. I get the below error.

> ERROR: out of shared memory
> HINT:You might need to increase max_locks_per_transaction
> CONTEXT: SQL Statement "created temp table changedinfo(colName
> varchar(100), oldValue varchar(4000), newValue varchar(4000)

[ raised eyebrow ... ]  If you are concerned about performance,
I'd start by not creating a temp table per row of the outer update.
That's costing probably 100x to 1000x as much as the row update itself.

        regards, tom lane

Re: Need help identifying a periodic performance issue.

2021-11-18 Thread Tom Lane

Thomas Munro  writes:
> I'm wondering if the thing that changed between 9.6 and 13 might be
> the heuristics for when auto vacuum's background ANALYZE is triggered,
> creating the unlucky timing required to get your system to this state
> occasionally.

> For a while now I have been wondering how we could teach the
> planner/stats system about "volatile" tables (as DB2 calls them), that
> is, ones that are frequently empty, which often come up in job queue
> workloads.  I've seen problems like this with user queries (I used to
> work on big job queue systems across different relational database
> vendors, which is why I finished up writing the SKIP LOCKED patch for
> 9.5), but this is the first time I've contemplated FK check queries
> being negatively affected by this kind of stats problem.  I don't have
> a good concrete idea, though (various dumb ideas: don't let auto
> analyze run on an empty table if it's marked VOLATILE, or ignore
> apparently empty stats on tables marked VOLATILE (and use what?),
> ...).

Hmm.  If this complaint were about v14 rather than v13, I'd be
wondering whether 3d351d916 was what made things worse.  But
in v13, if the table does go to empty (zero length) and ANALYZE
happens to see that state, we should end up back at the planner's
"minimum ten pages" heuristic, which likely would be enough to
prevent choice of a seqscan.  OTOH, if the analyzed state is
"empty but has a couple of pages", it looks like that could
provoke a seqscan.

This is all guesswork though, since we don't know quite what's
happening on Robert's system.  It might be worth setting
"log_autovacuum_min_duration = 0" (either globally, or as a
reloption on the relevant tables), and seeing if there seems
to be any correlation between autovacuum/autoanalyze activity
and the occurrences of poor plan choices.

regards, tom lane

1 2 3 4 >

1 - 100 of 357 matches

Mail list logo