Re: PostgreSQL Server Hang​

2023-06-21 Thread Andreas Kretschmer



On 22 June 2023 07:09:26 CEST, KK CHN  wrote:
>*Description of System: *
>1. We are running a Postgres Server (version 12, on CentOS 6) for an
>emergency call attending and  vehicle tracking system fitted with mobile
>devices for vehicles with navigation apps for emergency service.
>
>2.   vehicles every 30 Seconds sending location coordinates( Lat /Long ) and
>getting stored into the DB server at the emergency call center cum control
>room.
>
>*Issue: *
>We are facing an issue of  the database hanging and becoming unresponsive
>for
>applications running which try to connect to the DB.
>
>*Question: *
>How to optimize and fine tune this database performance issue ?  Definitely
>pouring the resources like the above is not a solution.
>
>What to check the root cause and find for the performance bottle neck
>reasons  ?
>

Have you THP disabled?

Andreas




PostgreSQL Server Hang​

2023-06-21 Thread KK CHN
*Description of System: *
1. We are running a Postgres Server (version 12, on CentOS 6) for an
emergency call attending and  vehicle tracking system fitted with mobile
devices for vehicles with navigation apps for emergency service.

2.   vehicles every 30 Seconds sending location coordinates( Lat /Long ) and
getting stored into the DB server at the emergency call center cum control
room.

*Issue: *
We are facing an issue of  the database hanging and becoming unresponsive
for
applications running which try to connect to the DB. So eventually applications
are also crawling on its knees.


*Mitigation done so far : *What mitigation we have done is increasing the
resources,CPU(vCPUs)  from  32 to 64  ( Not sure is this the right
approach / may be too dumb idea, but the hanging issue for the time being
rectified. )..

RAM 32 GB increased to 48 GB,  but  it observed that RAM usage was always
below 32GB only ( So foolishly increased the RAM !!!)

*Question: *
How to optimize and fine tune this database performance issue ?  Definitely
pouring the resources like the above is not a solution.

What to check the root cause and find for the performance bottle neck
reasons  ?

Thank you,
Krishane


*Additional Inputs If required: *

*##*
The DB machine   is running on a CentOS6 platform ..  Only a single Database
Instance  running as a Virtual Machine.

The  Database server also stores call center call related ( call arrival and
dispatch time stamps and short messages to and  from around 300 Desktop
application operators and data from Mobile tablets  fitted on Vehicles
with  VTS App installed on it.  The vehicle locations every 30 seconds are
continuously stored into the database..

Voice calls from callers in emergency which are each 3MB in size not stored
in the Database, but as  files stored in an NFS mount folder and  the
Database stores only the references to that voice call  files for future
reference ( The call volumes are around 1 lakh / day )   Only  meta data
information related to the calls ,  caller name, caller number, lat/long data
of caller, short description of caller situation which are less than 200
Characters  x  3 messages per each call  stored into DB.

This database  is also used for  making  reports on the action taken by
call takers/despatchers, Vehicle tracking reports etc on a daily basis.
 Around 2000 Vehicles are fitted with Mobile tablets with the emergency
navigation apps in the fleet.

The database grows roughly 1 GB / Day  )





Re: pb with join plan

2023-06-21 Thread Marc Millas
Marc MILLAS




On Wed, Jun 21, 2023 at 12:43 PM Tomas Vondra 
wrote:

> On 6/21/23 00:26, Marc Millas wrote:
> >
> >
> > On Tue, Jun 20, 2023 at 11:19 PM David Rowley  > > wrote:
> >
> > On Wed, 21 Jun 2023 at 08:34, Marc Millas  > > wrote:
> > >
> > > On Tue, Jun 20, 2023 at 10:14 PM David Rowley
> > mailto:dgrowle...@gmail.com>> wrote:
> > >>
> > >> On Wed, 21 Jun 2023 at 07:42, Marc Millas  > > wrote:
> > >> > But if I do the same with clause one OR clause 2, I have to
> > kill the request after an hour, seeing the filesystem showing more
> > than 140 Mb of increased usage.
>
> It's a bit weird the "victor" table is joined seemingly without any join
> conditions, leading to a cross join (which massively inflates the cost
> for joins above it). Maybe the anonymized plan mangles it somehow.
>

So I did try to simplify my pb.
I create a table with the result of the first 3 joins.
That table do have 15M lines. all tables have been vacuum analyze

Now if I do an explain analyze of a simple join between that table and my
original table 4
using a simple = clause, I get a result in one second (around). and the
planner guesses for rows seems in line with the observed values .
if I use a substr(table1.a)= table2.b, the explain analyze get a result in
21 seconds and the planner estimates a 65M rows result set while the
observed is 330 k rows
so here its 20 times slower and the discrepency between planner rows guess
and reality is a 200 ratio.

Now, if I try an explain analyze with join on a=b or substr(c)=d or e=f
then... I kill the query after a quarter an hour without any answer.
if I try to just explain the query, the planner rows guess becomes more
than 2 Billions
the extremely simple query and plan are here, without automatic obfuscation
https://explain.depesz.com/s/b8Ll




>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Marc MILLAS


Re: libpq: What can and cannot be bound? How to know?

2023-06-21 Thread Dominique Devienne
On Wed, Jun 21, 2023 at 4:20 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> The planner is the thing that handles binds.  [...]
>

Depends what you mean by "handles", since when I asked about "bind peeking"
during planning,
I think the answer was that it was not supported. So I don't see the
different link between planning per-se
and binds, which seem more related to the executor once a plan was chosen,
from my naive perspective.

But of course, I'm reasoning purely on my little abstract understanding of
what that machinery could be...


> Reworking that core design choice doesn't seem like a great use of time.
> Especially when alternatives exist.
>

Are you saying conn.exec("NOTIFY {}, {}", conn.escapeName(channel),
conn.escapeLiteral(payload))
is somehow worse than conn.exec(bind(channel, payload), "SELECT
pg_notify($1, $2)")?
I'm not asking in the abstract, as I'll be wrapping these in typesafe and
SQL-injection-safe wrappers soon.

And since channel is a name, does the string bound to the pg_notify() call
needs to be escaped or not?
I've had "loads of fun" troubleshooting why ::regrole casts failed for
names that need escaping,
so it's not a rethorical question either... Whether a function taking a
name requires the name to be
escaped on the "outside", or will be escaped "inside", is not really
specified, at least that I can see.


> Specifically, the pg_notify function that can be parameterized and handles
> the SQL-injection stuff for you.
>

Sure, for that one example, there's a function equivalent that allows
wrapping.
Is that a native function? Or a SQL or PL/SQL function that just basically
does the escape*()
that I showed above? Note that performance matters much between the two,
but I'm curious.

Still, not all SQL COMMAND taking names and/or literals are similarly
wrapped in functions.
So my point remains. That you think there's low to zero value in it, sure,
I get it. --DD


Re: pb with join plan

2023-06-21 Thread Marc Millas
On Wed, Jun 21, 2023 at 12:43 PM Tomas Vondra 
wrote:

> On 6/21/23 00:26, Marc Millas wrote:
> >
> >
> > On Tue, Jun 20, 2023 at 11:19 PM David Rowley  > > wrote:
> >
> > On Wed, 21 Jun 2023 at 08:34, Marc Millas  > > wrote:
> > >
> > > On Tue, Jun 20, 2023 at 10:14 PM David Rowley
> > mailto:dgrowle...@gmail.com>> wrote:
> > >>
> > >> On Wed, 21 Jun 2023 at 07:42, Marc Millas  > > wrote:
> > >> > But if I do the same with clause one OR clause 2, I have to
> > kill the request after an hour, seeing the filesystem showing more
> > than 140 Mb of increased usage.
> > >>
> > >>
> > > link to the anonymized plan of the req with one clause :
> > https://explain.depesz.com/s/TWp4  >
> >
> > link to the plan with the second
> > clause alone: https://explain.depesz.com/s/byW5
> > 
> > link to the plan with both clauses ORed (the one not
> > finishing) https://explain.depesz.com/s/jHO2
> > 
> >
> >
> >
> > It's quite difficult to know what the problem is you want to fix
> here.
> > Your initial post indicated it was the query with the OR condition
> > that was causing you the problems, but the plan you've posted has no
> > OR condition?!
> >
> > You're more likely to get help here if you take time to properly
> > explain the situation and post the information that's actually
> > relevant to the problem you're having, or state the problem more
> > clearly, as there's a mismatch somewhere.
> >
> > It might also be worth having a look at
> > https://wiki.postgresql.org/wiki/Slow_Query_Questions
> >  . EXPLAIN
> is not
> > going to tell us what part of the query is slow. I'll let the wiki
> > page guide you into what to do instead.
> >
> >
> > I know that page. obviously, as I have to kill the request, I cannot
> > provide a explain analyze...
> >
>
> It's a bit weird the "victor" table is joined seemingly without any join
> conditions, leading to a cross join (which massively inflates the cost
> for joins above it). Maybe the anonymized plan mangles it somehow.
>

the query does:
 select blabla from table1 join table2 on (list of 9 fields ANDed and
corresponding to the index of both table1 and table2)
join table3 on table1.a=table3.a and table1.b=table3.b
join table4 on (list of 2 clauses table2.d=table4.e  and one clause
substr(table2.f)=table4.g  all ORed)
table1 and table2 are big (15M and 60M lines), table3 and table4 are small
(30k lines)

basically, if I rewrites the query beginning by the join between table2 and
table4, then join table1 and then table3, postgres generates the same plan,
which doesnt end.

if instead of the 3 clauses of the last join I keep one equality clause,
the explain plan looks the same, but executes in 45 secondes.


>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

 Marc MILLAS


Re: libpq: What can and cannot be bound? How to know?

2023-06-21 Thread David G. Johnston
On Wed, Jun 21, 2023 at 6:09 AM Dominique Devienne 
wrote:

>
> I'm sure there are good technical reason. But from the outside, it is
> surprising and a bit inconsistent.
>
>
The planner is the thing that handles binds.  The only things that are
planned are queries - i.e., SQL commands that are capable of producing
result sets from data within tables.  I agree this seems like it should be
documented in places besides PREPARE.

Reworking that core design choice doesn't seem like a great use of time.
Especially when alternatives exist.  Specifically, the pg_notify function
that can be parameterized and handles the SQL-injection stuff for you.

David J.


Re: libpq: What can and cannot be bound? How to know?

2023-06-21 Thread Dominique Devienne
On Wed, Jun 21, 2023 at 1:52 PM Laurenz Albe 
wrote:

> On Wed, 2023-06-21 at 11:00 +0200, Dominique Devienne wrote:
> > [...] obvious way to know what can and cannot be bound, just from the
> doc?
>
> This is not adequately documented.


Bummer.


> The documentation for PREPARE says:
>

Note that I'm not even preparing in this case, but using PQexecParams()
instead.
So I'm unlikely to look at the PREPARE doc, but the one from the COMMAND
I'm actually using, i.e. NOTIFY in my example.


> so NOTIFY is not supported.


Bummer again.

So I must PQescapeIdentifier() and PQescapeLiteral() to have an iron-clad
NOTIFY,
as the 2nd pseudo-code above showed then. Thanks for confirming Laurenz.


> However, you need some inside knowledge to know
> that what you are running is an "unnamed prepared statement" and that the
> limitation
> stated in PREPARE applies.
>

I don't know what shape or form this could be specified in the doc.
Maybe specific and consistent typography for each "argument" kind,
like names, non-bind-able literals, bind-able literals? Just thinking aloud.
With a link in all commands to a "central" page about SQL Injection and
binding and format(), etc... maybe?

I do find it strange, from the user perspective, not to be able to bind the
NOTIFY's payload text message.
Or be able to bind the password in a CREATE USER. (Yes, I know
about PQencryptPasswordConn()).
I'm sure there are good technical reason. But from the outside, it is
surprising and a bit inconsistent.

My $0.02. --DD


Re: libpq: What can and cannot be bound? How to know?

2023-06-21 Thread Laurenz Albe
On Wed, 2023-06-21 at 11:00 +0200, Dominique Devienne wrote:
> For example with [NOTIFY][1]. The doc states:
> 
> > Payload: This must be specified as a simple string literal
> 
> Does that mean we cannot bind the payload?
> I.e. the pseudo code:
> ```
> conn.exec(bind(msg), "NOTIFY {} $1", conn.escapeName(channel));
> ```
>  is invalid? And I must use instead
> ```
> conn.exec("NOTIFY {} {}", conn.escapeName(channel), conn.escapeLiteral(msg))`?
> ```
> I can try, of course, but could there be a obvious way to know what can and 
> cannot be bound, just from the doc?
> 
> That would make it easier to deal with SQL injection to be able to bind for 
> example.
> And knowing what can be bound would be useful.

This is not adequately documented.

The documentation for PREPARE says:

  Any SELECT, INSERT, UPDATE, DELETE, MERGE, or VALUES statement.

so NOTIFY is not supported.  However, you need some inside knowledge to know
that what you are running is an "unnamed prepared statement" and that the 
limitation
stated in PREPARE applies.

Yours,
Laurenz Albe




Re: pb with join plan

2023-06-21 Thread Tomas Vondra
On 6/21/23 00:26, Marc Millas wrote:
> 
> 
> On Tue, Jun 20, 2023 at 11:19 PM David Rowley  > wrote:
> 
> On Wed, 21 Jun 2023 at 08:34, Marc Millas  > wrote:
> >
> > On Tue, Jun 20, 2023 at 10:14 PM David Rowley
> mailto:dgrowle...@gmail.com>> wrote:
> >>
> >> On Wed, 21 Jun 2023 at 07:42, Marc Millas  > wrote:
> >> > But if I do the same with clause one OR clause 2, I have to 
> kill the request after an hour, seeing the filesystem showing more
> than 140 Mb of increased usage.
> >>
> >>
> > link to the anonymized plan of the req with one clause :
> https://explain.depesz.com/s/TWp4 
> 
> link to the plan with the second
> clause alone: https://explain.depesz.com/s/byW5
>  
> link to the plan with both clauses ORed (the one not
> finishing) https://explain.depesz.com/s/jHO2
> 
> 
> 
> 
> It's quite difficult to know what the problem is you want to fix here.
> Your initial post indicated it was the query with the OR condition
> that was causing you the problems, but the plan you've posted has no
> OR condition?!
> 
> You're more likely to get help here if you take time to properly
> explain the situation and post the information that's actually
> relevant to the problem you're having, or state the problem more
> clearly, as there's a mismatch somewhere.
> 
> It might also be worth having a look at
> https://wiki.postgresql.org/wiki/Slow_Query_Questions
>  . EXPLAIN is not
> going to tell us what part of the query is slow. I'll let the wiki
> page guide you into what to do instead.
> 
>  
> I know that page. obviously, as I have to kill the request, I cannot
> provide a explain analyze... 
> 

It's a bit weird the "victor" table is joined seemingly without any join
conditions, leading to a cross join (which massively inflates the cost
for joins above it). Maybe the anonymized plan mangles it somehow.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




libpq: What can and cannot be bound? How to know?

2023-06-21 Thread Dominique Devienne
For example with [NOTIFY][1]. The doc states:

> Payload: This must be specified as a simple string literal

Does that mean we cannot bind the payload?
I.e. the pseudo code:
```
conn.exec(bind(msg), "NOTIFY {} $1", conn.escapeName(channel));
```
 is invalid? And I must use instead
```
conn.exec("NOTIFY {} {}", conn.escapeName(channel),
conn.escapeLiteral(msg))`?
```
I can try, of course, but could there be a obvious way to know what can and
cannot be bound, just from the doc?

That would make it easier to deal with SQL injection to be able to bind for
example.
And knowing what can be bound would be useful.

My $0.02. Thanks, --DD

[1]: https://www.postgresql.org/docs/current/sql-notify.html


Re: strange behavior of .pgpass file

2023-06-21 Thread Laurenz Albe
On Wed, 2023-06-21 at 00:29 +0530, Atul Kumar wrote:
> I found some strange behaviour of postgres superuser password in my existing 
> cluster, below is the basic outline of the setup.
> 1. version - postgres 12
> 2. replication - streaming replication async
> 3. OS- centos7
> 4. One Master, One Standby
> 
> I have identical pgpass files on both server postgres home directory.
> 
> So when I execute below command on slave node:
> psql -h slave_hostname/ip -U postgres -d postgres -p 5432 -c "select 
> pg_is_in_recovery"
> 
> I get error of password authentication:
> 
> psql: error: FATAL:  password authentication failed for user "postgres"
> password retrieved from file "/homedirectorypath/.pgpass"
> 
> But when I run the same command on master node:
> psql -h slave_hostname/ip -U postgres -d postgres -p 5432 -c "select 
> pg_is_in_recovery"
> 
> I don't get any errors and get the expected output as "t".
> 
> Note: the passwords in and path of both pgpass files are identical.
> 
> then why am I getting errors on the slave node for the same command even 
> after having everything the same ?

pg_hba.conf could be different on both servers.

Yours,
Laurenz Albe




Re: foreign keys on multiple parent table

2023-06-21 Thread Dominique Devienne
On Tue, Jun 20, 2023 at 10:47 PM Lorusso Domenico 
wrote:

> Could work, but is there a way to set a reference key over the uuid of all
> the tables?
>

Yes, it's possible. We do it. There are several ways to emulate what I call
"polymorphic" FKs.

All approaches have pros and cons, the one we use relies on CHECK
constraints and virtual/generated columns.
It assumes all mutually exclusive FKs are of the same type. For ON DELETE
CASCADE FKs, you have the primary
"fk" concrete column, plus a secondary "fk$t" type column, telling you
which FK is active, then N "fk$N" virtual columns
whose expression automatically turn them ON (="fk") or OFF (is NULL) based
on "fk$t"'s value. A CHECK constraint
ensures only 0 or 1 "fk$N" column is ON, depending on "fk"'s NULLablity.
For ON DELETE SET NULL, you need to
reverse the concrete and virtual columns, so the constraint can *write* the
"fk$N" columns, with more CHECK constraints.

The technique works because FKs on virtual column works fine. As with all
FKs with ON DELETE CASCADE, you want
to index your FKs to avoid full scans. With partial indexes (since the FKs
are mutually exclusive and full of NULLs), the
storage overhead from multiplicating (virtual) columns and indexes can be
limited (i.e. not as bad as N times the single index).
Of course, this is tricky to pull-off correctly w/o automatic schema
generation from a logic model. We have dozens of these PFKs,
of various cardinality, maintaining those manually would be a nightmare.
And when the polymorphism is too much,
we give up on referential integrity on a case by case basis, to avoid
bloating the tables and schema. It's a tradeof, as always.

I'm sure I didn't invent this technique. But it sure isn't very common and
it has been our "secret sauce" for a few years.
On Oracle first, now on PostgreSQL. A Dalibo consultant once told me I
should present it at a PGCon conference :).

Good luck if you try that. FWIW, --DD