from:"Philip Semanchuk"

Re: Parameter value in RDS

2024-01-16 Thread Philip Semanchuk

> On Jan 16, 2024, at 4:19 PM, David G. Johnston  
> wrote:
> 
> On Tuesday, January 16, 2024, Atul Kumar  wrote:
> Hi,
> 
> I am new to RDS postgres, I have version 14 running on it with m7g.large
> 
> I found that lots of parameters has DBInstanceClassMemory written, so what 
> exactly is the value of this variable ?
> 
> How should I calculate it?
> 
>  IIRC it’s the amount of RAM on your instance.  You look it up in a table 
> usually.  Or check the web console.

Yes, I’m pretty sure that it’s the amount of RAM *in bytes*. 

Although I like the flexibility of having that expressed in a variable, it 
wan’t always as easy to use as I wanted it to be. Specifically, I found that 
differences in units made it a little difficult to figure out how AWS was 
configuring things. 

For example, on an old instance we had, shared_buffers was defined as 
{DBInstanceClassMemory/10922}. It took me a while to figure out that that 
translates to “calculate 75% of available memory, and express that value in 8Kb 
blocks”. How? Well, 10922 = 8 * 1024 * 1.3. shared_buffers is expressed in 
8Kb blocks, so converting from units of bytes (DBInstanceClassMemory) to 8kB 
blocks (shared_buffers) requires dividing by 8 * 1024. And dividing by 1. 
is the same as multiplying by 3/4, which is 75%. 

This may have been explained in AWS documentation but I couldn’t find it at the 
time and it took some work on my part to figure out the logic behind 10922 and 
some other config magic numbers. Maybe this will save you some time.

Cheers
Philip

Re: Local postgres manual

2023-11-03 Thread Philip Semanchuk

> On Nov 3, 2023, at 9:45 AM, Bruce Momjian  wrote:
> 
> On Fri, Nov  3, 2023 at 09:39:46AM -0400, Philip Semanchuk wrote:
>> 
>> In addition to Bruce Momjian’s suggestion, I’ll add that you can make an 
>> HTML version of the manual from the source code. This is what I use and it 
>> works great for me.
> 
> Yes, we can do HTML too as a single file, postgres.html.  What I don't
> see is man page output for anything but the references pages, and as
> separate files.  It might be possible to convert the HTML to man format
> using something like html2man.

Yes, ISTR there’s an all-in-one-page option, but the HTML that I built is in 
individual pages that mirror the organization on postgresql.org which works 
great for me. Sorry I don’t remember the exact command I used but it was very 
straightforward, nothing tricky at all. Maybe as simple as `make install-docs`?

Cheers
Philip

Re: Local postgres manual

2023-11-03 Thread Philip Semanchuk




> On Nov 3, 2023, at 9:18 AM, Ben Hancock  wrote:
> 
> Hi all:
> 
> Does Postgres come with a local, full version of the manual installed
> by default anywhere (i.e. akin to what is available on the website, but
> in man, info, or plain-text format)? When I invoke `man postgres`, I do
> get a very useful - but limited - manual page, which has references to 
> chapters. For example:
> 
> OPTIONS
>   postgres accepts the following command-line arguments. For a
>   detailed discussion of the options consult Chapter 20. ...
> 
> Of course, I can pull up a browser and find the manual, or consult a
> local PDF if I have a graphical display. But sometimes it may be convenient 
> to view the the manual for the version of Postgres that is on the system, 
> right there. Does one exist?

In addition to Bruce Momjian’s suggestion, I’ll add that you can make an HTML 
version of the manual from the source code. This is what I use and it works 
great for me.

Cheers
Philip

Re: Uppercase version of ß desired

2023-03-14 Thread Philip Semanchuk




> On Mar 13, 2023, at 5:38 PM, Celia McInnis  wrote:
> 
> HI:
> 
> I would be really happy if postgresql  had an upper case version of the ß 
> german character. The wiki page 
> https://en.wikipedia.org/wiki/%C3%9F
> 
> indicates that the capital (U+1E9E ẞ LATIN CAPITAL LETTER SHARP S) was 
> encoded by ISO 10646 in 2008.
> 
> BTW the reason that I'd like upper('ß') to give something different than 'ß'  
> is because I have written a simple substitution puzzle for a large number of 
> languages where I show the encrypted lower case words in upper case and the 
> successful letter substitution submissions in lower case - so I need the 
> upper and lower case versions of each letter to be different!
> 
> Thanks for any assistance! Maybe I can hack what I want in python (which is 
> what I am using for the puzzle).

Hi Celia,
I ran into this too back when we were transitioning from Python 2 to 3 (2 
behaved differently from 3). While researching it I discovered this Python 
issue which maybe sheds some additional light on the subject: 
https://github.com/python/cpython/issues/74993

We ultimately found 90 characters that (under Python 3) grew longer when 
uppercased. 

python -c "print([c for c in range(0x80, 0x22ff) if len(chr(c)) != 
len(chr(c).upper())])”


I hope this is at least interesting. :-)

Cheers
Philip

Re: ALTER COLUMN to change GENERATED ALWAYS AS expression?

2023-02-07 Thread Philip Semanchuk




> On Feb 7, 2023, at 3:30 AM, Laurenz Albe  wrote:
> 
> On Mon, 2023-02-06 at 12:04 -0500, Philip Semanchuk wrote:
>> I have a column defined GENERATED ALWAYS AS {my_expression} STORED. I’d like 
>> to change the
>> {my_expression} part. After reading the documentation for ALTER TABLE
>> (https://www.postgresql.org/docs/current/sql-altertable.html) and trying a 
>> few things that
>> resulted in syntax errors, there doesn’t seem to be a way to alter the 
>> column’s GENERATED
>> expression in place. It seems like my only option is to drop and re-add the 
>> column.
>> Is that correct?
> 
> I think that is correct.  But changing the expression would mean rewriting 
> the column
> anyway.  The only downside is that a dropped column remains in the table, and 
> no even
> a VACUUM (FULL) will get rid of it.

Thanks for the confirmation. I hadn’t realized that the column would remain in 
the table even after a DROP + VACUUM FULL. I’m curious — its presence as a 
deleted column doesn't  affect performance in any meaningful way, does it? 

In this case we have the option of dropping and re-creating the table entirely, 
and that's probably what I'll do.

Cheers
Philip

ALTER COLUMN to change GENERATED ALWAYS AS expression?

2023-02-06 Thread Philip Semanchuk

Hi all,
I have a column defined GENERATED ALWAYS AS {my_expression} STORED. I’d like to 
change the {my_expression} part. After reading the documentation for ALTER 
TABLE (https://www.postgresql.org/docs/current/sql-altertable.html) and trying 
a few things that resulted in syntax errors, there doesn’t seem to be a way to 
alter the column’s GENERATED expression in place. It seems like my only option 
is to drop and re-add the column. Is that correct? 

Thanks
Philip

Re: IMMUTABLE function to cast enum to/from text?

2022-11-10 Thread Philip Semanchuk




> On Nov 10, 2022, at 3:39 PM, Tom Lane  wrote:
> 
> Joe Conway  writes:
>> 
>> CREATE OR REPLACE FUNCTION mood2text(mood)
>> RETURNS text AS
>> $$
>>  select $1
>> $$ STRICT IMMUTABLE LANGUAGE sql;
> 
> Of course, what this is doing is using a SQL-function wrapper to
> lie about the mutability of the expression.  Whether you consider
> that elegant is up to you ;-) ... but it should work, as long as
> you don't break things by renaming the enum's values.


Thanks Joe and Tom,
I’m comfortable lying to Postgres occasionally — never for evil, only for good 
of course. :-)


Cheers
Philip

IMMUTABLE function to cast enum to/from text?

2022-11-10 Thread Philip Semanchuk

Hi all,
I know that Postgres' enum_in()/enum_out() functions have a volatility class of 
STABLE, and STABLE is required because enum elements can be renamed. We have an 
enum in our database used in a number of custom functions, all of which require 
casting the enum to/from text. Since enum_in() and enum_out() are STABLE, that 
means our functions that rely on those casts must also be STABLE, and as a 
result we can't use them in generated columns. 

I have seen conversations that suggest creating a custom IMMUTABLE function to 
perform the cast, but I can't figure out how to do that except with a CASE 
statement that enumerates every possible value. Is there a more elegant 
approach?

Thanks 
Philip

Custom function ROWS hint ignored due to inlining?

2022-10-19 Thread Philip Semanchuk

Hi,
I have a custom function where the ROWS hint is getting ignored. I think it’s 
because the function is getting inlined, but I’d like a second opinion.

Here’s my working (contrived) example.

CREATE TABLE my_table (
id int primary key GENERATED ALWAYS AS IDENTITY,
base_value int NOT NULL
);

INSERT INTO my_table (base_value) VALUES (42);

CREATE OR REPLACE FUNCTION fn_get_deltas(base_value int)
RETURNS TABLE (delta int, total int) AS $$
SELECT 
generate_series(1, CASE base_value WHEN 42 THEN 4 ELSE 20 END), 
base_value + generate_series(1, CASE base_value WHEN 42 THEN 4 ELSE 20 
END)
$$ LANGUAGE sql IMMUTABLE PARALLEL SAFE ROWS 10;

EXPLAIN
SELECT base_value, delta, total
FROM my_table
CROSS JOIN LATERAL (SELECT delta, total FROM fn_get_deltas(base_value)) AS foo
+--+
| QUERY PLAN   |
|--|
| Nested Loop  (cost=0.00..107427.80 rows=226 width=12)|
|   ->  Seq Scan on my_table  (cost=0.00..32.60 rows=2260 width=4) |
|   ->  Result  (cost=0.00..27.52 rows=1000 width=8)   |
| ->  ProjectSet  (cost=0.00..5.02 rows=1000 width=4)  |
|   ->  Result  (cost=0.00..0.01 rows=1 width=0)   |
+--+

The plan estimates 1000 rows from the CROSS JOIN despite the “ROWS 10” hint on 
my function. I think this is because the planner never sees fn_get_deltas() — 
it has been inlined by the query preprocessor because fn_get_deltas() meets the 
criteria for inlining 
(https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions). Instead of 10 
rows, the planner uses its default assumption of 1000 rows.

If I change the function to VOLATILE to prevent inlining, I get this plan.
+-+
| QUERY PLAN  |
|-|
| Nested Loop  (cost=0.25..484.85 rows=22600 width=12)|
|   ->  Seq Scan on my_table  (cost=0.00..32.60 rows=2260 width=4)|
|   ->  Function Scan on fn_get_deltas  (cost=0.25..0.35 rows=10 width=8) |
+-+

I would prefer to have the function inlined for better performance, but I can 
declare it VOLATILE if that’s necessary to give decent estimates to the 
planner. Am I correctly reading the situation? If so, is there another solution 
that allows inlining *and* making the ROWS hint visible to the planner?

Thanks a bunch
Philip

Re: Max sane value for join_collapse_limit?

2022-06-03 Thread Philip Semanchuk

> On Jun 3, 2022, at 4:19 AM, Andreas Joseph Krogh  wrote:
> 
> Hi, I have set join_collapse_limit = 12 in production, but I'm thinking about 
> raising it to 16.
> On modern HW is there a “sane maximum” for this value?
> I can easily spare 10ms for extra planning per query on our workload, is 16 
> too high?

I set ours set to 24 (from_collapse_limit=24 and geqo_threshold=25). Most of 
our queries that involve that involve 10+ relations have a slow execution time 
(20-30 minutes or more) so reducing planning time isn’t a major concern for us. 
If the planner takes an extra 20-30 seconds to find a plan that reduces 
execution time by 5%, we still come out ahead. 

That said, in our environment the planner can make pretty bad choices once the 
number of relations into the mid teens because we have some 
difficult-to-estimate join conditions, so we write our canned queries with this 
in mind, breaking them into two parts if necessary to avoid throwing too much 
at the planner at once. IOW, we generally don’t come anywhere near 24 relations 
in a query. Our very high join_collapse_limit might still come into play if a 
user writes a very complicated ad hoc query.

So (IMHO) as is often the case, the answer is “it depends”. :-)

Cheers
Philip

Could Postgres warn about incorrect volatility class?

2022-04-28 Thread Philip Semanchuk

Hi all,
I recently discovered that a custom function that I thought was being inlined 
was not being inlined because I had declared it IMMUTABLE, but the function 
body cast an enum value to text which is a STABLE operator. Once I corrected my 
function's definition to declare it STABLE, Postgres inlined it. 

Since Postgres can apparently determine during query parsing that my function 
has a volatility class mismatch, is there a reason that Postgres can't make the 
same determination and inform me about it when I define the function? In this 
case a helpful message would have been "I sure hope you know what you're 
doing..." :-)

Thanks 
Philip

Re: Are stored procedures/triggers common in your industry

2022-04-20 Thread Philip Semanchuk

> On Apr 20, 2022, at 3:18 PM, Guyren Howe  wrote:
> 
> I’ve really only ever worked in web development. 90+% of web developers 
> regard doing anything at all clever in the database with suspicion.
> 
> I’m considering working on a book about implementing business logic in 
> Postgres, and I’m curious about how common that actually is.
> 

We have some business logic in Postgres functions, particularly triggers. Our 
apps are written in Python, and we use pytest to exercise our SQL functions to 
ensure they're doing what we think they’re doing. It works well for us.

FWIW, we’re not a Web dev shop.

Cheers
Philip

Re: Why is my function inlined only when STABLE?

2022-03-29 Thread Philip Semanchuk




> On Mar 29, 2022, at 2:24 PM, Tom Lane  wrote:
> 
> Philip Semanchuk  writes:
>> I have a function that isn't being inlined, and I would appreciate help to 
>> understand why that's the case. 
> 
> I think the test methodology you used is faulty, because it does not
> distinguish between "inline-able" and "foldable to a constant".
> Given an immutable function applied to constant(s), the planner prefers
> to fold to a constant by just executing the function.  The inline-ing
> transformation is considered only when that case doesn't apply.

Excellent point, thank you. Now I understand. I was trying to write an inlining 
demo for my colleagues, and I simplified my example one step too far by using a 
constant.

I really appreciate the help!

Cheers
Philip

Why is my function inlined only when STABLE?

2022-03-29 Thread Philip Semanchuk

Hi all,
I have a function that isn't being inlined, and I would appreciate help to 
understand why that's the case. 

I'm using PG 11.15. I know that if I declare my function IMMUTABLE and it calls 
a non-IMMUTABLE function, Postgres won't inline my function. But even when my 
function calls only substring() (which I understand to be IMMUTABLE based on 
'\df+ substring'), I still can't get Postgres to inline it. If I re-declare my 
function as STABLE, then Postgres inlines it. According to the rules I 
understand 
(https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions#Inlining_conditions_for_scalar_functions),
 the IMMUTABLE version of my function should be inlined too. What am I missing?

Here's a log of a CLI session showing that the IMMUTABLE version is not 
inlined, but the STABLE one is.


show track_functions
+---+
| track_functions   |
|---|
| all   |
+---+
SHOW
me@/tmp:wylan#
SELECT * FROM pg_stat_user_functions
+--+--++-+--+-+
| funcid   | schemaname   | funcname   | calls   | total_time   | self_time   |
|--+--++-+--+-|
+--+--++-+--+-+
SELECT 0
Time: 0.021s
me@/tmp:wylan#
CREATE OR REPLACE FUNCTION f(foo text)
 RETURNS text
 AS $$
 SELECT substring(foo FROM 1 FOR 2)
 $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;
CREATE FUNCTION
Time: 0.003s
me@/tmp:wylan#
select f('4242')
+-+
| f   |
|-|
| 42  |
+-+
SELECT 1
Time: 0.008s
me@/tmp:wylan#
SELECT * FROM pg_stat_user_functions
+--+--++-+--+-+
| funcid   | schemaname   | funcname   | calls   | total_time   | self_time   |
|--+--++-+--+-|
| 14472085 | public   | f  | 1   | 0.05 | 0.05|
+--+--++-+--+-+
SELECT 1
Time: 0.022s
me@/tmp:wylan#
DROP FUNCTION f(text)
DROP FUNCTION
Time: 0.001s
me@/tmp:wylan#
CREATE OR REPLACE FUNCTION f(foo text)
 RETURNS text
 AS $$
 SELECT substring(foo FROM 1 FOR 2)
 $$ LANGUAGE sql STABLE PARALLEL SAFE;
CREATE FUNCTION
Time: 0.003s
me@/tmp:wylan#
select pg_stat_reset()
+-+
| pg_stat_reset   |
|-|
| |
+-+
SELECT 1
Time: 0.008s
me@/tmp:wylan#
SELECT * FROM pg_stat_user_functions
+--+--++-+--+-+
| funcid   | schemaname   | funcname   | calls   | total_time   | self_time   |
|--+--++-+--+-|
+--+--++-+--+-+
SELECT 0
Time: 0.022s
me@/tmp:wylan#
select f('4242')
+-+
| f   |
|-|
| 42  |
+-+
SELECT 1
Time: 0.008s
me@/tmp:wylan#
SELECT * FROM pg_stat_user_functions
+--+--++-+--+-+
| funcid   | schemaname   | funcname   | calls   | total_time   | self_time   |
|--+--++-+--+-|
+--+--++-+--+-+
SELECT 0
Time: 0.019s
me@/tmp:wylan#




Thanks
Philip

Re: Leading comments and client applications

2022-03-28 Thread Philip Semanchuk




> On Mar 28, 2022, at 5:42 AM, Philippe Doussot  
> wrote:
> 
> >Something about the way TextClause changes the raw SQL string causes the 
> >behavior I’m seeing, although we didn’t notice it at the time of the 
> >changeover.
> >I don’t know what exactly it’s doing yet, but when I switch back to passing 
> >a DDLElement to execute(), my SQL function is created as I expected. 
> 
> 
> Alternate option if you want continue to use  TextClause:
> 
> use /* comment */ for first prefix comment.
> 
> Comment is logged and query executed (tested on Java ( not on SQLAlchemy )).
> We use it to track back the request id executed like that 
> 
> query = em.createNativeQuery("/*requete_enregistree_num_" + requete.getId() + 
> "*/ " + requete.getReqRequete().trim());

Thanks for the suggestion! In my testing, both single line and multiline 
comment blocks cause the same problem for me. I *was* able to resolve this with 
a simple change. I was calling SQLAlchemy’s engine.execute(). When I call 
connection.execute() instead, the problem resolves. This also solves a future 
deprecation problem for us. engine.execute() is deprecated in SQLAlchemy 1.4, 
but connection.execute() is not.

I didn’t expect this to fix the problem. There’s no difference in the Postgres 
log that I can see, so I think the SQL that SQLAlchemy sends to postgres is the 
same. If it’s a commit/transaction problem, it should affect all of our 
functions equally, not just the ones that start with comments. 

I clearly don’t understand this problem fully. Although I'm curious about it, 
I’m eager to move on to other things. I plan to proceed with this fix and not 
investigate any more. 

THanks everyone for all the help and suggestions

Cheers
Philip



> 
> On 25/03/2022 19:05, Philip Semanchuk wrote:
>> 
>>> On Mar 25, 2022, at 11:59 AM, Tom Lane 
>>>  wrote:
>>> 
>>> Philip Semanchuk 
>>> 
>>>  writes:
>>> 
>>>> I'm trying to understand a behavior where, with our Postgres client, a 
>>>> leading comment in a SQL script causes the CREATE FUNCTION statement 
>>>> following it to be not executed. I can't figure out if this is a bug 
>>>> somewhere or just a misunderstanding on my part. I would appreciate some 
>>>> help understanding.
>>>> 
>>> Are you certain there's actually a newline after the comment?
>>> The easiest explanation for this would be if something in the
>>> SQLAlchemy code path were munging the newline.
>>> 
>> I verified that there is a newline after the comment. But yes, thanks to 
>> your suggestion and others, I was able to narrow this down to something in 
>> SQLAlchemy behavior. In case anyone else comes across this and is wondering 
>> --
>> 
>> In addition to accepting a plain string, execute() accepts a number of 
>> different SQLAlchemy data types, including TextClause and DDLElement. We 
>> used to pass a DDLElement to execute(), but a few months ago we switched to 
>> passing a TextClause because DDLElement interprets % signs anywhere in SQL 
>> scripts as Python string interpolation markers and that was causing us 
>> headaches in some scripts. Something about the way TextClause changes the 
>> raw SQL string causes the behavior I’m seeing, although we didn’t notice it 
>> at the time of the changeover. I don’t know what exactly it’s doing yet, but 
>> when I switch back to passing a DDLElement to execute(), my SQL function is 
>> created as I expected. 
>> 
>> 
>> https://docs.sqlalchemy.org/en/13/core/connections.html#sqlalchemy.engine.Connection.execute
>> 
>> 
>> As David J pointed out, execute() is deprecated as of version 1.4. We’re 
>> still on 1.3 but we’ll have to move away from this code eventually so maybe 
>> this is a good inspiration to move away from execute() now and reduce the 
>> number of deprecation warnings we have to deal with in the future.
>> 
>> 
>> 
>>> As far as the comparison behavior goes, psql's parser strips
>>> comments that start with double dashes, for $obscure_reasons.
>>> The server is perfectly capable of ignoring those by itself,
>>> though.  (Awhile back I tried to remove that psql behavior,
>>> but it caused too much churn in our regression tests.)
>>> 
>> 
>> Thanks, this is most helpful. I use psql to double check I think SQLAlchemy 
>> is doing something odd. It’s good to know that psql's behavior in this case 
>> is a choice and not required behavior for clients. Peter J. Holzer’s 
>> psycopg2 example could have showed me the same; I wish I had thought of that.
>> 
>> 
>> I appreciate all the help!
>> 
>> Cheers
>> Philip
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> 
>  Le nom de domaine de nos adresses mails évolue et devient @arche-mc2.fr. 
> 
> 
> arche-mc2.fr
> 
> 
> 
> 
> Philippe DOUSSOT
> 
> ARCHITECTE TECHNIQUE
> 
> DIRECTION DES SOLUTIONS ARCHE MC2 DOMICILE
> 
> philippe.doussot@arche‑mc2.fr
>

Re: Leading comments and client applications

2022-03-25 Thread Philip Semanchuk

> On Mar 25, 2022, at 11:59 AM, Tom Lane  wrote:
> 
> Philip Semanchuk  writes:
>> I'm trying to understand a behavior where, with our Postgres client, a 
>> leading comment in a SQL script causes the CREATE FUNCTION statement 
>> following it to be not executed. I can't figure out if this is a bug 
>> somewhere or just a misunderstanding on my part. I would appreciate some 
>> help understanding.
> 
> Are you certain there's actually a newline after the comment?
> The easiest explanation for this would be if something in the
> SQLAlchemy code path were munging the newline.

I verified that there is a newline after the comment. But yes, thanks to your 
suggestion and others, I was able to narrow this down to something in 
SQLAlchemy behavior. In case anyone else comes across this and is wondering --

In addition to accepting a plain string, execute() accepts a number of 
different SQLAlchemy data types, including TextClause and DDLElement. We used 
to pass a DDLElement to execute(), but a few months ago we switched to passing 
a TextClause because DDLElement interprets % signs anywhere in SQL scripts as 
Python string interpolation markers and that was causing us headaches in some 
scripts. Something about the way TextClause changes the raw SQL string causes 
the behavior I’m seeing, although we didn’t notice it at the time of the 
changeover. I don’t know what exactly it’s doing yet, but when I switch back to 
passing a DDLElement to execute(), my SQL function is created as I expected. 

https://docs.sqlalchemy.org/en/13/core/connections.html#sqlalchemy.engine.Connection.execute

As David J pointed out, execute() is deprecated as of version 1.4. We’re still 
on 1.3 but we’ll have to move away from this code eventually so maybe this is a 
good inspiration to move away from execute() now and reduce the number of 
deprecation warnings we have to deal with in the future.

> As far as the comparison behavior goes, psql's parser strips
> comments that start with double dashes, for $obscure_reasons.
> The server is perfectly capable of ignoring those by itself,
> though.  (Awhile back I tried to remove that psql behavior,
> but it caused too much churn in our regression tests.)

Thanks, this is most helpful. I use psql to double check I think SQLAlchemy is 
doing something odd. It’s good to know that psql's behavior in this case is a 
choice and not required behavior for clients. Peter J. Holzer’s psycopg2 
example could have showed me the same; I wish I had thought of that.

I appreciate all the help!

Cheers
Philip

Leading comments and client applications

2022-03-25 Thread Philip Semanchuk

Hi,
I'm trying to understand a behavior where, with our Postgres client, a leading 
comment in a SQL script causes the CREATE FUNCTION statement following it to be 
not executed. I can't figure out if this is a bug somewhere or just a 
misunderstanding on my part. I would appreciate some help understanding.

Here's the contents of foo.sql --

-- this is a comment
CREATE FUNCTION foo(bar text) RETURNS text AS $$
SELECT bar
$$
LANGUAGE sql IMMUTABLE PARALLEL SAFE
;


When I feed that to 'psql -f foo.sql', the function is created as I expect. In 
the Postgres log, the leading comment *doesn't* appear. I see the same behavior 
if I just copy/paste the function into psql.

Our test system uses Python 3.8, SQLAlchemy 1.3.6, and psycopg 2.8.5, and when 
our test harness reads foo.sql and passes it to SQLAlchemy's execute(), I can 
see in the Postgres log that the leading comment is *not* stripped, and the 
function isn't created.

The server is Postgres 11. 

My naive interpretation is that one of the client layers (SQLAlchemy or 
psycopg2) should be stripping the leading comment but isn't, but that seems 
like a lot of responsibility to push onto a client application. I figured that 
would be the responsibility of the Postgres parser. 

I'd be grateful for any insights about what I'm missing.

Thanks
Philip

Re: psql syntax for array of strings in a variable?

2021-10-29 Thread Philip Semanchuk




> On Oct 29, 2021, at 2:05 PM, Tom Lane  wrote:
> 
> "David G. Johnston"  writes:
>> On Friday, October 29, 2021, Philip Semanchuk 
>> wrote:
>>> I would appreciate help with the syntax for querying an array of strings
>>> declared as a psql variable. Here's an example.
>>> 
>>> \set important_days ARRAY['monday', 'friday']
> 
>> Not sure why the single quotes are getting stripped out but that is the
>> issue.  Maybe double them up to escape them like in a normal text literal?
> 
> Yeah, that's just the way that \set works (and most other psql backslash
> commands, I believe).  You've likely got an issue with whitespace
> disappearing, too, though that might be harmless in this specific example.
> 
> regression=# \set foo 'bar baz'
> regression=# \echo :foo
> bar baz
> regression=# \set foo 'bar ''baz'
> regression=# \echo :foo
> bar 'baz
> regression=# \set foo bar ''baz 
> regression=# \echo :foo
> barbaz
> 
> Not sure offhand how well-documented this is.

Thanks, all. Glad to know I wasn’t missing something obvious. 

> On Oct 29, 2021, at 1:52 PM, Pavel Stehule  wrote:

> psql variables can hold only text. There is not any type - all is just text.


^^^ This was especially helpful; I’d never considered that before.

Cheers
Philip

psql syntax for array of strings in a variable?

2021-10-29 Thread Philip Semanchuk

Hi,
I would appreciate help with the syntax for querying an array of strings 
declared as a psql variable. Here's an example.

\set important_days ARRAY['monday', 'friday']

select 1 where 'monday' = ANY(:important_days);
ERROR:  42703: column "monday" does not exist
LINE 1: select 1 where 'monday' = ANY(ARRAY[monday,friday]);

select 1 where 'monday' = ANY(:"important_days");
ERROR:  42703: column "ARRAY[monday,friday]" does not exist
LINE 1: select 1 where 'monday' = ANY("ARRAY[monday,friday]");

I'm doing something wrong but I can't figure out what.

My real-world use case is that I have a psql script that will execute several 
queries on a long list of strings, and rather than repeat those strings over 
and over in the script, I'd like to declare them once at the top of the script 
and then refer to the variable after that. Bonus points if there's a way to do 
a multiline declaration like --

\set important_days ARRAY['monday', 
  'friday']

Thanks for reading
Philip

Re: Help with my MacOS PostgreSQL 12 installation

2021-08-18 Thread Philip Semanchuk

> On Aug 17, 2021, at 12:21 PM, Michael White  wrote:
> 
> This is my first post so if I’m in the wrong place please help me get to the 
> correct list.
> 
> Issue:
> 
> I installed PostgreSQL on my Mac over a year ago and since upgraded my disk 
> drive and OS to Big Sur.  I am a retired Software Engineer and still like to 
> program and am learning PostgreSQL and have it working on Linux and FreeBSD.  
> In any case I had put it on my Mac laptop and want to use it again but it 
> seems that the upgrades I did might have rearranged file locations or they 
> may have been lost.  I have been trying to understand what is where and come 
> to a point where I need help getting PostgreSQL running again.  I have a file 
> with notes on the installation and here it is:

…

> 3. Don’t remember how I installed PostgreSQL but looks like with EnterpriseDB 
> because of the installation directory.  (I got this from postgresapp.com)

Hi Michael,
I use PostgresApp on my Mac, and it has worked very nicely for me. It’s a very 
no-fuss installation. You should have an app in our Applications folder called 
Postgres. Start that, and the Postgres server will be running. There’s a little 
elephant in my menu bar that I can click on that shows me what databases are 
available. I can set it to start the server automatically when I log in. If you 
don’t have an app called Postgres or it doesn’t run when you click it, I’d say 
your installation is damaged. If you can nuke it and start over, that’d be what 
I’d try. Start by dragging the app to the trash and reinstalling. That might 
fix things.

FYI I modified my .zshrc to include this line to add Postgres’ client apps to 
my path:
export PATH=$PATH:/Applications/Postgres.app/Contents/Versions/11/bin

That way I have easy access to psql, pg_config, etc.

Hope this helps,
Philip

Re: Low cache hit ratio

2021-07-29 Thread Philip Semanchuk

> On Jul 29, 2021, at 3:09 AM, Lucas  wrote:
> 
> Hello,
> 
> I have recently deployed a new Slave (streaming replication) and have been 
> monitoring its cache hit ratio.
> 
> At the moment, read-only queries are being sent to this slave but only 10% of 
> the traffic.
> The cache hit ratio is now at 82%. This database is around 1.4TB and is 
> running on EC2 instances in AWS. PostgreSQL version is 9.2.24.
> 
> The instance is a r4.2xlarge, so it's got 8 vCPUs and 61GB of RAM. 

Hi Lucas,
Have you tried the pg_buffercache extension? It gives you some visibility into 
what’s in the cache so you can understand what’s staying resident and not 
leaving room for other things. I wrote a view atop pg_buffercache that I use 
for this purpose. It’s pasted below; I hope you find it helpful. My only caveat 
is that I run this under Postgres 11. I *think* I’ve used it under Postgres 9.6 
but I’m not sure. It definitely hasn’t been tested on 9.2. 

Hope this helps,
Philip

/* A view of pg_buffercache which shows what's in the Postgres cache.
Access to pg_buffercache requires membership in the group pg_monitor.

It's OK to query this ad hoc, but don't query it aggressively (e.g. in a 
polling loop). The
Postgres doc says --

> When the pg_buffercache view is accessed, internal buffer manager locks 
are taken for
> long enough to copy all the buffer state data that the view will display. 
This ensures
> that the view produces a consistent set of results, while not blocking 
normal buffer
> activity longer than necessary. Nonetheless there could be some impact on 
database
> performance if this view is read often.

https://www.postgresql.org/docs/11/pgbuffercache.html

*/

CREATE OR REPLACE VIEW
vw_postgres_cache
AS
SELECT
c.relname,
sum(usagecount) AS usage_count,
/* count(*) is the number of buffers in use. Each buffer is 8Kb. (8Kb 
is the page size
a.k.a. block size configured at compile time, confirm in psql with the 
command
`show block_size`).

I cast the count to float to introduce a float into calculations that 
are otherwise all int
which would produce a result rounded to the nearest int.
*/
-- cache_% shows the portion of the cache that this entity occupies
((count(*)::float/ pg_settings.setting::int) * 100)::numeric(3, 1) 
AS "cache_%",
-- entity_% shows the portion of this entity that's in cache
-- pg_table_size() is in bytes, multiply buffer count by (8 * 1024) so 
units match
(((count(*)::float * 8192) / pg_table_size(c.relname::text)) * 
100)::numeric(4,1)
AS "entity_%",
(count(*)::float * 8 / (1024 * 1024))::numeric(20, 1) AS Gb,
(count(*)::float * 8 / 1024 )::numeric(20, 1) AS Mb
FROM
pg_buffercache b
CROSS JOIN pg_settings
INNER JOIN pg_class c
ON b.relfilenode = pg_relation_filenode(c.oid)
AND b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = 
current_database()))
WHERE
pg_settings.name = 'shared_buffers'
-- If this is run on a system where shared_buffers is expressed in 
something other than 8kB
-- blocks, it will calculate incorrect kB/Mb/Gb values. Asserting the 
value of the unit here
-- ensures no results at all will be returned in that case.
AND pg_settings.unit = '8kB'
GROUP BY
c.relname, pg_settings.setting::int
HAVING
-- Only include entries that occupy at least 0.1% of the cache
((count(*)::float / pg_settings.setting::int) * 100) >= 0.1
ORDER BY 6 DESC
;

Re: Database issues when adding GUI

2021-06-07 Thread Philip Semanchuk




> On Jun 7, 2021, at 11:07 AM, Rob Sargent  wrote:
> 
> On 6/7/21 9:02 AM, Rich Shepard wrote:
>> This is an unusual message and I hope someone(s) here can offer insights
>> into the cause of the problem I've been wrestling with for the past couple
>> of weeks.
>> 
>> Context: For the past decade I've run my business tracking database from the
>> psql shell (currently with postgresql-12.2) and decided it's time to add a
>> frontend so it's a stand-alone desktop application. I'm learning to use
>> PyQt5 as the widget set and application framework.
>> 
>> The database contains three lookup tables: activitytypes, industrytypes, and
>> statustypes, all have a single column and few rows. So I've started with
>> these to get python3 and PyQt5-5.13.2 to run the SELECT query and display
>> the results in a QTableView.
>> 
>> The problem with all three is that my code produces an empty window and
>> hangs. There's no python error displayed and the application reports finding
>> the database but not the tables. For example, the activitytypes debugging
>> log contains:
>> INFO:root:found database
>> DEBUG:root:Defining model/view
>> DEBUG:root:model error:  Unable to find table activitytypes
>> DEBUG:root:about to execute select query
>> DEBUG:root:End of Program
>> 
>> I, and others on the python and pyqt mail lists and stackoverflow, can find
>> nothing wrong with the python code. This suggests it's something with the
>> database itself. But I have no idea where to look. The database structure
>> is:
>> bustrac=# \d
>> List of relations
>>  Schema |   Name|   Type   |  Owner 
>> +---+--+--
>>  public | activities| table| rshepard
>>  public | activitytypes | table| rshepard
>>  public | industrytypes | table| rshepard
>>  public | locations | table| rshepard
>>  public | organizations | table| rshepard
>>  public | organizations_org_nbr_seq | sequence | rshepard
>>  public | people| table| rshepard
>>  public | people_person_nbr_seq | sequence | rshepard
>>  public | projects  | table| rshepard
>>  public | statustypes   | table| rshepard
>> (10 rows)
>> 
>> What might stop a front-end application from finding a table that has been
>> readily accessed from the psql shell?
>> 
>> All suggestions and recommendations are needed.
>> 
>> TIA,
>> 
>> Rich
>> 
>> \
> This looks like a permissions problem, as though you are connecting as a 
> role/user without permission to select from the tables.

I agree with Rich. Permissions, or you’re connecting as the wrong user. You 
might find it helpful to simplify and not use a GUI application until you get 
your connection issues sorted out. A simple command line Python app that 
connects to the database and prints the result of “select * from activitytypes 
limit 1” will enable quick(er) debugging of your connection issues.

Cheers
Philip

Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?

2021-06-02 Thread Philip Semanchuk




> On Jun 1, 2021, at 3:23 PM, Thomas Munro  wrote:
> 
> On Wed, Jun 2, 2021 at 7:15 AM Vijaykumar Jain
>  wrote:
>> i only get workers to create mv, but refresh mv plan does not use workers 
>> for the same conf params.
> 
> Yeah, this changed in v14:
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9e7ccd9ef64d05e87ceb1985d459bef9031205c0


Thanks, all! It’s great to have a clear explanation. I looked at the change 
notes for 12 & 13 before I posted. I didn’t occur to me to look at 14. :-) 

Cheers
Philip

Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?

2021-06-01 Thread Philip Semanchuk




> On Jun 1, 2021, at 2:20 PM, Vijaykumar Jain  
> wrote:
> 
> if you are not using it concurrently, can you confirm the there are *no 
> active* queries on the mv.
> refresh requires AccessExclusiveLock and will wait, till it gets one.
> just asking if you can rule out the extended time is not due to waiting for 
> lock.

I can confirm that it’s not waiting on a lock. In addition, through the AWS CPU 
utilization monitor I can see that the REFRESH uses one CPU/worker whereas the 
CREATE uses four. This is consistent with the EXPLAIN ANALYZE for the CREATE 
which says it uses four workers.


> also, can you share the plans  where you see the diff.

Unless I misunderstand, there is no plan for a REFRESH.


EXPLAIN (ANALYZE, BUFFERS) refresh materialized view my_mat_view
+---+
| QUERY PLAN|
|---|
| Utility statements have no plan structure |
+---+

Cheers
Philip




> 
> On Tue, 1 Jun 2021 at 23:30, Philip Semanchuk  
> wrote:
> Hi all,
> Should I expect a planner difference between CREATE MATERIALIZED VIEW and 
> REFRESH MATERIALIZED VIEW? We have a materialized view that uses 4 workers 
> during CREATE but only one worker during REFRESH, and as a result the refresh 
> takes much longer (~90 minutes vs. 30 minutes for the CREATE). So far this 
> behavior has been 100% consistent.
> 
> I'm running both the CREATE and REFRESH on the same server (Postgres 11.9 on 
> AWS Aurora). I don't think the refresh is using one worker in response to 
> other things happening on the server because we’ve observed this happening 
> when the server is not busy. We're not using the CONCURRENTLY option for 
> REFRESH.
> 
> THanks
> Philip
> 
> 
> 
> -- 
> Thanks,
> Vijay
> Mumbai, India

CREATE/REFRESH MATERIALIZED VIEW planner difference?

2021-06-01 Thread Philip Semanchuk

Hi all,
Should I expect a planner difference between CREATE MATERIALIZED VIEW and 
REFRESH MATERIALIZED VIEW? We have a materialized view that uses 4 workers 
during CREATE but only one worker during REFRESH, and as a result the refresh 
takes much longer (~90 minutes vs. 30 minutes for the CREATE). So far this 
behavior has been 100% consistent.

I'm running both the CREATE and REFRESH on the same server (Postgres 11.9 on 
AWS Aurora). I don't think the refresh is using one worker in response to other 
things happening on the server because we’ve observed this happening when the 
server is not busy. We're not using the CONCURRENTLY option for REFRESH.

THanks
Philip

Re: How different is AWS-RDS postgres?

2021-05-27 Thread Philip Semanchuk



> On May 26, 2021, at 10:04 PM, Rob Sargent  wrote:
> 
> 
> 
>> On May 26, 2021, at 4:37 PM, Ian Harding  wrote:
>> 
>> 
>> There is an option to send the logs to cloudwatch which makes it less awful 
>> to look at them. 
> I have that but precious little of interest there. Lots of autovac, a 
> smattering of hints to increase wal size!?  I have yet to spot anything which 
> corresponds to the “I/O failure” which the middle ware gets. 
> 
> I don’t have query logging on, but I do see reports from my psql session 
> fat-fingering.
> 
> As to the logs UI, the search is pretty feeble; I don’t understand why there 
> are four  channels of logs; the graphs are wearing the same rose-coloured as 
> the logs.  
> And 24 hours without a peep from AWS support. (I don’t call mailing me what I 
> sent them “contact”.)
> 
> My guess right now is that the entire tomcat connection pool is in a single 
> transaction? That’s the only way the tables could disappear.  I am making 
> separate calls to JDBC getConnection () for each doPost. 

We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is 
pretty limited. I wrote a Python script to download cloudwatch logs to my 
laptop where I can use proper tools like grep to search them. It’s attached to 
this email. It’s hacky but not too terrible. I hope you find it useful. 

Cheers
Philip


import pathlib
import operator
import logging
from collections import namedtuple
import subprocess
import datetime
import json

import boto3

DB_IDENTIFIER = 'your-db-name-here'

PATH = './logs'

Config = namedtuple('Config', ['access_key', 'secret_key', 'region', 'db_identifier', 'rds_client'])

boto_session = boto3.session.Session()

config = Config(
access_key=boto_session._session.get_credentials().access_key,
secret_key=boto_session._session.get_credentials().secret_key,
region=boto_session._session.get_config_variable('region'),
db_identifier=DB_IDENTIFIER,
rds_client=boto_session.client('rds'),
)


class LogFile:
def __init__(self, aws_name, timestamp, size):
self.aws_name = aws_name
self.last_written = datetime.datetime.fromtimestamp(timestamp / 1000)
self.size = int(size)

# typical aws_name = error/postgresql.log.2019-06-21-16
self.local_path = pathlib.Path(PATH, pathlib.Path(aws_name).name + '.txt')

def download(self, config):
# aws rds download-db-log-file-portion \
#   --db-instance-identifier wylan-sql \
#   --log-file-name error/postgresql.log.2019-06-24-14 \
#   --no-paginate  --output text
cmd = [
'aws',
'rds',
'download-db-log-file-portion',
'--db-instance-identifier',
config.db_identifier,
'--log-file-name',
self.aws_name,
'--no-paginate',
]

with open(self.local_path, 'wb') as f:
self._proc = subprocess.Popen(cmd, stdout=f)
return_code = self._proc.wait()

if return_code == 0:
# Great, the data were written. It's actually in JSON format. All of the interesting
# info is in the LogFileData element. Grab that and replace the file contents with it.
with open(self.local_path) as f:
d = json.load(f, encoding='utf-8')
log_text = d['LogFileData']

with open(self.local_path, 'w') as f:
f.write(log_text)
else:
# FIXME provide a more helpful exception
raise ValueError


def _get_log_files(config, root_directory):
result = config.rds_client.describe_db_log_files(DBInstanceIdentifier=config.db_identifier)

# FIXME filter out logs where 'Size' == 0?
rds_logs = [LogFile(d['LogFileName'], d['LastWritten'], d['Size'])
for d in result['DescribeDBLogFiles']]

# rds_logs[0].download(config)

# import pprint; pp=pprint.pprint
# import pdb; pdb.set_trace()

rds_logs.sort(key=lambda rds_log: rds_log.aws_name)

if not rds_logs:
print('No RDS logs found')
else:
for rds_log in rds_logs:
print(f'downloading {rds_log.aws_name}...')
rds_log.download(config)


if __name__ == '__main__':
_get_log_files(config, None)

Re: pg_stat_user_tables.n_mod_since_analyze persistence?

2021-02-16 Thread Philip Semanchuk




> On Feb 15, 2021, at 3:55 PM, Tom Lane  wrote:
> 
> Philip Semanchuk  writes:
>> I saw some unexpected behavior that I'm trying to understand. I suspect it 
>> might be a quirk specific to AWS Aurora and I'd like to confirm that.
> 
>> When I restart my local Postgres instance (on my Mac), the values in 
>> pg_stat_user_tables.n_mod_since_analyze are preserved. In other words, if 
>> table foo had n_mod_since_analyze=33 before the reboot, it still has 
>> n_mod_since_analyze=33 after the restart. 
> 
>> When I restart an AWS Aurora instance, the values in 
>> pg_stat_user_tables.n_mod_since_analyze all seem to be reset to 0. 
> 
>> Can anyone confirm (or refute) that the behavior I see on my Mac 
>> (preservation of these values through a restart) is common & expected 
>> behavior?
> 
> Yeah, in PG those stats would be preserved, at least as long as it's
> a clean shutdown.


Thanks, Tom. A colleague pointed me to a blog post by Michael Vitale that 
confirms this bug on AWS and contains more detail:
https://elephas.io/685-2/


Hope this helps someone else
Philip

pg_stat_user_tables.n_mod_since_analyze persistence?

2021-02-15 Thread Philip Semanchuk

Hi all,
I saw some unexpected behavior that I'm trying to understand. I suspect it 
might be a quirk specific to AWS Aurora and I'd like to confirm that.

When I restart my local Postgres instance (on my Mac), the values in 
pg_stat_user_tables.n_mod_since_analyze are preserved. In other words, if table 
foo had n_mod_since_analyze=33 before the reboot, it still has 
n_mod_since_analyze=33 after the restart. 

When I restart an AWS Aurora instance, the values in 
pg_stat_user_tables.n_mod_since_analyze all seem to be reset to 0. 

Can anyone confirm (or refute) that the behavior I see on my Mac (preservation 
of these values through a restart) is common & expected behavior?

Thanks
Philip

Re: How does Postgres decide if to use additional workers?

2021-02-10 Thread Philip Semanchuk

> On Feb 9, 2021, at 10:52 AM, Thorsten Schöning  wrote:
> 
> So, based on which facts does Postgres decide if to use aadditional
> workers or not? Can I see those decisions explained somewhere? I don't
> see anything in the query plan. Thanks!

Hi Thorsten,
This is an interesting topic for me too. here’s a formula for the max number of 
workers that Postgres will consider for a table. Of course, most queries use 
more than just one table, and I don’t yet understand how Postgres handles the 
situation where the formula suggests multiple workers for some tables and a 
single worker for others.

There was some conversation about this on the performance mailing list in June. 
Here’s a link to the message that contains the formula; there’s more items of 
interest in the whole thread:
https://www.postgresql.org/message-id/89423FD3-0F13-447D-8D9E-EB1722150F94%40americanefficient.com

Hope this helps
Philip

Re: How to check if a materialised view is being updated?

2021-01-19 Thread Philip Semanchuk

> On Jan 19, 2021, at 6:33 AM, Jayadevan M  wrote:
> 
> 
> 
> So I’m looking for a way to identify if the refresh process is finished or if 
> it’s still running  - preferably without having to wait for timeout by 
> querying a locked materialized view.  But by e.g. using the system tables or 
> otherwise.
> 
>  
> 
> Can anybody suggest some pointers on how to do this?
> 
>  
> Maybe pg_stat_activity will have the refresh query?

Yes, pg_stat_activity has a query column that could be searched with the 
regular string matching tools, including regex if necessary. pg_stat_activity 
also has some other useful columns that, like query_start which can tell you 
how long the query has been running.

Cheers
Philip

Re: plpgsql unit testing and code coverage

2020-12-31 Thread Philip Semanchuk




> On Dec 31, 2020, at 7:20 AM, Joao Miguel Ferreira 
>  wrote:
> 
> Hello,
> 
> I'm a fan of unit testing and related matters but have used it only on client 
> applications, not on database implemented logic. I recently joined a project 
> whit dozens of PL functions and procedures.
> 
> So, it would be great for me to find a way to execute unit tests on those 
> functions and procedures and, even better, if I can get code coverage results 
> from it. I really enjoy approaching sw development with these tools.
> 
> I'dd appreciate your feedback

Hi Joao,
Are you familiar with pgTAP? (https://pgtap.org) It gives you some of what you 
want, but I don’t think it handles coverage.

Cheers
Philip

Re: Avoid excessive inlining?

2020-12-22 Thread Philip Semanchuk




> On Dec 22, 2020, at 8:40 AM, Laurenz Albe  wrote:
> 
> On Mon, 2020-12-21 at 11:45 -0500, Philip Semanchuk wrote:
>>> On Dec 19, 2020, at 12:59 AM, Joel Jacobson  wrote:
>>> Is there a way to avoid excessive inlining when writing pure SQL functions, 
>>> without having to use PL/pgSQL?
>> 
>> The rules for inlining are here:
>> https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions
>> 
>> According to those rules, if you declared your SQL function as VOLATILE, 
>> then Postgres wouldn’t
>> inline it. From your question, I’m not sure if you want to have the same 
>> function inlined
>> sometimes and not others. I can’t think of a way to do that offhand.
> 
> Where do you see that?  As far as I know, VOLATILE is the best choice if you
> want the function to be inlined.

Ugh, you’re absolutely right, and I’m sorry for spreading misinformation. 
That’s what I get from quoting from memory rather than reading the link that I 
posted. 


> 
> I would say that the simplest way to prevent a function from being inlined
> is to set a parameter on it:
> 
>  ALTER FUNCTION f() SET enable_seqscan = on;

I appreciate the correction and education. 

Cheers
Philip

Re: Avoid excessive inlining?

2020-12-21 Thread Philip Semanchuk

> On Dec 19, 2020, at 12:59 AM, Joel Jacobson  wrote:
> 
> Is there a way to avoid excessive inlining when writing pure SQL functions, 
> without having to use PL/pgSQL?

Hi Joel,
The rules for inlining are here:
https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions

According to those rules, if you declared your SQL function as VOLATILE, then 
Postgres wouldn’t inline it. From your question, I’m not sure if you want to 
have the same function inlined sometimes and not others. I can’t think of a way 
to do that offhand.

Hope this helps,
Philip

Re: maintenance_work_mem

2020-11-19 Thread Philip Semanchuk




> On Nov 19, 2020, at 3:05 AM, Atul Kumar  wrote:
> 
> Hi,
> 
> I have below queries:
> 
> 1. How do i check the maintenance_work_mem for current session, before
> setting some other value for this parameter for the same session.
> 
> 2. and How do I set maintenance_work_mem for a session only, and how
> will it be "rollback" once my maintainance work is done, Do I need to
> execute any command for that or just closing the session will rollback
> what I set for the session.

In addition to Andreas’ helpful references to SHOW and SET, there’s also 
pg_settings --

https://www.postgresql.org/docs/13/view-pg-settings.html


Cheers
Philip

Re: precautions/prerequisites to take for specific table

2020-11-05 Thread Philip Semanchuk

> On Nov 5, 2020, at 8:49 AM, Vasu Madhineni  wrote:
> 
> Hi All,
> 
> In my organisation a newly built project application team requirement on 
> tables like have a column (text type), with size can reach around 3 MB, and 
> 45 million records annually. 
> 
> Are there any specific precautions/prerequisites we have to take from DBA end 
> to handle this type of table.

Hi Vasu,
Postgres can handle that just fine. We have a table with two text columns 
that’s 18Gb and almost 400 million rows, and that’s not a big table by some 
people’s standards.

If you have specific concerns, you’ll need to tell us more about your situation 
and why you think you won’t be satisfied.

Cheers
Philip

Re: postgres materialized view refresh performance

2020-10-26 Thread Philip Semanchuk




> On Oct 26, 2020, at 10:45 AM, Ayub M  wrote:
> 
> It's a simple sequential scan plan of one line, just reading the base table 
> sequentially. 

Well, unless I have misunderstood you, the materialized view is basically just 
"select * from some_other_table”, the number of records in the source table is 
~6m and doesn’t change much, there are no locking delays and no resource 
shortages, but sometimes the refresh takes minutes, and sometimes hours. 
There’s something missing from the story here.

Some things to try or check on —
 - activity (CPU, disk, memory) during the period when the mat view is 
refreshing 
 - each time after you refresh the mat view, vacuum it
 - even better, if you can afford a brief lock on reads, run a vacuum full 
instead of just regular vacuum
 - if possible, at the same time as you create the problematic mat view, run a 
similar process that writes to a different mat view (tmp_throwaway_mat_view) 
without the CONCURRENTLY keyword and see if it behaves similarly. 



> 
> On Mon, Oct 26, 2020, 9:21 AM Philip Semanchuk  
> wrote:
> 
> 
> > On Oct 25, 2020, at 10:52 PM, Ayub M  wrote:
> > 
> > Thank you both.
> > 
> > As for the mview refresh taking long --
> >   • The mview gets refreshed in a couple of mins sometimes and sometimes it 
> > takes hours. When it runs for longer, there are no locks and no resource 
> > shortage, the number of recs in the base table is 6m (7.5gb) which is not 
> > huge so why does it take so long to refresh the mview?
> > 
> > Does the run time correlate with the number of changes being made?  
> > 
> > -- Almost the same number of records are present in the base table (6 
> > million records). The base table gets truncated and reloaded everytime with 
> > almost the same number of records. 
> > 
> > And the mview is a simple select from this one base table. 
> > 
> > The mview has around 10 indexes, 1 unique and 9 non-unique indexes. 
> > 
> > Population of the base tables takes about 2 mins, using "insert into select 
> > from table", but when the mview is created for the first time it takes 16 
> > minutes. Even when I remove all but one unique index it takes about 7 
> > minutes. Any clue as to why it is taking longer than the create of the base 
> > table (which is 2 mins).
> 
> Do you know if it’s executing a different plan when it takes a long time? 
> auto_explain can help with that.
> 
> 
> 
> > 
> > On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk 
> >  wrote:
> > 
> > 
> > > On Oct 23, 2020, at 9:52 AM, Ravi Krishna  wrote:
> > > 
> > >> My understanding is that when CONCURRENTLY is specified, Postgres 
> > >> implements the refresh as a series of INSERT, UPDATE,
> > >> and DELETE statements on the existing view. So the answer to your 
> > >> question is no, Postgres doesn’t create another table and
> > >> then swap it.
> > > 
> > > The INSERTS/UPDATE/DELETE happens only for the difference.  PG first 
> > > creates a new temp table and then compares it with
> > > the MV and detects the difference.  That is why for CONCURRENTLY, a 
> > > unique index is required on the MV.
> > 
> > Yes, thank you, that’s what I understand too but I expressed it very 
> > poorly. 
> > 
> > 
> > 
> > -- 
> > Regards,
> > Ayub
>

Re: postgres materialized view refresh performance

2020-10-26 Thread Philip Semanchuk




> On Oct 25, 2020, at 10:52 PM, Ayub M  wrote:
> 
> Thank you both.
> 
> As for the mview refresh taking long --
>   • The mview gets refreshed in a couple of mins sometimes and sometimes it 
> takes hours. When it runs for longer, there are no locks and no resource 
> shortage, the number of recs in the base table is 6m (7.5gb) which is not 
> huge so why does it take so long to refresh the mview?
> 
> Does the run time correlate with the number of changes being made?  
> 
> -- Almost the same number of records are present in the base table (6 million 
> records). The base table gets truncated and reloaded everytime with almost 
> the same number of records. 
> 
> And the mview is a simple select from this one base table. 
> 
> The mview has around 10 indexes, 1 unique and 9 non-unique indexes. 
> 
> Population of the base tables takes about 2 mins, using "insert into select 
> from table", but when the mview is created for the first time it takes 16 
> minutes. Even when I remove all but one unique index it takes about 7 
> minutes. Any clue as to why it is taking longer than the create of the base 
> table (which is 2 mins).

Do you know if it’s executing a different plan when it takes a long time? 
auto_explain can help with that.



> 
> On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk 
>  wrote:
> 
> 
> > On Oct 23, 2020, at 9:52 AM, Ravi Krishna  wrote:
> > 
> >> My understanding is that when CONCURRENTLY is specified, Postgres 
> >> implements the refresh as a series of INSERT, UPDATE,
> >> and DELETE statements on the existing view. So the answer to your question 
> >> is no, Postgres doesn’t create another table and
> >> then swap it.
> > 
> > The INSERTS/UPDATE/DELETE happens only for the difference.  PG first 
> > creates a new temp table and then compares it with
> > the MV and detects the difference.  That is why for CONCURRENTLY, a unique 
> > index is required on the MV.
> 
> Yes, thank you, that’s what I understand too but I expressed it very poorly. 
> 
> 
> 
> -- 
> Regards,
> Ayub

Re: postgres materialized view refresh performance

2020-10-23 Thread Philip Semanchuk




> On Oct 23, 2020, at 9:52 AM, Ravi Krishna  wrote:
> 
>> My understanding is that when CONCURRENTLY is specified, Postgres implements 
>> the refresh as a series of INSERT, UPDATE,
>> and DELETE statements on the existing view. So the answer to your question 
>> is no, Postgres doesn’t create another table and
>> then swap it.
> 
> The INSERTS/UPDATE/DELETE happens only for the difference.  PG first creates 
> a new temp table and then compares it with
> the MV and detects the difference.  That is why for CONCURRENTLY, a unique 
> index is required on the MV.

Yes, thank you, that’s what I understand too but I expressed it very poorly.

Re: postgres materialized view refresh performance

2020-10-23 Thread Philip Semanchuk

> On Oct 22, 2020, at 3:53 PM, Ayub M  wrote:
> 
> There is a table t which is used in a mview mv, this is the only table in the 
> mview definition.
> 
> create table t (c1 int, ..., c10 int
> );
> 
> -- there is a pk on say c1 column
> create materialized view mv as select c1, c2...c10 from
>  t;
> 
> ---there is a unique index on say c5 and bunch of other indexes on the mview.
> The reason there is a mview created instead of using table t, is that that 
> the table gets truncated and reloaded every couple of hours and we don't want 
> users to see an empty table at any point of time that's why mview is being 
> used.
> 
> Using "refresh materialized view concurrently", this mview is being used by 
> APIs and end users.
> 
> Couple of questions I have -

Hi Ayub,
I’m not an expert on the subject; I hope you’ll get an answer from someone who 
is. :-) Until then, my answers might help.

>   • Whenever mview refresh concurrently happens, does pg create another 
> set of table and indexes and switch it with the orig? If no, then does it 
> update the existing data?

My understanding is that when CONCURRENTLY is specified, Postgres implements 
the refresh as a series of INSERT, UPDATE, and DELETE statements on the 
existing view. So the answer to your question is no, Postgres doesn’t create 
another table and then swap it.

>   • The mview gets refreshed in a couple of mins sometimes and sometimes 
> it takes hours. When it runs for longer, there are no locks and no resource 
> shortage, the number of recs in the base table is 6m (7.5gb) which is not 
> huge so why does it take so long to refresh the mview?

Does the run time correlate with the number of changes being made?

>   • Does mview need vacuum/analyze/reindex?

My understanding is that when CONCURRENTLY is specified, yes it does need 
vacuuming, because of the aforementioned implementation of REFRESH as a series 
of INSERT, UPDATE, and DELETE statements. 

In our situation, we have large views that are refreshed once per week. We want 
to ensure that the view is in the best possible shape for users, so we create 
the view with autovacuum_enabled = false and then run an explicit 
vacuum/analyze step immediately after the refresh rather than leaving it to 
chance.

Cheers
Philip

Re: PKEY getting corrupted

2020-09-10 Thread Philip Semanchuk

> On Sep 10, 2020, at 6:43 AM, Abraham, Danny  wrote:
> 
> Hi,
> 
> We have seen several times a situation where a PKEY is compromised and 
> duplicate values are created within a table.
> 
> This has happened so far on PG 928 on Linux and Windows, and also on PG955 on 
> AIX.
> 
> We ran massive test on PG10.4 but it has not recreated the problem.
> 
> Has anyone faced the same issue? Any known bug?

In my experience, this is usually due to an application writing PK values 
rather than allowing a Postgres sequence to generate them, or due to the 
application misusing sequences. Postgres sequences are monotonically 
increasing, so even if you insert sequence values “out of order”, they won’t 
overlap. That’s not to say a bug in Postgres is impossible, but every time I’ve 
seen this problem, it’s been my application, not Postgres.

Postgres 10 added new syntax for PK columns "GENERATED ALWAYS AS IDENTITY” 
(https://www.postgresql.org/docs/10/sql-createtable.html). Postgres will raise 
an exception if an application tries to insert a specific value into a column 
defined this way. When we upgraded from 9.6 to 11 we converted all of our PK 
columns to use this syntax which added a nice safety net for us. 

You’re probably already aware that 9.5.23 is the oldest supported version in 
the 9.x series, so the sooner you can upgrade those unsupported 9.x versions 
the better. Maybe you’re in the middle of an upgrade already which is why this 
came up. :-) 

Cheers
Philip

Re: Understanding EXPLAIN ANALYZE estimates when loops != 1

2020-08-20 Thread Philip Semanchuk




> On Aug 19, 2020, at 6:24 PM, David Rowley  wrote:
> 
> On Thu, 20 Aug 2020 at 09:55, Philip Semanchuk
>  wrote:
>> I could use some help interpreting EXPLAIN ANALYZE output.
>> 
>> ->  Index Scan using ix_foo on t1 (cost=0.69..68.57 rows=3283 width=105) 
>> (actual time=0.006..0.918 rows=3760 loops=94)
>> 
>> The actual rows returned by this plan node ~= 3760 * 94 = 353,440.
> 
> Yes.  It's total rows / loops rounded to the nearest integer number.
> 
>> Did postgres expect (estimate) 3283 rows from this join, or 3283 * 94 = 
>> 308,602?
> 
> Yes, that's the case at least when the node is not a Parallel node.
> If this index scan was part of a parameterized nested loop, then
> you'll see the estimate of the number of expected loops from the outer
> side of the join.

Thanks, I was wondering where the 94 came from. 


> Same question for this node.
>> 
>> ->  Parallel Index Scan using pk_xyz on xyz  (cost=0.29..2354.67 rows=54285 
>> width=25) (actual time=0.049..6.326 rows=14864 loops=5)
>> 
>> Actual rows ~= 14864 * 5 = 74,320, estimate = 54285 or 54285 * 5?
> 
> So parallel plans are a bit more complex.   The row estimates are the
> total estimated rows  / the amount of workers we expect to do useful
> work.  You might expect the divisor there to be an integer number
> since you can't really have 0.5 workers.  However, it's more complex
> than that since the leader has other tasks to take care of such as
> pulling tuples from workers, it's not dedicated to helping out.

Sounds like it help to set max_parallel_workers = 1 before running EXPLAIN 
ANALYZE in order to simplify the numbers, yes? Or is there a possibility that 
doing so would send the planner down an entirely different path?

> 
> If you're into reading C code, then there's more information in
> https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/costsize.c#L5699
> , if you hunt around for usages of that function then you'll see the
> estimated row counts are divided by the return value of that function.

Yes, I’ve spent some time reading that file and its relatives. It’s been 
helpful. 

Much appreciated
Philip

Understanding EXPLAIN ANALYZE estimates when loops != 1

2020-08-19 Thread Philip Semanchuk

Hi all,
I could use some help interpreting EXPLAIN ANALYZE output. 

->  Index Scan using ix_foo on t1 (cost=0.69..68.57 rows=3283 width=105) 
(actual time=0.006..0.918 rows=3760 loops=94)

The actual rows returned by this plan node ~= 3760 * 94 = 353,440. Did postgres 
expect (estimate) 3283 rows from this join, or 3283 * 94 = 308,602? 


Same question for this node.

->  Parallel Index Scan using pk_xyz on xyz  (cost=0.29..2354.67 rows=54285 
width=25) (actual time=0.049..6.326 rows=14864 loops=5)

Actual rows ~= 14864 * 5 = 74,320, estimate = 54285 or 54285 * 5?


THanks
Philip

Re: Implement a new data type

2020-08-12 Thread Philip Semanchuk

> On Aug 11, 2020, at 8:01 PM, raf  wrote:
> 
> On Tue, Aug 11, 2020 at 06:38:39AM -0700, Miles Elam 
>  wrote:
> 
>> Also of note: PostgreSQL already has a money type (
>> https://www.postgresql.org/docs/current/datatype-money.html)
>> But you shouldn't use it (
>> https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_money).
>> 
>> I only bring it up so that you can know to make your money type a slightly
>> different name to avoid a conflict. Money is deceptively hard to implement
>> correctly. I'd recommend reading the second link if you have not already to
>> avoid previously known issues.
> 
> I use decimal(10,2) for whole cents, and decimal(12,6)
> for sub-cents. Single currency only. I didn't know
> there was a money type originally, but it wouldn't be
> usable for me anyway without the ability to specify the
> scale and precision.
> 
> I recommend considering passing values to the database
> as "decimal '1.23'" rather than bare numeric literals,
> just so there's no chance of the value being
> interpreted as a float at any stage by postgres. Maybe
> that's being too paranoid but that's a good idea when
> it comes to money. :-)

Yes, I agree, this is also important (and easy to overlook) if you’re accessing 
the database via a non-SQL language. We use Python which, like most (all?) 
languages that rely on the underlying C library for floating point support, is 
vulnerable to floating point noise. Python has a fixed precision type, and like 
Postgres it also accepts character and float input. The float input can give 
surprising results.

>>> decimal.Decimal('1.79')   # This is OK
Decimal('1.79')
>>> decimal.Decimal(1.79) # This will not end well!
Decimal('1.79003552713678800500929355621337890625')
>>>

In the case of a Postgres column like numeric(10,2), input like 
1.79003552713678800500929355621337890625 will get rounded to 1.79 
anyway and no harm will be done. But like you said, raf, it’s a good idea to be 
too paranoid. :-)

Cheers
Philip

Re: is JIT available

2020-07-27 Thread Philip Semanchuk

> On Jul 25, 2020, at 8:21 AM, Pavel Stehule  wrote:
> 
> 
> 
> so 25. 7. 2020 v 14:04 odesílatel Scott Ribe  
> napsal:
> > On Jul 24, 2020, at 9:55 PM, Pavel Stehule  wrote:
> > 
> > SELECT * FROM pg_config;
> 
> That doesn't tell me whether or not it can actually be used.
> 
> It shows if Postgres was compiled with JIT support.
> 
> When you run EXPLAIN ANALYZE SELECT ... then you can see info about JIT 
> overhead. If you don't see notices about JIT in EXPLAIN, then JIT was not 
> used.

I like Pavel’s 'EXPLAIN ANALYZE SELECT’ suggestion a lot. I think setting 
jit=on and jit_above_cost=1 and then running 'EXPLAIN ANALYZE SELECT’ is a very 
effective way to see whether jit is available in practice.

On installations where jit isn’t available (like on my Mac or on AWS Aurora), 
you can still set jit=on in a session and Postgres doesn’t complain, but that 
doesn’t mean it’s actually enabled.

Cheers
Philip

Re: Same query taking less time in low configuration machine

2020-07-14 Thread Philip Semanchuk




> On Jul 14, 2020, at 5:27 AM, Vishwa Kalyankar  
> wrote:
> 
> Hi,  
> 
> I have two machines - one with 8GB RAM & 4core CPU and the other with 64GB 
> Ram & 24 core CPU.  Both machines have the same DB (Postgres 12 + Postgis 
> 2.5.3).  Same query is taking less time in low end machine whereas more time 
> in high end machine.  Any thoughts on where to look?  I have tuned the db in 
> both machines according to https://pgtune.leopard.in.ua/#/ 
> 
> 
> Below I am pasting the output of query explain in both the machines.
> 
>  -bash-4.2$ psql -p 5434
> psql (12.3)
> Type "help" for help.
> 
> postgres=# \c IPDS_KSEB;
> You are now connected to database "IPDS_KSEB" as user "postgres".
> IPDS_KSEB=# explain analyze select * from 
> kseb_geometry_trace_with_barrier_partition(5,'kottarakara_version',437,'htline',2)
>  ;
>QUERY 
> PLAN
> -
>  Function Scan on kseb_geometry_trace_with_barrier_partition  
> (cost=0.25..10.25 rows=1000 width=169) (actual time=11626.548..11626.568 
> rows=254 loops=1)
>  Planning Time: 0.212 ms
>  Execution Time: 11628.590 ms
> 
> 
> -bash-4.2$ psql -p 5422
> psql (12.3)
> Type "help" for help.
> 
> postgres=# \c IPDS_KSEB;
> You are now connected to database "IPDS_KSEB" as user "postgres".
> IPDS_KSEB=# explain analyze select * from 
> kseb_geometry_trace_with_barrier_partition(5,'kottarakara_version',437,'htline',2)
>  ;
>QUERY 
> PLAN
> -
>  Function Scan on kseb_geometry_trace_with_barrier_partition  
> (cost=0.25..10.25 rows=1000 width=169) (actual time=22304.425..22304.448 
> rows=254 loops=1)
>  Planning Time: 0.219 ms
>  Execution Time: 22352.219 ms
> (3 rows)
> 

Hi Vishwa,
Is it possible that your data is in the cache on the low end machine but not on 
the high end machine? There’s both the Postgres cache and the OS disk cache to 
consider. You can see what’s in the Postgres cache with an extension like 
pg_buffercache. I don’t know of a way to see what’s in the OS cache; maybe 
others do.

Cheers
Philip

Re: Enforcing uniqueness on [real estate/postal] addresses

2020-05-11 Thread Philip Semanchuk




> On May 11, 2020, at 12:55 PM, Peter Devoy  wrote:
> 
> Hi list
> 
> I need to store addresses for properties (as in real estate) so in my
> naivety I created a unique constraint like this:
> 
> ALTER TABLE properties
>ADD CONSTRAINT is_unique_address
>UNIQUE (
>description, --e.g. Land north of Foo Cottage
>address_identifier_general,
>street,
>postcode
>);
> 
> Of course, if any of the fields are NULL (which they often are) I end
> up with duplicates.
> 
> One solution may be to add NOT NULL constraints and use empty strings
> instead of NULL values but, until asking around today, I thought this was
> generally considered bad practice.
> 
> Please can anyone recommend a way of approaching this? Perhaps empty strings
> are pragmatic in this situation?

Hi Peter,
I wouldn’t use empty strings in place of NULL. It’s possible to define a 
partial unique index that has more or less the same effect as a constraint. 
Have you looked into them? 

Cheers
Philip

46 matches

Mail list logo