Re: Do we want a hashset type?

2023-06-24 Thread Joel Jacobson
On Thu, Jun 22, 2023, at 07:51, Joel Jacobson wrote:
> For instance, how should hashset_count() work?
>
> Given the query,
>
> SELECT hashset_count('{1,2,3,null}'::int4hashset);
>
> Should we,
>
> a) threat NULL as a distinct value and return 4?
>
> b) ignore NULL and return 3?
>
> c) return NULL? (since the presence of NULL can be thought to render 
> the entire count indeterminate)
>
> I think my personal preference is (b) since it is then consistent with 
> how COUNT() works.

Having thought a bit more on this matter,
I think it's better to remove hashset_count() since the semantics are not 
obvious,
and instead provide a hashset_cardinality() function, that would obviously
include a possible null value in the number of elements:

SELECT hashset_cardinality('{1,2,3,null}'::int4hashset);
4

SELECT hashset_cardinality('{null}'::int4hashset);
1

SELECT hashset_cardinality('{null,null}'::int4hashset);
1

SELECT hashset_cardinality('{}'::int4hashset);
0

SELECT hashset_cardinality(NULL::int4hashset);
NULL

Sounds good?

/Joel




Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs

2023-06-24 Thread Tomas Vondra
On 6/24/23 02:08, Tom Lane wrote:
> Tomas Vondra  writes:
>> The problem is that the selectivity for "IS NULL" is estimated using the
>> table-level statistics. But the LEFT JOIN entirely breaks the idea that
>> the null_frac has anything to do with NULLs in the join result.
> 
> Right.
> 
>> I wonder how to improve this, say by adjusting the IS NULL selectivity
>> when we know to operate on the outer side of the join. We're able to
>> do this for antijoins, so maybe we could do that here, somehow?
> 
> This mess is part of the long-term plan around the work I've been doing
> on outer-join-aware Vars.  We now have infrastructure that can let
> the estimator routines see "oh, this Var isn't directly from a scan
> of its table, it's been passed through a potentially-nulling outer
> join --- and I can see which one".  I don't have more than vague ideas
> about what happens next, but that is clearly an essential step on the
> road to doing better.
> 

I was wondering if that work on outer-join-aware Vars could help with
this, but I wasn't following it very closely. I agree the ability to
check if the Var could be NULL due to an outer join seems useful, as it
says whether applying raw attribute statistics makes sense or not.

I was thinking about what to do for the case when that's not possible,
i.e. when the Var refers to nullable side of the join. Knowing that this
is happening is clearly not enough - we need to know how many new NULLs
are "injected" into the join result, and "communicate" that to the
estimation routines.

Attached is a very ugly experimental patch doing that, and with it the
estimate changes to this:

 QUERY PLAN
  --
   Hash Left Join  (cost=3.25..18179.88 rows=00 width=16)
   (actual time=0.528..596.151 rows=00 loops=1)
 Hash Cond: (large.id = small.id)
 Filter: ((small.id IS NULL) OR
  (large.a = ANY ('{1000,2000,3000,4000,5000}'::integer[])))
 Rows Removed by Filter: 100
 ->  Seq Scan on large  (cost=0.00..14425.00 rows=100 width=8)
 (actual time=0.069..176.138 rows=100 loops=1)
 ->  Hash  (cost=2.00..2.00 rows=100 width=8)
   (actual time=0.371..0.373 rows=100 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 12kB
   ->  Seq Scan on small  (cost=0.00..2.00 rows=100 width=8)
 (actual time=0.032..0.146 rows=100 loops=1)
   Planning Time: 3.845 ms
   Execution Time: 712.405 ms
  (10 rows)

Seems nice, but. The patch is pretty ugly, I don't claim it works for
other queries or that this is exactly what we should do. It calculates
"unmatched frequency" next to eqjoinsel_inner, stashes that info into
sjinfo and the estimator (nulltestsel) then uses that to adjust the
nullfrac it gets from the statistics.

The good thing is this helps even for IS NULL checks on non-join-key
columns (where we don't switch to an antijoin), but there's a couple
things that I dislike ...

1) It's not restricted to outer joins or anything like that (this is
mostly just my laziness / interest in one particular query, but also
something the outer-join-aware patch might help with).

2) We probably don't want to pass this kind of information through
sjinfo. That was the simplest thing for an experimental patch, but I
suspect it's not the only piece of information we may need to pass to
the lower levels of estimation code.

3) I kinda doubt we actually want to move this responsibility (to
consider fraction of unmatched rows) to the low-level estimation
routines (e.g. nulltestsel and various others). AFAICS this just
"introduces NULLs" into the relation, so maybe we could "adjust" the
attribute statistics (in examine_variable?) by inflating null_frac and
modifying the other frequencies in MCV/histogram.

4) But I'm not sure we actually want to do that in these low-level
selectivity functions. The outer join essentially produces output with
two subsets - one with matches on the outer side, one without them. But
the side without matches has NULLs in all columns. In a way, we know
exactly how are these columns correlated - if we do the usual estimation
(even with the null_frac adjusted), we just throw this information away.
And when there's a lot of rows without a match, that seems bad.

So maybe we should split the join estimate into two parts, one for each
subset of the join result. One for the rows with a match (and then we
can just do what we do now, with the attribute stats we already have).
And one for the "unmatched part" where we know the values on the outer
side are NULL (and then we can easily "fake" stats with null_frac=1.0).


I really hope what I just wrote makes at least a little bit of sense.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
in

Re: Remove deprecation warnings when compiling PG ~13 with OpenSSL 3.0~

2023-06-24 Thread Michael Paquier
On Fri, Jun 23, 2023 at 10:41:06PM +0200, Peter Eisentraut wrote:
> Considering that, yes.

Thanks, applied to 11~13, then.
--
Michael


signature.asc
Description: PGP signature


Re: Stampede of the JIT compilers

2023-06-24 Thread Tomas Vondra



On 6/24/23 02:33, David Rowley wrote:
> On Sat, 24 Jun 2023 at 02:28, James Coleman  wrote:
>> There are a couple of issues here. I'm sure it's been discussed
>> before, and it's not the point of my thread, but I can't help but note
>> that the default value of jit_above_cost of 10 seems absurdly low.
>> On good hardware like we have even well-planned queries with costs
>> well above that won't be taking as long as JIT compilation does.
> 
> It would be good to know your evidence for thinking it's too low.
> 
> The main problem I see with it is that the costing does not account
> for how many expressions will be compiled. It's quite different to
> compile JIT expressions for a query to a single table with a simple
> WHERE clause vs some query with many joins which scans a partitioned
> table with 1000 partitions, for example.
> 

I think it's both - as explained by James, there are queries with much
higher cost, but the JIT compilation takes much more than just running
the query without JIT. So the idea that 100k difference is clearly not
sufficient to make up for the extra JIT compilation cost.

But it's true that's because the JIT costing is very crude, and there's
little effort to account for how expensive the compilation will be (say,
how many expressions, ...).

IMHO there's no "good" default that wouldn't hurt an awful lot of cases.

There's also a lot of bias - people are unlikely to notice/report cases
when the JIT (including costing) works fine. But they sure are annoyed
when it makes the wrong choice.

>> But on the topic of the thread: I'd like to know if anyone has ever
>> considered implemented a GUC/feature like
>> "max_concurrent_jit_compilations" to cap the number of backends that
>> may be compiling a query at any given point so that we avoid an
>> optimization from running amok and consuming all of a servers
>> resources?
> 
> Why do the number of backends matter?  JIT compilation consumes the
> same CPU resources that the JIT compilation is meant to save.  If the
> JIT compilation in your query happened to be a net win rather than a
> net loss in terms of CPU usage, then why would
> max_concurrent_jit_compilations be useful? It would just restrict us
> on what we could save. This idea just covers up the fact that the JIT
> costing is disconnected from reality.  It's a bit like trying to tune
> your radio with the volume control.
> 

Yeah, I don't quite get this point either. If JIT for a given query
helps (i.e. makes execution shorter), it'd be harmful to restrict the
maximum number of concurrent compilations. It we just disable JIT after
some threshold is reached, that'd make queries longer and just made the
pileup worse.

If it doesn't help for a given query, we shouldn't be doing it at all.
But that should be based on better costing, not some threshold.

In practice there'll be a mix of queries where JIT does/doesn't help,
and this threshold would just arbitrarily (and quite unpredictably)
enable/disable costing, making it yet harder to investigate slow queries
(as if we didn't have enough trouble with that already).

> I think the JIT costs would be better taking into account how useful
> each expression will be to JIT compile.  There were some ideas thrown
> around in [1].
> 

+1 to that


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Allow ordered partition scans in more cases

2023-06-24 Thread David Rowley
On Sat, 4 Mar 2023 at 00:56, David Rowley  wrote:
> What's on my mind now is if turning 1 Sort into N Sorts is a
> particularly good idea from a work_mem standpoint. I see that we don't
> do tuplesort_end() until executor shutdown, so that would mean that we
> could end up using 1 x work_mem per Sort node.  I idly wondered if we
> couldn't do tuplesort_end() after spitting out the final tuple when
> EXEC_FLAG_REWIND is not set, but that would still mean we could use N
> work_mems when EXEC_FLAG_REWIND *is* set. We only really have
> visibility of that during execution too, so can't really make a
> decision at plan time based on that.

Because of the above, I'm not planning on pursuing this patch any
further.  We can maybe revisit this if we come up with better ways to
manage the number of work_mems a plan can have in the future. As of
now, I'm a little too worried that this patch will end up consuming
too many work_mems by adding a Sort node per partition.

I'll mark this as withdrawn.

David




Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-06-24 Thread Joe Conway

On 6/22/23 03:26, Heikki Linnakangas wrote:

On 21/06/2023 01:02, Joe Conway wrote:

On 6/19/23 19:30, Heikki Linnakangas wrote:

I think we should call "uselocale(LC_GLOBAL_LOCALE)" immediately after
returning from the perl interpreter, instead of before setlocale()
calls, if we want all Postgres code to run with the global locale. Not
sure how much performance overhead that would have.


I don't see how that is practical, or at least it does not really
address the issue. I think any loaded shared library could cause the
same problem by running newlocale() + uselocale() on init. Perhaps I
should go test that theory though.


Any shared library could do that, that's true. Any shared library could
also call 'chdir'. But most shared libraries don't. I think it's the
responsibility of the extension that loads the shared library, plperl in
this case, to make sure it doesn't mess up the environment for the
postgres backend.

Ok, fair enough.

The attached fixes all of the issues raised on this thread by 
specifically patching plperl.


8<
create or replace function finnish_to_number()
returns numeric as
$$
  select to_number('1,23', '9D99')
$$ language sql set lc_numeric to 'fi_FI.utf8';

pl_regression=# show lc_monetary;
 lc_monetary
-
 C
(1 row)

DO LANGUAGE 'plperlu'
$$
  use POSIX qw(setlocale LC_NUMERIC);
  use locale;
  setlocale LC_NUMERIC, "fi_FI.utf8";
  $n = 5/2;   # Assign numeric 2.5 to $n
  spi_exec_query('SELECT finnish_to_number()');
  # Locale-dependent conversion to string
  $a = " $n";
  # Locale-dependent output
  elog(NOTICE, "half five is $n");
$$;
NOTICE:  half five is 2,5
DO

set lc_messages ='sv_SE.UTF8';
this prints syntax error in Swedish;
FEL:  syntaxfel vid eller nära "this"
LINE 1: this prints syntax error in Swedish;
^

set lc_messages ='en_GB.utf8';
this *should* print syntax error in English;
ERROR:  syntax error at or near "this"
LINE 1: this *should* print syntax error in English;
^
set lc_monetary ='sv_SE.UTF8';
SELECT 12.34::money AS price;
  price
--
 12,34 kr
(1 row)

set lc_monetary ='en_GB.UTF8';
SELECT 12.34::money AS price;
 price

 £12.34
(1 row)

set lc_monetary ='en_US.UTF8';
SELECT 12.34::money AS price;
 price

 $12.34
(1 row)
8<

This works correctly from what I can see -- tested against pg16beta1 on 
Linux Mint with perl v5.34.0 as well as against pg15.2 on RHEL 7 with 
perl v5.16.3.


Although I have not looked yet, presumably we could have similar 
problems with plpython. I would like to get agreement on this approach 
against plperl before diving into that though.


Thoughts?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Ensure correct locale is used when executing plperl

Newer versions of libperl, via plperl, call uselocale() which
has the effect of changing the current locale away from the
global locale underneath postgres. This can result in, among other
infelicities, localeconv() grabbing the wrong locale for
numeric and monetary symbols and formatting. Fix that by arranging
to capture the perl locale and swapping with the global locale
as appropriate when entering and exiting libperl. Importantly,
this dance is also needed when exiting perl via SPI calls made
while executing perl.

Backpatch to all supported versions.

Author: Joe Conway
Reviewed-By: Tom Lane and Heikki Linnakangas
Reported by: Guido Brugnara
Discussion: https://postgr.es/m/flat/17946-3e84cb577e9551c3%40postgresql.org
Backpatch-through: 11

diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index 8638642..9831361 100644
*** a/src/pl/plperl/plperl.c
--- b/src/pl/plperl/plperl.c
*** typedef struct plperl_array_info
*** 223,228 
--- 223,233 
  static HTAB *plperl_interp_hash = NULL;
  static HTAB *plperl_proc_hash = NULL;
  static plperl_interp_desc *plperl_active_interp = NULL;
+ /*
+  * Newer versions of perl call uselocale() to switch away from
+  * the global locale used by the backend. Store that here.
+  */
+ static locale_t perl_locale_obj = LC_GLOBAL_LOCALE;
  
  /* If we have an unassigned "held" interpreter, it's stored here */
  static PerlInterpreter *plperl_held_interp = NULL;
*** static char *setlocale_perl(int category
*** 302,307 
--- 307,314 
  #define setlocale_perl(a,b)  Perl_setlocale(a,b)
  #endif			/* defined(WIN32) && PERL_VERSION_LT(5, 28, 0) */
  
+ static void plperl_xact_callback(XactEvent event, void *arg);
+ 
  /*
   * Decrement the refcount of the given SV within the active Perl interpreter
   *
*** _PG_init(void)
*** 482,487 
--- 489,508 
  	 */
  	plperl_held_interp = plperl_init_interp();
  
+ 	/*
+ 	 * Grab a copy of perl locale in use, and switch back
+ 	 * to the global one. We will need to switch back and
+ 	 * forth, such that the current locale is perl's whenever
+ 	 * we are about to evaluate perl code, and the global
+ 	 * locale whene

Re: psql: Add role's membership options to the \du+ command

2023-06-24 Thread Pavel Luzanov
Thank you for all valuable comments. I can now continue working on the 
patch.

Here's what I plan to do in the next version.

Changes for \du & \dg commands
* showing distinct roles in the "Member of" column
* explicit order for list of roles
* no changes for extended mode (\du+)

New meta-command \drg
* showing info from pg_auth_members based on a query:

SELECT r.rolname role, m.rolname member,
   pg_catalog.concat_ws(', ',
   CASE WHEN pam.admin_option THEN 'ADMIN' END,
   CASE WHEN pam.inherit_option THEN 'INHERIT' END,
   CASE WHEN pam.set_option THEN 'SET' END
   ) AS options,
   g.rolname grantor
FROM pg_catalog.pg_auth_members pam
 JOIN pg_catalog.pg_roles r ON (pam.roleid = r.oid)
 JOIN pg_catalog.pg_roles m ON (pam.member = m.oid)
 JOIN pg_catalog.pg_roles g ON (pam.grantor = g.oid)
WHERE r.rolname !~ '^pg_'
ORDER BY role, member, grantor;
   role   |  member  |   options | grantor
--+--+-+--
 regress_du_role0 | regress_du_admin | ADMIN, INHERIT, SET | postgres
 regress_du_role0 | regress_du_role1 | ADMIN, INHERIT, SET | 
regress_du_admin

 regress_du_role0 | regress_du_role1 | INHERIT | regress_du_role1
 regress_du_role0 | regress_du_role1 | SET | regress_du_role2
 regress_du_role0 | regress_du_role2 | ADMIN | regress_du_admin
 regress_du_role0 | regress_du_role2 | INHERIT, SET | regress_du_role1
 regress_du_role0 | regress_du_role2 | | regress_du_role2
 regress_du_role1 | regress_du_admin | ADMIN, INHERIT, SET | postgres
 regress_du_role1 | regress_du_role2 | ADMIN, SET | regress_du_admin
 regress_du_role2 | regress_du_admin | ADMIN, INHERIT, SET | postgres
(10 rows)

Notes
* The name of the new command. It's a good name, if not for the history.
There are two commands showing the same information about roles: \du and 
\dr.
The addition of \drg may be misinterpreted: if there is \drg, then there 
is also \dug.
Maybe it's time to think about deprecating of the \du command and leave 
only \dg in the next versions?


* 'empty'. I suggest thinking about forbidding the situation with empty 
options.

If we prohibit them, the issue will be resolved automatically.

* The new meta-command will also make sense for versions <16.
The ADMIN OPTION is available in all supported versions.

* The new meta-command will not show all roles. It will only show the 
roles included in other roles.
To show all roles you need to add an outer join between pg_roles and 
pg_auth_members.

But all columns except "role" will be left blank. Is it worth doing this?

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com





Re: Things I don't like about \du's "Attributes" column

2023-06-24 Thread Pavel Luzanov

On 23.06.2023 03:50, Tom Lane wrote:

Nearby I dissed psql's \du command for its incoherent "Attributes"
column [1].  It's too late to think about changing that for v16,
but here's some things I think we should consider for v17:


If there are no others willing, I am ready to take up this topic. There 
is definitely room for improvement here.

But first I want to finish with the \du command.

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com





Re: psql: Add role's membership options to the \du+ command

2023-06-24 Thread David G. Johnston
On Sat, Jun 24, 2023 at 8:11 AM Pavel Luzanov 
wrote:

> Notes
> * The name of the new command. It's a good name, if not for the history.
> There are two commands showing the same information about roles: \du and
> \dr.
> The addition of \drg may be misinterpreted: if there is \drg, then there
> is also \dug.
> Maybe it's time to think about deprecating of the \du command and leave
> only \dg in the next versions?
>

I would add \dr as the new official command to complement adding \drg and
deprecate both \du and \dg.  Though actual removal and de-documenting
doesn't seem like a good idea.  But if we ever did assign something
non-role to \dr it would be very confusing.



> * The new meta-command will also make sense for versions <16.
> The ADMIN OPTION is available in all supported versions.
>

Doesn't every role pre-16 gain SET permission?  We can also deduce whether
the grant provides INHERIT based upon the attribute of the role in question.


> * The new meta-command will not show all roles. It will only show the
> roles included in other roles.
> To show all roles you need to add an outer join between pg_roles and
> pg_auth_members.
> But all columns except "role" will be left blank. Is it worth doing this?
>
>
I'm inclined to want this.  I would be good when specifying a role to
filter upon that all rows that do exist matching that filter end up in the
output regardless if they are standalone or not.

David J.


Re: Stampede of the JIT compilers

2023-06-24 Thread James Coleman
On Sat, Jun 24, 2023 at 7:40 AM Tomas Vondra
 wrote:
>
>
>
> On 6/24/23 02:33, David Rowley wrote:
> > On Sat, 24 Jun 2023 at 02:28, James Coleman  wrote:
> >> There are a couple of issues here. I'm sure it's been discussed
> >> before, and it's not the point of my thread, but I can't help but note
> >> that the default value of jit_above_cost of 10 seems absurdly low.
> >> On good hardware like we have even well-planned queries with costs
> >> well above that won't be taking as long as JIT compilation does.
> >
> > It would be good to know your evidence for thinking it's too low.

It's definitely possible that I stated this much more emphatically
than I should have -- it was coming out of my frustration with this
situation after all.

I think, though, that my later comments here will provide some
philosophical justification for it.

> > The main problem I see with it is that the costing does not account
> > for how many expressions will be compiled. It's quite different to
> > compile JIT expressions for a query to a single table with a simple
> > WHERE clause vs some query with many joins which scans a partitioned
> > table with 1000 partitions, for example.
> >
>
> I think it's both - as explained by James, there are queries with much
> higher cost, but the JIT compilation takes much more than just running
> the query without JIT. So the idea that 100k difference is clearly not
> sufficient to make up for the extra JIT compilation cost.
>
> But it's true that's because the JIT costing is very crude, and there's
> little effort to account for how expensive the compilation will be (say,
> how many expressions, ...).
>
> IMHO there's no "good" default that wouldn't hurt an awful lot of cases.
>
> There's also a lot of bias - people are unlikely to notice/report cases
> when the JIT (including costing) works fine. But they sure are annoyed
> when it makes the wrong choice.
>
> >> But on the topic of the thread: I'd like to know if anyone has ever
> >> considered implemented a GUC/feature like
> >> "max_concurrent_jit_compilations" to cap the number of backends that
> >> may be compiling a query at any given point so that we avoid an
> >> optimization from running amok and consuming all of a servers
> >> resources?
> >
> > Why do the number of backends matter?  JIT compilation consumes the
> > same CPU resources that the JIT compilation is meant to save.  If the
> > JIT compilation in your query happened to be a net win rather than a
> > net loss in terms of CPU usage, then why would
> > max_concurrent_jit_compilations be useful? It would just restrict us
> > on what we could save. This idea just covers up the fact that the JIT
> > costing is disconnected from reality.  It's a bit like trying to tune
> > your radio with the volume control.
> >
>
> Yeah, I don't quite get this point either. If JIT for a given query
> helps (i.e. makes execution shorter), it'd be harmful to restrict the
> maximum number of concurrent compilations. It we just disable JIT after
> some threshold is reached, that'd make queries longer and just made the
> pileup worse.

My thought process here is that given the poor modeling of JIT costing
you've both described that we're likely to estimate the cost of "easy"
JIT compilation acceptably well but also likely to estimate "complex"
JIT compilation far lower than actual cost.

Another way of saying this is that our range of JIT compilation costs
may well be fine on the bottom end but clamped on the high end, and
that means that our failure modes will tend towards the worst
mis-costings being the most painful (e.g., 2s compilation time for a
100ms query). This is even more the case in an OLTP system where the
majority of queries are already known to be quite fast.

In that context capping the number of backends compiling, particularly
where plans (and JIT?) might be cached, could well save us (depending
on workload).

That being said, I could imagine an alternative approach solving a
similar problem -- a way of exiting early from compilation if it takes
longer than we expect.

> If it doesn't help for a given query, we shouldn't be doing it at all.
> But that should be based on better costing, not some threshold.
>
> In practice there'll be a mix of queries where JIT does/doesn't help,
> and this threshold would just arbitrarily (and quite unpredictably)
> enable/disable costing, making it yet harder to investigate slow queries
> (as if we didn't have enough trouble with that already).
>
> > I think the JIT costs would be better taking into account how useful
> > each expression will be to JIT compile.  There were some ideas thrown
> > around in [1].
> >
>
> +1 to that

That does sound like an improvement.

One thing about our JIT that is different from e.g. browser JS engine
JITing is that we don't substitute in the JIT code "on the fly" while
execution is already underway. That'd be another, albeit quite
difficult, way to solve these issues.

Regards,
James Coleman




Re: Stampede of the JIT compilers

2023-06-24 Thread Tom Lane
James Coleman  writes:
> On Sat, Jun 24, 2023 at 7:40 AM Tomas Vondra
>  wrote:
>> On 6/24/23 02:33, David Rowley wrote:
>>> On Sat, 24 Jun 2023 at 02:28, James Coleman  wrote:
 There are a couple of issues here. I'm sure it's been discussed
 before, and it's not the point of my thread, but I can't help but note
 that the default value of jit_above_cost of 10 seems absurdly low.
 On good hardware like we have even well-planned queries with costs
 well above that won't be taking as long as JIT compilation does.

>>> It would be good to know your evidence for thinking it's too low.

> It's definitely possible that I stated this much more emphatically
> than I should have -- it was coming out of my frustration with this
> situation after all.

I think there is *plenty* of evidence that it is too low, or at least
that for some reason we are too willing to invoke JIT when the result
is to make the overall cost of a query far higher than it is without.
Just see all the complaints on the mailing lists that have been
resolved by advice to turn off JIT.  You do not even have to look
further than our own regression tests: on my machine with current
HEAD, "time make installcheck-parallel" reports

real0m8.544s
user0m0.906s
sys 0m0.863s

for a build without --with-llvm, and

real0m13.211s
user0m0.917s
sys 0m0.811s

for a build with it (and all JIT settings left at defaults).  If you
do non-parallel "installcheck" the ratio is similar.  I don't see how
anyone can claim that 50% slowdown is just fine.

I don't know whether raising the default would be enough to fix that
in a nice way, and I certainly don't pretend to have a specific value
to offer.  But it's undeniable that we have a serious problem here,
to the point where JIT is a net negative for quite a few people.


> In that context capping the number of backends compiling, particularly
> where plans (and JIT?) might be cached, could well save us (depending
> on workload).

TBH I do not find this proposal attractive in the least.  We have
a problem here even when you consider a single backend.  If we fixed
that, so that we don't invoke JIT unless it really helps, then it's
not going to help less just because you have a lot of backends.
Plus, the overhead of managing a system-wide limit is daunting.

regards, tom lane




Re: Do we want a hashset type?

2023-06-24 Thread Joel Jacobson
New version of int4hashset_contains() that should follow the same
General Rules as MULTISET's MEMBER OF (8.16 ).

The first rule is to return False if the cardinality is 0 (zero).
However, we must first check if the first argument is null,
in which case the cardinality cannot be 0 (zero),
so if the first argument is null then we return Unknown
(represented as null).

We then proceed and check if the set is empty,
which is defined as nelements being 0 (zero)
as well as the new null_element field being false.
If the set is empty, then we always return False,
regardless of the second argument, that is,
even if it would be null we would still return False,
since the set is empty and can therefore not contain
any element.

The second rule is to return Unknown (represented as null)
if any of the arguments are null. We've already checked that
the first argument is not null, so now we check the second
argument, and return Unknown (represented as null) if it is null.

The third rule is to check for the element, and return True if
the set contains the element. Otherwise, if the set contains
the null element, we don't know if the element we're checking
for is in the set, so we then return Unknown (represented as null).
Finally, if the set doesn't contain the null element and nor the
element we're checking for, then we return False.

Datum
int4hashset_contains(PG_FUNCTION_ARGS)
{
int4hashset_t  *set;
int32   value;
boolresult;

if (PG_ARGISNULL(0))
PG_RETURN_NULL();

set = PG_GETARG_INT4HASHSET(0);

if (set->nelements == 0 && !set->null_element)
PG_RETURN_BOOL(false);

if (PG_ARGISNULL(1))
PG_RETURN_NULL();

value = PG_GETARG_INT32(1);
result = int4hashset_contains_element(set, value);

if (!result && set->null_element)
PG_RETURN_NULL();

PG_RETURN_BOOL(result);
}

Example queries and expected results:

SELECT hashset_contains(NULL::int4hashset, NULL::int); -- null
SELECT hashset_contains(NULL::int4hashset, 1::int); -- null
SELECT hashset_contains('{}'::int4hashset, NULL::int); -- false
SELECT hashset_contains('{}'::int4hashset, 1::int); -- false
SELECT hashset_contains('{null}'::int4hashset, NULL::int); -- null
SELECT hashset_contains('{null}'::int4hashset, 1::int); -- null
SELECT hashset_contains('{1}'::int4hashset, NULL::int); -- null
SELECT hashset_contains('{1}'::int4hashset, 1::int); -- true
SELECT hashset_contains('{1}'::int4hashset, 2::int); -- false

Looks good?

/Joel




Inquiry/Help with pg_adviser (problem in index_create function for creating indexes)

2023-06-24 Thread Ahmed Ibrahim
Hi everyone!

I am new to PostgreSQL community and working currently on project
pg_adviser [https://github.com/DrPostgres/pg_adviser/]

The extension last worked with version 8.3, and currently I am working to
make it support version 16 and then the other active versions.

I will give a brief about the extension:
It's used to recommend useful indexes for a set of queries. It does that
by  planning the query initially and seeing the initial cost and then
creating *virtual* indexes (based on the query and columns used in it,
..etc) and planning again to see how those indexes changed the cost.

The problem I am facing is in creating those indexes in Postgres 16 (while
calling *index_create*), and you can find here a detail description about
the problem along with the code/PR
https://drive.google.com/file/d/1x2PnDEfEo094vgNiBd1-BfJtB5Fovrih/view

I would appreciate any help. Thanks :)


Re: Stampede of the JIT compilers

2023-06-24 Thread David Rowley
On Sun, 25 Jun 2023 at 05:54, Tom Lane  wrote:
>
> James Coleman  writes:
> > On Sat, Jun 24, 2023 at 7:40 AM Tomas Vondra
> >  wrote:
> >> On 6/24/23 02:33, David Rowley wrote:
> >>> On Sat, 24 Jun 2023 at 02:28, James Coleman  wrote:
>  There are a couple of issues here. I'm sure it's been discussed
>  before, and it's not the point of my thread, but I can't help but note
>  that the default value of jit_above_cost of 10 seems absurdly low.
>  On good hardware like we have even well-planned queries with costs
>  well above that won't be taking as long as JIT compilation does.
>
> >>> It would be good to know your evidence for thinking it's too low.
>
> > It's definitely possible that I stated this much more emphatically
> > than I should have -- it was coming out of my frustration with this
> > situation after all.
>
> I think there is *plenty* of evidence that it is too low, or at least
> that for some reason we are too willing to invoke JIT when the result
> is to make the overall cost of a query far higher than it is without.

I've seen plenty of other reports and I do agree there is a problem,
but I think you're jumping to conclusions in this particular case.
I've seen nothing here that couldn't equally indicate the planner
didn't overestimate the costs or some row estimate for the given
query.  The solution to those problems shouldn't be bumping up the
default JIT thresholds it could be to fix the costs or tune/add
statistics to get better row estimates.

I don't think it's too big an ask to see a few more details so that we
can confirm what the actual problem is.

David




Re: Add support for AT LOCAL

2023-06-24 Thread Vik Fearing

On 6/12/23 17:37, Alvaro Herrera wrote:

On 2023-Jun-06, Laurenz Albe wrote:


At a quick glance, it looks like you resolve "timezone" at the time
the query is parsed.  Shouldn't the resolution happen at query
execution time?


Sounds like it -- consider the case where the timestamp value is a
partition key and one of the partition boundaries falls in between two
timezone offsets for some particular ts value; then you use a prepared
query to read from a view defined with AT LOCAL.  Partition pruning
would need to compute partitions to read from at runtime, not plan time.



Can you show me an example of that happening with my patch?
--
Vik Fearing





Re: Stampede of the JIT compilers

2023-06-24 Thread James Coleman
On Sat, Jun 24, 2023 at 1:54 PM Tom Lane  wrote:
>
> James Coleman  writes:
> > In that context capping the number of backends compiling, particularly
> > where plans (and JIT?) might be cached, could well save us (depending
> > on workload).
>
> TBH I do not find this proposal attractive in the least.  We have
> a problem here even when you consider a single backend.  If we fixed
> that, so that we don't invoke JIT unless it really helps, then it's
> not going to help less just because you have a lot of backends.
> Plus, the overhead of managing a system-wide limit is daunting.
>
> regards, tom lane

I'm happy to withdraw that particular idea. My mental model was along
the lines "this is a startup cost, and then we'll have it cached, so
the higher than expected cost won't matter as much when the system
settles down", and in that scenario limiting the size of the herd can
make sense.

But that's not the broader problem, so...

Regards,
James Coleman




Re: Stampede of the JIT compilers

2023-06-24 Thread James Coleman
On Sat, Jun 24, 2023 at 8:14 PM David Rowley  wrote:
>
> On Sun, 25 Jun 2023 at 05:54, Tom Lane  wrote:
> >
> > James Coleman  writes:
> > > On Sat, Jun 24, 2023 at 7:40 AM Tomas Vondra
> > >  wrote:
> > >> On 6/24/23 02:33, David Rowley wrote:
> > >>> On Sat, 24 Jun 2023 at 02:28, James Coleman  wrote:
> >  There are a couple of issues here. I'm sure it's been discussed
> >  before, and it's not the point of my thread, but I can't help but note
> >  that the default value of jit_above_cost of 10 seems absurdly low.
> >  On good hardware like we have even well-planned queries with costs
> >  well above that won't be taking as long as JIT compilation does.
> >
> > >>> It would be good to know your evidence for thinking it's too low.
> >
> > > It's definitely possible that I stated this much more emphatically
> > > than I should have -- it was coming out of my frustration with this
> > > situation after all.
> >
> > I think there is *plenty* of evidence that it is too low, or at least
> > that for some reason we are too willing to invoke JIT when the result
> > is to make the overall cost of a query far higher than it is without.
>
> I've seen plenty of other reports and I do agree there is a problem,
> but I think you're jumping to conclusions in this particular case.
> I've seen nothing here that couldn't equally indicate the planner
> didn't overestimate the costs or some row estimate for the given
> query.  The solution to those problems shouldn't be bumping up the
> default JIT thresholds it could be to fix the costs or tune/add
> statistics to get better row estimates.
>
> I don't think it's too big an ask to see a few more details so that we
> can confirm what the actual problem is.

I did say in the original email "encountered a situation where a
misplanned query (analyzing helped with this, but I think the issue is
still relevant)".

I'll look at specifics again on Monday, but what I do remember is that
there were a lot of joins, and we already know we have cases where
those are planned poorly too (even absent bad stats).

What I wanted to get at more broadly here was thinking along the lines
of how to prevent the misplanning from causing such a disaster.

Regards,
James Coleman




Re: postgres_fdw: wrong results with self join + enable_nestloop off

2023-06-24 Thread Richard Guo
On Wed, Jun 14, 2023 at 2:49 PM Etsuro Fujita 
wrote:

> On Mon, Jun 5, 2023 at 10:19 PM Etsuro Fujita 
> wrote:
> > To avoid this issue, I am wondering if we should modify
> > add_paths_to_joinrel() in back branches so that it just disallows the
> > FDW to consider pushing down joins when the restrictlist has
> > pseudoconstant clauses.  Attached is a patch for that.
>
> I think that custom scans have the same issue, so I modified the patch
> further so that it also disallows custom-scan providers to consider
> join pushdown in add_paths_to_joinrel() if necessary.  Attached is a
> new version of the patch.


Good point.  The v2 patch looks good to me for back branches.

I'm wondering what the plan is for HEAD.  Should we also disallow
foreign/custom join pushdown in the case that there is any
pseudoconstant restriction clause, or instead still allow join pushdown
in that case?  If it is the latter, I think we can do something like my
patch upthread does.  But that patch needs to be revised to consider
custom scans, maybe by storing the restriction clauses also in
CustomPath?

Thanks
Richard