Re: possible bug

2022-10-24 Thread Alvaro Herrera
On 2022-Oct-21, Tom Lane wrote:

> "David G. Johnston"  writes:
> > On Fri, Oct 21, 2022 at 4:52 PM Ravi Krishna  wrote:
> >> on a diff note, is the word memoize inspired from Perl Module memoize
> >> which use to do the same thing.
> 
> > It is a general functional programming concept - not sure on the history
> > but probably academic and thus Perl and others picked it up "from the
> > source".
> 
> Looks to me like you suggested our use of the terminology:
> 
> https://www.postgresql.org/message-id/flat/CAKFQuwZQmCNyS_Vv2Jf3TNe7wRTiptWNs7xkgU%3DAEdqthkQe9A%40mail.gmail.com#bbcd739c97e28b17ef2e111be8cf214b

The word itself has been used in the PG lists before.  For example,
Greg Stark used it in 2008 when discussing WITH RECURSIVE
https://postgr.es/m/47c151f1.8050...@enterprisedb.com
There are a few other hits over the years.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Digital and video cameras have this adjustment and film cameras don't for the
same reason dogs and cats lick themselves: because they can."   (Ken Rockwell)




Re: possible bug

2022-10-21 Thread David G. Johnston
On Fri, Oct 21, 2022 at 6:09 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Fri, Oct 21, 2022 at 4:52 PM Ravi Krishna 
> wrote:
> >> on a diff note, is the word memoize inspired from Perl Module memoize
> >> which use to do the same thing.
>
> > It is a general functional programming concept - not sure on the history
> > but probably academic and thus Perl and others picked it up "from the
> > source".
>
> Looks to me like you suggested our use of the terminology:
>
>
> https://www.postgresql.org/message-id/flat/CAKFQuwZQmCNyS_Vv2Jf3TNe7wRTiptWNs7xkgU%3DAEdqthkQe9A%40mail.gmail.com#bbcd739c97e28b17ef2e111be8cf214b
>
>
Yeah, I just don't remember which book on functional programming (probably
JavaScript oriented though) I learned about it in.

David J.


Re: possible bug

2022-10-21 Thread Tom Lane
"David G. Johnston"  writes:
> On Fri, Oct 21, 2022 at 4:52 PM Ravi Krishna  wrote:
>> on a diff note, is the word memoize inspired from Perl Module memoize
>> which use to do the same thing.

> It is a general functional programming concept - not sure on the history
> but probably academic and thus Perl and others picked it up "from the
> source".

Looks to me like you suggested our use of the terminology:

https://www.postgresql.org/message-id/flat/CAKFQuwZQmCNyS_Vv2Jf3TNe7wRTiptWNs7xkgU%3DAEdqthkQe9A%40mail.gmail.com#bbcd739c97e28b17ef2e111be8cf214b

regards, tom lane




Re: possible bug

2022-10-21 Thread David G. Johnston
On Fri, Oct 21, 2022 at 4:52 PM Ravi Krishna  wrote:

> on a diff note, is the word memoize inspired from Perl Module memoize
> which use to
> do the same thing.
>

It is a general functional programming concept - not sure on the history
but probably academic and thus Perl and others picked it up "from the
source".

David J.


Re: possible bug

2022-10-21 Thread Ravi Krishna
on a diff note, is the word memoize inspired from Perl Module memoize which use 
todo the same thing.

Re: possible bug

2022-10-21 Thread Tom Lane
Les  writes:
> As always, you hit the nail on the head. set enable_memoize = on fixes the
> problem!
> Version is PostgreSQL 14.1, time to upgrade...

Yup, very likely fixed by c2dc7b9e1 then.

> I'm sorry that I wasted your time.

No need to apologize, it was an actual bug.

regards, tom lane




Re: possible bug

2022-10-21 Thread Les
> Which PG version is this exactly?  Given the Memoize node shown
> in your plan, I suppose 14.something, but is it up to date?
> There were Memoize-related bug fixes in 14.2 and 14.4, and the
> one in 14.2 looks particularly likely to be relevant.
>
> If you are on the current minor release, does "set enable_memoize = off"
> change the behavior?
>
As always, you hit the nail on the head. set enable_memoize = on fixes the
problem!

Version is PostgreSQL 14.1, time to upgrade...

I'm sorry that I wasted your time.


Re: possible bug

2022-10-21 Thread Tom Lane
Les  writes:
> We had a support request today, and we have narrowed down the problem to a
> query that behaves very strangely. The actual query was much more
> complicated, but I came up with this minimal example.

Which PG version is this exactly?  Given the Memoize node shown
in your plan, I suppose 14.something, but is it up to date?
There were Memoize-related bug fixes in 14.2 and 14.4, and the
one in 14.2 looks particularly likely to be relevant.

If you are on the current minor release, does "set enable_memoize = off"
change the behavior?

regards, tom lane




Re: possible bug

2022-10-21 Thread Les
>
>
>
> So what happens if query the table directly?:
>
> select * from wf.workflow  where head_table_id::float8::int8 = 25408438504;
>
> vs
>
> select * from wf.workflow  where head_table_id = 25408438504;
>
>
Both return lots of rows. The same number of rows.

select count(*) from wf.workflow  where head_table_id::float8::int8 =
25408438504;
count|
-+
62260|

select count(*) from wf.workflow  where head_table_id = 25408438504;
count|
-+
62260|

Both of them use seq scan.

QUERY PLAN |
---+
Aggregate  (cost=2985.00..2985.01 rows=1 width=8)  |
  ->  Seq Scan on workflow  (cost=0.00..2829.07 rows=62369 width=0)|
Filter: (head_table_id = '25408438504'::bigint)|

QUERY PLAN
  |
-+
Aggregate  (cost=3289.86..3289.87 rows=1 width=8)
 |
  ->  Seq Scan on workflow  (cost=0.00..3288.70 rows=460 width=0)
 |
Filter: (((head_table_id)::double precision)::bigint =
'25408438504'::bigint)|


Re: possible bug

2022-10-21 Thread Adrian Klaver

On 10/21/22 10:57 AM, Les wrote:
One of my colleagues pointed out, that they query returns a different 
result, if I cast the head_table_id condition to float8 and then back to 
int8.


SELECT
c.id ,
     tt.code,
     c.regno,
     (
         select count(*)
         FROM kap.course_user cu
         JOIN wf.workflow w_1 ON w_1.rec_id = cu.id  AND 
w_1.head_table_id::float8::int8 = 25408438504
         where cu.is_active AND cu.course_id = c.id  AND 
w_1.station_id = 25406740434

     ) AS col_3
FROM
     kap.course c
     INNER JOIN kap.training_type tt ON tt.id  = 
c.training_type_id

where c.id  in (26437091668, 26643094740)

Returns:

id         |code|regno|col_3|
---++-+-+
26437091668|TA-T| 2632|    1|
26643094740|PEG | 2905|    0|

Although all identifier columns are defined as int8.


So what happens if query the table directly?:

select * from wf.workflow  where head_table_id::float8::int8 = 25408438504;

vs

select * from wf.workflow  where head_table_id = 25408438504;

FYI, the convention on the list is to not top post, but instead to 
bottom or inline post. Also to trim out material which was covered in 
previous posts.




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: possible bug

2022-10-21 Thread Les
Not that I know of.

I just tried this:

reindex table kap.course;
reindex table kap.course_user;
reindex table wf.workflow;
reindex table kap.training_type;

But it is still wrong.



Adrian Klaver  ezt írta (időpont: 2022. okt.
21., P, 19:57):

> On 10/21/22 10:50 AM, Les wrote:
> > Hello,
> >
> > We had a support request today, and we have narrowed down the problem to
> > a query that behaves very strangely. The actual query was much more
> > complicated, but I came up with this minimal example.
> >
> > This is what we have seen inside our application:
> >
> > select * from test where id in (26643094740, 26437091668);
> >
> > id |code|regno|col_3|
> > ---++-+-+
> > 26437091668|TA-T| 2632|1|
> > 26643094740|PEG | 2905|1|
> >
> > select * from test where id = 26643094740;
> >
> > id |code|regno|col_3|
> > ---++-+-+
> > 26643094740|PEG | 2905|0|
> >
> > The problem: value of col_3 changes for id=26643094740 if I query two
> > rows vs. one row. This is without changing any data. The problem is 100%
> > repeatable, if I query two rows from the same view, then I get different
> > data for one of the rows.
> >
> > I suspect that this is a bug. But I might be wrong. Please help me!
>
> I suspect an index problem. Have you tried reindexing the source table,
> kap.course if I am following correctly.
>
> Have there been any issues with the database lately, e.g. crash or other
> significant event?
>
> >
> > The actual test view looks like this:
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: possible bug

2022-10-21 Thread Les
One of my colleagues pointed out, that they query returns a different
result, if I cast the head_table_id condition to float8 and then back to
int8.

SELECT
c.id,
tt.code,
c.regno,
(
select count(*)
FROM kap.course_user cu
JOIN wf.workflow w_1 ON w_1.rec_id = cu.id AND
w_1.head_table_id::float8::int8 = 25408438504
where cu.is_active AND cu.course_id = c.id AND w_1.station_id =
25406740434
) AS col_3
FROM
kap.course c
INNER JOIN kap.training_type tt ON tt.id = c.training_type_id
where c.id in (26437091668, 26643094740)

Returns:

id |code|regno|col_3|
---++-+-+
26437091668|TA-T| 2632|1|
26643094740|PEG | 2905|0|

Although all identifier columns are defined as int8.

Les  ezt írta (időpont: 2022. okt. 21., P, 19:50):

> Hello,
>
> We had a support request today, and we have narrowed down the problem to a
> query that behaves very strangely. The actual query was much more
> complicated, but I came up with this minimal example.
>
> This is what we have seen inside our application:
>
> select * from test where id in (26643094740, 26437091668);
>
> id |code|regno|col_3|
> ---++-+-+
> 26437091668|TA-T| 2632|1|
> 26643094740|PEG | 2905|1|
>
> select * from test where id = 26643094740;
>
> id |code|regno|col_3|
> ---++-+-+
> 26643094740|PEG | 2905|0|
>
> The problem: value of col_3 changes for id=26643094740 if I query two rows
> vs. one row. This is without changing any data. The problem is 100%
> repeatable, if I query two rows from the same view, then I get different
> data for one of the rows.
>
> I suspect that this is a bug. But I might be wrong. Please help me!
>
> The actual test view looks like this:
>
> create view test as
> SELECT
> c.id,
> tt.code,
> c.regno,
> (
> select count(*)
> FROM kap.course_user cu
> JOIN wf.workflow w_1 ON w_1.rec_id = cu.id AND w_1.head_table_id
> = 25408438504
> where cu.is_active AND cu.course_id = c.id AND w_1.station_id =
> 25406740434
> ) AS col_3
> FROM
> kap.course c
> INNER JOIN kap.training_type tt ON tt.id = c.training_type_id;
>
> Below are some DDL s (simplified, the actual tables contain much more
> fields).
>
>
> Do you think that this might be a bug? If not, then can somebody please
> explain how this can happen?
>
>Laszlo
>
> explain analyze select * from test where id in (26643094740, 26437091668);
>
> QUERY PLAN
>
> |
>
> ---+
> Hash Join  (cost=16.08..124.99 rows=2 width=29) (actual time=0.067..0.107
> rows=2 loops=1)
>  |
>   Hash Cond: (tt.id = c.training_type_id)
>
>  |
>   ->  Seq Scan on training_type tt  (cost=0.00..12.71 rows=71 width=13)
> (actual time=0.004..0.033 rows=71 loops=1)
> |
>   ->  Hash  (cost=16.05..16.05 rows=2 width=24) (actual time=0.024..0.025
> rows=2 loops=1)
>  |
> Buckets: 1024  Batches: 1  Memory Usage: 9kB
>
> |
> ->  Bitmap Heap Scan on course c  (cost=8.58..16.05 rows=2
> width=24) (actual time=0.018..0.021 rows=2 loops=1)
> |
>   Recheck Cond: (id = ANY
> ('{26643094740,26437091668}'::bigint[]))
>   |
>   Heap Blocks: exact=2
>
> |
>   ->  Bitmap Index Scan on pk_course  (cost=0.00..8.58 rows=2
> width=0) (actual time=0.012..0.012 rows=2 loops=1)
>   |
> Index Cond: (id = ANY
> ('{26643094740,26437091668}'::bigint[]))
>   |
>   SubPlan 1
>
>  |
> ->  Aggregate  (cost=47.91..47.92 rows=1 width=8) (actual
> time=0.014..0.014 rows=1 loops=2)
>  |
>   ->  Nested Loop  (cost=0.59..47.90 rows=1 width=0) (actual
> time=0.010..0.012 rows=1 loops=2)
>   |
> ->  Index Scan using workflow_idx_station on workflow w_1
>  (cost=0.29..22.90 rows=3 width=8) (actual time=0.005..0.007 rows=1
> loops=2) |
>   Index Cond: (station_id = '25406740434'::bigint)
>
> |
>   Filter: (head_table_id = '25408438504'::bigint)
>
>  |
> ->  Memoize  (cost=0.30..8.32 rows=1 width=8) (actual
> time=0.004..0.004 rows=1 loops=2)
>  |
>   Cache Key: w_1.rec_id
>
>  |
>   Hits: 1  Misses: 1  Evictions: 0  Overflows: 0
>  Memory Usage: 1kB
>|
>   ->  Index Scan using pk_course_user on course_user
> cu  (cost=0.29..8.31 rows=1 width=8) (actual time=0.005..0.005 rows=1
> loops=1)|
> Index Cond: (id = w_1.rec_id)
>
>  |
> Filter: (is_active AND (course_id = c.id))
>
>   |
> Planning Time: 0.527 ms
>
>  |
> Execution Time: 0.175 ms
>
> |
>
> explain analyze select * from test where id in (26643094740)
>
> QUERY PLAN
>
> |
>
> 

Re: possible bug

2022-10-21 Thread Adrian Klaver

On 10/21/22 10:50 AM, Les wrote:

Hello,

We had a support request today, and we have narrowed down the problem to 
a query that behaves very strangely. The actual query was much more 
complicated, but I came up with this minimal example.


This is what we have seen inside our application:

select * from test where id in (26643094740, 26437091668);

id         |code|regno|col_3|
---++-+-+
26437091668|TA-T| 2632|    1|
26643094740|PEG | 2905|    1|

select * from test where id = 26643094740;

id         |code|regno|col_3|
---++-+-+
26643094740|PEG | 2905|    0|

The problem: value of col_3 changes for id=26643094740 if I query two 
rows vs. one row. This is without changing any data. The problem is 100% 
repeatable, if I query two rows from the same view, then I get different 
data for one of the rows.


I suspect that this is a bug. But I might be wrong. Please help me!


I suspect an index problem. Have you tried reindexing the source table, 
kap.course if I am following correctly.


Have there been any issues with the database lately, e.g. crash or other 
significant event?




The actual test view looks like this:




--
Adrian Klaver
adrian.kla...@aklaver.com




possible bug

2022-10-21 Thread Les
Hello,

We had a support request today, and we have narrowed down the problem to a
query that behaves very strangely. The actual query was much more
complicated, but I came up with this minimal example.

This is what we have seen inside our application:

select * from test where id in (26643094740, 26437091668);

id |code|regno|col_3|
---++-+-+
26437091668|TA-T| 2632|1|
26643094740|PEG | 2905|1|

select * from test where id = 26643094740;

id |code|regno|col_3|
---++-+-+
26643094740|PEG | 2905|0|

The problem: value of col_3 changes for id=26643094740 if I query two rows
vs. one row. This is without changing any data. The problem is 100%
repeatable, if I query two rows from the same view, then I get different
data for one of the rows.

I suspect that this is a bug. But I might be wrong. Please help me!

The actual test view looks like this:

create view test as
SELECT
c.id,
tt.code,
c.regno,
(
select count(*)
FROM kap.course_user cu
JOIN wf.workflow w_1 ON w_1.rec_id = cu.id AND w_1.head_table_id =
25408438504
where cu.is_active AND cu.course_id = c.id AND w_1.station_id =
25406740434
) AS col_3
FROM
kap.course c
INNER JOIN kap.training_type tt ON tt.id = c.training_type_id;

Below are some DDL s (simplified, the actual tables contain much more
fields).


Do you think that this might be a bug? If not, then can somebody please
explain how this can happen?

   Laszlo

explain analyze select * from test where id in (26643094740, 26437091668);

QUERY PLAN

|
---+
Hash Join  (cost=16.08..124.99 rows=2 width=29) (actual time=0.067..0.107
rows=2 loops=1)
 |
  Hash Cond: (tt.id = c.training_type_id)
   |
  ->  Seq Scan on training_type tt  (cost=0.00..12.71 rows=71 width=13)
(actual time=0.004..0.033 rows=71 loops=1)
|
  ->  Hash  (cost=16.05..16.05 rows=2 width=24) (actual time=0.024..0.025
rows=2 loops=1)
 |
Buckets: 1024  Batches: 1  Memory Usage: 9kB

|
->  Bitmap Heap Scan on course c  (cost=8.58..16.05 rows=2
width=24) (actual time=0.018..0.021 rows=2 loops=1)
|
  Recheck Cond: (id = ANY
('{26643094740,26437091668}'::bigint[]))
  |
  Heap Blocks: exact=2

|
  ->  Bitmap Index Scan on pk_course  (cost=0.00..8.58 rows=2
width=0) (actual time=0.012..0.012 rows=2 loops=1)
  |
Index Cond: (id = ANY
('{26643094740,26437091668}'::bigint[]))
  |
  SubPlan 1
   |
->  Aggregate  (cost=47.91..47.92 rows=1 width=8) (actual
time=0.014..0.014 rows=1 loops=2)
 |
  ->  Nested Loop  (cost=0.59..47.90 rows=1 width=0) (actual
time=0.010..0.012 rows=1 loops=2)
  |
->  Index Scan using workflow_idx_station on workflow w_1
 (cost=0.29..22.90 rows=3 width=8) (actual time=0.005..0.007 rows=1
loops=2) |
  Index Cond: (station_id = '25406740434'::bigint)

|
  Filter: (head_table_id = '25408438504'::bigint)
   |
->  Memoize  (cost=0.30..8.32 rows=1 width=8) (actual
time=0.004..0.004 rows=1 loops=2)
 |
  Cache Key: w_1.rec_id
   |
  Hits: 1  Misses: 1  Evictions: 0  Overflows: 0
 Memory Usage: 1kB
   |
  ->  Index Scan using pk_course_user on course_user cu
 (cost=0.29..8.31 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=1)|
Index Cond: (id = w_1.rec_id)
   |
Filter: (is_active AND (course_id = c.id))

|
Planning Time: 0.527 ms
   |
Execution Time: 0.175 ms

|

explain analyze select * from test where id in (26643094740)

QUERY PLAN

|
---+
Nested Loop  (cost=0.42..64.60 rows=1 width=29) (actual time=0.033..0.035
rows=1 loops=1)
 |
  ->  Index Scan using pk_course on course c  (cost=0.28..8.30 rows=1
width=24) (actual time=0.007..0.008 rows=1 loops=1)
 |
Index Cond: (id = '26643094740'::bigint)

|
  ->  Index Scan using pk_training_type on training_type tt
 (cost=0.14..8.16 rows=1 width=13) (actual time=0.002..0.002 rows=1
loops=1)   |
Index Cond: (id = c.training_type_id)
   

Re: Possible bug: SQL function parameter in window frame definition

2019-09-30 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Hmm.  I think this is a reasonable direction to go in, but
>  Tom> what about groupingSets and rowMarks?

> groupingSets ultimately contains nothing but numbers which are
> meaningless without reference to the matching groupClause list. So
> anything that cares about those is really going to have to process them
> in its Query case in the walker function in order to get at both
> clauses.

Ah.  I was thinking there were SortGroupClauses under them, but that
was based on an overly hasty reading of the parsenodes.h comments.

No further complaints.

regards, tom lane




Re: Possible bug: SQL function parameter in window frame definition

2019-09-29 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> Here is a draft patch along those lines; the intent of this one is
 >> that no existing walker or mutator should need to change (the change
 >> to the dependency code is basically cosmetic I believe, just avoids
 >> walking some things twice).

 Tom> Hmm.  I think this is a reasonable direction to go in, but
 Tom> what about groupingSets and rowMarks?

groupingSets ultimately contains nothing but numbers which are
meaningless without reference to the matching groupClause list. So
anything that cares about those is really going to have to process them
in its Query case in the walker function in order to get at both
clauses.

Similarly, rowMarks contains indexes into the rangetable (and no
recursive substructure at all), so it's likewise better processed at the
Query level.

 Tom> Also, in HEAD I'd be inclined to add assertions about utilityStmt
 Tom> being NULL.

Yup.

-- 
Andrew (irc:RhodiumToad)




Re: Possible bug: SQL function parameter in window frame definition

2019-09-29 Thread Tom Lane
Andrew Gierth  writes:
> "Andrew" == Andrew Gierth  writes:
>  Andrew> We could minimize the chance of breakage in a back-patched fix
>  Andrew> by having query_tree_walker/mutator iterate the windowClause
>  Andrew> list itself

> Here is a draft patch along those lines; the intent of this one is that
> no existing walker or mutator should need to change (the change to the
> dependency code is basically cosmetic I believe, just avoids walking
> some things twice).

Hmm.  I think this is a reasonable direction to go in, but
what about groupingSets and rowMarks?

Also, in HEAD I'd be inclined to add assertions about utilityStmt
being NULL.

regards, tom lane




Re: Possible bug: SQL function parameter in window frame definition

2019-09-29 Thread Andrew Gierth
> "Andrew" == Andrew Gierth  writes:

 Andrew> We could minimize the chance of breakage in a back-patched fix
 Andrew> by having query_tree_walker/mutator iterate the windowClause
 Andrew> list itself

Here is a draft patch along those lines; the intent of this one is that
no existing walker or mutator should need to change (the change to the
dependency code is basically cosmetic I believe, just avoids walking
some things twice).

Also added some tests.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index dd0a7d8dac..03582781f6 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -2214,18 +2214,13 @@ find_expr_references_walker(Node *node,
 			   context->addrs);
 		}
 
-		/* query_tree_walker ignores ORDER BY etc, but we need those opers */
-		find_expr_references_walker((Node *) query->sortClause, context);
-		find_expr_references_walker((Node *) query->groupClause, context);
-		find_expr_references_walker((Node *) query->windowClause, context);
-		find_expr_references_walker((Node *) query->distinctClause, context);
-
 		/* Examine substructure of query */
 		context->rtables = lcons(query->rtable, context->rtables);
 		result = query_tree_walker(query,
    find_expr_references_walker,
    (void *) context,
-   QTW_IGNORE_JOINALIASES);
+   QTW_IGNORE_JOINALIASES |
+   QTW_EXAMINE_SORTGROUP);
 		context->rtables = list_delete_first(context->rtables);
 		return result;
 	}
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 18bd5ac903..d063bee271 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2296,6 +2296,33 @@ query_tree_walker(Query *query,
 		return true;
 	if (walker(query->limitCount, context))
 		return true;
+	if ((flags & QTW_EXAMINE_SORTGROUP))
+	{
+		if (walker((Node *) query->groupClause, context))
+			return true;
+		if (walker((Node *) query->windowClause, context))
+			return true;
+		if (walker((Node *) query->sortClause, context))
+			return true;
+		if (walker((Node *) query->distinctClause, context))
+			return true;
+	}
+	else
+	{
+		/*
+		 * We need to walk the expressions in WindowClause nodes even if we're
+		 * not interested in SortGroupClause nodes.
+		 */
+		ListCell   *lc;
+		foreach(lc, query->windowClause)
+		{
+			WindowClause *wc = lfirst_node(WindowClause, lc);
+			if (walker(wc->startOffset, context))
+return true;
+			if (walker(wc->endOffset, context))
+return true;
+		}
+	}
 	if (!(flags & QTW_IGNORE_CTE_SUBQUERIES))
 	{
 		if (walker((Node *) query->cteList, context))
@@ -3153,6 +3180,38 @@ query_tree_mutator(Query *query,
 	MUTATE(query->havingQual, query->havingQual, Node *);
 	MUTATE(query->limitOffset, query->limitOffset, Node *);
 	MUTATE(query->limitCount, query->limitCount, Node *);
+
+	if ((flags & QTW_EXAMINE_SORTGROUP))
+	{
+		MUTATE(query->groupClause, query->groupClause, List *);
+		MUTATE(query->windowClause, query->windowClause, List *);
+		MUTATE(query->sortClause, query->sortClause, List *);
+		MUTATE(query->distinctClause, query->distinctClause, List *);
+	}
+	else
+	{
+		/*
+		 * We need to mutate the expressions in WindowClause nodes even if
+		 * we're not interested in SortGroupClause nodes.
+		 */
+		List	   *resultlist;
+		ListCell   *temp;
+
+		resultlist = NIL;
+		foreach(temp, query->windowClause)
+		{
+			WindowClause *wc = lfirst_node(WindowClause, temp);
+			WindowClause *newnode;
+
+			FLATCOPY(newnode, wc, WindowClause);
+			MUTATE(newnode->startOffset, wc->startOffset, Node *);
+			MUTATE(newnode->endOffset, wc->endOffset, Node *);
+
+			resultlist = lappend(resultlist, (Node *) newnode);
+		}
+		query->windowClause = resultlist;
+	}
+
 	if (!(flags & QTW_IGNORE_CTE_SUBQUERIES))
 		MUTATE(query->cteList, query->cteList, List *);
 	else		/* else copy CTE list as-is */
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 0cb931c82c..4b5408fa9b 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -27,6 +27,7 @@
 #define QTW_EXAMINE_RTES_AFTER		0x20	/* examine RTE nodes after their
 			 * contents */
 #define QTW_DONT_COPY_QUERY			0x40	/* do not copy top Query */
+#define QTW_EXAMINE_SORTGROUP		0x80	/* include SortGroupNode lists */
 
 /* callback function for check_functions_in_node */
 typedef bool (*check_function_callback) (Oid func_id, void *context);
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index edc93d5729..d5fd4045f9 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3821,3 +3821,45 @@ SELECT i, b, bool_and(b) OVER w, bool_or(b) OVER w
  5 | t | t| t
 (5 rows)
 
+-- Tests for problems with failure to walk or mutate expressions
+-- within window frame clauses.
+-- test walker (fails with collation error if expressions are not walked)
+SELECT array_ag

Re: Possible bug: SQL function parameter in window frame definition

2019-09-28 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> However, we need to fix this in all active branches, and I
 Tom> definitely agree with minimizing the amount of change to back
 Tom> branches. The fact that the minimal change breaks (or exposes an
 Tom> oversight in) assign_collations_walker makes it very plausible
 Tom> that it will also break somebody's third-party code. If we push
 Tom> the API change further we increase the risk of breaking stuff.
 Tom> That seems OK in HEAD but not in back branches.

We could minimize the chance of breakage in a back-patched fix by having
query_tree_walker/mutator iterate the windowClause list itself and
invoke the walker only on offset expressions; is it worth it?

Walkers that follow the recommended code structure should be unaffected;
it only shows up in the collations walker because that treats
expressions as the "default" case and tries to explicitly handle all
non-expression nodes.

-- 
Andrew (irc:RhodiumToad)




Re: Possible bug: SQL function parameter in window frame definition

2019-09-28 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Now probably this is never called on utility statements, and maybe
>  Tom> there is never a reason for anyone to examine or mutate
>  Tom> SortGroupClauses, GroupingSets, or RowMarkClauses, but I'm not
>  Tom> sure it's any business of this module to assume that.

> I think the logic that query_tree_walker is specifically there to walk
> places that might contain _expressions_ is reasonably valid. That said,
> the fact that we do have one caller that finds it necessary to
> explicitly walk some of the places that query_tree_walker omits suggests
> that this decision may have been a mistake.

I'm okay with assuming that these functions aren't used on utility
statements (but maybe we should add Assert(query->utilityStmt == NULL)?).
I'm a bit uncomfortable with skipping the other lists.  Admittedly,
there's probably not huge value in examining SortGroupClauses in a
vacuum (that is, without knowing which list they appear in).  The only
application I can think of offhand is extracting dependencies, which
is already covered by that one caller you mention.

However, we need to fix this in all active branches, and I definitely
agree with minimizing the amount of change to back branches.
The fact that the minimal change breaks (or exposes an oversight in)
assign_collations_walker makes it very plausible that it will also
break somebody's third-party code.  If we push the API change further
we increase the risk of breaking stuff.  That seems OK in HEAD but
not in back branches.

regards, tom lane




Re: Possible bug: SQL function parameter in window frame definition

2019-09-28 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> It looks to me that the reason is that query_tree_mutator
 Tom> (likewise query_tree_walker) fails to visit query->windowClause,

I noticed this too. I spent some time looking at what might break if
that was changed (found two places so far, see attached draft patch).

 Tom> which is a bug of the first magnitude if we allow those to contain
 Tom> expressions. Not sure how we've missed that up to now.

I suspect because the partition/order by expressions are actually in the
targetlist instead (with only SortGroupClause nodes in the
windowClause), so only window framing expressions are being missed.

 Tom> Looking at struct Query, it seems like that's not the only
 Tom> questionable omission. We're also not descending into

 Tom> Node   *utilityStmt;/* non-null if commandType == CMD_UTILITY 
*/

I assume that utility statements are doing any necessary expression
processing themselves...

 Tom> List   *groupClause;/* a list of SortGroupClause's */

There's at least one place that walks this (and the distinct and sort
clauses) explicitly (find_expr_references_walker) but most places just
aren't interested in SortGroupClause nodes given that the actual
expressions are elsewhere.

 Tom> List   *groupingSets;   /* a list of GroupingSet's if present */

Likewise, GroupingSet nodes are not any form of expression, they only
reference the groupClause entries. 

 Tom> List   *distinctClause; /* a list of SortGroupClause's */
 Tom> List   *sortClause; /* a list of SortGroupClause's */

Same goes as for groupClause.

 Tom> List   *rowMarks;   /* a list of RowMarkClause's */

 Tom> Now probably this is never called on utility statements, and maybe
 Tom> there is never a reason for anyone to examine or mutate
 Tom> SortGroupClauses, GroupingSets, or RowMarkClauses, but I'm not
 Tom> sure it's any business of this module to assume that.

I think the logic that query_tree_walker is specifically there to walk
places that might contain _expressions_ is reasonably valid. That said,
the fact that we do have one caller that finds it necessary to
explicitly walk some of the places that query_tree_walker omits suggests
that this decision may have been a mistake.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index dd0a7d8dac..2862c47314 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -2217,7 +2217,6 @@ find_expr_references_walker(Node *node,
 		/* query_tree_walker ignores ORDER BY etc, but we need those opers */
 		find_expr_references_walker((Node *) query->sortClause, context);
 		find_expr_references_walker((Node *) query->groupClause, context);
-		find_expr_references_walker((Node *) query->windowClause, context);
 		find_expr_references_walker((Node *) query->distinctClause, context);
 
 		/* Examine substructure of query */
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 18bd5ac903..7f485ae29a 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2292,6 +2292,8 @@ query_tree_walker(Query *query,
 		return true;
 	if (walker(query->havingQual, context))
 		return true;
+	if (walker(query->windowClause, context))
+		return true;
 	if (walker(query->limitOffset, context))
 		return true;
 	if (walker(query->limitCount, context))
@@ -3151,6 +3153,7 @@ query_tree_mutator(Query *query,
 	MUTATE(query->jointree, query->jointree, FromExpr *);
 	MUTATE(query->setOperations, query->setOperations, Node *);
 	MUTATE(query->havingQual, query->havingQual, Node *);
+	MUTATE(query->windowClause, query->windowClause, List *);
 	MUTATE(query->limitOffset, query->limitOffset, Node *);
 	MUTATE(query->limitCount, query->limitCount, Node *);
 	if (!(flags & QTW_IGNORE_CTE_SUBQUERIES))
diff --git a/src/backend/parser/parse_collate.c b/src/backend/parser/parse_collate.c
index 31a5f702a9..dabd904999 100644
--- a/src/backend/parser/parse_collate.c
+++ b/src/backend/parser/parse_collate.c
@@ -485,6 +485,7 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 		case T_FromExpr:
 		case T_OnConflictExpr:
 		case T_SortGroupClause:
+		case T_WindowClause:
 			(void) expression_tree_walker(node,
 		  assign_collations_walker,
 		  (void *) &loccontext);


Re: Possible bug: SQL function parameter in window frame definition

2019-09-28 Thread Tom Lane
Andrew Gierth  writes:
> "Alastair" == Alastair McKinley  writes:
>  Alastair> This appears to be a bug to me.

> Yes, it's a bug, related to function inlining (the select f(3); is not
> inlined and therefore works, but the select * from f(3); is being
> inlined, but the original Param is somehow making it into the final plan
> rather than being substituted with its value). Looking into why.

It looks to me that the reason is that query_tree_mutator (likewise
query_tree_walker) fails to visit query->windowClause, which is a
bug of the first magnitude if we allow those to contain expressions.
Not sure how we've missed that up to now.

Looking at struct Query, it seems like that's not the only questionable
omission.  We're also not descending into

Node   *utilityStmt;/* non-null if commandType == CMD_UTILITY */
List   *groupClause;/* a list of SortGroupClause's */
List   *groupingSets;   /* a list of GroupingSet's if present */
List   *distinctClause; /* a list of SortGroupClause's */
List   *sortClause; /* a list of SortGroupClause's */
List   *rowMarks;   /* a list of RowMarkClause's */

Now probably this is never called on utility statements, and maybe
there is never a reason for anyone to examine or mutate SortGroupClauses,
GroupingSets, or RowMarkClauses, but I'm not sure it's any business of
this module to assume that.

regards, tom lane




Re: Possible bug: SQL function parameter in window frame definition

2019-09-28 Thread Andrew Gierth
> "Alastair" == Alastair McKinley  writes:

 Alastair> Hi all,

 Alastair> I noticed this strange behaviour whilst trying to write a
 Alastair> function for Postgres 11.5 (PostgreSQL 11.5 on
 Alastair> x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623
 Alastair> (Red Hat 4.8.5-36), 64-bit) and reduced it to this minimal
 Alastair> example. Using a function parameter in the window frame
 Alastair> definition seems to be the cause of the error.

 [...]

 Alastair> This appears to be a bug to me.

Yes, it's a bug, related to function inlining (the select f(3); is not
inlined and therefore works, but the select * from f(3); is being
inlined, but the original Param is somehow making it into the final plan
rather than being substituted with its value). Looking into why.

-- 
Andrew (irc:RhodiumToad)




Possible bug: SQL function parameter in window frame definition

2019-09-28 Thread Alastair McKinley
Hi all,

I noticed this strange behaviour whilst trying to write a function for Postgres 
11.5 (PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 
20150623 (Red Hat 4.8.5-36), 64-bit) and reduced it to this minimal example.  
Using a function parameter in the window frame definition seems to be the cause 
of the error.

create or replace function f(group_size bigint) returns setof int[] as
$$
select array_agg(s) over w
from generate_series(1,10) s
window w as (order by s rows between current row and group_size 
following)
$$ language sql immutable;

Calling the function without a column list succeeds:

postgres=# select f(3);
f

{1,2,3,4}
{2,3,4,5}
{3,4,5,6}
{4,5,6,7}
{5,6,7,8}
{6,7,8,9}
{7,8,9,10}
{8,9,10}
{9,10}
{10}
(10 rows)

Calling the function with select * fails:

postgres=# select * from f(3);
ERROR:  42704: no value found for parameter 1
LOCATION:  ExecEvalParamExtern, execExprInterp.c:2296

Using a plpgsql function with a stringified query works, which is my current 
workaround:

create or replace function f1(group_size bigint) returns setof int[] as
$$
begin
return query execute format($q$
select array_agg(s) over w as t
from generate_series(1,10) s
window w as (order by s rows between current row and %1$s following)
$q$,group_size);
end;
$$ language plpgsql immutable;

This appears to be a bug to me.  If confirmed that this is not some expected 
behaviour unknown to me I will report this.

Alastair









Re: GiST index on INT8, possible bug in query planner?

2018-12-03 Thread Jan Behrens
On Mon, 03 Dec 2018 11:47:17 -0500
Tom Lane  wrote:

> Jan Behrens  writes:
> 
> > However, the GiST index seems not to work as expected by me when
> > 64-bit integers are involved. I tried to create a minimal
> > proof-of-concept to demonstrate this. Consider the following setup:
> > 
> > CREATE TABLE test8_gist (id SERIAL4, ctx INT8);
> > CREATE INDEX ON test8_gist USING gist (ctx, id);
> > EXPLAIN SELECT * FROM test8_gist WHERE ctx = 1 AND id = 2;
> > -- uses Index Cond: (id = 2)
> > 
> > The query planning for the select on table "test8_gist" does not
> > include "ctx" in the "Index Cond".
> 
> Probably it would if you'd written "WHERE ctx = 1::int8".  Without
> the cast, what you'll have is "int8 = int4", and I suspect that
> btree_gist doesn't include cross-type operators in its opclasses.
> 
>   regards, tom lane

You are right! I just tested it and ::int8 does the job.

It might be good to add a short notice or warning in the documentation
at: https://www.postgresql.org/docs/current/btree-gist.html

It might help other people who run into the same problem.


Thanks for helping me,
Jan Behrens

-- 
Public Software Group e. V.
Johannisstr. 12, 10117 Berlin, Germany

www.public-software-group.org
vorst...@public-software-group.org

eingetragen in das Vereinregister
des Amtsgerichtes Charlottenburg
Registernummer: VR 28873 B

Vorstände (einzelvertretungsberechtigt):
Jan Behrens
Axel Kistner
Andreas Nitsche
Björn Swierczek



Re: GiST index on INT8, possible bug in query planner?

2018-12-03 Thread Tom Lane
Jan Behrens  writes:
> However, the GiST index seems not to work as expected by me when
> 64-bit integers are involved. I tried to create a minimal
> proof-of-concept to demonstrate this. Consider the following setup:
> CREATE TABLE test8_gist (id SERIAL4, ctx INT8);
> CREATE INDEX ON test8_gist USING gist (ctx, id);
> EXPLAIN SELECT * FROM test8_gist WHERE ctx = 1 AND id = 2;
> -- uses Index Cond: (id = 2)
> The query planning for the select on table "test8_gist" does not
> include "ctx" in the "Index Cond".

Probably it would if you'd written "WHERE ctx = 1::int8".  Without
the cast, what you'll have is "int8 = int4", and I suspect that
btree_gist doesn't include cross-type operators in its opclasses.

regards, tom lane



GiST index on INT8, possible bug in query planner?

2018-12-03 Thread Jan Behrens
Dear colleagues,

I have developed two indices using PostgreSQL's awesome GiST support,
one of them available here:

http://www.public-software-group.org/pgLatLon

(which is a lightweight and MIT-licensed alternative to PostGIS for
certain simple tasks involving geographic coordinates on the WGS-84
spheroid)


Recently I had the requirement of creating a multi-column index on
an integer in the first column and a custom data type in the second
column of the index. Since integers are not supported by GiST indices
by default, I used the btree_gist extension by Teodor Sigaev,
Oleg Bartunov, Janko Richter, and Paul Jungwirth, see:
https://www.postgresql.org/docs/10/btree-gist.html

However, the GiST index seems not to work as expected by me when
64-bit integers are involved. I tried to create a minimal
proof-of-concept to demonstrate this. Consider the following setup:

CREATE EXTENSION btree_gist;

CREATE TABLE test4_btree (id SERIAL4, ctx INT4);
CREATE TABLE test8_btree (id SERIAL4, ctx INT8);
CREATE TABLE test4_gist (id SERIAL4, ctx INT4);
CREATE TABLE test8_gist (id SERIAL4, ctx INT8);

I create multi-column indices on all four tables, with "ctx" as primary
and "id" as secondary column:

CREATE INDEX ON test4_btree (ctx, id);
CREATE INDEX ON test8_btree (ctx, id);
CREATE INDEX ON test4_gist USING gist (ctx, id);
CREATE INDEX ON test8_gist USING gist (ctx, id);

Now we add some data:

INSERT INTO test4_btree (ctx) SELECT floor(random()*100)+1 FROM 
generate_series(1, 1);
INSERT INTO test8_btree (ctx) SELECT floor(random()*100)+1 FROM 
generate_series(1, 1);
INSERT INTO test4_gist (ctx) SELECT floor(random()*100)+1 FROM 
generate_series(1, 1);
INSERT INTO test8_gist (ctx) SELECT floor(random()*100)+1 FROM 
generate_series(1, 1);

Only the tables directly using the B-tree index ("test4_btree" and
"test8_btree") and the table where "ctx" is 32-bit wide seem to work
properly:

EXPLAIN SELECT * FROM test4_btree WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))

EXPLAIN SELECT * FROM test8_btree WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))

EXPLAIN SELECT * FROM test4_gist WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))

EXPLAIN SELECT * FROM test8_gist WHERE ctx = 1 AND id = 2;
-- uses Index Cond: (id = 2)

The query planning for the select on table "test8_gist" does not
include "ctx" in the "Index Cond".


To verify that the above problem isn't just an optimization because of
a low row count, I created a larger example with different values:

CREATE EXTENSION btree_gist;

CREATE TABLE test4_btree (ctx INT4, src INT4);
CREATE TABLE test8_btree (ctx INT8, src INT4);
CREATE TABLE test4_gist (ctx INT4, src INT4);
CREATE TABLE test8_gist (ctx INT8, src INT4);

CREATE INDEX ON test4_btree (ctx, src);
CREATE INDEX ON test8_btree (ctx, src);
CREATE INDEX ON test4_gist USING gist (ctx, src);
CREATE INDEX ON test8_gist USING gist (ctx, src);

INSERT INTO test4_btree SELECT floor(random()*1)+1, floor(random()*2)+1 
FROM generate_series(1, 100);
INSERT INTO test8_btree SELECT floor(random()*1)+1, floor(random()*2)+1 
FROM generate_series(1, 100);
INSERT INTO test4_gist SELECT floor(random()*1)+1, floor(random()*2)+1 FROM 
generate_series(1, 100);
INSERT INTO test8_gist SELECT floor(random()*1)+1, floor(random()*2)+1 FROM 
generate_series(1, 100);

EXPLAIN SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: (src = 2)

ANALYZE;

EXPLAIN SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;
-- does not use Index Cond at all, but Filter: ((ctx = 1) AND (src = 2))

SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;  -- fast
SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;  -- fast
SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;  -- fast
SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;  -- slow!

The query on "test8_gist" is significantly slower than in all other
three cases.


I wonder if this is a bug in the query planner, in the GiST facilities
of PostgreSQL, a problem of the "btree_gist" extension, or something
else? Can anyone help me?


Kind regards,
Jan Behrens


-- 
Public Software Group e. V.
Johannisstr. 12, 10117 Berlin, Germany

www.public-software-group.o

Re: Possible bug: could not open relation with OID [numbers] SQL State: XX000

2018-01-19 Thread pinker
I would like to refresh the topic and add another report about the issue that
just happened to me.I'm sure it's the toast table that cannot be opened
inside the function.I have added following RAISE NOTICE clauses to it and
run analyze inside of the function:  
analyze verbose temp_table; 
raise notice 'oid temp_table %', ( SELECT array_agg(relname::TEXT||
relfilenode::TEXT|| 'relpages:'||relpages::TEXT|| 'reltuples:' ||
reltuples::TEXT|| 'relallvisible:' ||relallvisible::TEXT||'reltoastrelid:'||
reltoastrelid::TEXT) FROM pg_class where relname= 'temp_table');
raise notice 'rel size %', (select
pg_total_relation_size('temp_table'));
It's pointing to the toast table:
1 live rows and 1 dead rows; 1 rows in sample, 1 estimated total
rowspsql:/tmp/gg:23: NOTICE:  oid temp_table
{temp_table106538relpages:1reltuples:1relallvisible:0reltoastrelid:*106541*}psql:/tmp/gg:23:
NOTICE:  rel size 32768psql:/tmp/gg:23: ERROR:  could not open relation with
OID *106541*
Thank you for the advice about ON COMMIT DROP - it's working.When the table
size is smaller, about 16k this issue simply disappears.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html