Re: [GENERAL] Incremental refresh - Materialized view

2019-01-02 Thread Nguyễn Trần Quốc Vinh
>
> Dear all,
>
> We have some result on incremental update for MVs. We generate triggers in
> C to do the incremental maintenance. We posted the code to github about 1
> year ago, but unfortunately i posted a not-right version of ctrigger.h
> header. The mistake was exposed to me when a person could not compile the
> generated triggers and reported to me. And now i re-posted with the right
> ctrigger.h file.
>
> You can find the codes of the generator here:
> https://github.com/ntqvinh/PgMvIncrementalUpdate/commits/master. You can
> find how did we do here:
> https://link.springer.com/article/10.1134/S0361768816050066. The paper is
> about generating of codes in pl/pgsql. Anyway i see it is useful for
> reading the codes. I don't know if i can share the paper or not so that i
> don't publish anywhere else. The text about how to generate triggers in C
> was published with open-access but unfortunately, it is in Vietnamese.
>
> We are happy if the codes are useful for someone.
>
> Thank you and best regards,
>
> NTQ Vinh
>
-- 
TS. Nguyễn Trần Quốc Vinh
---
Chủ nhiệm khoa Tin học
Trường ĐH Sư phạm - ĐH Đà Nẵng

Nguyen Tran Quoc Vinh, PhD
Dean
Faculty of Information Technology
Danang University of Education
Website: http://it.ued.udn.vn; http://www.ued.vn ;
http://www.ued.udn.vn
SCV: http://scv.ued.vn/~ntquocvinh 
Phone: (+84) 511.6-512-586
Mobile: (+84) 914.78-08-98


Re: Thoughts on row-level security for webapps?

2019-01-02 Thread Achilleas Mantzios

On 31/12/18 6:57 μ.μ., Siegfried Bilstein wrote:

Hi all,

I'm evaluating using a tool called Postgraphile that generates a GraphSQL server from a postgres setup. The recommended way of handling security is to implement RLS within postgres and simply have 
the webserver take a cookie or similar and define which user is querying data.


I've normally built webapps like this: pull out user id from a session cookie -> 
the API endpoint verifies the user and whether or not it has access to the given data 
-> app code mutates the data.

With Postgraphile the request specifies the mutation and the server processes 
the request and relies on Postgres to determine if the user has correct access 
rights.

It seems like I would need to create a ROLE for every single member that signs 
up for my website which I'm a little concerned about.


Why?


Is this a common usage pattern for SQL security? Any gotchas relying on RLS?

--
Siggy Bilstein
CTO ofAyuda Care 
Book some time  with me!



--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2019-01-02 Thread Mark
Hi Jeff,

Thanks for your help. That is exactly what is happening.

I have a long running job which deletes all of the common_student table and
then repopulates it. It takes long time to load all the other data and
commit the transaction. I didn't think the delete inside the transaction
would have any effect until it is commited or rolled back.

I will have to rewrite the application so it updates the existing rows
rather than deleting all and then inserting.

Thanks again for helping me understand what's happening here.

Proof:

db=> explain analyze select * from common_student where school_id = 36;
  QUERY PLAN

--
 Bitmap Heap Scan on common_student  (cost=88.30..3846.49 rows=1533
width=384) (actual time=4.852..7.065 rows=1388 loops=1)
   Recheck Cond: (school_id = 36)
   Heap Blocks: exact=67
   ->  Bitmap Index Scan on idx_common_student_sid  (cost=0.00..87.91
rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
 Index Cond: (school_id = 36)
 Planning time: 0.097 ms
 Execution time: 8.084 ms
(7 rows)

db=> /* At this point I have started a long running transaction that
deletes all of common_student for school_id 36  */ ;

db=> analyse verbose common_student(school_id);
INFO:  analyzing "public.common_student"
INFO:  "common_student": scanned 7322 of 7322 pages, containing 65431 live
rows and 8060 dead rows; 56818 rows in sample, 65431 estimated total rows
ANALYZE
db=> explain analyze select * from common_student where school_id = 36;
 QUERY
PLAN

 Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498 rows=1388
loops=1)
   Index Cond: (school_id = 36)
 Planning time: 0.098 ms
 Execution time: 2.583 ms
(4 rows)

db=> /* At this point I have killed the long running transaction that
deletes all of common_student for school_id 36  */ ;
db=> vacuum analyze common_student;
VACUUM
db=> explain analyze select * from common_student where school_id = 36;
  QUERY PLAN

--
 Bitmap Heap Scan on common_student  (cost=79.17..3357.79 rows=1388
width=383) (actual time=0.088..1.302 rows=1388 loops=1)
   Recheck Cond: (school_id = 36)
   Heap Blocks: exact=67
   ->  Bitmap Index Scan on idx_common_student_sid  (cost=0.00..78.83
rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)
 Index Cond: (school_id = 36)
 Planning time: 0.327 ms
 Execution time: 2.311 ms
(7 rows)


On Sun, 23 Dec 2018 at 02:57 Jeff Janes  wrote:

>
>> - Does the analyse output below mean that it only scanned 51538 of 65463
>> rows in the table? Is school_id 36 just being missed in the sample? (This
>> happens when the analyse is repeated )
>>
>
> Is there a transaction which had deleted all of school_id=36, and then was
> just left open indefinitely without either committing or rolling back?
>
> That would explain it, and I don't know of anything else that could.  The
> deleted but not committed tuples are still live, but don't get sampled.
>
> Cheers,
>
> Jeff
>


Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2019-01-02 Thread Mark
Hi Maxim,

Thanks for your help.

Jeff has pointed me in the right direction here, it seems that the rows=1
is due to a long running transaction which deletes all of common_student
for school_id 36 and then repopulates it.

I was unaware that the delete inside the transaction would affect the
VACUUM ANALYSE. As I said to Jeff I will have to rewrite the application to
update the rows if this is the normal behaviour.

Here are the queries:


db=> explain analyze select * from common_student where school_id = 36;
  QUERY PLAN

--
 Bitmap Heap Scan on common_student  (cost=88.30..3846.49 rows=1533
width=384) (actual time=4.852..7.065 rows=1388 loops=1)
   Recheck Cond: (school_id = 36)
   Heap Blocks: exact=67
   ->  Bitmap Index Scan on idx_common_student_sid  (cost=0.00..87.91
rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
 Index Cond: (school_id = 36)
 Planning time: 0.097 ms
 Execution time: 8.084 ms
(7 rows)


db=> /* At this point I have started a long running transaction that
deletes all of common_student for school_id 36  */ ;

db=> analyse verbose common_student(school_id);
INFO:  analyzing "public.common_student"
INFO:  "common_student": scanned 7322 of 7322 pages, containing 65431 live
rows and 8060 dead rows; 56818 rows in sample, 65431 estimated total rows
ANALYZE
db=> explain analyze select * from common_student where school_id = 36;
 QUERY
PLAN

 Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498 rows=1388
loops=1)
   Index Cond: (school_id = 36)
 Planning time: 0.098 ms
 Execution time: 2.583 ms
(4 rows)

db=> create table test_table AS SELECT * from common_student;
SELECT 65431
db=> vacuum analyze test_table;
VACUUM
db=> explain analyze select * from test_table where school_id = 36;
   QUERY PLAN

-
 Seq Scan on test_table  (cost=0.00..3848.89 rows=1381 width=384) (actual
time=10.105..22.680 rows=1388 loops=1)
   Filter: (school_id = 36)
   Rows Removed by Filter: 64043
 Planning time: 0.390 ms
 Execution time: 23.767 ms
(5 rows)

db=> drop table test_table;
DROP TABLE
db=> create table test_table AS SELECT * from common_student ORDER BY
school_id;
SELECT 65431
db=> vacuum analyze test_table;
VACUUM
db=> explain analyze select * from test_table where school_id = 36;
   QUERY PLAN


 Seq Scan on test_table  (cost=0.00..3850.89 rows=1341 width=382) (actual
time=5.674..27.585 rows=1388 loops=1)
   Filter: (school_id = 36)
   Rows Removed by Filter: 64043
 Planning time: 0.264 ms
 Execution time: 28.643 ms
(5 rows)

db=> explain analyze select * from test_table where school_id = 36;
   QUERY PLAN

-
 Seq Scan on test_table  (cost=0.00..3850.89 rows=1341 width=382) (actual
time=20.848..43.272 rows=1388 loops=1)
   Filter: (school_id = 36)
   Rows Removed by Filter: 64043
 Planning time: 0.068 ms
 Execution time: 44.423 ms
(5 rows)

db=> /* At this point I have killed the long running transaction that
deletes all of common_student for school_id 36  */ ;
db=> vacuum analyze common_student;
VACUUM
db=> explain analyze select * from common_student where school_id = 36;
  QUERY PLAN

--
 Bitmap Heap Scan on common_student  (cost=79.17..3357.79 rows=1388
width=383) (actual time=0.088..1.302 rows=1388 loops=1)
   Recheck Cond: (school_id = 36)
   Heap Blocks: exact=67
   ->  Bitmap Index Scan on idx_common_student_sid  (cost=0.00..78.83
rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)
 Index Cond: (school_id = 36)
 Planning time: 0.327 ms
 Execution time: 2.311 ms
(7 rows)

On Sun, 23 Dec 2018 at 15:28 Maxim Boguk  wrote:

> Hi Mark,
>
> It's look very weird.
> Can you try something like this (check that you have enough disk space for
> second copy of common_student before):
>
> create table test_table AS SELECT * from common_student;
> Vacuum analyze test_table;
> explain analyze select * from test_table where school_id = 36;
> drop table test

Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2019-01-02 Thread Ron
But Jeff said "left open indefinitely without either committing or rolling 
back".  Your process is committing the transaction.


On 1/2/19 6:15 AM, Mark wrote:

Hi Jeff,

Thanks for your help. That is exactly what is happening.

I have a long running job which deletes all of the common_student table 
and then repopulates it. It takes long time to load all the other data and 
commit the transaction. I didn't think the delete inside the transaction 
would have any effect until it is commited or rolled back.


I will have to rewrite the application so it updates the existing rows 
rather than deleting all and then inserting.


Thanks again for helping me understand what's happening here.

Proof:

db=> explain analyze select * from common_student where school_id = 36;
    QUERY PLAN
--
 Bitmap Heap Scan on common_student  (cost=88.30..3846.49 rows=1533 
width=384) (actual time=4.852..7.065 rows=1388 loops=1)

   Recheck Cond: (school_id = 36)
   Heap Blocks: exact=67
   ->  Bitmap Index Scan on idx_common_student_sid (cost=0.00..87.91 
rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)

         Index Cond: (school_id = 36)
 Planning time: 0.097 ms
 Execution time: 8.084 ms
(7 rows)

db=> /* At this point I have started a long running transaction that 
deletes all of common_student for school_id 36  */ ;


db=> analyse verbose common_student(school_id);
INFO:  analyzing "public.common_student"
INFO:  "common_student": scanned 7322 of 7322 pages, containing 65431 live 
rows and 8060 dead rows; 56818 rows in sample, 65431 estimated total rows

ANALYZE
db=> explain analyze select * from common_student where school_id = 36;
       QUERY PLAN

 Index Scan using idx_common_student_sid on common_student  
(cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498 rows=1388 
loops=1)

   Index Cond: (school_id = 36)
 Planning time: 0.098 ms
 Execution time: 2.583 ms
(4 rows)

db=> /* At this point I have killed the long running transaction that 
deletes all of common_student for school_id 36  */ ;

db=> vacuum analyze common_student;
VACUUM
db=> explain analyze select * from common_student where school_id = 36;
    QUERY PLAN
--
 Bitmap Heap Scan on common_student  (cost=79.17..3357.79 rows=1388 
width=383) (actual time=0.088..1.302 rows=1388 loops=1)

   Recheck Cond: (school_id = 36)
   Heap Blocks: exact=67
   ->  Bitmap Index Scan on idx_common_student_sid (cost=0.00..78.83 
rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)

         Index Cond: (school_id = 36)
 Planning time: 0.327 ms
 Execution time: 2.311 ms
(7 rows)


On Sun, 23 Dec 2018 at 02:57 Jeff Janes > wrote:



- Does the analyse output below mean that it only scanned 51538 of
65463 rows in the table? Is school_id 36 just being missed in the
sample? (This happens when the analyse is repeated )


Is there a transaction which had deleted all of school_id=36, and then
was just left open indefinitely without either committing or rolling back?

That would explain it, and I don't know of anything else that could. 
The deleted but not committed tuples are still live, but don't get
sampled.

Cheers,

Jeff



--
Angular momentum makes the world go 'round.


Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2019-01-02 Thread Mark
Hi Ron,

Yes, my process will commit the transaction (or roll it back) eventually.
It's the window where
one transaction has deleted all the rows (before committing) and an ANALYSE
has ran.

The deleted rows won't make it into the sample even though the transaction
has NOT been commited.

During this time I will get bad row estimates on rows for that ID. You can
see this in the example below with two
database connections (labelled 1 and 2).

I would have expected the DELETE to have no effect until it was committed.

connection 1=> create table test_table(school_id integer, note text);
CREATE TABLE
connection 1=> insert into test_table(school_id, note) SELECT
g.id,md5(random()::text)
FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> insert into test_table(school_id, note) SELECT
g.id,md5(random()::text)
FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> insert into test_table(school_id, note) SELECT
g.id,md5(random()::text)
FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> select * from test_table ;
 school_id |   note
---+--
 1 | 0e08cf3990a04f0e943584517c564d31
 2 | 96bf83ae5f8eb9342e8408b1ac25cb14
 1 | f8fd943012edfe42a03a421df660bc33
 2 | cd5d7ff0abca61f18857df9b21d234e0
 1 | 60d731f430cb68c7285ddbcd9186eaa0
 2 | 635e6c9cf305147ad8684213f0a9299c
(6 rows)

connection 1=> analyse verbose test_table ;
INFO:  analyzing "public.test_table"
INFO:  "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
dead rows; 6 rows in sample, 6 estimated total rows
ANALYZE
connection 1=> explain analyse select * from test_table where school_id = 2
;
 QUERY PLAN

-
 Seq Scan on test_table  (cost=0.00..1.07 rows=3 width=37) (actual
time=0.011..0.015 rows=3 loops=1)
   Filter: (school_id = 2)
   Rows Removed by Filter: 3
 Planning time: 0.164 ms
 Execution time: 0.043 ms
(5 rows)

connection 2=> BEGIN ;
BEGIN
connection 2=> delete from test_table where school_id = 2 ;
DELETE 3
connection 2=> /* This connection is now idle */


connection 1=> analyse verbose test_table ;
INFO:  analyzing "public.test_table"
INFO:  "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
dead rows; 3 rows in sample, 6 estimated total rows
ANALYZE
connection 1=> explain analyse select * from test_table where school_id = 2
;
 QUERY PLAN

-
 Seq Scan on test_table  (cost=0.00..1.07 rows=1 width=37) (actual
time=0.009..0.014 rows=3 loops=1)
   Filter: (school_id = 2)
   Rows Removed by Filter: 3
 Planning time: 0.095 ms
 Execution time: 0.039 ms
(5 rows)


On Wed, 2 Jan 2019 at 14:04 Ron  wrote:

> But Jeff said "left open indefinitely without either committing or rolling
> back".  Your process is committing the transaction.
>
>
> On 1/2/19 6:15 AM, Mark wrote:
>
> Hi Jeff,
>
> Thanks for your help. That is exactly what is happening.
>
> I have a long running job which deletes all of the common_student table
> and then repopulates it. It takes long time to load all the other data and
> commit the transaction. I didn't think the delete inside the transaction
> would have any effect until it is commited or rolled back.
>
> I will have to rewrite the application so it updates the existing rows
> rather than deleting all and then inserting.
>
> Thanks again for helping me understand what's happening here.
>
> Proof:
>
> db=> explain analyze select * from common_student where school_id = 36;
>   QUERY PLAN
>
>
> --
>  Bitmap Heap Scan on common_student  (cost=88.30..3846.49 rows=1533
> width=384) (actual time=4.852..7.065 rows=1388 loops=1)
>Recheck Cond: (school_id = 36)
>Heap Blocks: exact=67
>->  Bitmap Index Scan on idx_common_student_sid  (cost=0.00..87.91
> rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
>  Index Cond: (school_id = 36)
>  Planning time: 0.097 ms
>  Execution time: 8.084 ms
> (7 rows)
>
> db=> /* At this point I have started a long running transaction that
> deletes all of common_student for school_id 36  */ ;
>
> db=> analyse verbose common_student(school_id);
> INFO:  analyzing "public.common_student"
> INFO:  "common_student": scanned 7322 of 7322 pages, containing 65431 live
> rows and 8060 dead rows; 56818 rows in sample, 65431 estimated total rows
> ANALYZE
> db=> explain analyze select * from common_student where school_id = 36;
>  QUERY
> PLAN
>
> -

Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2019-01-02 Thread Ron

Try using SERIALIZABLE transactions instead of the default READ COMMITTED.

On 1/2/19 9:28 AM, Mark wrote:

Hi Ron,

Yes, my process will commit the transaction (or roll it back) eventually. 
It's the window where
one transaction has deleted all the rows (before committing) and an 
ANALYSE has ran.


The deleted rows won't make it into the sample even though the transaction 
has NOT been commited.


During this time I will get bad row estimates on rows for that ID. You can 
see this in the example below with two

database connections (labelled 1 and 2).

I would have expected the DELETE to have no effect until it was committed.

connection 1=> create table test_table(school_id integer, note text);
CREATE TABLE
connection 1=> insert into test_table(school_id, note) SELECT g.id 
,md5(random()::text) FROM generate_series(1,2) as g(id);

INSERT 0 2
connection 1=> insert into test_table(school_id, note) SELECT g.id 
,md5(random()::text) FROM generate_series(1,2) as g(id);

INSERT 0 2
connection 1=> insert into test_table(school_id, note) SELECT g.id 
,md5(random()::text) FROM generate_series(1,2) as g(id);

INSERT 0 2
connection 1=> select * from test_table ;
 school_id |               note
---+--
         1 | 0e08cf3990a04f0e943584517c564d31
         2 | 96bf83ae5f8eb9342e8408b1ac25cb14
         1 | f8fd943012edfe42a03a421df660bc33
         2 | cd5d7ff0abca61f18857df9b21d234e0
         1 | 60d731f430cb68c7285ddbcd9186eaa0
         2 | 635e6c9cf305147ad8684213f0a9299c
(6 rows)

connection 1=> analyse verbose test_table ;
INFO:  analyzing "public.test_table"
INFO:  "test_table": scanned 1 of 1 pages, containing 6 live rows and 0 
dead rows; 6 rows in sample, 6 estimated total rows

ANALYZE
connection 1=> explain analyse select * from test_table where school_id = 2 ;
                                             QUERY PLAN
-
 Seq Scan on test_table  (cost=0.00..1.07 rows=3 width=37) (actual 
time=0.011..0.015 rows=3 loops=1)

   Filter: (school_id = 2)
   Rows Removed by Filter: 3
 Planning time: 0.164 ms
 Execution time: 0.043 ms
(5 rows)

connection 2=> BEGIN ;
BEGIN
connection 2=> delete from test_table where school_id = 2 ;
DELETE 3
connection 2=> /* This connection is now idle */


connection 1=> analyse verbose test_table ;
INFO:  analyzing "public.test_table"
INFO:  "test_table": scanned 1 of 1 pages, containing 6 live rows and 0 
dead rows; 3 rows in sample, 6 estimated total rows

ANALYZE
connection 1=> explain analyse select * from test_table where school_id = 2 ;
                                             QUERY PLAN
-
 Seq Scan on test_table  (cost=0.00..1.07 rows=1 width=37) (actual 
time=0.009..0.014 rows=3 loops=1)

   Filter: (school_id = 2)
   Rows Removed by Filter: 3
 Planning time: 0.095 ms
 Execution time: 0.039 ms
(5 rows)


On Wed, 2 Jan 2019 at 14:04 Ron > wrote:


But Jeff said "left open indefinitely without either committing or
rolling back".  Your process is committing the transaction.


On 1/2/19 6:15 AM, Mark wrote:

Hi Jeff,

Thanks for your help. That is exactly what is happening.

I have a long running job which deletes all of the common_student
table and then repopulates it. It takes long time to load all the
other data and commit the transaction. I didn't think the delete
inside the transaction would have any effect until it is commited or
rolled back.

I will have to rewrite the application so it updates the existing
rows rather than deleting all and then inserting.

Thanks again for helping me understand what's happening here.

Proof:

db=> explain analyze select * from common_student where school_id = 36;
              QUERY PLAN

--
 Bitmap Heap Scan on common_student (cost=88.30..3846.49 rows=1533
width=384) (actual time=4.852..7.065 rows=1388 loops=1)
   Recheck Cond: (school_id = 36)
   Heap Blocks: exact=67
   ->  Bitmap Index Scan on idx_common_student_sid  (cost=0.00..87.91
rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
         Index Cond: (school_id = 36)
 Planning time: 0.097 ms
 Execution time: 8.084 ms
(7 rows)

db=> /* At this point I have started a long running transaction that
deletes all of common_student for school_id 36  */ ;

db=> analyse verbose common_student(school_id);
INFO:  analyzing "public.common_student"
INFO:  "common_student": scanned 7322 of 7322 pages, containing 65431
live rows and 8060 dead rows; 56818 rows in sample, 65431 estimated
total rows
ANALYZE

Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2019-01-02 Thread Mark
Hi Ron,

I tried my test_table example below using swapping 'BEGIN' for:

=> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

It made no difference to the behaviour.

On Wed, 2 Jan 2019 at 15:45 Ron  wrote:

> Try using SERIALIZABLE transactions instead of the default READ COMMITTED.
>
>
> On 1/2/19 9:28 AM, Mark wrote:
>
> Hi Ron,
>
> Yes, my process will commit the transaction (or roll it back) eventually.
> It's the window where
> one transaction has deleted all the rows (before committing) and an
> ANALYSE has ran.
>
> The deleted rows won't make it into the sample even though the transaction
> has NOT been commited.
>
> During this time I will get bad row estimates on rows for that ID. You can
> see this in the example below with two
> database connections (labelled 1 and 2).
>
> I would have expected the DELETE to have no effect until it was committed.
>
> connection 1=> create table test_table(school_id integer, note text);
> CREATE TABLE
> connection 1=> insert into test_table(school_id, note) SELECT 
> g.id,md5(random()::text)
> FROM generate_series(1,2) as g(id);
> INSERT 0 2
> connection 1=> insert into test_table(school_id, note) SELECT 
> g.id,md5(random()::text)
> FROM generate_series(1,2) as g(id);
> INSERT 0 2
> connection 1=> insert into test_table(school_id, note) SELECT 
> g.id,md5(random()::text)
> FROM generate_series(1,2) as g(id);
> INSERT 0 2
> connection 1=> select * from test_table ;
>  school_id |   note
> ---+--
>  1 | 0e08cf3990a04f0e943584517c564d31
>  2 | 96bf83ae5f8eb9342e8408b1ac25cb14
>  1 | f8fd943012edfe42a03a421df660bc33
>  2 | cd5d7ff0abca61f18857df9b21d234e0
>  1 | 60d731f430cb68c7285ddbcd9186eaa0
>  2 | 635e6c9cf305147ad8684213f0a9299c
> (6 rows)
>
> connection 1=> analyse verbose test_table ;
> INFO:  analyzing "public.test_table"
> INFO:  "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
> dead rows; 6 rows in sample, 6 estimated total rows
> ANALYZE
> connection 1=> explain analyse select * from test_table where school_id =
> 2 ;
>  QUERY PLAN
>
>
> -
>  Seq Scan on test_table  (cost=0.00..1.07 rows=3 width=37) (actual
> time=0.011..0.015 rows=3 loops=1)
>Filter: (school_id = 2)
>Rows Removed by Filter: 3
>  Planning time: 0.164 ms
>  Execution time: 0.043 ms
> (5 rows)
>
> connection 2=> BEGIN ;
> BEGIN
> connection 2=> delete from test_table where school_id = 2 ;
> DELETE 3
> connection 2=> /* This connection is now idle */
>
>
> connection 1=> analyse verbose test_table ;
> INFO:  analyzing "public.test_table"
> INFO:  "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
> dead rows; 3 rows in sample, 6 estimated total rows
> ANALYZE
> connection 1=> explain analyse select * from test_table where school_id =
> 2 ;
>  QUERY PLAN
>
>
> -
>  Seq Scan on test_table  (cost=0.00..1.07 rows=1 width=37) (actual
> time=0.009..0.014 rows=3 loops=1)
>Filter: (school_id = 2)
>Rows Removed by Filter: 3
>  Planning time: 0.095 ms
>  Execution time: 0.039 ms
> (5 rows)
>
>
> On Wed, 2 Jan 2019 at 14:04 Ron  wrote:
>
>> But Jeff said "left open indefinitely without either committing or
>> rolling back".  Your process is committing the transaction.
>>
>>
>> On 1/2/19 6:15 AM, Mark wrote:
>>
>> Hi Jeff,
>>
>> Thanks for your help. That is exactly what is happening.
>>
>> I have a long running job which deletes all of the common_student table
>> and then repopulates it. It takes long time to load all the other data and
>> commit the transaction. I didn't think the delete inside the transaction
>> would have any effect until it is commited or rolled back.
>>
>> I will have to rewrite the application so it updates the existing rows
>> rather than deleting all and then inserting.
>>
>> Thanks again for helping me understand what's happening here.
>>
>> Proof:
>>
>> db=> explain analyze select * from common_student where school_id = 36;
>>   QUERY PLAN
>>
>>
>> --
>>  Bitmap Heap Scan on common_student  (cost=88.30..3846.49 rows=1533
>> width=384) (actual time=4.852..7.065 rows=1388 loops=1)
>>Recheck Cond: (school_id = 36)
>>Heap Blocks: exact=67
>>->  Bitmap Index Scan on idx_common_student_sid  (cost=0.00..87.91
>> rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
>>  Index Cond: (school_id = 36)
>>  Planning time: 0.097 ms
>>  Execution time: 8.084 ms
>> (7 rows)
>>
>> db=> /* At this point I have started a long running transaction tha

Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2019-01-02 Thread Tom Lane
Mark  writes:
> I have a long running job which deletes all of the common_student table and
> then repopulates it. It takes long time to load all the other data and
> commit the transaction. I didn't think the delete inside the transaction
> would have any effect until it is commited or rolled back.
> I will have to rewrite the application so it updates the existing rows
> rather than deleting all and then inserting.

Hmm ... I'm not sure if that will actually make things better.  The root
of the issue is what analyze.c does with DELETE_IN_PROGRESS tuples:

 * We count delete-in-progress rows as still live, using
 * the same reasoning given above; but we don't bother to
 * include them in the sample.

The "reasoning given above" is a reference to what happens for
INSERT_IN_PROGRESS tuples:

 * Insert-in-progress rows are not counted.  We assume
 * that when the inserting transaction commits or aborts,
 * it will send a stats message to increment the proper
 * count.  This works right only if that transaction ends
 * after we finish analyzing the table; if things happen
 * in the other order, its stats update will be
 * overwritten by ours.  However, the error will be large
 * only if the other transaction runs long enough to
 * insert many tuples, so assuming it will finish after us
 * is the safer option.

Now the problem with this, from your perspective, is that *neither*
INSERT_IN_PROGRESS nor DELETE_IN_PROGRESS tuples are included in
ANALYZE's data sample.  So a long-running update transaction will
cause all the rows it changes to be excluded from the sample until
commit.  This seems like a bad thing, and it definitely means that
adjusting your app as you're contemplating won't help.

In order to handle the bulk-update scenario sanely, it seems like
we ought to allow one of INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples
to be included.  And it looks like, for consistency with the row-counting
logic, the one that's included needs to be DELETE_IN_PROGRESS.  This
is a slightly annoying conclusion, because it means we're accumulating
stats that we know are likely to soon be obsolete, but I think sampling
INSERT_IN_PROGRESS tuples instead would lead to very strange results
in some cases.

In short, I think we want to do this to the DELETE_IN_PROGRESS logic:

if 
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data)))
deadrows += 1;
else
+   {
+   sample_it = true;
liverows += 1;
+   }

with suitable adjustment of the adjacent comment.

Thoughts?

regards, tom lane



Implementing standard SQL's DOMAIN constraint

2019-01-02 Thread Rich Shepard

Happy New Year all,

My readings taught me that standard SQL has a domain constraint that checks
for the same valid characters in a column common to multiple tables.
Example:

CREATE DOMAIN state_code AS char(2)
DEFAULT '??'
CONSTRAINT valid_state_code
CHECK (value IN ('AL', 'AK', 'AZ', ...));

This applies to all tables each having a column named state_code.

I see the value of this feature when multiple tables have start_date and
end_date columns with a constraint that ensures the start_date is <= to the
end date.

Reading the release 10 manual I find many constraints and I want to learn
which one will implement this feature. A pointer is needed.

Rich





Re: Implementing standard SQL's DOMAIN constraint

2019-01-02 Thread David G. Johnston
On Wednesday, January 2, 2019, Rich Shepard 
wrote:

>
> CREATE DOMAIN state_code AS char(2)
> DEFAULT '??'
> CONSTRAINT valid_state_code
> CHECK (value IN ('AL', 'AK', 'AZ', ...));
>
> This applies to all tables each having a column named state_code.
>

There is no magic name logic involved.  A domain is just a type with
inherent constraints that are user definable.  You make use of it like any
other type.

Create table tbl (
column_name state_code not null
)

Values stored in column_name are now of type state_code and constrained to
be one of the check constraint values.

David J.


Re: Implementing standard SQL's DOMAIN constraint

2019-01-02 Thread Rich Shepard

On Wed, 2 Jan 2019, David G. Johnston wrote:


There is no magic name logic involved. A domain is just a type with
inherent constraints that are user definable. You make use of it like any
other type.

Create table tbl (
column_name state_code not null
)

Values stored in column_name are now of type state_code and constrained to
be one of the check constraint values.


David,

  I'm not following you. I have two tables each with a column,
state_code char(2) NOT NULL.

  Do you mean that I need to write the column constraint for each table? If
not, I don't see from your response how to implement the multi-table
constraint for this column.

Regards,

Rich



Re: Implementing standard SQL's DOMAIN constraint

2019-01-02 Thread David G. Johnston
On Wednesday, January 2, 2019, Rich Shepard 
wrote:

> On Wed, 2 Jan 2019, David G. Johnston wrote:
>
> There is no magic name logic involved. A domain is just a type with
>> inherent constraints that are user definable. You make use of it like any
>> other type.
>>
>> Create table tbl (
>> column_name state_code not null
>> )
>>
>> Values stored in column_name are now of type state_code and constrained to
>> be one of the check constraint values.
>>
>
> David,
>
>   I'm not following you. I have two tables each with a column,
> state_code char(2) NOT NULL.
>
>   Do you mean that I need to write the column constraint for each table? If
> not, I don't see from your response how to implement the multi-table
> constraint for this column.
>
>
That is a char(2) column for which ‘??’ is a valid value.  The fact that it
is named state_code is immaterial; the domain that you created doesn’t get
used.  There is no magic linking just by virtue of using the same name.

Change char(2) to state_code if you wish to apply the domain on the column.

David J.


Re: Implementing standard SQL's DOMAIN constraint

2019-01-02 Thread Rich Shepard

On Wed, 2 Jan 2019, David G. Johnston wrote:


  I'm not following you. I have two tables each with a column,
state_code char(2) NOT NULL.



That is a char(2) column for which ‘??’ is a valid value.  The fact that it
is named state_code is immaterial; the domain that you created doesn’t get
used.  There is no magic linking just by virtue of using the same name.

Change char(2) to state_code if you wish to apply the domain on the column.


David,

  I think I'm now on your page. In the schema I change the column data type
to state_code, then I add the SQL code creating the domain at the top of the
.sql file. Yes?

Thanks,

Rich






Re: Implementing standard SQL's DOMAIN constraint

2019-01-02 Thread David G. Johnston
On Wednesday, January 2, 2019, Rich Shepard 
wrote:

> On Wed, 2 Jan 2019, David G. Johnston wrote:
>
>   I'm not following you. I have two tables each with a column,
>>> state_code char(2) NOT NULL.
>>>
>>
> That is a char(2) column for which ‘??’ is a valid value.  The fact that it
>> is named state_code is immaterial; the domain that you created doesn’t get
>> used.  There is no magic linking just by virtue of using the same name.
>>
>> Change char(2) to state_code if you wish to apply the domain on the
>> column.
>>
>
> David,
>
>   I think I'm now on your page. In the schema I change the column data type
> to state_code, then I add the SQL code creating the domain at the top of
> the
> .sql file. Yes?
>
>
You add the create domain command once before any objects that make use of
it.  If you only have one .sql file then at the top of it works.

David J.


Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2019-01-02 Thread Mark
Hi Tom,

Thanks for your reply.

Am I correct in my understanding that any row that has been modified (i.e.
UPDATE) is in state HEAPTUPLE_INSERT_IN_PROGRESS so it will not be included
in the sample?

I'm going to rework the application so there is less time between the
DELETE and the COMMIT so I will only see the problem if ANALYZE runs during
this smaller time window. Looks like your patch will help if this happens.

Then again, it also seems no one has had a problem with its current
behaviour (for 11 years!).

Thanks,

Mark

On Wed, 2 Jan 2019 at 16:11 Tom Lane  wrote:

> Mark  writes:
> > I have a long running job which deletes all of the common_student table
> and
> > then repopulates it. It takes long time to load all the other data and
> > commit the transaction. I didn't think the delete inside the transaction
> > would have any effect until it is commited or rolled back.
> > I will have to rewrite the application so it updates the existing rows
> > rather than deleting all and then inserting.
>
> Hmm ... I'm not sure if that will actually make things better.  The root
> of the issue is what analyze.c does with DELETE_IN_PROGRESS tuples:
>
>  * We count delete-in-progress rows as still live,
> using
>  * the same reasoning given above; but we don't bother
> to
>  * include them in the sample.
>
> The "reasoning given above" is a reference to what happens for
> INSERT_IN_PROGRESS tuples:
>
>  * Insert-in-progress rows are not counted.  We assume
>  * that when the inserting transaction commits or
> aborts,
>  * it will send a stats message to increment the proper
>  * count.  This works right only if that transaction
> ends
>  * after we finish analyzing the table; if things
> happen
>  * in the other order, its stats update will be
>  * overwritten by ours.  However, the error will be
> large
>  * only if the other transaction runs long enough to
>  * insert many tuples, so assuming it will finish
> after us
>  * is the safer option.
>
> Now the problem with this, from your perspective, is that *neither*
> INSERT_IN_PROGRESS nor DELETE_IN_PROGRESS tuples are included in
> ANALYZE's data sample.  So a long-running update transaction will
> cause all the rows it changes to be excluded from the sample until
> commit.  This seems like a bad thing, and it definitely means that
> adjusting your app as you're contemplating won't help.
>
> In order to handle the bulk-update scenario sanely, it seems like
> we ought to allow one of INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples
> to be included.  And it looks like, for consistency with the row-counting
> logic, the one that's included needs to be DELETE_IN_PROGRESS.  This
> is a slightly annoying conclusion, because it means we're accumulating
> stats that we know are likely to soon be obsolete, but I think sampling
> INSERT_IN_PROGRESS tuples instead would lead to very strange results
> in some cases.
>
> In short, I think we want to do this to the DELETE_IN_PROGRESS logic:
>
> if
> (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data)))
> deadrows += 1;
> else
> +   {
> +   sample_it = true;
> liverows += 1;
> +   }
>
> with suitable adjustment of the adjacent comment.
>
> Thoughts?
>
> regards, tom lane
>


Re: Implementing standard SQL's DOMAIN constraint [RESOLVED]

2019-01-02 Thread Rich Shepard

On Wed, 2 Jan 2019, David G. Johnston wrote:


You add the create domain command once before any objects that make use of
it.


David,

  This is the answer I sought: postgres supports the create domain command.
I did not see this in your first response.

Thanks very much,

Rich



Re: Implementing standard SQL's DOMAIN constraint [RESOLVED]

2019-01-02 Thread Ron

On 1/2/19 12:05 PM, Rich Shepard wrote:

On Wed, 2 Jan 2019, David G. Johnston wrote:


You add the create domain command once before any objects that make use of
it.


David,

  This is the answer I sought: postgres supports the create domain command.
I did not see this in your first response.


Note that a CHECK constraint with 50 items is a *Very Bad Idea*, since 
changing such a constraint is very painful.  Use a FK constraint instead.


--
Angular momentum makes the world go 'round.


Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2019-01-02 Thread Tom Lane
Mark  writes:
> Am I correct in my understanding that any row that has been modified (i.e.
> UPDATE) is in state HEAPTUPLE_INSERT_IN_PROGRESS so it will not be included
> in the sample?

An update will mark the existing tuple as delete-in-progress and then
insert a new tuple (row version) that's insert-in-progress.

A concurrent ANALYZE scan will definitely see the old tuple (modulo
sampling considerations) but it's timing-dependent which state it sees it
in --- it could still be "live" when we see it, or already
delete-in-progress.  ANALYZE might or might not see the new tuple at all,
depending on timing and where the new tuple gets placed.  So "count/sample
delete-in-progress but not insert-in-progress" seems like a good rule to
minimize the timing sensitivity of the results.  It's not completely
bulletproof, but I think it's better than what we're doing now.

> I'm going to rework the application so there is less time between the
> DELETE and the COMMIT so I will only see the problem if ANALYZE runs during
> this smaller time window.

Yeah, that's about the best you can do from the application side.

regards, tom lane



RE: Relocatable Binaries (RPMs) : custom installation path for PostgreSQL

2019-01-02 Thread Kevin Brannen
From: chiru r 

> I have installed Community  PostgreSQL RPMs and are going into " 
> /usr/pgsql-11/" by default.
> Please let us know how to get the PostgreSQL-11 installed in above custom 
> paths using RPMs? .


I've never tried it, but look at the "--relocate" option for the rpm command. 
The "--prefix" option might do it too, but I think "--relocate" is what you 
need.

HTH,
Kevin


This e-mail transmission, and any documents, files or previous e-mail messages 
attached to it, may contain confidential information. If you are not the 
intended recipient, or a person responsible for delivering it to the intended 
recipient, you are hereby notified that any disclosure, distribution, review, 
copy or use of any of the information contained in or attached to this message 
is STRICTLY PROHIBITED. If you have received this transmission in error, please 
immediately notify us by reply e-mail, and destroy the original transmission 
and its attachments without reading them or saving them to disk. Thank you.


Re: Implementing standard SQL's DOMAIN constraint [RESOLVED]

2019-01-02 Thread Rich Shepard

On Wed, 2 Jan 2019, Ron wrote:


Note that a CHECK constraint with 50 items is a *Very Bad Idea*, since
changing such a constraint is very painful. Use a FK constraint instead.


Ron,

  It's even longer with Canadian provinces included. I gratefully accept
your advice and will use a table and foreign key instead.

Regards,

Rich




Default for date field: today vs CURRENT_DATE

2019-01-02 Thread Rich Shepard

  Reading the manual I saw that 'today' is a special value, but it did not
work when I used it as a column default; e.g.,
start_date date DEFAULT today,

  Appending parentheses also failed. But, changing today to CURRENT_DATE
worked. I've not found an explanation and would appreciate learning why
'today' fails.

TIA,

Rich



Re: Default for date field: today vs CURRENT_DATE

2019-01-02 Thread Tom Lane
Rich Shepard  writes:
>Reading the manual I saw that 'today' is a special value, but it did not
> work when I used it as a column default; e.g.,
> start_date date DEFAULT today,
>Appending parentheses also failed. But, changing today to CURRENT_DATE
> worked. I've not found an explanation and would appreciate learning why
> 'today' fails.

'today' is special as a date input string, so you can use it as a literal:

regression=# select 'today'::date;
date

 2019-01-02
(1 row)

But it's not a SQL keyword, nor a function name, so you can't write it
without quotes.

Also, it wouldn't be very useful for this purpose, because it's resolved
on sight in date_in().  Thus

regression=# create table wrong_thing (start_date date DEFAULT 'today');
CREATE TABLE
regression=# \d wrong_thing
  Table "public.wrong_thing"
   Column   | Type | Collation | Nullable |  Default   
+--+---+--+
 start_date | date |   |  | '2019-01-02'::date

The default would effectively be the creation date of the table,
not the insertion date of any particular row.

So CURRENT_DATE or one of its sibling functions is what you want
here.  On the other hand, something like

INSERT INTO my_table VALUES ('today', ...);

might be perfectly sensible code.

regards, tom lane



Re: Default for date field: today vs CURRENT_DATE [RESOLVED]

2019-01-02 Thread Rich Shepard

On Wed, 2 Jan 2019, Tom Lane wrote:


'today' is special as a date input string, so you can use it as a literal:

regression=# select 'today'::date;
   date

2019-01-02
(1 row)

But it's not a SQL keyword, nor a function name, so you can't write it
without quotes.


Tom,

  Now I understand. I tried it with quotes, too, but this was in the table
definition, not as a selection criterion. I did not pick up this difference
when reading about the date-time data types.

  Thanks very much for clarifying.

Best regards,

Rich