[HACKERS] Strange result with LATERAL query

2016-08-23 Thread Jeevan Chalke
Hi,

While playing with LATERAL along with some aggregates in sub-query, I have
observed somewhat unusual behavior.

Consider following steps:

create table tab1(c1 int, c2 int);
insert into tab1 select id, 1 from generate_series(1, 3) id;
create function sum_tab1(extra int) returns setof bigint as $$
  select sum(c1 + extra) sum from tab1 group by c1
$$ language sql;

-- This gives wrong output
select c1, c2, sum from tab1 t1, lateral
  (select sum(t2.c1 + t1.c1) sum from tab1 t2 group by t2.c1) qry
  order by 1, 2, 3;
-- This gives correct output
select c1, c2, sum from tab1 t1, lateral
  (select sum_tab1 sum from sum_tab1(c1)) qry
  order by 1, 2, 3;

I would expect same result from both these queries, but unfortunately not.
Per my understanding, second query involving function gives me correct
result
where as first query's output seems wrong.

Is this an expected behavior OR we are giving wrong result in case of first
query?

Thanks
-- 
Jeevan B Chalke


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Claudio Freire
On Wed, Aug 24, 2016 at 2:04 AM, Alvaro Herrera
 wrote:
> Claudio Freire wrote:
>
>> After looking at it from a birdseye view, I agree it's conceptually
>> complex (reading HeapTupleSatisfiesSelf already makes one dizzy).
>>
>> But other than that, the implementation seems rather simple. It seems
>> to me, if one figures out that it is safe to do so (a-priori, xmin not
>> committed, xmax is current transaction), it would simply be a matter
>> of chasing the HOT chain root, setting all LP except the first to
>> LP_UNUSED and the first one to LP_DEAD.
>>
>> Of course I may be missing a ton of stuff.
>
> What you seem to be missing is that rows corresponding to temp tables
> are not "visible to its own transaction only".  The rows are valid
> after the transaction is gone; what makes the tables temporary is the
> fact that they are in a temporary schema.  And what makes them invisible
> to one backend is the fact that they are in the temporary schema for
> another backend.  Not that they are uncommitted.

Yeah, I was thinking of "on commit drop" behavior, but granted there's
all the others.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-23 Thread Mark Kirkwood

On 24/08/16 16:52, Mark Kirkwood wrote:

On 24/08/16 16:33, Amit Kapila wrote:

On Wed, Aug 24, 2016 at 9:53 AM, Mark Kirkwood
 wrote:

On 24/08/16 15:36, Amit Kapila wrote:

On Wed, Aug 24, 2016 at 8:54 AM, Mark Kirkwood
 wrote:
Can you get the call stacks?


For every stuck backend? (just double checking)...

Yeah.



Cool,

I managed to reproduce with a reduced workload of 4 backends, then 
noticed that the traces for 3 of 'em were all the same. So I've 
attached the 2 unique ones, plus noted what pg_stat_activity thought 
the wait event was, in case that is useful.


...actually I was wrong there, only 2 of them were the same. So I've 
attached a new log of bt's of them all.



$ pgbench -c 4 -T600 bench

postgres=# SELECT pid, state,now()-xact_start AS 
wait,wait_event_type,wait_event,query FROM pg_stat_activity WHERE 
datname='bench' ORDER BY wait DESC;  pid  | state  |  wait   | 
wait_event_type |   wait_event   |  
   query   
   
---++-+-++-
---
 16549 | active | 00:06:00.252146 | LWLockTranche   | buffer_content | INSERT 
INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (190, 24, 9513890, 
-2566, C
URRENT_TIMESTAMP);
 16548 | active | 00:06:00.235003 | LWLockTranche   | buffer_content | INSERT 
INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (445, 3, 5688076, 
1021, CUR
RENT_TIMESTAMP);
 16547 | active | 00:06:00.218032 | LWLockTranche   | buffer_content | INSERT 
INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (803, 97, 6871273, 
-35, CUR
RENT_TIMESTAMP);
 16546 | active | 00:06:00.192543 | Lock| transactionid  | UPDATE 
pgbench_branches SET bbalance = bbalance + -2823 WHERE bid = 3;
(4 rows)

$ gdb 16546
(gdb) bt
#0  0x7f51daa3a133 in __epoll_wait_nocancel () at 
../sysdeps/unix/syscall-template.S:84
#1  0x006abdb6 in WaitEventSetWaitBlock (nevents=1, 
occurred_events=0x7fffd197d3a0, cur_timeout=-1, set=0xdc9288)
at latch.c:987
#2  WaitEventSetWait (set=set@entry=0xdc9288, timeout=timeout@entry=-1, 
occurred_events=occurred_events@entry=0x7fffd197d3a0, 
nevents=nevents@entry=1) at latch.c:941
#3  0x006ac1ca in WaitLatchOrSocket (latch=0x7f51d9fd969c, 
wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, 
timeout=-1, timeout@entry=0) at latch.c:347
#4  0x006ac27d in WaitLatch (latch=, 
wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0)
at latch.c:302
#5  0x006bb5e3 in ProcSleep (locallock=locallock@entry=0xd3d690, 
lockMethodTable=lockMethodTable@entry=0x9149a0 ) at 
proc.c:1219
#6  0x006b639d in WaitOnLock (locallock=locallock@entry=0xd3d690, 
owner=owner@entry=0xdc4cb0) at lock.c:1703
#7  0x006b7a77 in LockAcquireExtended 
(locktag=locktag@entry=0x7fffd197d700, lockmode=lockmode@entry=5, 
sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000', 
reportMemoryError=reportMemoryError@entry=1 '\001')
at lock.c:998
#8  0x006b7d51 in LockAcquire (locktag=locktag@entry=0x7fffd197d700, 
lockmode=lockmode@entry=5, 
sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000') 
at lock.c:688
#9  0x006b58fe in XactLockTableWait (xid=xid@entry=7667, 
rel=rel@entry=0x7f51db5c6398, ctid=ctid@entry=0x7fffd197d844, 
oper=oper@entry=XLTW_Update) at lmgr.c:587
#10 0x004a5d29 in heap_update (relation=relation@entry=0x7f51db5c6398, 
otid=otid@entry=0x7fffd197d9e0, 
newtup=newtup@entry=0xd4b3d0, cid=2, crosscheck=0x0, wait=wait@entry=1 
'\001', hufd=0x7fffd197d8f0, lockmode=0x7fffd197d8ec)
at heapam.c:3755
#11 0x005dbeaa in ExecUpdate (tupleid=tupleid@entry=0x7fffd197d9e0, 
oldtuple=oldtuple@entry=0x0, 
slot=slot@entry=0xd4adb0, planSlot=planSlot@entry=0xdc8ea0, 
epqstate=epqstate@entry=0xdc76c8, estate=estate@entry=0xdc73d0, 
canSetTag=1 '\001') at nodeModifyTable.c:922
#12 0x005dc633 in ExecModifyTable (node=node@entry=0xdc7620) at 
nodeModifyTable.c:1501
#13 0x005c3948 in ExecProcNode (node=node@entry=0xdc7620) at 
execProcnode.c:396
#14 0x005bfdbf in ExecutePlan (dest=0xdc9e60, direction=, numberTuples=0, sendTuples=, 
operation=CMD_UPDATE, use_parallel_mode=, 
planstate=0xdc7620, estate=0xdc73d0) at execMain.c:1567
#15 standard_ExecutorRun (queryDesc=0xdc6fc0, direction=, 
count=0) at execMain.c:338
#16 0x006ce669 in ProcessQuery (plan=, 
sourceText=0xd8ff70 "UPDATE pgbench_branches SET bbalance = bbalance + 
-2823 WHERE bid = 3;", params=0x0, dest=0xdc9e60, 
completionTag=0x7fffd197dfc0 "") at pquery.c:187
#17 0x006ce89c in PortalRunMulti (portal=portal@entry=0xd35000, 
isTopLevel=isTopLevel@entry=1 '\001', 

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-23 Thread Amit Kapila
On Wed, Aug 24, 2016 at 2:37 AM, Jeff Janes  wrote:
> Hi Amit,
>
> Thanks for working on this.
>
> When building with --enable-cassert, I get compiler warning:
>
> hash.c: In function 'hashbucketcleanup':
> hash.c:722: warning: 'new_bucket' may be used uninitialized in this function
>

This warning is from concurrent index patch. I will fix it and post
the patch on that thread.

>
> After an intentionally created crash, I get an Assert triggering:
>
> TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] &
> (1<<((bitmapbit)%32", File: "hashovfl.c", Line: 553)
>
> freep[0] is zero and bitmapbit is 16.
>

Here what is happening is that when it tries to clear the bitmapbit,
it expects it to be set.  Now, I think the reason for why it didn't
find the bit as set could be that after the new overflow page is added
and the bit corresponding to it is set, you might have crashed the
system and the replay would not have set the bit.  Then while freeing
the overflow page it can hit the Assert as mentioned by you.  I think
the problem here could be that I am using REGBUF_STANDARD to log the
bitmap page updates which seems to be causing the issue.  As bitmap
page doesn't follow the standard page layout, it would have omitted
the actual contents while taking full page image and then during
replay, it would not have set the bit, because page doesn't need REDO.
I think here the fix is to use REGBUF_NO_IMAGE as we use for vm
buffers.

If you can send me the detailed steps for how you have produced the
problem, then I can verify after fixing whether you are seeing the
same problem or something else.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Alvaro Herrera
Claudio Freire wrote:

> After looking at it from a birdseye view, I agree it's conceptually
> complex (reading HeapTupleSatisfiesSelf already makes one dizzy).
> 
> But other than that, the implementation seems rather simple. It seems
> to me, if one figures out that it is safe to do so (a-priori, xmin not
> committed, xmax is current transaction), it would simply be a matter
> of chasing the HOT chain root, setting all LP except the first to
> LP_UNUSED and the first one to LP_DEAD.
> 
> Of course I may be missing a ton of stuff.

What you seem to be missing is that rows corresponding to temp tables
are not "visible to its own transaction only".  The rows are valid
after the transaction is gone; what makes the tables temporary is the
fact that they are in a temporary schema.  And what makes them invisible
to one backend is the fact that they are in the temporary schema for
another backend.  Not that they are uncommitted.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-23 Thread Mark Kirkwood

On 24/08/16 16:33, Amit Kapila wrote:

On Wed, Aug 24, 2016 at 9:53 AM, Mark Kirkwood
 wrote:

On 24/08/16 15:36, Amit Kapila wrote:

On Wed, Aug 24, 2016 at 8:54 AM, Mark Kirkwood
 wrote:
Can you get the call stacks?


For every stuck backend? (just double checking)...

Yeah.



Cool,

I managed to reproduce with a reduced workload of 4 backends, then 
noticed that the traces for 3 of 'em were all the same. So I've attached 
the 2 unique ones, plus noted what pg_stat_activity thought the wait 
event was, in case that is useful.


Cheers

Mark

$ pgbench -c 4 -T600 bench

postgres=# SELECT pid, state,now()-xact_start AS 
wait,wait_event_type,wait_event,query FROM pg_stat_activity WHERE 
datname='bench' ORDER BY wait DESC;  pid  | state  |  wait   | 
wait_event_type |   wait_event   |  
   query   
   
---++-+-++-
---
 16549 | active | 00:06:00.252146 | LWLockTranche   | buffer_content | INSERT 
INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (190, 24, 9513890, 
-2566, C
URRENT_TIMESTAMP);
 16548 | active | 00:06:00.235003 | LWLockTranche   | buffer_content | INSERT 
INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (445, 3, 5688076, 
1021, CUR
RENT_TIMESTAMP);
 16547 | active | 00:06:00.218032 | LWLockTranche   | buffer_content | INSERT 
INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (803, 97, 6871273, 
-35, CUR
RENT_TIMESTAMP);
 16546 | active | 00:06:00.192543 | Lock| transactionid  | UPDATE 
pgbench_branches SET bbalance = bbalance + -2823 WHERE bid = 3;
(4 rows)


postgres@zmori:~$ gdb -p 16546
0x7f51daa3a133 in __epoll_wait_nocancel ()
at ../sysdeps/unix/syscall-template.S:84
84  ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x7f51daa3a133 in __epoll_wait_nocancel ()
at ../sysdeps/unix/syscall-template.S:84
#1  0x006abdb6 in WaitEventSetWaitBlock (nevents=1, 
occurred_events=0x7fffd197d3a0, cur_timeout=-1, set=0xdc9288)
at latch.c:987
#2  WaitEventSetWait (set=set@entry=0xdc9288, timeout=timeout@entry=-1, 
occurred_events=occurred_events@entry=0x7fffd197d3a0, 
nevents=nevents@entry=1) at latch.c:941
#3  0x006ac1ca in WaitLatchOrSocket (latch=0x7f51d9fd969c, 
wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1, 
timeout@entry=0) at latch.c:347
#4  0x006ac27d in WaitLatch (latch=, 
wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0) at latch.c:302
#5  0x006bb5e3 in ProcSleep (locallock=locallock@entry=0xd3d690, 
lockMethodTable=lockMethodTable@entry=0x9149a0 )
at proc.c:1219
#6  0x006b639d in WaitOnLock (locallock=locallock@entry=0xd3d690, 
owner=owner@entry=0xdc4cb0) at lock.c:1703
#7  0x006b7a77 in LockAcquireExtended (
locktag=locktag@entry=0x7fffd197d700, lockmode=lockmode@entry=5, 
sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000', 
reportMemoryError=reportMemoryError@entry=1 '\001') at lock.c:998
#8  0x006b7d51 in LockAcquire (locktag=locktag@entry=0x7fffd197d700, 
lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', 
dontWait=dontWait@entry=0 '\000') at lock.c:688
#9  0x006b58fe in XactLockTableWait (xid=xid@entry=7667, 
rel=rel@entry=0x7f51db5c6398, ctid=ctid@entry=0x7fffd197d844, 
oper=oper@entry=XLTW_Update) at lmgr.c:587
#10 0x004a5d29 in heap_update (relation=relation@entry=0x7f51db5c6398, 
otid=otid@entry=0x7fffd197d9e0, newtup=newtup@entry=0xd4b3d0, cid=2, 
crosscheck=0x0, wait=wait@entry=1 '\001', hufd=0x7fffd197d8f0, 
lockmode=0x7fffd197d8ec) at heapam.c:3755
#11 0x005dbeaa in ExecUpdate (tupleid=tupleid@entry=0x7fffd197d9e0, 
oldtuple=oldtuple@entry=0x0, slot=slot@entry=0xd4adb0, 
planSlot=planSlot@entry=0xdc8ea0, epqstate=epqstate@entry=0xdc76c8, 
estate=estate@entry=0xdc73d0, canSetTag=1 '\001') at nodeModifyTable.c:922
#12 0x005dc633 in ExecModifyTable (node=node@entry=0xdc7620)
at nodeModifyTable.c:1501
#13 0x005c3948 in ExecProcNode (node=node@entry=0xdc7620)
at execProcnode.c:396
#14 0x005bfdbf in ExecutePlan (dest=0xdc9e60, 
direction=, numberTuples=0, sendTuples=, 
operation=CMD_UPDATE, use_parallel_mode=, 
planstate=0xdc7620, estate=0xdc73d0) at execMain.c:1567
#15 standard_ExecutorRun (queryDesc=0xdc6fc0, direction=, 
count=0) at execMain.c:338
#16 0x006ce669 in ProcessQuery (plan=, 
sourceText=0xd8ff70 "UPDATE pgbench_branches SET bbalance = bbalance + 
-2823 WHERE bid = 3;", params=0x0, dest=0xdc9e60, completionTag=0x7fffd197dfc0 
"")
---Type  to continue, or q  to quit--- 

Re: [HACKERS] LSN as a recovery target

2016-08-23 Thread Michael Paquier
On Tue, Aug 23, 2016 at 8:50 PM, Michael Paquier
 wrote:
> On Tue, Aug 23, 2016 at 6:10 PM, Simon Riggs  wrote:
>> On 23 August 2016 at 09:39, Petr Jelinek  wrote:
>>
>>> Looks very reasonable to me (both patches). Thanks for doing that.
>>>
>>> I am inclined to mark this as ready for committer.
>>
>> Looking at it now.
>>
>> The messages for recovery_target_lsn don't mention after or before, as
>> do other targets... e.g.
>>  recoveryStopAfter ? "after" : "before",
>> My understanding is that if you request an LSN that isn't the exact
>> end point of a WAL record then it will either stop before or after the
>> requested point, so that needs to be described in the docs and in the
>> messages generated prior to starting to search.
>>
>> Everything else looks in good order.
>
> You are right, this message should be completed as such. Do you want
> an updated patch?

Well, I finished by updating the patch anyway.
-- 
Michael


0001-Introduce-recovery_target_lsn-in-recovery.conf.patch
Description: invalid/octet-stream


0002-Introduce-error-callback-when-parsing-recovery.conf.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-23 Thread Amit Kapila
On Wed, Aug 24, 2016 at 9:53 AM, Mark Kirkwood
 wrote:
> On 24/08/16 15:36, Amit Kapila wrote:
>>
>> On Wed, Aug 24, 2016 at 8:54 AM, Mark Kirkwood
>>  wrote:
>>>
>>
>> Can you get the call stacks?
>>
>
> For every stuck backend? (just double checking)...

Yeah.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-23 Thread Mark Kirkwood

On 24/08/16 15:36, Amit Kapila wrote:

On Wed, Aug 24, 2016 at 8:54 AM, Mark Kirkwood
 wrote:

On 24/08/16 12:09, Mark Kirkwood wrote:

On 23/08/16 15:24, Amit Kapila wrote:


Thoughts?

Note - To use this patch, first apply latest version of concurrent
hash index patch [2].

[1] - https://commitfest.postgresql.org/10/647/
[2] -
https://www.postgresql.org/message-id/caa4ek1lkq_udism-z2dq6cuvjh3db5fnfnnezzbpsrjw0ha...@mail.gmail.com



Firstly - really nice! Patches applied easily etc to latest version 10
checkout.

I thought I'd test by initializing pgbench schema, adding a standby, then
adding some hash indexes and running pgbench:

Looking on the standby:

bench=# \d pgbench_accounts
Table "public.pgbench_accounts"
   Column  | Type  | Modifiers
--+---+---
  aid  | integer   | not null
  bid  | integer   |
  abalance | integer   |
  filler   | character(84) |
Indexes:
 "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
 "pgbench_accounts_bid" hash (bid)<

bench=# \d pgbench_history
   Table "public.pgbench_history"
  Column |Type | Modifiers
+-+---
  tid| integer |
  bid| integer |
  aid| integer |
  delta  | integer |
  mtime  | timestamp without time zone |
  filler | character(22)   |
Indexes:
 "pgbench_history_bid" hash (bid) <=


they have been replicated there ok.

However I'm seeing a hang on the master after a while:

bench=# SELECT datname,application_name,state,now()-xact_start AS
wait,query FROM pg_stat_activity ORDER BY wait DESC;
  datname | application_name | state  |  wait | query

-+--++-+
  | walreceiver  | idle   | |
  bench   | pgbench  | active | 00:31:38.367467 | INSERT INTO
pgbench_history (tid, bid, aid, delta, mtime) VALUES (921, 38, 251973,
-3868, CURRENT_TIMESTAMP);
  bench   | pgbench  | active | 00:31:38.215378 | INSERT INTO
pgbench_history (tid, bid, aid, delta, mtime) VALUES (280, 95, 3954814,
2091, CURRENT_TIMESTAMP);
  bench   | pgbench  | active | 00:31:35.991056 | INSERT INTO
pgbench_history (tid, bid, aid, delta, mtime) VALUES (447, 33, 8355237,
3438, CURRENT_TIMESTAMP);
  bench   | pgbench  | active | 00:31:35.619798 | INSERT INTO
pgbench_history (tid, bid, aid, delta, mtime) VALUES (134, 16, 4839994,
-2443, CURRENT_TIMESTAMP);
  bench   | pgbench  | active | 00:31:35.544196 | INSERT INTO
pgbench_history (tid, bid, aid, delta, mtime) VALUES (37, 73, 9620119, 4053,
CURRENT_TIMESTAMP);
  bench   | pgbench  | active | 00:31:35.334504 | UPDATE
pgbench_branches SET bbalance = bbalance + -2954 WHERE bid = 33;
  bench   | pgbench  | active | 00:31:35.234112 | UPDATE
pgbench_branches SET bbalance = bbalance + -713 WHERE bid = 38;
  bench   | pgbench  | active | 00:31:34.434676 | UPDATE
pgbench_branches SET bbalance = bbalance + -921 WHERE bid = 33;
  bench   | psql | active | 00:00:00| SELECT
datname,application_name,state,now()-xact_start AS wait,query FROM
pg_stat_activity ORDER BY wait DESC;
(10 rows)

but no errors in the logs, any thoughts?

Can you get the call stacks?



For every stuck backend? (just double checking)...


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-23 Thread Amit Kapila
On Wed, Aug 24, 2016 at 8:54 AM, Mark Kirkwood
 wrote:
> On 24/08/16 12:09, Mark Kirkwood wrote:
>>
>> On 23/08/16 15:24, Amit Kapila wrote:
>>>
>>>
>>> Thoughts?
>>>
>>> Note - To use this patch, first apply latest version of concurrent
>>> hash index patch [2].
>>>
>>> [1] - https://commitfest.postgresql.org/10/647/
>>> [2] -
>>> https://www.postgresql.org/message-id/caa4ek1lkq_udism-z2dq6cuvjh3db5fnfnnezzbpsrjw0ha...@mail.gmail.com
>>>
>>>
>>
>> Firstly - really nice! Patches applied easily etc to latest version 10
>> checkout.
>>
>> I thought I'd test by initializing pgbench schema, adding a standby, then
>> adding some hash indexes and running pgbench:
>>
>> Looking on the standby:
>>
>> bench=# \d pgbench_accounts
>>Table "public.pgbench_accounts"
>>   Column  | Type  | Modifiers
>> --+---+---
>>  aid  | integer   | not null
>>  bid  | integer   |
>>  abalance | integer   |
>>  filler   | character(84) |
>> Indexes:
>> "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
>> "pgbench_accounts_bid" hash (bid)<
>>
>> bench=# \d pgbench_history
>>   Table "public.pgbench_history"
>>  Column |Type | Modifiers
>> +-+---
>>  tid| integer |
>>  bid| integer |
>>  aid| integer |
>>  delta  | integer |
>>  mtime  | timestamp without time zone |
>>  filler | character(22)   |
>> Indexes:
>> "pgbench_history_bid" hash (bid) <=
>>
>>
>> they have been replicated there ok.
>>
>> However I'm seeing a hang on the master after a while:
>>
>> bench=# SELECT datname,application_name,state,now()-xact_start AS
>> wait,query FROM pg_stat_activity ORDER BY wait DESC;
>>  datname | application_name | state  |  wait | query
>>
>> -+--++-+
>>  | walreceiver  | idle   | |
>>  bench   | pgbench  | active | 00:31:38.367467 | INSERT INTO
>> pgbench_history (tid, bid, aid, delta, mtime) VALUES (921, 38, 251973,
>> -3868, CURRENT_TIMESTAMP);
>>  bench   | pgbench  | active | 00:31:38.215378 | INSERT INTO
>> pgbench_history (tid, bid, aid, delta, mtime) VALUES (280, 95, 3954814,
>> 2091, CURRENT_TIMESTAMP);
>>  bench   | pgbench  | active | 00:31:35.991056 | INSERT INTO
>> pgbench_history (tid, bid, aid, delta, mtime) VALUES (447, 33, 8355237,
>> 3438, CURRENT_TIMESTAMP);
>>  bench   | pgbench  | active | 00:31:35.619798 | INSERT INTO
>> pgbench_history (tid, bid, aid, delta, mtime) VALUES (134, 16, 4839994,
>> -2443, CURRENT_TIMESTAMP);
>>  bench   | pgbench  | active | 00:31:35.544196 | INSERT INTO
>> pgbench_history (tid, bid, aid, delta, mtime) VALUES (37, 73, 9620119, 4053,
>> CURRENT_TIMESTAMP);
>>  bench   | pgbench  | active | 00:31:35.334504 | UPDATE
>> pgbench_branches SET bbalance = bbalance + -2954 WHERE bid = 33;
>>  bench   | pgbench  | active | 00:31:35.234112 | UPDATE
>> pgbench_branches SET bbalance = bbalance + -713 WHERE bid = 38;
>>  bench   | pgbench  | active | 00:31:34.434676 | UPDATE
>> pgbench_branches SET bbalance = bbalance + -921 WHERE bid = 33;
>>  bench   | psql | active | 00:00:00| SELECT
>> datname,application_name,state,now()-xact_start AS wait,query FROM
>> pg_stat_activity ORDER BY wait DESC;
>> (10 rows)
>>
>> but no errors in the logs, any thoughts?
>

Can you get the call stacks?

>
>
> FWIW, retesting with the wal logging patch removed (i.e leaving the
> concurrent hast one) works fine.
>

Okay, information noted.


Thanks for testing and showing interest in the patch.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-23 Thread Etsuro Fujita

On 2016/08/10 5:19, Robert Haas wrote:

On Mon, Aug 8, 2016 at 12:22 AM, Etsuro Fujita
 wrote:

One thing we need to do to leave that as is would be to fix a bug that I
pointed out upthred.  Let me explain about that again.  The EXPLAIN command
selects relation aliases to be used in printing a query so that each
selected alias is unique, but postgres_fdw wouldn't consider the uniqueness.
Here is an example:

postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1, ft2
t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2 where t1.a
= t2.a) as t(t1a, t2a);
 QUERY PLAN

 Unique  (cost=204.12..204.13 rows=2 width=8)
   Output: t1.a, t2.a
   ->  Sort  (cost=204.12..204.12 rows=2 width=8)
 Output: t1.a, t2.a
 Sort Key: t1.a, t2.a
 ->  Append  (cost=100.00..204.11 rows=2 width=8)
   ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
 Output: t1.a, t2.a
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1 INNER
JOIN public.t2 r2 ON (((r1.a = r2.a
   ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
 Output: t1_1.a, t2_1.a
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1 INNER
JOIN public.t2 r2 ON (((r1.a = r2.a
(14 rows)

The relation aliases in the Relations line in the second Foreign Scan, t1
and t2 for ft1 and ft2, are not unique; they should be t1_1 and t2_1
(compare the aliases in the Relations line with ones in the Output line
directly above that, created by core).  The reason for that is because
postgres_fdw has created the Relations info by using rte->eref->aliasname as
relation aliases as is at path-creation time. Probably it would be a little
bit too early for postgers_fdw to do that.  Couldn't postgres_fdw create
that info after query planning, for example, during ExplainForeignScan?



Yes, it seems what we are doing now is not consistent with what
happens for plain tables; that should probably be fixed.


OK, I think we should fix the issue that postgres_fdw produces incorrect 
aliases for joining relations shown in the Relations line in EXPLAIN for 
a join pushdown query like the above) in advance of the 9.6 release, so 
I'll add this to the 9.6 open items.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-23 Thread Mark Kirkwood

On 24/08/16 12:09, Mark Kirkwood wrote:

On 23/08/16 15:24, Amit Kapila wrote:


Thoughts?

Note - To use this patch, first apply latest version of concurrent
hash index patch [2].

[1] - https://commitfest.postgresql.org/10/647/
[2] - 
https://www.postgresql.org/message-id/caa4ek1lkq_udism-z2dq6cuvjh3db5fnfnnezzbpsrjw0ha...@mail.gmail.com





Firstly - really nice! Patches applied easily etc to latest version 10 
checkout.


I thought I'd test by initializing pgbench schema, adding a standby, 
then adding some hash indexes and running pgbench:


Looking on the standby:

bench=# \d pgbench_accounts
   Table "public.pgbench_accounts"
  Column  | Type  | Modifiers
--+---+---
 aid  | integer   | not null
 bid  | integer   |
 abalance | integer   |
 filler   | character(84) |
Indexes:
"pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
"pgbench_accounts_bid" hash (bid)<

bench=# \d pgbench_history
  Table "public.pgbench_history"
 Column |Type | Modifiers
+-+---
 tid| integer |
 bid| integer |
 aid| integer |
 delta  | integer |
 mtime  | timestamp without time zone |
 filler | character(22)   |
Indexes:
"pgbench_history_bid" hash (bid) <=


they have been replicated there ok.

However I'm seeing a hang on the master after a while:

bench=# SELECT datname,application_name,state,now()-xact_start AS 
wait,query FROM pg_stat_activity ORDER BY wait DESC;

 datname | application_name | state  |  wait | query
-+--++-+ 


 | walreceiver  | idle   | |
 bench   | pgbench  | active | 00:31:38.367467 | INSERT INTO 
pgbench_history (tid, bid, aid, delta, mtime) VALUES (921, 38, 251973, 
-3868, CURRENT_TIMESTAMP);
 bench   | pgbench  | active | 00:31:38.215378 | INSERT INTO 
pgbench_history (tid, bid, aid, delta, mtime) VALUES (280, 95, 
3954814, 2091, CURRENT_TIMESTAMP);
 bench   | pgbench  | active | 00:31:35.991056 | INSERT INTO 
pgbench_history (tid, bid, aid, delta, mtime) VALUES (447, 33, 
8355237, 3438, CURRENT_TIMESTAMP);
 bench   | pgbench  | active | 00:31:35.619798 | INSERT INTO 
pgbench_history (tid, bid, aid, delta, mtime) VALUES (134, 16, 
4839994, -2443, CURRENT_TIMESTAMP);
 bench   | pgbench  | active | 00:31:35.544196 | INSERT INTO 
pgbench_history (tid, bid, aid, delta, mtime) VALUES (37, 73, 9620119, 
4053, CURRENT_TIMESTAMP);
 bench   | pgbench  | active | 00:31:35.334504 | UPDATE 
pgbench_branches SET bbalance = bbalance + -2954 WHERE bid = 33;
 bench   | pgbench  | active | 00:31:35.234112 | UPDATE 
pgbench_branches SET bbalance = bbalance + -713 WHERE bid = 38;
 bench   | pgbench  | active | 00:31:34.434676 | UPDATE 
pgbench_branches SET bbalance = bbalance + -921 WHERE bid = 33;
 bench   | psql | active | 00:00:00| SELECT 
datname,application_name,state,now()-xact_start AS wait,query FROM 
pg_stat_activity ORDER BY wait DESC;

(10 rows)

but no errors in the logs, any thoughts?



FWIW, retesting with the wal logging patch removed (i.e leaving the 
concurrent hast one) works fine.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recent compiler warnings

2016-08-23 Thread Tom Lane
Andres Freund  writes:
> On 2016-08-23 19:54:39 -0700, Jeff Janes wrote:
>> commit ed0097e4f9e6b1 has introduced two compiler warnings:

> That's probably when compiling without assertions, right? rd_index is
> only read from within Assert()'s Probably needs a
> PG_USED_FOR_ASSERTS_ONLY slapped on.

Yeah, looks like that to me too.  I'm on it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recent compiler warnings

2016-08-23 Thread Andres Freund
On 2016-08-23 19:54:39 -0700, Jeff Janes wrote:
> Sorry for starting a new thread, I can't find the correct one to reply to.
> 
> Using: gcc version 4.8.3 20140627 [gcc-4_8-branch revision 212064] (SUSE 
> Linux)
> 
> commit ed0097e4f9e6b1 has introduced two compiler warnings:
> 
> gistutil.c: In function 'gistproperty':
> gistutil.c:855:16: warning: variable 'rd_index' set but not used
> [-Wunused-but-set-variable]
>   Form_pg_index rd_index;
> ^
> amutils.c: In function 'test_indoption':
> amutils.c:118:16: warning: variable 'rd_index' set but not used
> [-Wunused-but-set-variable]
>   Form_pg_index rd_index;
> ^
> 
> commit ed0097e4f9e6b1227935e01fa67f12a238b66064
> Author: Tom Lane 
> Date:   Sat Aug 13 18:31:14 2016 -0400
> 
> Add SQL-accessible functions for inspecting index AM properties.
> 
> 
> I don't know if this compiler is a too clever by half, or not clever enough.

That's probably when compiling without assertions, right? rd_index is
only read from within Assert()'s Probably needs a
PG_USED_FOR_ASSERTS_ONLY slapped on.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] recent compiler warnings

2016-08-23 Thread Jeff Janes
Sorry for starting a new thread, I can't find the correct one to reply to.

Using: gcc version 4.8.3 20140627 [gcc-4_8-branch revision 212064] (SUSE Linux)

commit ed0097e4f9e6b1 has introduced two compiler warnings:

gistutil.c: In function 'gistproperty':
gistutil.c:855:16: warning: variable 'rd_index' set but not used
[-Wunused-but-set-variable]
  Form_pg_index rd_index;
^
amutils.c: In function 'test_indoption':
amutils.c:118:16: warning: variable 'rd_index' set but not used
[-Wunused-but-set-variable]
  Form_pg_index rd_index;
^

commit ed0097e4f9e6b1227935e01fa67f12a238b66064
Author: Tom Lane 
Date:   Sat Aug 13 18:31:14 2016 -0400

Add SQL-accessible functions for inspecting index AM properties.


I don't know if this compiler is a too clever by half, or not clever enough.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Change the default of update_process_title to off

2016-08-23 Thread Tsunakawa, Takayuki
From: Peter Geoghegan [mailto:p...@heroku.com]
> On Tue, Aug 23, 2016 at 1:44 PM, Bruce Momjian  wrote:
> >> [Windows]
> >> #clients  onoff
> >> 12 29793  38169
> >> 24 31587 87237
> >> 48 32588 83335
> >> 96 34261  67668
> >
> > This ranges from a 28% to a 97% speed improvement on Windows!  Those
> > are not typos!  This is a game-changer for use of Postgres on Windows
> > for certain workloads!
> 
> While I don't care all that much about performance on windows, it is a little
> sad that it took this long to fix something so simple. Consider this exchange,
> as a further example of our lack of concern here:
> 
> https://www.postgresql.org/message-id/30619.1428157...@sss.pgh.pa.us

Probably, the useful Windows Performance Toolkit, which is a counterpart of 
perf on Linux, was not available before.  Maybe we can dig deeper into 
performance problems with it now.

As a similar topic, I wonder whether the following still holds true, after many 
improvements on shared buffer lock contention.

https://www.postgresql.org/docs/devel/static/runtime-config-resource.html

"The useful range for shared_buffers on Windows systems is generally 
from 64MB to 512MB."

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dump/restore doesn't preserve row ordering?

2016-08-23 Thread Tom Lane
Andres Freund  writes:
> On 2016-08-23 17:22:03 -0400, Tom Lane wrote:
>> I can't immediately think of a reason for this.  In everyday
>> updates you could theorize about effects like autovacuum
>> asynchonously updating the FSM, but surely the FSM should have no
>> impact on where COPY puts stuff when loading into an empty table.

> It seems possible that a larger row didn't fit on a page anymore, then
> later when a new page was is needed for a smaller row, the earlier page
> is found again.  Due to RelationGetBufferForTuple() updating the fsm
> when an old target buffer is present:

Ah.  That matches the symptoms --- small groups of rows are getting
relocated, seems like.  And there's definitely a wide range of row
lengths in this data.

It's interesting that nobody has complained about this behavior.
Maybe the old fogies are all gone ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Claudio Freire
On Tue, Aug 23, 2016 at 9:12 PM, Tomas Vondra
 wrote:
> On 08/24/2016 12:38 AM, Claudio Freire wrote:
>>
>> On Tue, Aug 23, 2016 at 7:25 PM, Tomas Vondra
>>  wrote:
>
> Could someone please explain how the unlogged tables are supposed to
> fix
> the
> catalog bloat problem, as stated in the initial patch submission? We'd
> still
> need to insert/delete the catalog rows when creating/dropping the
> temporary
> tables, causing the bloat. Or is there something I'm missing?



 Wouldn't more aggressive vacuuming of catalog tables fix the bloat?

 Perhaps reserving a worker or N to run only on catalog schemas?

 That'd be far simpler.
>>>
>>>
>>>
>>> Maybe, although IIRC the issues with catalog bloat were due to a
>>> combination
>>> of long queries and many temporary tables being created/dropped. In that
>>> case simply ramping up autovacuum (or even having a dedicated workers for
>>> catalogs) would not realy help due to the xmin horizon being blocked by
>>> the
>>> long-running queries.
>>>
>>> Maybe it's entirely crazy idea due to the wine I drank at the dinner, but
>>> couldn't we vacuum the temporary table records differently? For example,
>>> couldn't we just consider them removable as soon as the backend that owns
>>> them disappears?
>>
>>
>> Or perhaps go all the way and generalize that to rows that never
>> become visible outside their parent transaction.
>>
>> As in, delete of rows created by the deleting transaction could clean
>> up, carefully to avoid voiding indexes and all that, but more
>> aggressively than regular deletes.
>>
>
> Maybe, but I wouldn't be surprised if such generalization would be an order
> of magnitude more complicated - and even the vacuuming changes I mentioned
> are undoubtedly a fair amount of work.

After looking at it from a birdseye view, I agree it's conceptually
complex (reading HeapTupleSatisfiesSelf already makes one dizzy).

But other than that, the implementation seems rather simple. It seems
to me, if one figures out that it is safe to do so (a-priori, xmin not
committed, xmax is current transaction), it would simply be a matter
of chasing the HOT chain root, setting all LP except the first to
LP_UNUSED and the first one to LP_DEAD.

Of course I may be missing a ton of stuff.

> Sadly, I don't see how this might fix the other issues mentioned in this
> thread (e.g. impossibility to create temp tables on standbys),

No it doesn't :(


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-08-23 Thread neha khatri
On Wed, Aug 24, 2016 at 10:07 AM, Gavin Flower <
gavinflo...@archidevsys.co.nz> wrote:

> On 24/08/16 12:02, neha khatri wrote:
>
>> >Andres Freund > writes:
>> >> On 2016-08-22 13:54:43 -0400, Robert Haas wrote:
>> >> On Sat, Aug 20, 2016 at 11:17 AM, Tom Lane  t...@sss.pgh.pa.us>> wrote:
>>  I'm inclined to suggest you forget this approach and propose a single
>>  counter for "SQL commands executed", which avoids all of the above
>>  definitional problems.  People who need more detail than that are
>>  probably best advised to look to contrib/pg_stat_statements, anyway.
>>
>> >>> I disagree.  I think SQL commands executed, lumping absolutely
>> >>> everything together, really isn't much use.
>>
>> >> I'm inclined to agree. I think that's a quite useful stat when looking
>> >> at an installation one previously didn't have a lot of interaction
>> with.
>>
>> >Well, let's at least have an "other" category so you can add up the
>> >counters and get a meaningful total.
>>
>> How would that meaningful total might help a user. What can a user might
>> analyse with the counter in 'other' category.
>>
>>
>>
>> The user could then judge if there were a significant number of examples
> not covered in the other categories - this may, or may not, be a problem;
> depending on the use case.
>
>
> Cheers,
> Gavin
>
> For the user to be able to judge that whether the number in the 'other'
category is a problem or not, the user is also required to know what all
might fall under the 'other' category. It may not be good to say that
_anything_ that is not part of the already defined category is part of
'other'. Probably, 'other' should also be a set of predefined operations.

Neha


Re: [HACKERS] Patch: initdb: "'" for QUOTE_PATH (non-windows)

2016-08-23 Thread Michael Paquier
On Wed, Aug 24, 2016 at 9:55 AM, Alvaro Herrera
 wrote:
> Ryan Murphy wrote:
>> Thanks for committing it Tom!  Thank you all for your help.
>>
>> I really like the Postgres project!  If there's anything that especially
>> needs worked on let me know, I'd love to help.
>
> https://wiki.postgresql.org/wiki/Todo

Even with that, there are always patches waiting for reviews, tests or input:
https://commitfest.postgresql.org/10/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Claudio Freire
On Tue, Aug 23, 2016 at 8:50 PM, Greg Stark  wrote:
> On Tue, Aug 23, 2016 at 4:15 PM, Aleksander Alekseev
>  wrote:
>> Frankly I have much more faith in Tom's idea of using virtual part of the
>> catalog for all temporary tables, i.e turning all temporary tables into
>> "fast" temporary tables. Instead of introducing a new type of temporary 
>> tables
>> that solve catalog bloating problem and forcing users to rewrite applications
>> why not just not to create a problem in a first place?
>
> Would applications really need to be rewritten? Are they really
> constructing temporary tables where the definition of the table is
> dynamic, not just the content?

Mine is. But it wouldn't be a big deal to adapt.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: initdb: "'" for QUOTE_PATH (non-windows)

2016-08-23 Thread Alvaro Herrera
Ryan Murphy wrote:
> Thanks for committing it Tom!  Thank you all for your help.
> 
> I really like the Postgres project!  If there's anything that especially
> needs worked on let me know, I'd love to help.

https://wiki.postgresql.org/wiki/Todo

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Tomas Vondra



On 08/24/2016 12:38 AM, Claudio Freire wrote:

On Tue, Aug 23, 2016 at 7:25 PM, Tomas Vondra
 wrote:

Could someone please explain how the unlogged tables are supposed to fix
the
catalog bloat problem, as stated in the initial patch submission? We'd
still
need to insert/delete the catalog rows when creating/dropping the
temporary
tables, causing the bloat. Or is there something I'm missing?



Wouldn't more aggressive vacuuming of catalog tables fix the bloat?

Perhaps reserving a worker or N to run only on catalog schemas?

That'd be far simpler.



Maybe, although IIRC the issues with catalog bloat were due to a combination
of long queries and many temporary tables being created/dropped. In that
case simply ramping up autovacuum (or even having a dedicated workers for
catalogs) would not realy help due to the xmin horizon being blocked by the
long-running queries.

Maybe it's entirely crazy idea due to the wine I drank at the dinner, but
couldn't we vacuum the temporary table records differently? For example,
couldn't we just consider them removable as soon as the backend that owns
them disappears?


Or perhaps go all the way and generalize that to rows that never
become visible outside their parent transaction.

As in, delete of rows created by the deleting transaction could clean
up, carefully to avoid voiding indexes and all that, but more
aggressively than regular deletes.



Maybe, but I wouldn't be surprised if such generalization would be an 
order of magnitude more complicated - and even the vacuuming changes I 
mentioned are undoubtedly a fair amount of work.


Sadly, I don't see how this might fix the other issues mentioned in this 
thread (e.g. impossibility to create temp tables on standbys),



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-23 Thread Mark Kirkwood

On 23/08/16 15:24, Amit Kapila wrote:


Thoughts?

Note - To use this patch, first apply latest version of concurrent
hash index patch [2].

[1] - https://commitfest.postgresql.org/10/647/
[2] - 
https://www.postgresql.org/message-id/caa4ek1lkq_udism-z2dq6cuvjh3db5fnfnnezzbpsrjw0ha...@mail.gmail.com




Firstly - really nice! Patches applied easily etc to latest version 10 
checkout.


I thought I'd test by initializing pgbench schema, adding a standby, 
then adding some hash indexes and running pgbench:


Looking on the standby:

bench=# \d pgbench_accounts
   Table "public.pgbench_accounts"
  Column  | Type  | Modifiers
--+---+---
 aid  | integer   | not null
 bid  | integer   |
 abalance | integer   |
 filler   | character(84) |
Indexes:
"pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
"pgbench_accounts_bid" hash (bid)<

bench=# \d pgbench_history
  Table "public.pgbench_history"
 Column |Type | Modifiers
+-+---
 tid| integer |
 bid| integer |
 aid| integer |
 delta  | integer |
 mtime  | timestamp without time zone |
 filler | character(22)   |
Indexes:
"pgbench_history_bid" hash (bid) <=


they have been replicated there ok.

However I'm seeing a hang on the master after a while:

bench=# SELECT datname,application_name,state,now()-xact_start AS 
wait,query FROM pg_stat_activity ORDER BY wait DESC;

 datname | application_name | state  |  wait | query
-+--++-+
 | walreceiver  | idle   | |
 bench   | pgbench  | active | 00:31:38.367467 | INSERT INTO 
pgbench_history (tid, bid, aid, delta, mtime) VALUES (921, 38, 251973, 
-3868, CURRENT_TIMESTAMP);
 bench   | pgbench  | active | 00:31:38.215378 | INSERT INTO 
pgbench_history (tid, bid, aid, delta, mtime) VALUES (280, 95, 3954814, 
2091, CURRENT_TIMESTAMP);
 bench   | pgbench  | active | 00:31:35.991056 | INSERT INTO 
pgbench_history (tid, bid, aid, delta, mtime) VALUES (447, 33, 8355237, 
3438, CURRENT_TIMESTAMP);
 bench   | pgbench  | active | 00:31:35.619798 | INSERT INTO 
pgbench_history (tid, bid, aid, delta, mtime) VALUES (134, 16, 4839994, 
-2443, CURRENT_TIMESTAMP);
 bench   | pgbench  | active | 00:31:35.544196 | INSERT INTO 
pgbench_history (tid, bid, aid, delta, mtime) VALUES (37, 73, 9620119, 
4053, CURRENT_TIMESTAMP);
 bench   | pgbench  | active | 00:31:35.334504 | UPDATE 
pgbench_branches SET bbalance = bbalance + -2954 WHERE bid = 33;
 bench   | pgbench  | active | 00:31:35.234112 | UPDATE 
pgbench_branches SET bbalance = bbalance + -713 WHERE bid = 38;
 bench   | pgbench  | active | 00:31:34.434676 | UPDATE 
pgbench_branches SET bbalance = bbalance + -921 WHERE bid = 33;
 bench   | psql | active | 00:00:00| SELECT 
datname,application_name,state,now()-xact_start AS wait,query FROM 
pg_stat_activity ORDER BY wait DESC;

(10 rows)

but no errors in the logs, any thoughts?

Cheers

Mark







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-08-23 Thread Gavin Flower

On 24/08/16 12:02, neha khatri wrote:

>Andres Freund > writes:
>> On 2016-08-22 13:54:43 -0400, Robert Haas wrote:
>> On Sat, Aug 20, 2016 at 11:17 AM, Tom Lane > wrote:

 I'm inclined to suggest you forget this approach and propose a single
 counter for "SQL commands executed", which avoids all of the above
 definitional problems.  People who need more detail than that are
 probably best advised to look to contrib/pg_stat_statements, anyway.

>>> I disagree.  I think SQL commands executed, lumping absolutely
>>> everything together, really isn't much use.

>> I'm inclined to agree. I think that's a quite useful stat when looking
>> at an installation one previously didn't have a lot of interaction 
with.


>Well, let's at least have an "other" category so you can add up the
>counters and get a meaningful total.

How would that meaningful total might help a user. What can a user 
might analyse with the counter in 'other' category.



Neha

The user could then judge if there were a significant number of examples 
not covered in the other categories - this may, or may not, be a 
problem; depending on the use case.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-23 Thread Michael Paquier
On Wed, Aug 24, 2016 at 5:34 AM, Martín Marqués  wrote:
> Hi,
>
> 2016-08-23 16:46 GMT-03:00 Martín Marqués :
>>
>> I will add tests for sequence and functions as you mention and test again.
>>
>> Then I'll check if other tests should be added as well.
>
> I found quite some other objects we should be checking as well, but
> this will add some duplication to the tests, as I'd just copy (with
> minor changes) what's in src/bin/pg_dump/t/002_pg_dump.pl
>
> I can't think of a way to avoid this duplication, not that it really
> hurts. We would have to make sure that any new objects added to one
> test, if needed, are added to the other (that's a bit cumbersome).
>
> Other things to check:
>
> CREATE AGGREGATE
> CREATE DOMAIN
> CREATE FUNCTION
> CREATE TYPE
> CREATE MATERIALIZED VIEW
> CREATE POLICY
>
> Maybe even CREATE INDEX over a table created in the schema.
>
> Also, ACLs have to be tested against objects in the schema.
>
> I hope I didn't miss anything there.

I'll take a look at that soon and review your patch as well as your
tests. For now I have added an entry in the CF app to not lose track
of it:
https://commitfest.postgresql.org/10/736/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-08-23 Thread neha khatri
>Andres Freund  writes:
>> On 2016-08-22 13:54:43 -0400, Robert Haas wrote:
>> On Sat, Aug 20, 2016 at 11:17 AM, Tom Lane  wrote:
 I'm inclined to suggest you forget this approach and propose a single
 counter for "SQL commands executed", which avoids all of the above
 definitional problems.  People who need more detail than that are
 probably best advised to look to contrib/pg_stat_statements, anyway.

>>> I disagree.  I think SQL commands executed, lumping absolutely
>>> everything together, really isn't much use.

>> I'm inclined to agree. I think that's a quite useful stat when looking
>> at an installation one previously didn't have a lot of interaction with.

>Well, let's at least have an "other" category so you can add up the
>counters and get a meaningful total.

How would that meaningful total might help a user. What can a user might
analyse with the counter in 'other' category.


Neha


[HACKERS] Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Greg Stark
On Tue, Aug 23, 2016 at 4:15 PM, Aleksander Alekseev
 wrote:
> Frankly I have much more faith in Tom's idea of using virtual part of the
> catalog for all temporary tables, i.e turning all temporary tables into
> "fast" temporary tables. Instead of introducing a new type of temporary tables
> that solve catalog bloating problem and forcing users to rewrite applications
> why not just not to create a problem in a first place?

Would applications really need to be rewritten? Are they really
constructing temporary tables where the definition of the table is
dynamic, not just the content? I think application authors would be
pretty happy to not need to keep recreating the same tables over and
over again and dealing with DDL in their run-time code. It's not
really rewriting an application to just remove that DDL and move it to
the one-time database schema creation.

I think it's clear we got the idea of temporary tables wrong when we
implemented them and the SQL standard is more useful. It's not just
some implementation artifact that it's possible to implement them in
an efficient way. It's a fundamental design change and experience
shows that separating DDL and making it static while the DML is
dynamic is just a better design.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Claudio Freire
On Tue, Aug 23, 2016 at 7:25 PM, Tomas Vondra
 wrote:
>>> Could someone please explain how the unlogged tables are supposed to fix
>>> the
>>> catalog bloat problem, as stated in the initial patch submission? We'd
>>> still
>>> need to insert/delete the catalog rows when creating/dropping the
>>> temporary
>>> tables, causing the bloat. Or is there something I'm missing?
>>
>>
>> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?
>>
>> Perhaps reserving a worker or N to run only on catalog schemas?
>>
>> That'd be far simpler.
>
>
> Maybe, although IIRC the issues with catalog bloat were due to a combination
> of long queries and many temporary tables being created/dropped. In that
> case simply ramping up autovacuum (or even having a dedicated workers for
> catalogs) would not realy help due to the xmin horizon being blocked by the
> long-running queries.
>
> Maybe it's entirely crazy idea due to the wine I drank at the dinner, but
> couldn't we vacuum the temporary table records differently? For example,
> couldn't we just consider them removable as soon as the backend that owns
> them disappears?

Or perhaps go all the way and generalize that to rows that never
become visible outside their parent transaction.

As in, delete of rows created by the deleting transaction could clean
up, carefully to avoid voiding indexes and all that, but more
aggressively than regular deletes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Andres Freund
On 2016-08-23 19:33:33 -0300, Claudio Freire wrote:
> On Tue, Aug 23, 2016 at 7:20 PM, Andres Freund  wrote:
> >> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?
> >
> > Not really in my experience, at least not without more drastic vacuum
> > changes. The issue is that if you have a single "long running"
> > transaction (in some workloads that can even just be a 3 min taking
> > query/xact), nothing will be cleaned up during that time. If you have a
> > few hundred temp tables created per sec, you'll be in trouble even
> > then. Not to speak of the case where you have queries taking hours (say
> > a backup).
> 
> Well, my experience isn't as extreme as that (just a few dozen temp
> tables per minute), but when I see bloat in catalog tables it's
> because all autovacuum workers are stuck vacuuming huge tables for
> huge periods of time (hours or days).

Well, that's because our defaults are batshit stupid.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Claudio Freire
On Tue, Aug 23, 2016 at 7:20 PM, Andres Freund  wrote:
>> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?
>
> Not really in my experience, at least not without more drastic vacuum
> changes. The issue is that if you have a single "long running"
> transaction (in some workloads that can even just be a 3 min taking
> query/xact), nothing will be cleaned up during that time. If you have a
> few hundred temp tables created per sec, you'll be in trouble even
> then. Not to speak of the case where you have queries taking hours (say
> a backup).

Well, my experience isn't as extreme as that (just a few dozen temp
tables per minute), but when I see bloat in catalog tables it's
because all autovacuum workers are stuck vacuuming huge tables for
huge periods of time (hours or days).

So that's certainly another bloat case to consider.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Tomas Vondra



On 08/24/2016 12:18 AM, Claudio Freire wrote:

On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
 wrote:

On 08/22/2016 10:32 PM, Robert Haas wrote:



...

1. The number of tables for which we would need to add a duplicate,
unlogged table is formidable.  You need pg_attribute, pg_attrdef,
pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
And the backend changes needed so that we used the unlogged copy for
temp tables and the permanent copy for regular tables is probably
really large.

2. You can't write to unlogged tables on standby servers, so this
doesn't help solve the problem of wanting to use temporary tables on
standbys.

3. While it makes creating temporary tables a lighter-weight
operation, because you no longer need to write WAL for the catalog
entries, there's probably still substantially more overhead than just
stuffing them in backend-local RAM.  So the performance benefits are
probably fairly modest.

Overall I feel like the development effort that it would take to make
this work would almost certainly be better-expended elsewhere.  But of
course I'm not in charge of how people who work for other companies
spend their time...



Could someone please explain how the unlogged tables are supposed to fix the
catalog bloat problem, as stated in the initial patch submission? We'd still
need to insert/delete the catalog rows when creating/dropping the temporary
tables, causing the bloat. Or is there something I'm missing?


Wouldn't more aggressive vacuuming of catalog tables fix the bloat?

Perhaps reserving a worker or N to run only on catalog schemas?

That'd be far simpler.


Maybe, although IIRC the issues with catalog bloat were due to a 
combination of long queries and many temporary tables being 
created/dropped. In that case simply ramping up autovacuum (or even 
having a dedicated workers for catalogs) would not realy help due to the 
xmin horizon being blocked by the long-running queries.


Maybe it's entirely crazy idea due to the wine I drank at the dinner, 
but couldn't we vacuum the temporary table records differently? For 
example, couldn't we just consider them removable as soon as the backend 
that owns them disappears?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Andres Freund
On 2016-08-23 19:18:04 -0300, Claudio Freire wrote:
> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
>  wrote:
> > Could someone please explain how the unlogged tables are supposed to fix the
> > catalog bloat problem, as stated in the initial patch submission? We'd still
> > need to insert/delete the catalog rows when creating/dropping the temporary
> > tables, causing the bloat. Or is there something I'm missing?

Beats me.


> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?

Not really in my experience, at least not without more drastic vacuum
changes. The issue is that if you have a single "long running"
transaction (in some workloads that can even just be a 3 min taking
query/xact), nothing will be cleaned up during that time. If you have a
few hundred temp tables created per sec, you'll be in trouble even
then. Not to speak of the case where you have queries taking hours (say
a backup).

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Claudio Freire
On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
 wrote:
> On 08/22/2016 10:32 PM, Robert Haas wrote:
>>
>>
>> ...
>>
>> 1. The number of tables for which we would need to add a duplicate,
>> unlogged table is formidable.  You need pg_attribute, pg_attrdef,
>> pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
>> And the backend changes needed so that we used the unlogged copy for
>> temp tables and the permanent copy for regular tables is probably
>> really large.
>>
>> 2. You can't write to unlogged tables on standby servers, so this
>> doesn't help solve the problem of wanting to use temporary tables on
>> standbys.
>>
>> 3. While it makes creating temporary tables a lighter-weight
>> operation, because you no longer need to write WAL for the catalog
>> entries, there's probably still substantially more overhead than just
>> stuffing them in backend-local RAM.  So the performance benefits are
>> probably fairly modest.
>>
>> Overall I feel like the development effort that it would take to make
>> this work would almost certainly be better-expended elsewhere.  But of
>> course I'm not in charge of how people who work for other companies
>> spend their time...
>>
>
> Could someone please explain how the unlogged tables are supposed to fix the
> catalog bloat problem, as stated in the initial patch submission? We'd still
> need to insert/delete the catalog rows when creating/dropping the temporary
> tables, causing the bloat. Or is there something I'm missing?

Wouldn't more aggressive vacuuming of catalog tables fix the bloat?

Perhaps reserving a worker or N to run only on catalog schemas?

That'd be far simpler.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Tomas Vondra



On 08/22/2016 10:32 PM, Robert Haas wrote:


...

1. The number of tables for which we would need to add a duplicate,
unlogged table is formidable.  You need pg_attribute, pg_attrdef,
pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
And the backend changes needed so that we used the unlogged copy for
temp tables and the permanent copy for regular tables is probably
really large.

2. You can't write to unlogged tables on standby servers, so this
doesn't help solve the problem of wanting to use temporary tables on
standbys.

3. While it makes creating temporary tables a lighter-weight
operation, because you no longer need to write WAL for the catalog
entries, there's probably still substantially more overhead than just
stuffing them in backend-local RAM.  So the performance benefits are
probably fairly modest.

Overall I feel like the development effort that it would take to make
this work would almost certainly be better-expended elsewhere.  But of
course I'm not in charge of how people who work for other companies
spend their time...



Could someone please explain how the unlogged tables are supposed to fix 
the catalog bloat problem, as stated in the initial patch submission? 
We'd still need to insert/delete the catalog rows when creating/dropping 
the temporary tables, causing the bloat. Or is there something I'm missing?



--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2016-08-23 Thread Michael Paquier
On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila  wrote:
> On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
>  wrote:
>> On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada  
>> wrote:
>>> As for PoC, I implemented parallel vacuum so that each worker
>>> processes both 1 and 2 phases for particular block range.
>>> Suppose we vacuum 1000 blocks table with 4 workers, each worker
>>> processes 250 consecutive blocks in phase 1 and then reclaims dead
>>> tuples from heap and indexes (phase 2).
>>
>> So each worker is assigned a range of blocks, and processes them in
>> parallel? This does not sound performance-wise. I recall Robert and
>> Amit emails on the matter for sequential scan that this would suck
>> performance out particularly for rotating disks.
>>
>
> The implementation in patch is same as we have initially thought for
> sequential scan, but turned out that it is not good way to do because
> it can lead to inappropriate balance of work among workers.  Suppose
> one worker is able to finish it's work, it won't be able to do more.

Ah, so it was the reason. Thanks for confirming my doubts on what is proposed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-23 Thread Jim Nasby

On 8/23/16 3:34 PM, Martín Marqués wrote:

I found quite some other objects we should be checking as well, but
this will add some duplication to the tests, as I'd just copy (with
minor changes) what's in src/bin/pg_dump/t/002_pg_dump.pl

I can't think of a way to avoid this duplication, not that it really
hurts. We would have to make sure that any new objects added to one
test, if needed, are added to the other (that's a bit cumbersome).


At one point I had some code that understood what object names (ie: 
AGGREGATE, TABLE, etc) went with what catalog tables, what ones lived 
inside a schema (as opposed to globally), and which ones were 
shared/global (cross-database). I think I needed this for some automatic 
handling of comments, but it's been a while. Maybe something like that 
would help reduce the duplication...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dump/restore doesn't preserve row ordering?

2016-08-23 Thread Andres Freund
On 2016-08-23 17:22:03 -0400, Tom Lane wrote:
> I happened to notice, while experimenting with the data set used
> in the SPGIST-for-inet thread, that loading the supplied pg_dump
> script and immediately dumping it does not reproduce the row order
> appearing in the original dump script.  I thought maybe this had
> something to do with the heap_multi_insert() optimization that
> COPY uses nowadays, but disabling that didn't change it.  Further
> experimentation says it's been like that since 8.4; 8.3 is the
> last version that reproduces the source row order in this test.

That's
http://archives.postgresql.org/message-id/CAE2gYzxv8YKEd4O%2B9HUYuQ%3DQMH4pwt9n9cmU-OchV-%3DN8Q7yXQ%40mail.gmail.com
?
> I can't immediately think of a reason for this.  In everyday
> updates you could theorize about effects like autovacuum
> asynchonously updating the FSM, but surely the FSM should have no
> impact on where COPY puts stuff when loading into an empty table.

It seems possible that a larger row didn't fit on a page anymore, then
later when a new page was is needed for a smaller row, the earlier page
is found again.  Due to RelationGetBufferForTuple() updating the fsm
when an old target buffer is present:
/*
 * Update FSM as to condition of this page, and ask for another 
page
 * to try.
 */
targetBlock = RecordAndGetPageWithFreeSpace(relation,

targetBlock,

pageFreeSpace,

len + saveFreeSpace);
that looks like it's even possible without a concurrent autovacuum.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Remove superuser() checks from pgstattuple

2016-08-23 Thread Stephen Frost
Greetings,

Attached is an rebased and updated patch to remove the explicit
superuser() checks from the contrib extension pgstattuple, in favor of
using the GRANT system to control access, and give the admin flexibility
to GRANT access to this function without having to write wrapper
functions, similar to what was been done with the backend functions.

This, naturally, REVOKE's EXECUTE from public on installation for these
functions.

Prior discussion was on this thread:

20160408214511.gq10...@tamriel.snowman.net

but it seemed prudent to open a new thread, to avoid anyone missing that
this isn't about default roles (which was the subject of that thread).

Thanks!

Stephen
From b4f04c81c88c126c470a7afc74ea9e4f860be412 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Tue, 23 Aug 2016 17:13:09 -0400
Subject: [PATCH] Remove superuser checks in pgstattuple

Now that we track initial privileges on extension objects and changes to
those permissions, we can drop the superuser() checks from the various
functions which are part of the pgstattuple extension.

Since a pg_upgrade will preserve the version of the extension which
existed prior to the upgrade, we can't simply modify the existing
functions but instead need to create new functions which remove the
checks and update the SQL-level functions to use the new functions
(and to REVOKE EXECUTE rights on those functions from PUBLIC).

Approach suggested by Noah.
---
 contrib/pgstattuple/Makefile  |   7 +-
 contrib/pgstattuple/pgstatapprox.c|  35 ++--
 contrib/pgstattuple/pgstatindex.c | 108 +++--
 contrib/pgstattuple/pgstattuple--1.4--1.5.sql | 111 ++
 contrib/pgstattuple/pgstattuple--1.4.sql  |  95 --
 contrib/pgstattuple/pgstattuple--1.5.sql  | 111 ++
 contrib/pgstattuple/pgstattuple.c |  36 +
 contrib/pgstattuple/pgstattuple.control   |   2 +-
 8 files changed, 395 insertions(+), 110 deletions(-)
 create mode 100644 contrib/pgstattuple/pgstattuple--1.4--1.5.sql
 delete mode 100644 contrib/pgstattuple/pgstattuple--1.4.sql
 create mode 100644 contrib/pgstattuple/pgstattuple--1.5.sql

diff --git a/contrib/pgstattuple/Makefile b/contrib/pgstattuple/Makefile
index e732680..dc5ddba 100644
--- a/contrib/pgstattuple/Makefile
+++ b/contrib/pgstattuple/Makefile
@@ -4,9 +4,10 @@ MODULE_big	= pgstattuple
 OBJS		= pgstattuple.o pgstatindex.o pgstatapprox.o $(WIN32RES)
 
 EXTENSION = pgstattuple
-DATA = pgstattuple--1.4.sql pgstattuple--1.3--1.4.sql \
-	pgstattuple--1.2--1.3.sql pgstattuple--1.1--1.2.sql \
-	pgstattuple--1.0--1.1.sql pgstattuple--unpackaged--1.0.sql
+DATA = pgstattuple--1.5.sql pgstattuple--1.4--1.5.sql \
+	pgstattuple--1.3--1.4.sql pgstattuple--1.2--1.3.sql \
+	pgstattuple--1.1--1.2.sql pgstattuple--1.0--1.1.sql \
+	pgstattuple--unpackaged--1.0.sql
 PGFILEDESC = "pgstattuple - tuple-level statistics"
 
 REGRESS = pgstattuple
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index a49ff54..11bae14 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -29,6 +29,9 @@
 #include "commands/vacuum.h"
 
 PG_FUNCTION_INFO_V1(pgstattuple_approx);
+PG_FUNCTION_INFO_V1(pgstattuple_approx_v1_5);
+
+Datum pgstattuple_approx_internal(Oid relid, FunctionCallInfo fcinfo);
 
 typedef struct output_type
 {
@@ -209,6 +212,33 @@ Datum
 pgstattuple_approx(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
+
+	if (!superuser())
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to use pgstattuple functions";
+
+	PG_RETURN_DATUM(pgstattuple_approx_internal(relid, fcinfo));
+}
+
+/*
+ * As of pgstattuple version 1.5, we no longer need to check if the user
+ * is a superuser because we REVOKE EXECUTE on the function from PUBLIC.
+ * Users can then grant access to it based on their policies.
+ *
+ * Otherwise identical to pgstattuple_approx (above).
+ */
+Datum
+pgstattuple_approx_v1_5(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+
+	PG_RETURN_DATUM(pgstattuple_approx_internal(relid, fcinfo));
+}
+
+Datum
+pgstattuple_approx_internal(Oid relid, FunctionCallInfo fcinfo)
+{
 	Relation	rel;
 	output_type stat = {0};
 	TupleDesc	tupdesc;
@@ -217,11 +247,6 @@ pgstattuple_approx(PG_FUNCTION_ARGS)
 	HeapTuple	ret;
 	int			i = 0;
 
-	if (!superuser())
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- (errmsg("must be superuser to use pgstattuple functions";
-
 	if (get_call_result_type(fcinfo, NULL, ) != TYPEFUNC_COMPOSITE)
 		elog(ERROR, "return type must be a row type");
 
diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index 6084589..753e52d 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -54,6 +54,14 @@ PG_FUNCTION_INFO_V1(pg_relpages);
 

[HACKERS] dump/restore doesn't preserve row ordering?

2016-08-23 Thread Tom Lane
I happened to notice, while experimenting with the data set used
in the SPGIST-for-inet thread, that loading the supplied pg_dump
script and immediately dumping it does not reproduce the row order
appearing in the original dump script.  I thought maybe this had
something to do with the heap_multi_insert() optimization that
COPY uses nowadays, but disabling that didn't change it.  Further
experimentation says it's been like that since 8.4; 8.3 is the
last version that reproduces the source row order in this test.

I can't immediately think of a reason for this.  In everyday
updates you could theorize about effects like autovacuum
asynchonously updating the FSM, but surely the FSM should have no
impact on where COPY puts stuff when loading into an empty table.

I know we've had complaints in the past about dump/reload failing
to preserve row order; that's the reason why pg_dump takes care
to turn off synchronize_seqscans.  And it seems bad from a testing
standpoint too.  So this bothers me.  But I have no idea what's
causing it.  Does this ring any bells for anyone?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-23 Thread Jeff Janes
Hi Amit,

Thanks for working on this.

When building with --enable-cassert, I get compiler warning:

hash.c: In function 'hashbucketcleanup':
hash.c:722: warning: 'new_bucket' may be used uninitialized in this function


After an intentionally created crash, I get an Assert triggering:

TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] &
(1<<((bitmapbit)%32", File: "hashovfl.c", Line: 553)

freep[0] is zero and bitmapbit is 16.

With this backtrace:

(gdb) bt
#0  0x003838c325e5 in raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x003838c33dc5 in abort () at abort.c:92
#2  0x0081a8fd in ExceptionalCondition (conditionName=, errorType=, fileName=,
lineNumber=) at assert.c:54
#3  0x004a4199 in _hash_freeovflpage (rel=0x7f3f745d86b8,
bucketbuf=198, ovflbuf=199, wbuf=198, itups=0x7ffc258fa090,
itup_offsets=0x126e8a8,
tups_size=0x7ffc258f93d0, nitups=70, bstrategy=0x12ba320) at
hashovfl.c:553
#4  0x004a4c32 in _hash_squeezebucket (rel=,
bucket=38, bucket_blkno=56, bucket_buf=198, bstrategy=0x12ba320)
at hashovfl.c:1010
#5  0x004a042a in hashbucketcleanup (rel=0x7f3f745d86b8,
bucket_buf=198, bucket_blkno=56, bstrategy=0x12ba320, maxbucket=96,
highmask=127,
lowmask=63, tuples_removed=0x7ffc258fc1c8,
num_index_tuples=0x7ffc258fc1c0, bucket_has_garbage=0 '\000', delay=1
'\001',
callback=0x5e9bd0 , callback_state=0x126e248) at
hash.c:937
#6  0x004a07e7 in hashbulkdelete (info=0x7ffc258fc2b0, stats=0x0,
callback=0x5e9bd0 , callback_state=0x126e248) at hash.c:580
#7  0x005e98c5 in lazy_vacuum_index (indrel=0x7f3f745d86b8,
stats=0x126ecc0, vacrelstats=0x126e248) at vacuumlazy.c:1599
#8  0x005ea7f9 in lazy_scan_heap (onerel=,
options=, params=0x12ba290, bstrategy=)
at vacuumlazy.c:1291
#9  lazy_vacuum_rel (onerel=, options=, params=0x12ba290, bstrategy=) at vacuumlazy.c:255
#10 0x005e8939 in vacuum_rel (relid=17329, relation=0x7ffc258fcbd0,
options=99, params=0x12ba290) at vacuum.c:1399
#11 0x005e8d01 in vacuum (options=99, relation=0x7ffc258fcbd0,
relid=, params=0x12ba290, va_cols=0x0,
bstrategy=, isTopLevel=1 '\001') at vacuum.c:307
#12 0x006a07f1 in autovacuum_do_vac_analyze () at autovacuum.c:2823
#13 do_autovacuum () at autovacuum.c:2341
#14 0x006a0f9c in AutoVacWorkerMain (argc=,
argv=) at autovacuum.c:1656
#15 0x006a1116 in StartAutoVacWorker () at autovacuum.c:1461
#16 0x006afb00 in StartAutovacuumWorker (postgres_signal_arg=) at postmaster.c:5323
#17 sigusr1_handler (postgres_signal_arg=) at
postmaster.c:5009
#18 
#19 0x003838ce1503 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:82
#20 0x006b0ec0 in ServerLoop (argc=,
argv=) at postmaster.c:1657
#21 PostmasterMain (argc=, argv=)
at postmaster.c:1301
#22 0x00632e88 in main (argc=4, argv=0x11f4d50) at main.c:228

Cheers,

Jeff


Re: [HACKERS] [RFC] Change the default of update_process_title to off

2016-08-23 Thread Bruce Momjian
On Tue, Aug 23, 2016 at 01:58:02PM -0700, Peter Geoghegan wrote:
> On Tue, Aug 23, 2016 at 1:44 PM, Bruce Momjian  wrote:
> >> [Windows]
> >> #clients  onoff
> >> 12 29793  38169
> >> 24 31587 87237
> >> 48 32588 83335
> >> 96 34261  67668
> >
> > This ranges from a 28% to a 97% speed improvement on Windows!  Those are
> > not typos!  This is a game-changer for use of Postgres on Windows for
> > certain workloads!
> 
> While I don't care all that much about performance on windows, it is a
> little sad that it took this long to fix something so simple. Consider
> this exchange, as a further example of our lack of concern here:

Agreed, but we really can only react to the research we are given.  If
no Windows users are digging in to find problems, those problems will
not be found.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Change the default of update_process_title to off

2016-08-23 Thread Peter Geoghegan
On Tue, Aug 23, 2016 at 1:44 PM, Bruce Momjian  wrote:
>> [Windows]
>> #clients  onoff
>> 12 29793  38169
>> 24 31587 87237
>> 48 32588 83335
>> 96 34261  67668
>
> This ranges from a 28% to a 97% speed improvement on Windows!  Those are
> not typos!  This is a game-changer for use of Postgres on Windows for
> certain workloads!

While I don't care all that much about performance on windows, it is a
little sad that it took this long to fix something so simple. Consider
this exchange, as a further example of our lack of concern here:

https://www.postgresql.org/message-id/30619.1428157...@sss.pgh.pa.us

ISTM that we don't even care about Windows performance to a minimal
degree. Hopefully, the ICU stuff Peter Eisentraut is working on will
level the playing field here a little bit, if only as an accidental
side-effect.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Change the default of update_process_title to off

2016-08-23 Thread Bruce Momjian
On Tue, Aug 16, 2016 at 11:53:25AM +0200, Magnus Hagander wrote:
> On Fri, Aug 5, 2016 at 12:25 PM, Tsunakawa, Takayuki <
> tsunakawa.ta...@jp.fujitsu.com> wrote:
> 
> > From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> > Yeah, I think I agree.  It would be bad to disable it by default on 
> Unix,
> > because ps(1) is a very standard tool there, but the same argument
> doesn't
> > hold for Windows.
> 
> It seems that we could reach a consensus.  The patch is attached.  I'll 
> add
> this to the next CommitFest.
>
> > Another route to a solution would be to find a cheaper way to update the
> > process title on Windows ... has anyone looked for alternatives?
> 
> I couldn't find an alternative solution after asking some Windows support
> staff.
> 
> 
> FWIW, I remember this from back in the days when this was written. We kind of
> expected this would be slow already back then, but couldn't find a better way
> at the time either. And back then, I guess there were just enough *other*
> things that were slow with pg-on-windows that it didn't become as obvious.
> 
> So - +1 for changing the defaults as suggested.

I am shocked at how much a speedup we get by turning off the process
title on Windows:

> From Tsunakawa, Takayuki
> 
> C:\> pgbench -h  -T 30 -c #clients -j 12 -S benchdb
> 
> [Windows]
> #clients  onoff
> 12 29793  38169
> 24 31587  87237
> 48 32588  83335
> 96 34261  67668

This ranges from a 28% to a 97% speed improvement on Windows!  Those are
not typos!  This is a game-changer for use of Postgres on Windows for
certain workloads!

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index Onlys Scan for expressions

2016-08-23 Thread Vladimir Sitnikov
Hi,

I've tried your indexonlypatch5.patch against REL9_6_BETA3.
Here are some results.

TL;DR:
1) <> does not support index-only
scan for index (type, upper(vc) varchar_pattern_ops).
3) <<(... where type=42 offset 0) where upper_vc like '%ABC%'>> does
trigger index-only scan. IOS reduces number of buffers from 977 to 17 and
that is impressive.

Can IOS be used for simple query like #1 as well?

Here are the details.

drop table vlsi;
create table vlsi(type numeric, vc varchar(500));
insert into vlsi(type,vc) select round(x/1000),
md5('||x)||md5('||x+1)||md5(''||x+2) from generate_series(1,100) as
s(x);
create index type_vc__vlsi on vlsi(type, upper(vc) varchar_pattern_ops);
vacuum analyze vlsi;

0) Smoke test (index only scan works when selecting indexed expression):

explain (analyze, buffers) select type, upper(vc) from vlsi where type=42;

 Index Only Scan using type_vc__vlsi on vlsi  (cost=0.55..67.97 rows=971
width=36) (actual time=0.012..0.212 rows=1000 loops=1)
   Index Cond: (type = '42'::numeric)
   Heap Fetches: 0
   Buffers: shared hit=17
 Planning time: 0.112 ms
 Execution time: 0.272 ms

1) When trying to apply "like condition", index only scan does not work.
Note: "buffers hit" becomes 977 instead of 17.

explain (analyze, buffers) select type, upper(vc) from vlsi where type=42
and upper(vc) like '%ABC%';

 Index Scan using type_vc__vlsi on vlsi  (cost=0.55..1715.13 rows=20
width=36) (actual time=0.069..1.343 rows=23 loops=1)
   Index Cond: (type = '42'::numeric)
   Filter: (upper((vc)::text) ~~ '%ABC%'::text)
   Rows Removed by Filter: 977
   Buffers: shared hit=939
 Planning time: 0.104 ms
 Execution time: 1.358 ms

Mere "subquery" does not help: still no index-only scan

2) explain (analyze, buffers) select * from (select type, upper(vc)
upper_vc from vlsi where type=42) as x where upper_vc like '%ABC%';

 Index Scan using type_vc__vlsi on vlsi  (cost=0.55..1715.13 rows=20
width=36) (actual time=0.068..1.344 rows=23 loops=1)
   Index Cond: (type = '42'::numeric)
   Filter: (upper((vc)::text) ~~ '%ABC%'::text)
   Rows Removed by Filter: 977
   Buffers: shared hit=939
 Planning time: 0.114 ms
 Execution time: 1.357 ms

3) "offset 0" trick does help:
explain (analyze, buffers) select * from (select type, upper(vc) upper_vc
from vlsi where type=42 offset 0) as x where upper_vc like '%ABC%';

 Subquery Scan on x  (cost=0.55..80.11 rows=39 width=36) (actual
time=0.033..0.488 rows=23 loops=1)
   Filter: (x.upper_vc ~~ '%ABC%'::text)
   Rows Removed by Filter: 977
   Buffers: shared hit=17
   ->  Index Only Scan using type_vc__vlsi on vlsi  (cost=0.55..67.97
rows=971 width=36) (actual time=0.015..0.210 rows=1000 loops=1)
 Index Cond: (type = '42'::numeric)
 Heap Fetches: 0
 Buffers: shared hit=17
 Planning time: 0.086 ms
 Execution time: 0.503 ms

Vladimir


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-23 Thread Martín Marqués
Hi,

2016-08-23 16:46 GMT-03:00 Martín Marqués :
>
> I will add tests for sequence and functions as you mention and test again.
>
> Then I'll check if other tests should be added as well.

I found quite some other objects we should be checking as well, but
this will add some duplication to the tests, as I'd just copy (with
minor changes) what's in src/bin/pg_dump/t/002_pg_dump.pl

I can't think of a way to avoid this duplication, not that it really
hurts. We would have to make sure that any new objects added to one
test, if needed, are added to the other (that's a bit cumbersome).

Other things to check:

CREATE AGGREGATE
CREATE DOMAIN
CREATE FUNCTION
CREATE TYPE
CREATE MATERIALIZED VIEW
CREATE POLICY

Maybe even CREATE INDEX over a table created in the schema.

Also, ACLs have to be tested against objects in the schema.

I hope I didn't miss anything there.

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-23 Thread Martín Marqués
Hi Michael,

2016-08-23 5:02 GMT-03:00 Michael Paquier :
> On Sat, Aug 13, 2016 at 6:58 AM, Martín Marqués  
> wrote:
>> I believe the fix will be simple after the back and forth mails with
>> Michael, Stephen and Tom. I will work on that later, but preferred to
>> have the tests the show the problem which will also make testing the fix
>> easier.
>>
>> Thoughts?
>
> It seems to me that we are going to need a bit more coverage for more
> object types depending on the code paths that are being changed. For
> example, sequences or functions should be checked for as well, and not
> only tables. By the way, do you need help to build a patch or should I
> jump in?

I wanted to test what I had in mind with one object, and then see if
any replication is needed for other objects.

I was struggling the last days as what I was reading in my patched
pg_dump.c had to work as expected, and dump the tables not created by
the test_pg_dump extension but inside the schema
regress_pg_dump_schema.

Today I decided to go over the test I wrote, and found a bug there,
reason why I couldn't get a successful make check.

Here go 2 patches. One is a fix for the test I sent earlier. The other
is the proposed idea Tom had using the dump_contains that Stephan
committed on 9.6.

So far I've checked that it fixes the dumpable for Tables, but I think
it should work for all other objects as well, as all this patch does
is leave execution of checkExtensionMembership at the end of
selectDumpableNamespace, leaving the dump_contains untouched.

Checks pass ok.

I will add tests for sequence and functions as you mention and test again.

Then I'll check if other tests should be added as well.

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From 6cbba9e87d1733ccb03d3d705e135e607be63cba Mon Sep 17 00:00:00 2001
From: Martin 
Date: Tue, 23 Aug 2016 16:17:38 -0300
Subject: [PATCH 1/2] New tests for pg_dump which make sure that tables from a
 schema depending on an extension will get dumped when they don't depend on
 the extension.

The only objects we shouldn't dump are the ones created by the extension.

We will provide a patch that fixes this bug.
---
 src/test/modules/test_pg_dump/t/001_base.pl| 25 ++
 .../modules/test_pg_dump/test_pg_dump--1.0.sql |  9 
 2 files changed, 34 insertions(+)

diff --git a/src/test/modules/test_pg_dump/t/001_base.pl b/src/test/modules/test_pg_dump/t/001_base.pl
index fb4f573..7bb92e7 100644
--- a/src/test/modules/test_pg_dump/t/001_base.pl
+++ b/src/test/modules/test_pg_dump/t/001_base.pl
@@ -283,6 +283,31 @@ my %tests = (
 			schema_only=> 1,
 			section_pre_data   => 1,
 			section_post_data  => 1, }, },
+	'CREATE TABLE regress_test_schema_table' => {
+		create_order => 3,
+		create_sql   => 'CREATE TABLE regress_pg_dump_schema.test_schema_table (
+		   col1 serial primary key,
+		   CHECK (col1 <= 1000)
+	   );',
+		regexp => qr/^
+			\QCREATE TABLE test_schema_table (\E
+			\n\s+\Qcol1 integer NOT NULL,\E
+			\n\s+\QCONSTRAINT test_schema_table_col1_check CHECK \E
+			\Q((col1 <= 1000))\E
+			\n\);/xm,
+		like => {
+			binary_upgrade => 1,
+			clean  => 1,
+			clean_if_exists=> 1,
+			createdb   => 1,
+			defaults   => 1,
+			no_privs   => 1,
+			no_owner   => 1,
+			schema_only=> 1,
+			section_pre_data   => 1, },
+		unlike => {
+			pg_dumpall_globals => 1,
+			section_post_data  => 1, }, },
 	'CREATE ACCESS METHOD regress_test_am' => {
 		regexp => qr/^
 			\QCREATE ACCESS METHOD regress_test_am TYPE INDEX HANDLER bthandler;\E
diff --git a/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql b/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql
index c2fe90d..3f88e6c 100644
--- a/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql
+++ b/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql
@@ -10,6 +10,15 @@ CREATE TABLE regress_pg_dump_table (
 
 CREATE SEQUENCE regress_pg_dump_seq;
 
+-- We want to test that schemas and objects created in the schema by the
+-- extension are not dumped, yet other objects created afterwards will be
+-- dumped.
+CREATE SCHEMA regress_pg_dump_schema
+   CREATE TABLE regress_pg_dump_schema_table (
+  col1 serial,
+  col2 int
+	);
+
 GRANT USAGE ON regress_pg_dump_seq TO regress_dump_test_role;
 
 GRANT SELECT ON regress_pg_dump_table TO regress_dump_test_role;
-- 
2.5.5

From 0dfe2224fae86ab2b6f8da4fc25b96fb13ca0f6c Mon Sep 17 00:00:00 2001
From: Martin 
Date: Tue, 23 Aug 2016 16:23:44 -0300
Subject: [PATCH 2/2] This patch fixes a bug reported against pg_dump, and
 makes pg_dump dump the objects contained in schemas depending on an
 extension.

Schema will not be dumped, as it will be created by the extension, but
any other tables created outside the extension's SQL 

Re: [HACKERS] Question about MVCC-unsafe commands

2016-08-23 Thread Antonin Houska
Robert Haas  wrote:
> On Tue, Aug 23, 2016 at 10:10 AM, Antonin Houska  wrote:

> CLUSTER preserves xmin/xmax when rewriting, but ALTER TABLE does not.

Indeed, CLUSTER was the command I picked for my experiments. I didn't expect
such a subtle difference. Thanks for answer.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST support for inet datatypes

2016-08-23 Thread Tom Lane
I've pushed this patch with mostly (not entirely) cosmetic adjustments.
I still think it'd be worth looking into why the produced index is larger
than the GiST equivalent, and for that matter exactly why the GiST
equivalent is so much slower to search.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2016-08-23 Thread Robert Haas
On Tue, Aug 23, 2016 at 12:59 PM, Craig Ringer  wrote:
> Also fine by me. You're right, keep it simple. It means the potential set of
> values isn't discoverable the same way, but ... meh. Using it usefully means
> reading the docs anyway.
>
> The remaining 2 patches of interest are attached - txid_status() and
> txid_convert_if_recent(). Thanks for committing txid_current_if_assigned().
>
> Now I'd best stop pretending I'm in a sensible timezone.

I reviewed this version some more and found some more problems.

+uint32 xid_epoch = (uint32)(xid_with_epoch >>32);
+TransactionId xid = (TransactionId)(xid_with_epoch);

I think this is not project style.  In particular, I think that the
first one needs a space after the cast and another space before the
32; and I think the second one has an unnecessary set of parentheses
and needs a space added.

+/*
+ * Underlying implementation of txid_status, which is mapped to an enum in
+ * system_views.sql.
+ */

Not any more...

+ errmsg("transaction ID "UINT64_FORMAT" is an invalid xid",
+xid_with_epoch)));

Spacing.

+if (TransactionIdIsCurrentTransactionId(xid) ||
TransactionIdIsInProgress(xid))
+status = gettext_noop("in progress");
+else if (TransactionIdDidCommit(xid))
+status = gettext_noop("committed");
+else if (TransactionIdDidAbort(xid))
+status = gettext_noop("aborted");
+else
+/* shouldn't happen */
+ereport(ERROR,
+(errmsg_internal("unable to determine commit status of
xid "UINT64_FORMAT, xid)));

Maybe I'm all wet here, but it seems like there might be a problem
here if the XID is older than the CLOG truncation point but less than
one epoch old. get_xid_in_recent_past only guarantees that the
transaction is less than one epoch old, not that we still have CLOG
data for it.  And there's nothing to keep NextXID from advancing under
us, so if somebody asks about a transaction that's just under 2^32
transactions old, then get_xid_in_recent_past could say it's valid,
then NextXID advances and we look up the XID extracted from the txid
and get the status of the new transaction instead of the old one!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-08-23 Thread Pavel Stehule
Hi

2016-08-19 10:58 GMT+02:00 Pavel Stehule :

> Hi
>
> I am sending implementation of xmltable function. The code should to have
> near to final quality and it is available for testing.
>
> I invite any help with documentation and testing.
>

new update - the work with nodes is much more correct now.

Regards

Pavel


> Regards
>
> Pavel
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 169a385..a6334b6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10099,6 +10099,47 @@ SELECT xmlroot(xmlparse(document 'abc'),
 

 
+   
+xmltable
+
+   
+xmltable
+   
+
+
+xmltable(xmlnamespaces(namespace uri AS namespace name|DEFAULT namespace uri , ...) rowexpr PASSING BY REF xml BY REF COLUMNS name type PATH columnexpr DEFAULT expr NOT NULL|NULL , ...)
+
+
+
+  The xmltable produces table based on passed XML value.
+
+
+
+
+
+   
+

 xmlagg
 
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 743e7d6..7abc367 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -189,6 +189,9 @@ static Datum ExecEvalCurrentOfExpr(ExprState *exprstate, ExprContext *econtext,
 static Datum ExecEvalGroupingFuncExpr(GroupingFuncExprState *gstate,
 		 ExprContext *econtext,
 		 bool *isNull, ExprDoneCond *isDone);
+static Datum ExecEvalTableExpr(TableExprState *tstate,
+		ExprContext *econtext,
+		bool *isnull, ExprDoneCond *isDone);
 
 
 /* 
@@ -4500,6 +4503,218 @@ ExecEvalCurrentOfExpr(ExprState *exprstate, ExprContext *econtext,
 	return 0;	/* keep compiler quiet */
 }
 
+/* 
+ *		ExecEvalTableExpr
+ *
+ * 
+ */
+
+#define XMLTABLE_DEFAULT_NAMESPACE			"pgdefxmlns"
+
+static Datum
+execEvalTableExpr(TableExprState *tstate,
+		ExprContext *econtext,
+		bool *isNull, ExprDoneCond *isDone)
+{
+	TupleDesc tupdesc;
+	HeapTuple		tuple;
+	HeapTupleHeader		result;
+	int	i;
+	Datum		*values;
+	bool		*nulls;
+	Datum	value;
+	bool	isnull;
+	xmltype		   *xmlval;
+	text		   *row_path;
+
+	tupdesc = tstate->tupdesc;
+
+	if (tstate->firstRow)
+	{
+		ListCell	*ns;
+
+		/* Evaluate expression first */
+		value = ExecEvalExpr(tstate->expr, econtext, , NULL);
+		if (isnull)
+		{
+			*isDone = ExprSingleResult;
+			*isNull = true;
+			return (Datum) 0;
+		}
+		xmlval = DatumGetXmlP(value);
+
+		value = ExecEvalExpr(tstate->row_path_expr, econtext, , NULL);
+		if (isnull)
+ereport(ERROR,
+		(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+	  errmsg("row query must not be null")));
+		row_path = DatumGetTextP(value);
+
+		Assert(tstate->xmltableCxt == NULL);
+		tstate->xmltableCxt = initXmlTableContext(xmlval,
+	tstate->used_dns ?
+		XMLTABLE_DEFAULT_NAMESPACE : NULL,
+	tstate->ncols,
+			  tstate->in_functions,
+			  tstate->typioparams,
+	econtext->ecxt_per_query_memory);
+
+		foreach(ns, tstate->namespaces)
+		{
+			Node	   *n = (Node *) lfirst(ns);
+			ExprState  *expr;
+			char	   *ns_name;
+			char	   *ns_uri;
+
+			if (IsA(n, NamedArgExpr))
+			{
+NamedArgExpr *na = (NamedArgExpr *) n;
+
+expr = (ExprState *) na->arg;
+ns_name = na->name;
+			}
+			else
+			{
+expr = (ExprState *) n;
+ns_name = NULL;
+			}
+
+			value = ExecEvalExpr((ExprState *) expr, econtext, , NULL);
+			if (isnull)
+ereport(ERROR,
+		(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+	  errmsg("namespace uri must not be null")));
+			ns_uri = text_to_cstring(DatumGetTextP(value));
+
+			XmlTableSetNs(tstate->xmltableCxt, ns_name, ns_uri);
+		}
+
+		XmlTableSetRowPath(tstate->xmltableCxt, row_path);
+
+		/* Path caclulation */
+		for (i = 0; i < tstate->ncols; i++)
+		{
+			char *col_path;
+
+			if (tstate->col_path_expr[i] != NULL)
+			{
+value = ExecEvalExpr(tstate->col_path_expr[i], econtext, , NULL);
+if (isnull)
+	ereport(ERROR,
+			(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+		  errmsg("column path for column \"%s\" must not be null",
+			  NameStr(tupdesc->attrs[i]->attname;
+col_path = text_to_cstring(DatumGetTextP(value));
+			}
+			else
+col_path = NameStr(tupdesc->attrs[i]->attname);
+
+			XmlTableSetColumnPath(tstate->xmltableCxt, i,
+			tupdesc->attrs[i]->atttypid, col_path);
+		}
+		tstate->firstRow = false;
+	}
+
+	values = tstate->values;
+	nulls = tstate->nulls;
+
+	if (XmlTableFetchRow(tstate->xmltableCxt))
+	{
+		if (tstate->ncols > 0)
+		{
+			for (i = 0; i < tstate->ncols; i++)
+			{
+if (i != tstate->for_ordinality_col - 1)
+{
+	values[i] = XmlTableGetValue(tstate->xmltableCxt, i,
+  tupdesc->attrs[i]->atttypid,
+  tupdesc->attrs[i]->atttypmod,
+			  );
+	if (isnull && 

Re: [HACKERS] Changed SRF in targetlist handling

2016-08-23 Thread Andres Freund
On 2016-08-17 17:41:28 -0700, Andres Freund wrote:
> a) Allow to avoid using a tuplestore for SRF_PERCALL SRFs in ROWS FROM -
>otherwise our performance would regress noticeably in some cases.

To demonstrate the problem:

master:
=# COPY (SELECT generate_series(1, 5000)) TO '/dev/null';
COPY 5000
Time: 6859.830 ms
=# COPY (SELECT * FROM generate_series(1, 5000)) TO '/dev/null';
COPY 5000
Time: 11314.507 ms

getting rid of the materialization indeed fixes the problem:

dev:
=# COPY (SELECT generate_series(1, 5000)) TO
'/dev/null';
COPY 5000
Time: 5757.547 ms

=# COPY (SELECT * FROM generate_series(1, 5000)) TO
'/dev/null';
COPY 5000
Time: 5842.524 ms

I've currently implemented this by having nodeFunctionscan.c store
enough state in FunctionScanPerFuncState to continue the ValuePerCall
protocol.  That all seems to work well, without big problems.

The open issue here is whether / how we want to deal with
EXEC_FLAG_REWIND and EXEC_FLAG_BACKWARD. Currently that, with some added
complications, is implemented in nodeFunctionscan.c itself. But for
ValuePerCall SRFs that doesn't directly work anymore.

ISTM that the easiest way here is actually to rip out support for
EXEC_FLAG_REWIND/EXEC_FLAG_BACKWARD from nodeFunctionscan.c. If the plan
requires that, the planner will slap a Material node on-top. Which will
even be more efficient when ROWS FROM for multiple SRFs, or WITH
ORDINALITY, are used.  Alternatively we can continue to create a
tuplestore for ValuePerCall when eflags indicates that's required. But
personally I don't see that as an advantageous course.

Comments?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dsm_unpin_segment

2016-08-23 Thread Robert Haas
On Mon, Aug 22, 2016 at 10:04 AM, Amit Kapila  wrote:
> On Mon, Aug 22, 2016 at 6:06 PM, Thomas Munro
>  wrote:
>> On Tue, Aug 23, 2016 at 12:07 AM, Amit Kapila  
>> wrote:
>>> + int control_slot = -1;
>>> ...
>>> + if (control_slot == -1)
>>> + elog(ERROR, "cannot unpin unknown segment handle");
>>>
>>> Isn't it better to use INVALID_CONTROL_SLOT for control_slot and use
>>> datatype as uint32 (same is used for dsm_segment->control_slot and
>>> nitems)?
>>
>> Yes, it is better.  New version attached.
>>
>
> This version of patch looks good to me.  I have marked it as Ready For
> Committer.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-23 Thread Robert Haas
On Tue, Aug 23, 2016 at 2:33 PM, Bruce Momjian  wrote:
> Well, the patch was updated several times, and the final version was not
> objected to until you objected.

It is not clear what you mean by "the final version", because you
posted two different final versions.  I don't see a clear vote from
anybody in favor of either of those things, and Peter's replies seem
to me to suggest that he does not support either of your proposals.
So I am not sure that I would agree with the statement that nobody
objected, but in any case there certainly wasn't a consensus in favor
of either change.

Also, the subject of this thread is "wrong suffix for pg_size_pretty",
which may not have tipped people off to the fact that you were
proposing to replace "kB" with "KB" everywhere.  Even after reading
your email, I didn't realize that you were proposing that until I
actually opened the patch and looked at it.  Such widespread changes
tend to draw objections, and IMHO shouldn't be made unless it's
abundantly clear that most people are in favor.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slowness of extended protocol

2016-08-23 Thread Shay Rojansky
Just a note from me - I also agree this thread evolved (or rather devolved)
in a rather unproductive and strange way.

One important note that came out, though, is that adding a new client
message does have a backwards compatibility issue - intelligent proxies
such as pgbouncer/pgpool will probably break once they see an unknown
client message. Even if they don't, they may miss potentially important
information being transmitted to the server. Whether this is a deal-breaker
for introducing new messages is another matter (I personally don't think
so).

On Tue, Aug 23, 2016 at 4:54 PM, Andres Freund  wrote:

> On 2016-08-23 11:42:53 -0400, Robert Haas wrote:
> > I think this could possibly be done, but it seems a lot better to me
> > to just bite the bullet and add a new protocol message.  That was
> > proposed by Tom Lane on July 31st and I think it's still by far the
> > best and easiest idea proposed, except I think we could introduce it
> > without waiting for a bigger rework of the protocol if we design the
> > libpq APIs carefully.  Most of the rest of this thread seems to have
> > devolved into an argument about whether this is really necessary,
> > which IMHO is a pretty silly argument, instead of focusing on how it
> > might be done, which I think would be a much more productive
> > conversation.
>
> I agree about the odd course of the further discussion, especially the
> tone was rather odd.  But I do think it's valuable to think about a path
> that fixes the issue without requiring version-dependant adaptions in
> all client drivers.
>
> Greetings,
>
> Andres Freund
>


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-23 Thread Alvaro Herrera
Bruce Momjian wrote:
> On Tue, Aug 23, 2016 at 02:31:26PM -0400, Robert Haas wrote:
> > On Tue, Aug 23, 2016 at 1:57 PM, Bruce Momjian  wrote:
> > > That's why I was asking you to comment on the final patch, which I am
> > > planning to apply to PG 10 soon.
> > 
> > Oh, OK.  I didn't understand that that was what you are asking.  I
> > don't find either of your proposed final patches to be an improvement
> > over the status quo.  I think the selection of kB rather than KB was a
> > deliberate decision by Peter Eisentraut, and I don't think changing
> > our practice now buys us anything meaningful.  Your first patch
> > introduces an odd wart into the GUC mechanism, with a strange wording
> > for the message, to fix something that's not really broken in the
> > first place.  Your second one alters kB to KB in zillions of places
> > all over the code base, and I am quite sure that there is no consensus
> > to do anything of that sort.
> 
> Well, the patch was updated several times, and the final version was not
> objected to until you objected.  Does anyone else want to weigh in?

I think this should be left alone -- it looks more like pointless
tinkering than something useful.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-23 Thread Bruce Momjian
On Tue, Aug 23, 2016 at 11:35:35AM -0700, Andres Freund wrote:
> On 2016-08-23 14:33:15 -0400, Bruce Momjian wrote:
> > On Tue, Aug 23, 2016 at 02:31:26PM -0400, Robert Haas wrote:
> > > On Tue, Aug 23, 2016 at 1:57 PM, Bruce Momjian  wrote:
> > > > That's why I was asking you to comment on the final patch, which I am
> > > > planning to apply to PG 10 soon.
> > > 
> > > Oh, OK.  I didn't understand that that was what you are asking.  I
> > > don't find either of your proposed final patches to be an improvement
> > > over the status quo.  I think the selection of kB rather than KB was a
> > > deliberate decision by Peter Eisentraut, and I don't think changing
> > > our practice now buys us anything meaningful.  Your first patch
> > > introduces an odd wart into the GUC mechanism, with a strange wording
> > > for the message, to fix something that's not really broken in the
> > > first place.  Your second one alters kB to KB in zillions of places
> > > all over the code base, and I am quite sure that there is no consensus
> > > to do anything of that sort.
> > 
> > Well, the patch was updated several times, and the final version was not
> > objected to until you objected.  Does anyone else want to weigh in?
> 
> To me the change doesn't seem beneficial. Noise aside, the added
> whitespace seems even seems detrimental to me.  But I also don't really
> care much.

Well, right now we are inconsistent, so we should decide on the spacing
and make it consistent.  I think we are consistent on using 'k' instead
of 'K'.  There were at least eight people on this thread and when no one
objected to my final patch, I thought people wanted it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-23 Thread Andres Freund
On 2016-08-23 14:33:15 -0400, Bruce Momjian wrote:
> On Tue, Aug 23, 2016 at 02:31:26PM -0400, Robert Haas wrote:
> > On Tue, Aug 23, 2016 at 1:57 PM, Bruce Momjian  wrote:
> > > That's why I was asking you to comment on the final patch, which I am
> > > planning to apply to PG 10 soon.
> > 
> > Oh, OK.  I didn't understand that that was what you are asking.  I
> > don't find either of your proposed final patches to be an improvement
> > over the status quo.  I think the selection of kB rather than KB was a
> > deliberate decision by Peter Eisentraut, and I don't think changing
> > our practice now buys us anything meaningful.  Your first patch
> > introduces an odd wart into the GUC mechanism, with a strange wording
> > for the message, to fix something that's not really broken in the
> > first place.  Your second one alters kB to KB in zillions of places
> > all over the code base, and I am quite sure that there is no consensus
> > to do anything of that sort.
> 
> Well, the patch was updated several times, and the final version was not
> objected to until you objected.  Does anyone else want to weigh in?

To me the change doesn't seem beneficial. Noise aside, the added
whitespace seems even seems detrimental to me.  But I also don't really
care much.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-23 Thread Bruce Momjian
On Tue, Aug 23, 2016 at 02:31:26PM -0400, Robert Haas wrote:
> On Tue, Aug 23, 2016 at 1:57 PM, Bruce Momjian  wrote:
> > That's why I was asking you to comment on the final patch, which I am
> > planning to apply to PG 10 soon.
> 
> Oh, OK.  I didn't understand that that was what you are asking.  I
> don't find either of your proposed final patches to be an improvement
> over the status quo.  I think the selection of kB rather than KB was a
> deliberate decision by Peter Eisentraut, and I don't think changing
> our practice now buys us anything meaningful.  Your first patch
> introduces an odd wart into the GUC mechanism, with a strange wording
> for the message, to fix something that's not really broken in the
> first place.  Your second one alters kB to KB in zillions of places
> all over the code base, and I am quite sure that there is no consensus
> to do anything of that sort.

Well, the patch was updated several times, and the final version was not
objected to until you objected.  Does anyone else want to weigh in?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-23 Thread Robert Haas
On Tue, Aug 23, 2016 at 1:57 PM, Bruce Momjian  wrote:
> That's why I was asking you to comment on the final patch, which I am
> planning to apply to PG 10 soon.

Oh, OK.  I didn't understand that that was what you are asking.  I
don't find either of your proposed final patches to be an improvement
over the status quo.  I think the selection of kB rather than KB was a
deliberate decision by Peter Eisentraut, and I don't think changing
our practice now buys us anything meaningful.  Your first patch
introduces an odd wart into the GUC mechanism, with a strange wording
for the message, to fix something that's not really broken in the
first place.  Your second one alters kB to KB in zillions of places
all over the code base, and I am quite sure that there is no consensus
to do anything of that sort.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Kevin Grittner
On Tue, Aug 23, 2016 at 11:50 AM, Andres Freund  wrote:
> On 2016-08-23 07:26:31 -0500, Kevin Grittner wrote:
>> On Tue, Aug 23, 2016 at 7:10 AM, Kevin Grittner  wrote:
>> > On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer  
>> > wrote:
>>
>> >> Could you provide an example of a case where xacts replayed in
>> >> commit order will produce incorrect results?
>> >
>> > https://wiki.postgresql.org/wiki/SSI#Deposit_Report
>> >
>> > ... where T3 is on the replication target.
>>
>> I should, perhaps, have mentioned that the cases where this is are
>> problem are "eventually consistent" -- it's a matter of being able
>> to see a state that violates business rule invariants or where data
>> which is "locked down" according to one part of the database is
>> still changing.  Such problems are prevented on a single database,
>> but would not be prevented on a logical replica where transactions
>> are applied in commit order.  Given enough time, the replica would
>> eventually settle into a state without the anomalies, similar to
>> some other products with eventual consistency.
>
> I've generally a bit of difficulty to see this as a significant problem
> for logical rep, as long as hot-standby, and crash-recovery in general,
> also has this problem...

Serialization anomalies cannot be seen on a hot standby nor on
crash recovery.  Granted, the mechanism which prevents it on the
hot standby is that we don't allow the transaction isolation level
to be set to SERIALIZABLE, to prevent the expectation that queries
will be free of anomalies, which is pretty crude.  On crash
recovery you cannot see any anomalies as far as I'm aware -- what
has given you the idea that it is possible?

I'm not sure that the same technique I described for logical
replication could be made to work for page-level physical
replication, but a "safe snapshot" technique has been previously
discussed on the lists which would work for physical replication
(although *that* technique seems unsuited to logical replication).

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-23 Thread Bruce Momjian
On Tue, Aug 23, 2016 at 01:53:25PM -0400, Robert Haas wrote:
> On Tue, Aug 23, 2016 at 1:47 PM, Bruce Momjian  wrote:
> >> I have already read the entire thread, and replied only after reading
> >> all messages.
> >
> > Well, what are you replying to then?
> 
> Your original message.  I'm arguing that we should not change the
> behavior, as you proposed to do.

That's why I was asking you to comment on the final patch, which I am
planning to apply to PG 10 soon.

> > There is no GUC used, and
> > everything is backward compatible.
> 
> Greg Stark proposed a GUC.  I don't think that's a good idea.  You
> proposed to change the behavior in a way that is not
> backward-compatible.  I don't think that's a good idea either.  If you
> are saying that you've dropped those proposals, fine, but I think it's
> entirely reasonable for me to express my opinion on them.  It was not
> evident to me that the thread had reached any kind of consensus.

Uh, the patch was the consensus, as I had several versions.  It was not
clear from your email what you thought of the patch, or if your comments
applied to the final patch at all.  The email you quoted was mine, but
from a very early stage in the discussion.

> > Your hyperbole about a new user
> > being confused is also not helpful.  What is this "chaos" you are
> > talking about?
> 
> Behavior-changing GUCs are bad news for reasons that have been
> discussed many times before: they create a requirement that everybody
> who writes code intended to run on arbitrary PostgreSQL installation
> be prepared to cater to every possible value of that GUC.
> pg_size_pretty() is pretty likely to appear in queries that we give
> users to run on their systems, so it would be a particularly poor
> choice to make its behavior configurable.

There is no question on that point.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-23 Thread Robert Haas
On Tue, Aug 23, 2016 at 1:47 PM, Bruce Momjian  wrote:
>> I have already read the entire thread, and replied only after reading
>> all messages.
>
> Well, what are you replying to then?

Your original message.  I'm arguing that we should not change the
behavior, as you proposed to do.

> There is no GUC used, and
> everything is backward compatible.

Greg Stark proposed a GUC.  I don't think that's a good idea.  You
proposed to change the behavior in a way that is not
backward-compatible.  I don't think that's a good idea either.  If you
are saying that you've dropped those proposals, fine, but I think it's
entirely reasonable for me to express my opinion on them.  It was not
evident to me that the thread had reached any kind of consensus.

> Your hyperbole about a new user
> being confused is also not helpful.  What is this "chaos" you are
> talking about?

Behavior-changing GUCs are bad news for reasons that have been
discussed many times before: they create a requirement that everybody
who writes code intended to run on arbitrary PostgreSQL installation
be prepared to cater to every possible value of that GUC.
pg_size_pretty() is pretty likely to appear in queries that we give
users to run on their systems, so it would be a particularly poor
choice to make its behavior configurable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-23 Thread Bruce Momjian
On Tue, Aug 23, 2016 at 01:45:44PM -0400, Robert Haas wrote:
> On Tue, Aug 23, 2016 at 1:43 PM, Bruce Momjian  wrote:
> > On Tue, Aug 23, 2016 at 01:30:29PM -0400, Robert Haas wrote:
> >> On Fri, Jul 29, 2016 at 8:18 PM, Bruce Momjian  wrote:
> >> > and the units were copied when pg_size_pretty() was implemented.  These
> >> > units are based on the International System of Units (SI)/metric.
> >> > However, the SI system is power-of-10-based, and we just re-purposed
> >> > them to be 1024 or 2^10-based.
> >> >
> >> > However, that is not the end of the story.
> >>
> >> Sure it is.  The behavior of the code matches the documentation.  The
> >> documentation describes one of several reasonable behaviors.  Full
> >> stop.
> >>
> >> > I am thinking Postgres 10 would be a good time to switch to KB as a
> >> > 1024-based prefix.  Unfortunately, there is no similar fix for MB, GB,
> >> > etc.  'm' is 'milli' so there we never used mB, so in JEDEC and Metric,
> >> > MB is ambiguous as 1000-based or 1024-based.
> >>
> >> I think this would be a backward compatibility break that would
> >> probably cause confusion for years.  I think we can add new functions
> >> that behave differently, but I oppose revising the behavior of the
> >> existing functions ... and I *definitely* oppose adding new
> >> behavior-changing GUCs.  The result of that will surely be chaos.
> >
> > Can you read up through August 1 and then reply?
> 
> I have already read the entire thread, and replied only after reading
> all messages.

Well, what are you replying to then?  There is no GUC used, and
everything is backward compatible.  Your hyperbole about a new user
being confused is also not helpful.  What is this "chaos" you are
talking about?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate prototype for socket_set_nonblocking.

2016-08-23 Thread Robert Haas
On Thu, Jul 28, 2016 at 9:12 PM, Kyotaro HORIGUCHI
 wrote:
> Looking into pqcomm.c, I noticed that the prototype of
> socket_set_nonblocking defined twice. Actually the prototype is
> not necessary at all but one may remain as a convention.
>
> It doesn't no harm so backpatching seems unnecessary.

Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-23 Thread Robert Haas
On Tue, Aug 23, 2016 at 1:43 PM, Bruce Momjian  wrote:
> On Tue, Aug 23, 2016 at 01:30:29PM -0400, Robert Haas wrote:
>> On Fri, Jul 29, 2016 at 8:18 PM, Bruce Momjian  wrote:
>> > and the units were copied when pg_size_pretty() was implemented.  These
>> > units are based on the International System of Units (SI)/metric.
>> > However, the SI system is power-of-10-based, and we just re-purposed
>> > them to be 1024 or 2^10-based.
>> >
>> > However, that is not the end of the story.
>>
>> Sure it is.  The behavior of the code matches the documentation.  The
>> documentation describes one of several reasonable behaviors.  Full
>> stop.
>>
>> > I am thinking Postgres 10 would be a good time to switch to KB as a
>> > 1024-based prefix.  Unfortunately, there is no similar fix for MB, GB,
>> > etc.  'm' is 'milli' so there we never used mB, so in JEDEC and Metric,
>> > MB is ambiguous as 1000-based or 1024-based.
>>
>> I think this would be a backward compatibility break that would
>> probably cause confusion for years.  I think we can add new functions
>> that behave differently, but I oppose revising the behavior of the
>> existing functions ... and I *definitely* oppose adding new
>> behavior-changing GUCs.  The result of that will surely be chaos.
>
> Can you read up through August 1 and then reply?

I have already read the entire thread, and replied only after reading
all messages.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-23 Thread Bruce Momjian
On Tue, Aug 23, 2016 at 01:30:29PM -0400, Robert Haas wrote:
> On Fri, Jul 29, 2016 at 8:18 PM, Bruce Momjian  wrote:
> > and the units were copied when pg_size_pretty() was implemented.  These
> > units are based on the International System of Units (SI)/metric.
> > However, the SI system is power-of-10-based, and we just re-purposed
> > them to be 1024 or 2^10-based.
> >
> > However, that is not the end of the story.
> 
> Sure it is.  The behavior of the code matches the documentation.  The
> documentation describes one of several reasonable behaviors.  Full
> stop.
> 
> > I am thinking Postgres 10 would be a good time to switch to KB as a
> > 1024-based prefix.  Unfortunately, there is no similar fix for MB, GB,
> > etc.  'm' is 'milli' so there we never used mB, so in JEDEC and Metric,
> > MB is ambiguous as 1000-based or 1024-based.
> 
> I think this would be a backward compatibility break that would
> probably cause confusion for years.  I think we can add new functions
> that behave differently, but I oppose revising the behavior of the
> existing functions ... and I *definitely* oppose adding new
> behavior-changing GUCs.  The result of that will surely be chaos.

Can you read up through August 1 and then reply?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-23 Thread Robert Haas
On Fri, Jul 29, 2016 at 8:18 PM, Bruce Momjian  wrote:
> The Postgres docs specify that kB is based on 1024 or 2^10:
>
> https://www.postgresql.org/docs/9.6/static/functions-admin.html
>
> Note: The units kB, MB, GB and TB used by the functions
> pg_size_pretty and pg_size_bytes are defined using powers of 2 rather
> than powers of 10, so 1kB is 1024 bytes, 1MB is 10242 = 1048576 bytes,
> and so on.
>
> These prefixes were introduced to GUC variable specification in 2006:
>
> commit b517e653489f733893d61e7a84c118325394471c
> Author: Peter Eisentraut 
> Date:   Thu Jul 27 08:30:41 2006 +
>
> Allow units to be specified with configuration settings.
>
> and added to postgresql.conf:
>
> # Memory units:  kB = kilobytesTime units:  ms  = milliseconds
> #MB = megabytes s   = seconds
> #GB = gigabytes min = minutes
> #TB = terabytes h   = hours
> #   d   = days
>
> and the units were copied when pg_size_pretty() was implemented.  These
> units are based on the International System of Units (SI)/metric.
> However, the SI system is power-of-10-based, and we just re-purposed
> them to be 1024 or 2^10-based.
>
> However, that is not the end of the story.

Sure it is.  The behavior of the code matches the documentation.  The
documentation describes one of several reasonable behaviors.  Full
stop.

> I am thinking Postgres 10 would be a good time to switch to KB as a
> 1024-based prefix.  Unfortunately, there is no similar fix for MB, GB,
> etc.  'm' is 'milli' so there we never used mB, so in JEDEC and Metric,
> MB is ambiguous as 1000-based or 1024-based.

I think this would be a backward compatibility break that would
probably cause confusion for years.  I think we can add new functions
that behave differently, but I oppose revising the behavior of the
existing functions ... and I *definitely* oppose adding new
behavior-changing GUCs.  The result of that will surely be chaos.

J. Random User: I'm having a problem!
Mailing List: Gee, how big are your tables?
J. Random User: Here's some pg_size_pretty output.
Mailing List: Gosh, we don't know what that means, what do you have
this obscure GUC set to?
J. Random User: Maybe I'll just give up on SQL and use MongoDB.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v19)

2016-08-23 Thread Robert Haas
On Tue, Aug 2, 2016 at 9:58 PM, Tomas Vondra
 wrote:
> Attached is v19 of the "multivariate stats" patch series - essentially v18
> rebased on top of current master.

Tom:

ISTR that you were going to try to look at this patch set.  It seems
from the discussion that it's not really ready for serious
consideration for commit yet, but also that some high-level design
comments from you at this stage could go a long way toward making sure
that the final form of the patch is something that will be acceptable.

I'd really like to see us get some kind of capability along these
lines, but I'm sure it will go a lot better if you or Dean handle it
than if I try to do it ... not to mention that there are only so many
hours in the day.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Craig Ringer
On 24 August 2016 at 00:50, Andres Freund  wrote:

> On 2016-08-23 07:26:31 -0500, Kevin Grittner wrote:
> > On Tue, Aug 23, 2016 at 7:10 AM, Kevin Grittner 
> wrote:
> > [an explanation of SSI anomalies]
>

I've generally a bit of difficulty to see this as a significant problem
> for logical rep, as long as hot-standby, and crash-recovery in general,
> also has this problem...
>

Same here, as commented upthread. I think it'd be cool to be able to
deliver SSI-alike behaviour on a standby, but it's far from a priority, and
it applies to phys ans well as logical rep.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2016-08-23 Thread Craig Ringer
On 23 August 2016 at 22:18, Robert Haas  wrote:

> On Mon, Aug 22, 2016 at 8:55 PM, Craig Ringer 
> wrote:
> > Updated patch series attached. As before, 0-4 intended for commit, 5 just
> > because it'll be handy to have around for people doing wraparound related
> > testing.
> >
> > Again, thanks for taking a look.
>
> /me reviews a bit more deeply.
>
> In 0001, it seems to me that "in-progress" should be "in progress".
>

Fine by me. I was on the fence about it anyway.

+CREATE TYPE txid_status AS ENUM ('committed', 'in-progress', 'aborted');
>


> I'm not really that keen on this approach.  I don't think we need to
> introduce a new data type for this, and I would rather not use SQL,
> either.  It would be faster and simpler to just return the appropriate
> string from a C function defined directly.
>

Also fine by me. You're right, keep it simple. It means the potential set
of values isn't discoverable the same way, but ... meh. Using it usefully
means reading the docs anyway.

The remaining 2 patches of interest are attached - txid_status() and
txid_convert_if_recent(). Thanks for committing txid_current_if_assigned().

Now I'd best stop pretending I'm in a sensible timezone.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 737fe0a0f8e76fc24c19e2e56a2890ff56f37075 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 19 Aug 2016 14:44:15 +0800
Subject: [PATCH 1/3] Introduce txid_status(bigint) to get status of an xact

If an appliation is disconnected while a COMMIT request is in flight,
the backend crashes mid-commit, etc, then an application may not be
sure whether or not a commit completed successfully or was rolled
back. While two-phase commit solves this it does so at a considerable
overhead, so introduce a lighter alternative.

txid_status(bigint) lets an application determine the status of a
a commit based on an xid-with-epoch as returned by txid_current()
or similar. Status may be committed, aborted, in-progress (including
prepared xacts) or null if the xact is too old for its commit status
to still be retained because it has passed the wrap-around epoch
boundary.

Applications must call txid_current() in their transactions to make
much use of this since PostgreSQL does not automatically report an xid
to the client when one is assigned.
---
 doc/src/sgml/func.sgml | 26 
 src/backend/access/transam/clog.c  | 23 ---
 src/backend/utils/adt/txid.c   | 85 ++
 src/include/access/clog.h  | 23 +++
 src/include/catalog/pg_proc.h  |  2 +
 src/include/utils/builtins.h   |  1 +
 src/test/regress/expected/txid.out | 50 ++
 src/test/regress/sql/txid.sql  | 35 
 8 files changed, 222 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6355300..420cced 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17143,6 +17143,10 @@ SELECT collation for ('foo' COLLATE "de_DE");
 txid_visible_in_snapshot

 
+   
+txid_status
+   
+

 The functions shown in 
 provide server transaction information in an exportable form.  The main
@@ -17193,6 +17197,11 @@ SELECT collation for ('foo' COLLATE "de_DE");
boolean
is transaction ID visible in snapshot? (do not use with subtransaction ids)
   
+  
+   txid_status(bigint)
+   txid_status
+   report the status of the given xact - committed, aborted, in progress, or NULL if the xid is too old
+  
  
 

@@ -17263,6 +17272,23 @@ SELECT collation for ('foo' COLLATE "de_DE");

 

+txid_status(bigint) reports the commit status of a recent
+transaction. Any recent transaction can be identified as one of
+
+ in progress
+ committed
+ aborted
+
+Prepared transactions are identified as in progress.
+The commit status of transactions older than the transaction ID wrap-around
+threshold is no longer known by the system, so txid_status
+returns NULL for such transactions. Applications may use
+txid_status to determine whether a transaction committed
+or aborted when the application and/or database server crashed or lost
+connection while a COMMIT command was in progress.
+   
+
+   
 The functions shown in 
 provide information about transactions that have been already committed.
 These functions mainly provide information about when the transactions
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 2634476..1a6e26d 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -41,29 +41,6 @@
 #include "miscadmin.h"
 #include "pg_trace.h"
 
-/*
- * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
- * everywhere else 

Re: [HACKERS] Index Onlys Scan for expressions

2016-08-23 Thread Robert Haas
On Tue, Aug 16, 2016 at 2:43 AM, Alexander Korotkov
 wrote:g of pg_operator.oprassociative...
> Another problem is computational complexity of such transformations.  AFAIR,
> few patches for making optimizer smarter with expressions were already
> rejected because of this reason.

s/few/many/

> Also, even if we would have such transformation of expressions, floating
> points types would be still problematic, because (a + b) + c = a + (b + c)
> is not exactly true for them because of computational error.

Right.  I think matching expressions exactly is plenty good enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slowness of extended protocol

2016-08-23 Thread Andres Freund
On 2016-08-23 11:42:53 -0400, Robert Haas wrote:
> I think this could possibly be done, but it seems a lot better to me
> to just bite the bullet and add a new protocol message.  That was
> proposed by Tom Lane on July 31st and I think it's still by far the
> best and easiest idea proposed, except I think we could introduce it
> without waiting for a bigger rework of the protocol if we design the
> libpq APIs carefully.  Most of the rest of this thread seems to have
> devolved into an argument about whether this is really necessary,
> which IMHO is a pretty silly argument, instead of focusing on how it
> might be done, which I think would be a much more productive
> conversation.

I agree about the odd course of the further discussion, especially the
tone was rather odd.  But I do think it's valuable to think about a path
that fixes the issue without requiring version-dependant adaptions in
all client drivers.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-23 Thread Andres Freund
On 2016-08-23 18:18:57 +0300, Heikki Linnakangas wrote:
> I installed Greg Smith's pgbench-tools kit on that server, and ran some
> tests. I'm seeing some benefit on "pgbench -N" workload, but only after
> modifying the test script to use "-M prepared", and using Unix domain
> sockets instead of TCP to connect. Apparently those things add enough
> overhead to mask out the little difference.

To make the problem more apparent, how are the differences for something
as extreme as just a
SELECT 1;
or a
SELECT 1 FROM pgbench_accounts WHERE aid = 1;

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Andres Freund
On 2016-08-23 07:26:31 -0500, Kevin Grittner wrote:
> On Tue, Aug 23, 2016 at 7:10 AM, Kevin Grittner  wrote:
> > On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer  wrote:
> 
> >> Could you provide an example of a case where xacts replayed in
> >> commit order will produce incorrect results?
> >
> > https://wiki.postgresql.org/wiki/SSI#Deposit_Report
> >
> > ... where T3 is on the replication target.
> 
> I should, perhaps, have mentioned that the cases where this is are
> problem are "eventually consistent" -- it's a matter of being able
> to see a state that violates business rule invariants or where data
> which is "locked down" according to one part of the database is
> still changing.  Such problems are prevented on a single database,
> but would not be prevented on a logical replica where transactions
> are applied in commit order.  Given enough time, the replica would
> eventually settle into a state without the anomalies, similar to
> some other products with eventual consistency.

I've generally a bit of difficulty to see this as a significant problem
for logical rep, as long as hot-standby, and crash-recovery in general,
also has this problem...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2016-08-23 Thread Andres Freund
On 2016-08-23 12:17:30 -0400, Robert Haas wrote:
> On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera
>  wrote:
> > Robert Haas wrote:
> >> 2. When you finish the heap scan, or when the array of dead tuple IDs
> >> is full (or very nearly full?), perform a cycle of index vacuuming.
> >> For now, have each worker process a separate index; extra workers just
> >> wait.  Perhaps use the condition variable patch that I posted
> >> previously to make the workers wait.  Then resume the parallel heap
> >> scan, if not yet done.
> >
> > At least btrees should easily be scannable in parallel, given that we
> > process them in physical order rather than logically walk the tree.  So
> > if there are more workers than indexes, it's possible to put more than
> > one worker on the same index by carefully indicating each to stop at a
> > predetermined index page number.
> 
> Well that's fine if we figure it out, but I wouldn't try to include it
> in the first patch.  Let's make VACUUM parallel one step at a time.

Given that index scan(s) are, in my experience, way more often the
bottleneck than the heap-scan(s), I'm not sure that order is the
best. The heap-scan benefits from the VM, the index scans don't.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Craig Ringer
On 24 August 2016 at 00:26, Kevin Grittner  wrote:

> On Tue, Aug 23, 2016 at 9:36 AM, Kevin Grittner  wrote:
> > On Tue, Aug 23, 2016 at 9:07 AM, Kevin Grittner 
> wrote:
> >> On Tue, Aug 23, 2016 at 7:40 AM, Craig Ringer 
> wrote:
> >>> On 23 Aug 2016 20:10, "Kevin Grittner"  wrote:
> 
>  On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer 
> >>
> > Could you provide an example of a case where xacts replayed in
> > commit order will produce incorrect results?
> 
>  https://wiki.postgresql.org/wiki/SSI#Deposit_Report
> 
>  ... where T3 is on the replication target.
> >>>
> >>> Right. But we don't attempt to replicate locking let alone SSI state.
> As I
> >>> said this is expected. If T1, T2 and T3 run in the master in either
> READ
> >>> COMMITTED or SERIALIZABLE we will correctly replay whatever got
> committed
> >>> and leave the replica in the same state as the master.
> >>
> >> Eventually.  Between the commit of T3 and T2 a state can be seen on
> >> the replica which would not have been allowed on the source.
> >>
> >>> It is row level replication so there is no simple way to detect this
> >>> anomaly.
> >>
> >> That is probably true, but there is a way to *prevent* it.
> >>
> >>> We would have to send a lot of co-ordination data *in both
> >>> directions*, right?
> >>
> >> No.  The source has all the information about both commit order and
> >> read-write dependencies, and could produce a stream of transaction
> >> IDs to specify the safe order for applying transactions to prevent
> >> the anomaly from appearing on the replica.  In this case the commit
> >> order is T1->T3->T2, but the apparent order of execution (AOoE) is
> >> T1->T2->T3.
> >
> > Sorry, trying to keep this conversation going while doing something
> > else and sent a response there which doesn't really make sense,
> > since the issue is whether to allow T3 to run *on the replica*.
> > I'll take another look when I'm less distracted.  You may be right.
>
> I had the right idea, but messed up the details.  The source has
> commit order T2->T1 and AOoE T1->T2.  So as long as making any work
> from T2 visible is held up until at least the point where the work
> of T1 is visible, T3 on the replica cannot see an anomalous state.
>
> It is still true that a one-way stream of information from the
> primary to the replica regarding AOoE would be sufficient
> communication.
>

Hm. That's really interesting.   So we could achieve SSI-like properties
for read-only xacts on a replica if we could defer xact decoding, or buffer
the xact on the downstream and defer apply, based on SSI info from the
upstream. That'd be pretty cool.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Question about MVCC-unsafe commands

2016-08-23 Thread Robert Haas
On Tue, Aug 23, 2016 at 10:10 AM, Antonin Houska  wrote:
> I'm failing to understand why [1] mentions "table-rewriting forms of ALTER
> TABLE" besides TRUNCATE command.
>
> For TRUNCATE it seems clear: if transaction A takes the snapshot before it
> accesses the table first time (typically because isolation level is at least
> REPEATABLE READ) and transaction B manages to commit TRUNCATE soon enough,
> then A sees pg_class entry of the table already affected by B, which has the
> new (empty) relfilenode. (The original pg_class entry is no longer visible by
> catalog snapshot, nor does it contain valid OID of the original relfilenode.)
>
> But IMO heap rewriting changes neither table contents, nor visibility. Can
> anyone explain what I miss?

CLUSTER preserves xmin/xmax when rewriting, but ALTER TABLE does not.

rhaas=# create table foo (a int);
CREATE TABLE
rhaas=# insert into foo values (1);
INSERT 0 1
rhaas=# insert into foo values (2);
INSERT 0 1
rhaas=# select xmin, a from foo;
 xmin | a
--+---
  934 | 1
  935 | 2
(2 rows)

rhaas=# alter table foo alter a type text;
ALTER TABLE
rhaas=# select xmin, a from foo;
 xmin | a
--+---
  936 | 1
  936 | 2
(2 rows)

This is sad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Kevin Grittner
On Tue, Aug 23, 2016 at 9:36 AM, Kevin Grittner  wrote:
> On Tue, Aug 23, 2016 at 9:07 AM, Kevin Grittner  wrote:
>> On Tue, Aug 23, 2016 at 7:40 AM, Craig Ringer  wrote:
>>> On 23 Aug 2016 20:10, "Kevin Grittner"  wrote:

 On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer 
>>
> Could you provide an example of a case where xacts replayed in
> commit order will produce incorrect results?

 https://wiki.postgresql.org/wiki/SSI#Deposit_Report

 ... where T3 is on the replication target.
>>>
>>> Right. But we don't attempt to replicate locking let alone SSI state. As I
>>> said this is expected. If T1, T2 and T3 run in the master in either READ
>>> COMMITTED or SERIALIZABLE we will correctly replay whatever got committed
>>> and leave the replica in the same state as the master.
>>
>> Eventually.  Between the commit of T3 and T2 a state can be seen on
>> the replica which would not have been allowed on the source.
>>
>>> It is row level replication so there is no simple way to detect this
>>> anomaly.
>>
>> That is probably true, but there is a way to *prevent* it.
>>
>>> We would have to send a lot of co-ordination data *in both
>>> directions*, right?
>>
>> No.  The source has all the information about both commit order and
>> read-write dependencies, and could produce a stream of transaction
>> IDs to specify the safe order for applying transactions to prevent
>> the anomaly from appearing on the replica.  In this case the commit
>> order is T1->T3->T2, but the apparent order of execution (AOoE) is
>> T1->T2->T3.
>
> Sorry, trying to keep this conversation going while doing something
> else and sent a response there which doesn't really make sense,
> since the issue is whether to allow T3 to run *on the replica*.
> I'll take another look when I'm less distracted.  You may be right.

I had the right idea, but messed up the details.  The source has
commit order T2->T1 and AOoE T1->T2.  So as long as making any work
from T2 visible is held up until at least the point where the work
of T1 is visible, T3 on the replica cannot see an anomalous state.

It is still true that a one-way stream of information from the
primary to the replica regarding AOoE would be sufficient
communication.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2016-08-23 Thread Alvaro Herrera
Robert Haas wrote:
> On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera
>  wrote:
> > Robert Haas wrote:
> >> 2. When you finish the heap scan, or when the array of dead tuple IDs
> >> is full (or very nearly full?), perform a cycle of index vacuuming.
> >> For now, have each worker process a separate index; extra workers just
> >> wait.  Perhaps use the condition variable patch that I posted
> >> previously to make the workers wait.  Then resume the parallel heap
> >> scan, if not yet done.
> >
> > At least btrees should easily be scannable in parallel, given that we
> > process them in physical order rather than logically walk the tree.  So
> > if there are more workers than indexes, it's possible to put more than
> > one worker on the same index by carefully indicating each to stop at a
> > predetermined index page number.
> 
> Well that's fine if we figure it out, but I wouldn't try to include it
> in the first patch.  Let's make VACUUM parallel one step at a time.

Sure, just putting the idea out there.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2016-08-23 Thread Robert Haas
On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> 2. When you finish the heap scan, or when the array of dead tuple IDs
>> is full (or very nearly full?), perform a cycle of index vacuuming.
>> For now, have each worker process a separate index; extra workers just
>> wait.  Perhaps use the condition variable patch that I posted
>> previously to make the workers wait.  Then resume the parallel heap
>> scan, if not yet done.
>
> At least btrees should easily be scannable in parallel, given that we
> process them in physical order rather than logically walk the tree.  So
> if there are more workers than indexes, it's possible to put more than
> one worker on the same index by carefully indicating each to stop at a
> predetermined index page number.

Well that's fine if we figure it out, but I wouldn't try to include it
in the first patch.  Let's make VACUUM parallel one step at a time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2016-08-23 Thread Masahiko Sawada
On Tue, Aug 23, 2016 at 10:50 PM, Robert Haas  wrote:
> On Tue, Aug 23, 2016 at 7:02 AM, Masahiko Sawada  
> wrote:
>> I'd like to propose block level parallel VACUUM.
>> This feature makes VACUUM possible to use multiple CPU cores.
>
> Great.  This is something that I have thought about, too.  Andres and
> Heikki recommended it as a project to me a few PGCons ago.
>
>> As for PoC, I implemented parallel vacuum so that each worker
>> processes both 1 and 2 phases for particular block range.
>> Suppose we vacuum 1000 blocks table with 4 workers, each worker
>> processes 250 consecutive blocks in phase 1 and then reclaims dead
>> tuples from heap and indexes (phase 2).
>> To use visibility map efficiency, each worker scan particular block
>> range of relation and collect dead tuple locations.
>> After each worker finished task, the leader process gathers these
>> vacuum statistics information and update relfrozenxid if possible.
>
> This doesn't seem like a good design, because it adds a lot of extra
> index scanning work.  What I think you should do is:
>
> 1. Use a parallel heap scan (heap_beginscan_parallel) to let all
> workers scan in parallel.  Allocate a DSM segment to store the control
> structure for this parallel scan plus an array for the dead tuple IDs
> and a lock to protect the array.
>
> 2. When you finish the heap scan, or when the array of dead tuple IDs
> is full (or very nearly full?), perform a cycle of index vacuuming.
> For now, have each worker process a separate index; extra workers just
> wait.  Perhaps use the condition variable patch that I posted
> previously to make the workers wait.  Then resume the parallel heap
> scan, if not yet done.
>
> Later, we can try to see if there's a way to have multiple workers
> work together to vacuum a single index.  But the above seems like a
> good place to start.

Thank you for the advice.
That's a what I thought as an another design, I will change the patch
to this design.

>> I also changed the buffer lock infrastructure so that multiple
>> processes can wait for cleanup lock on a buffer.
>
> You won't need this if you proceed as above, which is probably a good thing.

Right.

>
>> And the new GUC parameter vacuum_parallel_workers controls the number
>> of vacuum workers.
>
> I suspect that for autovacuum there is little reason to use parallel
> vacuum, since most of the time we are trying to slow vacuum down, not
> speed it up.  I'd be inclined, for starters, to just add a PARALLEL
> option to the VACUUM command, for when people want to speed up
> parallel vacuums.  Perhaps
>
> VACUUM (PARALLEL 4) relation;
>
> ...could mean to vacuum the relation with the given number of workers, and:
>
> VACUUM (PARALLEL) relation;
>
> ...could mean to vacuum the relation in parallel with the system
> choosing the number of workers - 1 worker per index is probably a good
> starting formula, though it might need some refinement.

It looks convenient.
I was thinking that we can manage the number of parallel worker per
table using this parameter for autovacuum , like
ALTER TABLE relation SET (parallel_vacuum_workers = 2)

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slowness of extended protocol

2016-08-23 Thread Robert Haas
On Tue, Aug 16, 2016 at 4:48 PM, Andres Freund  wrote:
> One approach to solving this, without changing the protocol, would be to
> "fuse" parse/bind/execute/sync together, by peeking ahead in the
> protocol stream. When that combination is seen looking ahead (without
> blocking), optimize it by handing it to something closer to
> exec_simple_query() which also handles parameters.  Even if we don't
> recognize that pattern everytime, e.g. because later messages are in
> different, not yet arrived, tcp packets, that'd speed up the common
> case.  As our client socket is nearly always is in non-blocking mode
> these days, that shouldn't be too expensive.

I think this could possibly be done, but it seems a lot better to me
to just bite the bullet and add a new protocol message.  That was
proposed by Tom Lane on July 31st and I think it's still by far the
best and easiest idea proposed, except I think we could introduce it
without waiting for a bigger rework of the protocol if we design the
libpq APIs carefully.  Most of the rest of this thread seems to have
devolved into an argument about whether this is really necessary,
which IMHO is a pretty silly argument, instead of focusing on how it
might be done, which I think would be a much more productive
conversation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-23 Thread Heikki Linnakangas

On 08/22/2016 08:38 PM, Andres Freund wrote:

On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:

I
remember seeing ProcArrayLock contention very visible earlier, but I can't
hit that now. I suspect you'd still see contention on bigger hardware,
though, my laptop has oly 4 cores. I'll have to find a real server for the
next round of testing.


Yea, I think that's true. I can just about see ProcArrayLock contention
on my more powerful laptop, to see it really bad you need bigger
hardware / higher concurrency.


As soon as I sent my previous post, Vladimir Borodin kindly offered 
access to a 32-core server for performance testing. Thanks Vladimir!


I installed Greg Smith's pgbench-tools kit on that server, and ran some 
tests. I'm seeing some benefit on "pgbench -N" workload, but only after 
modifying the test script to use "-M prepared", and using Unix domain 
sockets instead of TCP to connect. Apparently those things add enough 
overhead to mask out the little difference.


Attached is a graph with the results. Full results are available at 
https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch 
improved throughput, measured in TPS, with >= 32 or so clients. The 
biggest difference was with 44 clients, which saw about 5% improvement.


So, not phenomenal, but it's something. I suspect that with more cores, 
the difference would become more clear.


Like on a cue, Alexander Korotkov just offered access to a 72-core 
system :-). Thanks! I'll run the same tests on that.


- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2016-08-23 Thread Alvaro Herrera
Robert Haas wrote:

> 2. When you finish the heap scan, or when the array of dead tuple IDs
> is full (or very nearly full?), perform a cycle of index vacuuming.
> For now, have each worker process a separate index; extra workers just
> wait.  Perhaps use the condition variable patch that I posted
> previously to make the workers wait.  Then resume the parallel heap
> scan, if not yet done.

At least btrees should easily be scannable in parallel, given that we
process them in physical order rather than logically walk the tree.  So
if there are more workers than indexes, it's possible to put more than
one worker on the same index by carefully indicating each to stop at a
predetermined index page number.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Aleksander Alekseev
Thank you everyone for great comments!

> have a permanent pg_class which holds the records for permanent tables
> and an *unlogged* table, say pg_class_unlogged, which holds records
> for temporary tables.  Now everybody can see everybody else's data,
> yet we don't have to create permanent catalog entries.  So we are not
> dead.  All of the temporary catalog tables vanish on a crash, too, and
> in a very clean way, which is great.
>
> [...]
>
> Overall I feel like the development effort that it would take to make
> this work would almost certainly be better-expended elsewhere.

Agree. This is an interesting idea but considering named drawbacks,
especially:

> 2. You can't write to unlogged tables on standby servers, so this
> doesn't help solve the problem of wanting to use temporary tables on
> standbys.

... I don't think it's worth an effort.

>> when you use global temporary tables, then you create it only once -
>> like usual tables.
>>
>> you don't drop these tables.
>
> I share the view that this is a better/simpler solution to the problem.
> It will still require virtual (in-memory) tuples for pg_statistic
> records, but everything else works pretty much as for regular tables. In
> particular there are no problems with dependencies.
>
> The obvious disadvantage is that it requires changes to applications.

Frankly I have much more faith in Tom's idea of using virtual part of the
catalog for all temporary tables, i.e turning all temporary tables into
"fast" temporary tables. Instead of introducing a new type of temporary tables
that solve catalog bloating problem and forcing users to rewrite applications
why not just not to create a problem in a first place?

> I think one way to fix that would be to store the virtual tuples in
> shared memory (instead of process memory). That will certainly require
> locking and synchronization, but well - it needs to be shared.

I believe currently this is the most promising course of action. In first
implementation we could just place all virtual part of the catalog in a shared
memory and protect it with a single lock. If it will work as expected the next
step would be elimination of bottlenecks --- using multiple locks, moving part
of a virtual catalog to local backend's memory, etc.

As always, please don't hesitate to share any thoughts on this topic!

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Petr Jelinek

On 23/08/16 15:59, Robert Haas wrote:

On Mon, Aug 22, 2016 at 4:32 PM, Andres Freund  wrote:

On 2016-08-22 16:29:12 -0400, Robert Haas wrote:

So, I wish I could give you some better advice on this topic, but
sadly I am not an expert in this area.  However, it seems to me that
this is just one facet of a much more general problem: given two
transactions T1 and T2, the order of replay must match the order of
commit unless you can prove that there are no dependencies between
them.  I don't see why it matters whether the operations are sequence
operations or data operations; it's just a question of whether they're
modifying the same "stuff".

Of course, it's possible I'm missing something important here...


Maybe that normally logical decoding outputs stuff in commit order?


As slow as I sometimes am, I did know that.  :-)

I think what I was missing is that nextval() operations are
non-transactional.  Craig describes them as non-transactional except
when they are transactional, but I think that's not really the right
way of looking at it.  Creating the sequence is transactional, and
updating the value is not.  What seems to be causing trouble for Craig
is that if the nextval() operation is inserted into the replication
stream non-transactionally, it can happen before the sequence actually
gets created.  I'm wondering if we can't find a way to make it so that
it's OK for those operations to happen out of order, rather than
trying to make the nextval() operation sometimes transactional and
other times non-transactional.



Well, that's what Craig is trying to do by tracking if the transactional 
change has happend on a sequence in current transaction, no?


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2016-08-23 Thread Robert Haas
On Mon, Aug 22, 2016 at 8:18 PM, Thomas Munro
 wrote:
> On Tue, Aug 23, 2016 at 8:41 AM, Robert Haas  wrote:
>> We could test to see how much it slows things down.  But it
>> may be worth paying the cost even if it ends up being kinda expensive.
>
> Here are some numbers from a Xeon E7-8830 @ 2.13GHz running Linux 3.10
> running the attached program.  It's fairly noisy and I didn't run
> super long tests with many repeats, but the general theme is visible.
> If you're actually going to USE the memory, it's only a small extra
> cost to have reserved seats.  But if there's a strong chance you'll
> never access most of the memory, you might call it expensive.
>
> Segment size 1MB:
>
> base = shm_open + ftruncate + mmap + munmap + close = 5us
> base + fallocate = 38us
> base + memset = 332us
> base + fallocate + memset = 346us
>
> Segment size 1GB:
>
> base = shm_open + ftruncate + mmap + munmap + close = 10032us
> base + fallocate = 30774us
> base + memset = 602925us
> base + fallocate + memset = 655433us

Typical DSM segments for parallel query seem to be much smaller than
1MB.  I just added an elog(NOTICE, ...) to dsm_create to print the
size and ran the regression tests.  I got these results:

+ NOTICE:  dsm_create: 89352
+ NOTICE:  dsm_create: 332664
+ NOTICE:  dsm_create: 86664

So for parallel query we're looking at a hit that is probably in the
range of one-tenth of one millisecond or less, which sees like it's
not really a big deal considering that the typical startup time is 4ms
and, really, at this point, we're aiming to use this primarily for
queries with runtimes in the hundreds of milliseconds and more.  Also,
the code can be arbitrarily fast if it doesn't have to be safe.

Now, for bigger segment sizes, I think there actually could be a
little bit of a noticeable performance hit here, because it's not just
about total elapsed time.  Even if the code eventually touches all of
the memory, it might not touch it all before starting to fire up
workers or whatever else it wants to do with the DSM segment.  But I'm
thinking we still need to bite the bullet and pay the expense, because
crash-and-restart cycles are *really* bad.

Assuming the DSA code you submitted gets committed, that's really
where the hit will be here: you'll be be merrily allocating chunks of
dynamic shared memory until your existing DSM segment fills up, and
then, kaboom, you'll go into the tank for half a second when you try
to do the next allocation, supposing the next segment is 1GB in size.
That's not much fun, especially considering that .  But again, unless
we have a faster way to force the system to allocate the pages, I
think we're just going to have to live with that.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UTF-8 docs?

2016-08-23 Thread Victor Wagner
On Mon, 22 Aug 2016 10:53:25 -0400
Peter Eisentraut  wrote:

> On 8/22/16 9:32 AM, Tatsuo Ishii wrote:
> > I don't know what kind of problem you are seeing with encoding
> > handling, but at least UTF-8 is working for Japanese, French and
> > Russian.  
> 
> Those translations are using DocBook XML.
> 

Russian translation by Postgres Professional does use DocBook SGML,
although it uses xml as intermediate representation when applying
gettext to the documentation. I've already posted URL where sources of
postgresql with russian documentation in SGML format included can be
downloaded.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgbench - fix stats when using \sleep

2016-08-23 Thread Fabien COELHO


Hello devs,

When \sleep is used within a pgbench script it resets txn_scheduled which 
is used for computing stats about the transaction, resulting in absurd 
statistics:


 latency average = 0.649 ms *** ??? ***
 ...
 script statistics:
  - statement latencies in milliseconds:
  0.235  BEGIN;
100.301  \sleep 100 ms
  0.351  END;

I probably created this bug when adding "--rate" in 9.4 and trying to be 
too clever. As nobody complained yet about it, I'm not sure it is worth 
fixing it there, though.


The fix is that "\sleep" does not have to interfere with the txn_scheduled 
field, see the attached patch.


 latency average = 100.237 ms  *** BETTER ***
 ...
 script statistics:
  - statement latencies in milliseconds:
  0.099  BEGIN;
100.001  \sleep 100 ms
  0.135  END;

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8027955..de9b1b6 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -250,6 +250,7 @@ typedef struct
 	int			nvariables;		/* number of variables */
 	bool		vars_sorted;	/* are variables sorted by name? */
 	int64		txn_scheduled;	/* scheduled start time of transaction (usec) */
+	int64		sleep_until;	/* scheduled start time of next cmd (usec) */
 	instr_time	txn_begin;		/* used for measuring schedule lag times */
 	instr_time	stmt_begin;		/* used for measuring statement latencies */
 	int			use_file;		/* index in sql_scripts for this client */
@@ -1828,6 +1829,7 @@ top:
 			}
 		}
 
+		st->sleep_until = st->txn_scheduled;
 		st->sleeping = true;
 		st->throttling = true;
 		st->is_throttled = true;
@@ -1840,7 +1842,7 @@ top:
 	{			/* are we sleeping? */
 		if (INSTR_TIME_IS_ZERO(now))
 			INSTR_TIME_SET_CURRENT(now);
-		if (INSTR_TIME_GET_MICROSEC(now) < st->txn_scheduled)
+		if (INSTR_TIME_GET_MICROSEC(now) < st->sleep_until)
 			return true;		/* Still sleeping, nothing to do here */
 		/* Else done sleeping, go ahead with next command */
 		st->sleeping = false;
@@ -2139,7 +2141,7 @@ top:
 usec *= 100;
 
 			INSTR_TIME_SET_CURRENT(now);
-			st->txn_scheduled = INSTR_TIME_GET_MICROSEC(now) + usec;
+			st->sleep_until = INSTR_TIME_GET_MICROSEC(now) + usec;
 			st->sleeping = true;
 
 			st->listen = true;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2016-08-23 Thread Masahiko Sawada
On Tue, Aug 23, 2016 at 9:40 PM, Alexander Korotkov
 wrote:
> On Tue, Aug 23, 2016 at 3:32 PM, Tom Lane  wrote:
>>
>> Claudio Freire  writes:
>> > Not only that, but from your description (I haven't read the patch,
>> > sorry), you'd be scanning the whole index multiple times (one per
>> > worker).
>>
>> What about pointing each worker at a separate index?  Obviously the
>> degree of concurrency during index cleanup is then limited by the
>> number of indexes, but that doesn't seem like a fatal problem.
>
>
> +1
> We could eventually need some effective way of parallelizing vacuum of
> single index.
> But pointing each worker at separate index seems to be fair enough for
> majority of cases.
>

Or we can improve vacuum of single index by changing data
representation of dead tuple to bitmap.
 It can reduce the number of index whole scan during vacuum and make
comparing the index item to the dead tuples faster.
This is a listed on Todo list and I've implemented it.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Kevin Grittner
On Tue, Aug 23, 2016 at 9:07 AM, Kevin Grittner  wrote:
> On Tue, Aug 23, 2016 at 7:40 AM, Craig Ringer  wrote:
>> On 23 Aug 2016 20:10, "Kevin Grittner"  wrote:
>>>
>>> On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer 
>
 Could you provide an example of a case where xacts replayed in
 commit order will produce incorrect results?
>>>
>>> https://wiki.postgresql.org/wiki/SSI#Deposit_Report
>>>
>>> ... where T3 is on the replication target.
>>
>> Right. But we don't attempt to replicate locking let alone SSI state. As I
>> said this is expected. If T1, T2 and T3 run in the master in either READ
>> COMMITTED or SERIALIZABLE we will correctly replay whatever got committed
>> and leave the replica in the same state as the master.
>
> Eventually.  Between the commit of T3 and T2 a state can be seen on
> the replica which would not have been allowed on the source.
>
>> It is row level replication so there is no simple way to detect this
>> anomaly.
>
> That is probably true, but there is a way to *prevent* it.
>
>> We would have to send a lot of co-ordination data *in both
>> directions*, right?
>
> No.  The source has all the information about both commit order and
> read-write dependencies, and could produce a stream of transaction
> IDs to specify the safe order for applying transactions to prevent
> the anomaly from appearing on the replica.  In this case the commit
> order is T1->T3->T2, but the apparent order of execution (AOoE) is
> T1->T2->T3.

Sorry, trying to keep this conversation going while doing something
else and sent a response there which doesn't really make sense,
since the issue is whether to allow T3 to run *on the replica*.
I'll take another look when I'm less distracted.  You may be right.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2016-08-23 Thread Robert Haas
On Tue, Aug 23, 2016 at 10:18 AM, Robert Haas  wrote:
> On Mon, Aug 22, 2016 at 8:55 PM, Craig Ringer  wrote:
>> Updated patch series attached. As before, 0-4 intended for commit, 5 just
>> because it'll be handy to have around for people doing wraparound related
>> testing.
>>
>> Again, thanks for taking a look.
>
> /me reviews a bit more deeply.
>
> In 0001, ...

0002 looks good, so I committed it.   You forgot a function prototype
for the new SQL-callable function, though, so I added that.  For me,
it generates a compiler warning if that's missing; you might want to
try to achieve a similar setup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2016-08-23 Thread Robert Haas
On Mon, Aug 22, 2016 at 8:55 PM, Craig Ringer  wrote:
> Updated patch series attached. As before, 0-4 intended for commit, 5 just
> because it'll be handy to have around for people doing wraparound related
> testing.
>
> Again, thanks for taking a look.

/me reviews a bit more deeply.

In 0001, it seems to me that "in-progress" should be "in progress".  I
don't think it's normal to hyphenate that.  We have admittedly
sometimes done so, but:

[rhaas pgsql]$ git grep 'in-progress' | wc -l
  63
[rhaas pgsql]$ git grep 'in progress' | wc -l
 346

It may make sense to speak of an "in-progress transaction" but I would
say "the transaction is in progress" not "the transaction is
in-progress", which seems to me to argue for a space as the proper
separator here.

Also:

+CREATE TYPE txid_status AS ENUM ('committed', 'in-progress', 'aborted');
+
+CREATE FUNCTION
+  txid_status(txid bigint)
+RETURNS txid_status
+LANGUAGE sql
+VOLATILE PARALLEL SAFE
+AS $$
+SELECT CASE
+  WHEN s IS NULL THEN NULL::txid_status
+  WHEN s = -1 THEN 'aborted'::txid_status
+  WHEN s = 0 THEN 'in-progress'::txid_status
+  WHEN s = 1 THEN 'committed'::txid_status
+END
+FROM pg_catalog.txid_status_internal($1) s;
+$$;
+
+COMMENT ON FUNCTION txid_status(bigint)
+IS 'get commit status of given recent xid or null if too old';

I'm not really that keen on this approach.  I don't think we need to
introduce a new data type for this, and I would rather not use SQL,
either.  It would be faster and simpler to just return the appropriate
string from a C function defined directly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question about MVCC-unsafe commands

2016-08-23 Thread Antonin Houska
I'm failing to understand why [1] mentions "table-rewriting forms of ALTER
TABLE" besides TRUNCATE command.

For TRUNCATE it seems clear: if transaction A takes the snapshot before it
accesses the table first time (typically because isolation level is at least
REPEATABLE READ) and transaction B manages to commit TRUNCATE soon enough,
then A sees pg_class entry of the table already affected by B, which has the
new (empty) relfilenode. (The original pg_class entry is no longer visible by
catalog snapshot, nor does it contain valid OID of the original relfilenode.)

But IMO heap rewriting changes neither table contents, nor visibility. Can
anyone explain what I miss?

[1] https://www.postgresql.org/docs/9.6/static/mvcc-caveats.html
-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "Some tests to cover hash_index"

2016-08-23 Thread Robert Haas
On Tue, Aug 23, 2016 at 2:17 AM, Ashutosh Sharma  wrote:
> I have also removed the warning message that we used to get for hash index 
> like 'WARNING: hash indexes are not WAL-logged and their use is discouraged' 
> as this message is now no more visible w.r.t hash index after the WAL patch 
> for hash index. Please have a look and let me know your thoughts.

Well, that change should be part of Amit's patch to add WAL logging,
not this patch, whose mission is just to improve test coverage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Kevin Grittner
On Tue, Aug 23, 2016 at 7:40 AM, Craig Ringer  wrote:
> On 23 Aug 2016 20:10, "Kevin Grittner"  wrote:
>>
>> On Mon, Aug 22, 2016 at 6:39 PM, Craig Ringer 

>>> Could you provide an example of a case where xacts replayed in
>>> commit order will produce incorrect results?
>>
>> https://wiki.postgresql.org/wiki/SSI#Deposit_Report
>>
>> ... where T3 is on the replication target.
>
> Right. But we don't attempt to replicate locking let alone SSI state. As I
> said this is expected. If T1, T2 and T3 run in the master in either READ
> COMMITTED or SERIALIZABLE we will correctly replay whatever got committed
> and leave the replica in the same state as the master.

Eventually.  Between the commit of T3 and T2 a state can be seen on
the replica which would not have been allowed on the source.

> It is row level replication so there is no simple way to detect this
> anomaly.

That is probably true, but there is a way to *prevent* it.

> We would have to send a lot of co-ordination data *in both
> directions*, right?

No.  The source has all the information about both commit order and
read-write dependencies, and could produce a stream of transaction
IDs to specify the safe order for applying transactions to prevent
the anomaly from appearing on the replica.  In this case the commit
order is T1->T3->T2, but the apparent order of execution (AOoE) is
T1->T2->T3.  If the source communicated that to the replica, and
the replica held up application of any changes from T3 until T2 was
committed there would be no chance to read incorrect results.  It
would not matter if T2 and T3 were committed on the replica
simultaneously or in AOoE, as long as the work of T3 does not
appear before the work of T2.

The replica does not need to send anything back to the source for
this to work.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] comment typo lmgr.c

2016-08-23 Thread Robert Haas
On Tue, Aug 23, 2016 at 8:13 AM, Erik Rijkers  wrote:
> comment typo in  src/backend/storage/lmgr.c  attached

Committed, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-23 Thread Robert Haas
On Mon, Aug 22, 2016 at 4:32 PM, Andres Freund  wrote:
> On 2016-08-22 16:29:12 -0400, Robert Haas wrote:
>> So, I wish I could give you some better advice on this topic, but
>> sadly I am not an expert in this area.  However, it seems to me that
>> this is just one facet of a much more general problem: given two
>> transactions T1 and T2, the order of replay must match the order of
>> commit unless you can prove that there are no dependencies between
>> them.  I don't see why it matters whether the operations are sequence
>> operations or data operations; it's just a question of whether they're
>> modifying the same "stuff".
>>
>> Of course, it's possible I'm missing something important here...
>
> Maybe that normally logical decoding outputs stuff in commit order?

As slow as I sometimes am, I did know that.  :-)

I think what I was missing is that nextval() operations are
non-transactional.  Craig describes them as non-transactional except
when they are transactional, but I think that's not really the right
way of looking at it.  Creating the sequence is transactional, and
updating the value is not.  What seems to be causing trouble for Craig
is that if the nextval() operation is inserted into the replication
stream non-transactionally, it can happen before the sequence actually
gets created.  I'm wondering if we can't find a way to make it so that
it's OK for those operations to happen out of order, rather than
trying to make the nextval() operation sometimes transactional and
other times non-transactional.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2016-08-23 Thread Robert Haas
On Tue, Aug 23, 2016 at 7:02 AM, Masahiko Sawada  wrote:
> I'd like to propose block level parallel VACUUM.
> This feature makes VACUUM possible to use multiple CPU cores.

Great.  This is something that I have thought about, too.  Andres and
Heikki recommended it as a project to me a few PGCons ago.

> As for PoC, I implemented parallel vacuum so that each worker
> processes both 1 and 2 phases for particular block range.
> Suppose we vacuum 1000 blocks table with 4 workers, each worker
> processes 250 consecutive blocks in phase 1 and then reclaims dead
> tuples from heap and indexes (phase 2).
> To use visibility map efficiency, each worker scan particular block
> range of relation and collect dead tuple locations.
> After each worker finished task, the leader process gathers these
> vacuum statistics information and update relfrozenxid if possible.

This doesn't seem like a good design, because it adds a lot of extra
index scanning work.  What I think you should do is:

1. Use a parallel heap scan (heap_beginscan_parallel) to let all
workers scan in parallel.  Allocate a DSM segment to store the control
structure for this parallel scan plus an array for the dead tuple IDs
and a lock to protect the array.

2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait.  Perhaps use the condition variable patch that I posted
previously to make the workers wait.  Then resume the parallel heap
scan, if not yet done.

Later, we can try to see if there's a way to have multiple workers
work together to vacuum a single index.  But the above seems like a
good place to start.

> I also changed the buffer lock infrastructure so that multiple
> processes can wait for cleanup lock on a buffer.

You won't need this if you proceed as above, which is probably a good thing.

> And the new GUC parameter vacuum_parallel_workers controls the number
> of vacuum workers.

I suspect that for autovacuum there is little reason to use parallel
vacuum, since most of the time we are trying to slow vacuum down, not
speed it up.  I'd be inclined, for starters, to just add a PARALLEL
option to the VACUUM command, for when people want to speed up
parallel vacuums.  Perhaps

VACUUM (PARALLEL 4) relation;

...could mean to vacuum the relation with the given number of workers, and:

VACUUM (PARALLEL) relation;

...could mean to vacuum the relation in parallel with the system
choosing the number of workers - 1 worker per index is probably a good
starting formula, though it might need some refinement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >