Re: [HACKERS] wal-size limited to 16MB - Performance issue for subsequent backup

2014-10-22 Thread Heikki Linnakangas

On 10/20/2014 11:02 PM, jes...@krogh.cc wrote:

I do suspect the majority is from 30 concurrent processes updating an
506GB GIN index, but it would be nice to confirm that. There is also a
message-queue in the DB with a fairly high turnaround.


A 506GB GIN index? Uh, interesting :). What's it used for? Trigrams?

It is for full-text-search, but it is being updated entirely regulary,
~100M records. A dump/restore cycle typically reduces the size to 30-40%
of current size.


Try 9.4 beta. The on-disk format of GIN indexes was rewritten in 9.4, 
making them a lot smaller. That might help with WAL volume too. Or not, 
but I'd love to hear what the impact is, in a real-life database :-).


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Reducing lock strength of adding foreign keys

2014-10-22 Thread Andreas Karlsson

Hi,

I have been thinking about why we need to grab an AccessExclusiveLock on 
the table with the PK when we add a foreign key. Adding new tables with 
foreign keys to old ones is common so it would be really nice if the 
lock strength could be reduced.


A comment in the code claims that we need to lock so no rows are deleted 
under us and that adding a trigger will lock in AccessExclusive anyway. 
But with MVCC catalog access and the removal of SnapshotNow I do not see 
why adding a trigger would require an exclusive lock. Just locking for 
data changes should be enough.


Looking at the code the only see the duplicate name check use the fact 
that we have grabbed an exclusive lock and that should work anyway due 
to the unique constraint, but since I am pretty new to the code base I 
could be missing something obvious. I have attached a proof of concept 
patch which reduces the lock strength to ShareLock.


What do you say? Am I missing something?

My gut feel also says that if we know it is a RI trigger we are adding 
then AccessShare should be enough for the PK table, since we could rely 
on row locks to prevent rows from being deleted. It would be really nice 
though if this was possible since this would make it possible to add a 
new table with foreign keys and data without locking anything more than 
the referred rows.


Andreas
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
new file mode 100644
index ecdff1e..49ad323
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
*** ATAddForeignKeyConstraint(AlteredTableIn
*** 6016,6031 
  	ListCell   *old_pfeqop_item = list_head(fkconstraint-old_conpfeqop);
  
  	/*
! 	 * Grab an exclusive lock on the pk table, so that someone doesn't delete
! 	 * rows out from under us. (Although a lesser lock would do for that
! 	 * purpose, we'll need exclusive lock anyway to add triggers to the pk
! 	 * table; trying to start with a lesser lock will just create a risk of
! 	 * deadlock.)
  	 */
  	if (OidIsValid(fkconstraint-old_pktable_oid))
! 		pkrel = heap_open(fkconstraint-old_pktable_oid, AccessExclusiveLock);
  	else
! 		pkrel = heap_openrv(fkconstraint-pktable, AccessExclusiveLock);
  
  	/*
  	 * Validity checks (permission checks wait till we have the column
--- 6016,6028 
  	ListCell   *old_pfeqop_item = list_head(fkconstraint-old_conpfeqop);
  
  	/*
! 	 * Grab an share lock on the pk table, so that someone doesn't delete
! 	 * rows out from under us.
  	 */
  	if (OidIsValid(fkconstraint-old_pktable_oid))
! 		pkrel = heap_open(fkconstraint-old_pktable_oid, ShareLock);
  	else
! 		pkrel = heap_openrv(fkconstraint-pktable, ShareLock);
  
  	/*
  	 * Validity checks (permission checks wait till we have the column
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
new file mode 100644
index f4c0ffa..214755d
*** a/src/backend/commands/trigger.c
--- b/src/backend/commands/trigger.c
*** CreateTrigger(CreateTrigStmt *stmt, cons
*** 157,165 
  referenced;
  
  	if (OidIsValid(relOid))
! 		rel = heap_open(relOid, AccessExclusiveLock);
  	else
! 		rel = heap_openrv(stmt-relation, AccessExclusiveLock);
  
  	/*
  	 * Triggers must be on tables or views, and there are additional
--- 157,165 
  referenced;
  
  	if (OidIsValid(relOid))
! 		rel = heap_open(relOid, ShareLock);
  	else
! 		rel = heap_openrv(stmt-relation, ShareLock);
  
  	/*
  	 * Triggers must be on tables or views, and there are additional
*** CreateTrigger(CreateTrigStmt *stmt, cons
*** 525,532 
  	 * can skip this for internally generated triggers, since the name
  	 * modification above should be sufficient.
  	 *
! 	 * NOTE that this is cool only because we have AccessExclusiveLock on the
! 	 * relation, so the trigger set won't be changing underneath us.
  	 */
  	if (!isInternal)
  	{
--- 525,531 
  	 * can skip this for internally generated triggers, since the name
  	 * modification above should be sufficient.
  	 *
! 	 * NOTE that this is cool only because of the unique contraint.
  	 */
  	if (!isInternal)
  	{
*** RemoveTriggerById(Oid trigOid)
*** 1102,1108 
  	 */
  	relid = ((Form_pg_trigger) GETSTRUCT(tup))-tgrelid;
  
! 	rel = heap_open(relid, AccessExclusiveLock);
  
  	if (rel-rd_rel-relkind != RELKIND_RELATION 
  		rel-rd_rel-relkind != RELKIND_VIEW 
--- 1101,1107 
  	 */
  	relid = ((Form_pg_trigger) GETSTRUCT(tup))-tgrelid;
  
! 	rel = heap_open(relid, ShareLock);
  
  	if (rel-rd_rel-relkind != RELKIND_RELATION 
  		rel-rd_rel-relkind != RELKIND_VIEW 
*** renametrig(RenameStmt *stmt)
*** 1257,1263 
  	 * Look up name, check permissions, and acquire lock (which we will NOT
  	 * release until end of transaction).
  	 */
! 	relid = RangeVarGetRelidExtended(stmt-relation, AccessExclusiveLock,
  	 false, false,
  	 RangeVarCallbackForRenameTrigger,
  	 NULL);
--- 1256,1262 

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-22 Thread Amit Kapila
On Tue, Oct 21, 2014 at 7:56 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
 On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund and...@2ndquadrant.com
wrote:
 
  On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:

Today, I have verified all previous comments raised by
me and looked at new code and below are my findings:


 4.
 LWLockAcquireCommon()
 {
 ..
 if (!LWLockDequeueSelf(l))
 {
  for (;;)
 {
 PGSemaphoreLock(proc-sem, false);
  if (!proc-lwWaiting)
 break;
 extraWaits++;
  }
 lock-releaseOK = true;
 ..
 }

 Setting releaseOK in above context might not be required  because if the
 control comes in this part of code, it will not retry to acquire another
 time.

 Hm. You're probably right.

You have agreed to fix this comment, but it seems you have forgot
to change it.

 11.
 LWLockRelease()
 {
 ..
 PRINT_LWDEBUG(LWLockRelease, lock, mode);
 }

 Shouldn't this be in begining of LWLockRelease function rather than
 after processing held_lwlocks array?

 Ok.

You have agreed to fix this comment, but it seems you have forgot
to change it.

Below comment doesn't seem to be adressed?

 LWLockAcquireOrWait()
 {
 ..
 LOG_LWDEBUG(LWLockAcquireOrWait, lockid, mode, succeeded);
 ..
 }

 a. such a log is not there in any other LWLock.. variants,
  if we want to introduce it, then shouldn't it be done at
  other places as well.


Below point is yet to be resolved.

  12.
  #ifdef LWLOCK_DEBUG
  lock-owner = MyProc;
  #endif
 
  Shouldn't it be reset in LWLockRelease?

 That's actually intentional. It's quite useful to know the last owner
 when debugging lwlock code.

Won't it cause any problem if the last owner process exits?


Can you explain how pg_read_barrier() in below code makes this
access safe?

LWLockWakeup()
{
..
+ pg_read_barrier(); /* pairs with nwaiters-- */
+ if (!BOOL_ACCESS_ONCE(lock-releaseOK))
..
}



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [Windows,PATCH] Use faster, higher precision timer API

2014-10-22 Thread David Rowley
On Fri, Oct 10, 2014 at 10:08 PM, Craig Ringer cr...@2ndquadrant.com
wrote:

 On 09/17/2014 08:27 PM, Craig Ringer wrote:
  Hi all
 
  Attached is a patch to switch 9.5 over to using the
  GetSystemTimeAsFileTime call instead of separate GetSystemTime and
  SystemTimeToFileTime calls.

 Following on from my prior patch that switches to using
 GetSystemTimeAsFileTime, I now attach a two-patch series that also adds
 support for GetFileTimePreciseAsFileTime where it is available.


Hi Craig,

I was just having a quick look at this with the view of testing it on a
windows 8 machine.

Here's a couple of things I've noticed:

+ pg_get_system_time = (PgGetSystemTimeFn) GetProcAddress(
+ GetModuleHandle(TEXT(kernel32.dll)),
+ GetSystemRimePreciseAsFileTime);


Rime, not doubt is meant to be Time.

Same here:

+ elog(DEBUG1, GetProcAddress(\GetSystemRimePreciseAsFileTime\) on
kernel32.dll failed with error code %d not expected
ERROR_PROC_NOT_FOUND(127), errcode);

But I don't think you'll be able to elog quite that early.  I tested by
getting rid of the if (errcode != ERROR_PROC_NOT_FOUND) test and I get:

D:\Postgres\install\binpostgres -D ..\data
error occurred at src\port\gettimeofday.c:87 before error message
processing is available

Perhaps we needn't bother with this debug message? Either that you'll
probably need to cache the error code and do something when the logger is
initialised.

Also, just for testing the resolution of the 2 functions, I added some code
into PostgreSQL's gettimeofday()

do{
(*pg_get_system_time)(ft2);
}while ((file_time.dwLowDateTime - ft2.dwLowDateTime) == 0);

#ifndef FRONTEND
elog(NOTICE, %d,  ft2.dwLowDateTime - file_time.dwLowDateTime);
if (pg_get_system_time == GetSystemTimeAsFileTime)
elog(NOTICE, GetSystemTimeAsFileTime);
else
elog(NOTICE, GetSystemTimePreciseAsFileTime);
#endif

return 0;


I see:
test=# select current_timestamp;
NOTICE:  4
NOTICE:  GetSystemTimePreciseAsFileTime

Which indicates that this is quite a precise timer.

Whereas if I add a quick hack into init_win32_gettimeofday() so that it
always chooses GetSystemTimeAsFileTime() I see:

test=# select current_timestamp;
NOTICE:  9953
NOTICE:  GetSystemTimeAsFileTime


Also, I've attached gettimeofday.c, which is a small test programme which
just makes 10 million calls to each function, in order to find out which
one of the 3 method performs best. Here I just wanted to ensure that we'd
not get any performance regression in gettimeofday() calls.

Here's the output from running this on this windows 8.1 laptop.

D:\cl /Ox gettimeofday.c
Microsoft (R) C/C++ Optimizing Compiler Version 17.00.61030 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

gettimeofday.c
Microsoft (R) Incremental Linker Version 11.00.61030.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:gettimeofday.exe
gettimeofday.obj

D:\gettimeofday.exe
GetSystemTimePreciseAsFileTime() in 0.157480 seconds
GetSystemTimeAsFileTime() in 0.060075 seconds
Current Method in 0.742677 seconds

Regards

David Rowley

#include stdio.h
#include windows.h

#define LOOPS 1000

int main(void)
{
	LARGE_INTEGER start, end, frequency;
	FILETIME ft;
	SYSTEMTIME	system_time;
	
	int x;
	if (!QueryPerformanceFrequency(frequency))
		exit(-1); 
	
	QueryPerformanceCounter(start);
	
	for (x = 0; x  LOOPS; x++)
	{
		GetSystemTimePreciseAsFileTime(ft);
	}
	QueryPerformanceCounter(end);
	
	printf(GetSystemTimePreciseAsFileTime() in %f seconds\n, (end.QuadPart - start.QuadPart) / (double)frequency.QuadPart);
	
	QueryPerformanceCounter(start);
	
	for (x = 0; x  LOOPS; x++)
	{
		GetSystemTimeAsFileTime(ft);
	}
	QueryPerformanceCounter(end);
	
	printf(GetSystemTimeAsFileTime() in %f seconds\n, (end.QuadPart - start.QuadPart) / (double)frequency.QuadPart);
	
	QueryPerformanceCounter(start);

	for (x = 0; x  LOOPS; x++)
	{
		GetSystemTime(system_time);
		SystemTimeToFileTime(system_time, ft);
	}
	QueryPerformanceCounter(end);
	
	printf(Current Method in %f seconds\n, (end.QuadPart - start.QuadPart) / (double)frequency.QuadPart);
	
	return 0;
}
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Windows,PATCH] Use faster, higher precision timer API

2014-10-22 Thread Craig Ringer
On 10/22/2014 04:12 PM, David Rowley wrote:

 I was just having a quick look at this with the view of testing it on a
 windows 8 machine.

Thankyou. I really appreciate your taking the time to do this, as one of
the barriers to getting Windows-specific patches accepted is that
usually people just don't want to review Windows patches.

I see you've already linked it on the commitfest too, so thanks again.

I'll follow up with a fixed patch and my test code shortly. I'm at PGEU
in Madrid so things are a bit chaotic, but I can make some time.

 +pg_get_system_time = (PgGetSystemTimeFn) GetProcAddress(
 +GetModuleHandle(TEXT(kernel32.dll)),
 +GetSystemRimePreciseAsFileTime);
 
 
 Rime, not doubt is meant to be Time.

Hm.

I must've somehow managed to attach and post the earlier version of the
patch before I fixed a few issues and tested it, because I've compiled
and tested this feature on Win2k12.

Apparently just not the version I published.

Thanks for catching that. I'll fix it up.

 +elog(DEBUG1, GetProcAddress(\GetSystemRimePreciseAsFileTime\) on
 kernel32.dll failed with error code %d not expected
 ERROR_PROC_NOT_FOUND(127), errcode);
 
 But I don't think you'll be able to elog quite that early.  I tested by
 getting rid of the if (errcode != ERROR_PROC_NOT_FOUND) test and I get:
 
 D:\Postgres\install\binpostgres -D ..\data
 error occurred at src\port\gettimeofday.c:87 before error message
 processing is available

Thankyou. I didn't consider that logging wouldn't be available there.

This case shouldn't really happen.

 Perhaps we needn't bother with this debug message? Either that you'll
 probably need to cache the error code and do something when the logger
 is initialised. 

In a shouldn't happen case like this I think it'll be OK to just print
to stderr.

 I see:
 test=# select current_timestamp;
 NOTICE:  4
 NOTICE:  GetSystemTimePreciseAsFileTime
 
 Which indicates that this is quite a precise timer.

Great.

Because I was testing on AWS I wasn't getting results that fine, but the
kind of granularity you're seeing is consistent with what I get on my
Linux laptop.

 Whereas if I add a quick hack into init_win32_gettimeofday() so that it
 always chooses GetSystemTimeAsFileTime() I see:
 
 test=# select current_timestamp;
 NOTICE:  9953
 NOTICE:  GetSystemTimeAsFileTime

I'll publish the test code I was using too. I was doing it from SQL
level with no code changes other than the ones required for timestamp
precision.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about RI checks

2014-10-22 Thread Florian Pflug
Florian Pflug f...@phlo.org wrote:
 So in conclusion, the lock avoids raising constraint violation errors in
 a few cases in READ COMMITTED mode. In REPEATABLE READ mode, it converts some
 constraint violation errors into serialization failures. Or at least that's
 how it looks to me.

I go the REPEATABLE READ case wrong there, and that case is actually broken!.
I initially assumed that trying to lock a row whose latest version is invisible
a transaction in mode REPEATABLE READ or above will result in a serialization
error. But for the queries run by ri_triggers.c, this is not the case.

AFAICS, the only executor node which takes the crosscheck snapshot into account
is nodeModifyTable. So both a plain SELECT and a SELECT FOR [KEY] SHARE | UPDATE
will simply run with the snapshot passed to the SPI, which for the query in
question is always a *current* snapshot (we pass the original snapshot as the
crosscheck snapshot in REPEATABLE READ). Thus, if there's a child row that
is visible to the transaction deleting the parent row, but that child was 
meanwhile
removed, it seems that we currently *won't* complain. The same holds, of course
for mode SERIALIZABLE -- we don't distinguish the two in ri_triggers.c.

But that's wrong. The transaction's snapshot *would* see that row, so we
ought to raise an error. Note that this applies also to mode SERIALIZABLE, and
breaks true serializability in some cases, since we don't do conflict detection
for RI enforcement queries.

Here's a test case, involving two transaction A and B. I tried this on
REL9_4_STABLE.

Setup
-
CREATE TABLE parent (id int NOT NULL PRIMARY KEY);
CREATE TABLE child (id int NOT NULL PRIMARY KEY,
parent_id int NOT NULL REFERENCES parent (id));
INSERT INTO parent (id) VALUES (1);
INSERT INTO child (id, parent_id) VALUES (1, 1);

Failure Case

A:: set default_transaction_isolation='serializable';
A:: begin;
A:: select * from child;
 -  id | parent_id 
+---
  1 | 1
B:: set default_transaction_isolation='serializable';
B:: begin;
B:: delete from child;
 - DELETE 1
B:: commit;
A:: delete from parent;
 - DELETE 1
A:: commit;

A can neither come before B in any serializable history (since the DELETE
should fail then), nor after it (since it shouldn't see the row in the child
table then). Yet we don't complain, even though both transaction run in
SERIALIZABLE mode.

On Oct21, 2014, at 19:27 , Nick Barnes nickbarne...@gmail.com wrote:
 On Wed, Oct 22, 2014 at 3:19 AM, Kevin Grittner kgri...@ymail.com wrote:
 
 It doesn't seem like this analysis considers all of the available ON
 DELETE and ON UPDATE behaviors available.  Besides RESTRICT there is
 CASCADE, SET NULL, SET DEFAULT, and NO ACTION.  Some of those
 require updating the referencing rows.
  
 I think the logic in question is specific to RESTRICT and NO ACTION.
 The other cases don't look like they need to explicitly lock anything;
 the UPDATE / DELETE itself should take care of that.

Exactly. We run the dubious SELECT FROM fkrel ... FOR KEY SHARE query *only*
for RESTRICT and NO ACTION.

I've traced through the revisions of ri_triggers.c a bit. Locking the
conflicting child rows in the parent table's DELETE and UPDATE triggers was
added in commit 0882951b0cdb4c6686e57121d56216cb2044f7eb which is dated
1999-12-08. (This was before we even has row-level SHARE locks, so the code
did a SELECT .. FOR UPDATE back then). This is also the commit which added
FOR UPDATE locking to RI_FKey_check, i.e. which ensured that parent rows are
locked when adding new child rows. It's commit message unfortunately doesn't
offer much in terms of explanations - it just says Fixed concurrent visibility
bug.

When SHARE locks where implemented in commit 
bedb78d386a47fd66b6cda2040e0a5fb545ee371,
dating from 2005-04-28, it seems that FOR UPDATE was simply replaced by
FOR SHARE throughout ri_triggers.c. The same seems to have happened when
KEY SHARE locks where introduced on 2013-01-13 with commit
0ac5ad5134f2769ccbaefec73844f8504c4d6182.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Simplify EXISTS subqueries containing LIMIT

2014-10-22 Thread David Rowley
On Tue, Oct 21, 2014 at 11:00 PM, Marti Raudsepp ma...@juffo.org wrote:

 On Sun, Oct 19, 2014 at 1:22 PM, David Rowley dgrowle...@gmail.com
 wrote:
  the argument for this would
  have been much stronger if anti join support had just been added last
 week.
  It's been quite a few years now and the argument for this must be getting
  weaker with every release.

 I see your point, but I would put it another way: we've had this for a
 few years, but people haven't learned and are *still* using LIMIT 1.


I've had a bit of a look at this and here's a couple of things:

/*
+* LIMIT clause can be removed if it's a positive constant or ALL,
to
+* prevent it from being an optimization barrier. It's a common
meme to put
+* LIMIT 1 within EXISTS subqueries.
+*/

I think this comment may be better explained along the lines of:

A subquery which has a LIMIT clause with a positive value is effectively a
no-op in this scenario. With EXISTS we only care about the first row
anyway, so any positive limit value will have no behavioral change to the
query, so we'll simply remove the LIMIT clause. If we're unable to prove
that the LIMIT value is a positive number then we'd better not touch it.


+ /* Checking for negative values is done later; 0 is just silly */
+ if (! limit-constisnull  DatumGetInt64(limit-constvalue) = 0)

I'm a bit confused by the comment here. You're saying that we'll check for
negatives later... but you're checking for = 0 on the next line... Did I
miss something or is this a mistake?


This test:

+select * from int4_tbl o where exists (select 1 limit 0);
+ f1
+
+(0 rows)

I guess here you're just testing to ensure that you've not broken this and
that the new code does not accidentally remove the LIMIT clause. I think it
also might be worth adding some tests to ensure that co-related EXISTS
clauses with a positive LIMIT value get properly changed into a SEMI JOIN
instead of remaining as a subplan. And also make sure that a LIMIT 0 or
LIMIT -1 gets left as a subplan. I'd say such a test would supersede the
above test, and I think it's also the case you were talking about
optimising anyway?

You can use EXPLAIN (COSTS OFF) to get a stable explain output.

Regards

David Rowley


[HACKERS] btree_gin and ranges

2014-10-22 Thread Teodor Sigaev

Suggested patch adds GIN support contains operator for ranges over scalar 
column.

It allows more effective GIN scan. Currently, queries like
SELECT * FROM test_int4 WHERE i = 1 and i = 1
will be excuted by GIN with two scans: one is from mines infinity to 1 and 
another is from -1 to plus infinity. That's because GIN is generalized and it 
doesn't know a semantics of operation.


With patch it's possible to rewrite query with ranges
SELECT * FROM test_int4 WHERE i @ '[-1, 1]'::int4range
and GIN index will support this query with single scan from -1 to 1.

Patch provides index support only for existing range types: int4, int8, numeric, 
date and timestamp with and without time zone.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


btree_gin_range-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gin and ranges

2014-10-22 Thread Marti Raudsepp
Hi

On Wed, Oct 22, 2014 at 1:55 PM, Teodor Sigaev teo...@sigaev.ru wrote:
 With patch it's possible to rewrite query with ranges
 SELECT * FROM test_int4 WHERE i @ '[-1, 1]'::int4range
 and GIN index will support this query with single scan from -1 to 1.

Shouldn't this be implemented in a more generic manner? An ordinary
btree index could very well support @ queries too, but your patch
only adds this capability to btree-gin.

The documentation describes btree-gin as providing GIN operator
classes that implement B-tree equivalent behavior, but now the
behavior diverges.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] speedup tidbitmap patch: cache page

2014-10-22 Thread Teodor Sigaev
In specific workload postgres could spend a lot of time in tbm_add_tuples,  up 
to 50% of query time. hash_search call is expensive and called twice for each 
ItemPointer to insert. Suggested patch tries to cache last PagetableEntry 
pointer in hope that next ItemPointer will be on the same relation page.


Patch is rather trivial and I don't believe that it could cause performance 
degradation. But it could increase performance on some workload.


An example:
# create extension btree_gin;
# select (v / 10)::int4 as i into t from generate_series(1, 500) as v;
# create index idx on t using gin (i);
# set enable_seqscan  = off;

# explain analyze select * from t where i = 0;
without patch: Execution time: 2427.692 ms
with patch:Execution time: 2058.718 ms

# explain analyze select * from t where i = 100 and i= 100;
without patch: Execution time: 524.441 ms
with patch:Execution time: 195.381 ms
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


tbm_cachepage-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-22 Thread Heikki Linnakangas

On 10/20/2014 09:26 AM, Michael Paquier wrote:

On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier
michael.paqu...@gmail.com wrote:


On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao masao.fu...@gmail.com wrote:


In this case, the patch seems to make the restartpoint recycle even WAL files
which have .ready files and will have to be archived later. Thought?


The real problem currently is that it is possible to have a segment file not 
marked as .done during recovery when stream connection is abruptly cut when 
this segment is switched, marking it as .ready in archive_status and simply 
letting this segment in pg_xlog because it will neither be recycled nor 
removed. I have not been able to look much at this code these days, so I am not 
sure how invasive it would be in back-branches, but perhaps we should try to 
improve code such as when a segment file is switched and connection to the is 
cut, we guarantee that this file is completed and marked as .done.


I have spent more time on that, with a bit more of underground work...
First, the problem can be reproduced most of the time by running this
simple command:
psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate

This will enforce a segment file switch and restart the master in
crash recovery. This has as effect to immediately cut the WAL stream
on slave, symbolized by a FATAL in libpqrcv_receive where rawlen == 0.
For example, let's imagine that stream fails when switching from
00010003 to the next segment, then the
the last XLogRecPtr in XLogWalRcvProcessMsg for dataStart is for
example 0/310, and walrcv-latestWalEnd is 0/400. When stream
restarts it will begin once again from 0/400, ignoring that
00010003 should be marked as .done, ultimately marking
it in .ready state when old segment files are recycled or removed.
There is nothing that can really be done to enforce the creation of a
.done file before the FATAL of libpqrcv_receive because we cannot
predict the stream failure..

Now, we can do better than what we have now by looking at WAL start
position used when starting streaming in WAL receiver and enforce
.done if the start position is the last one of previous segment.
Hence, in the case of start position 0/400 that was found
previously, the file that will be enforced to .done is
00010003. I have written the patch attached that
implements this idea and fixes the problem. Now let's see if you guys
see any flaws in this simple logic which uses a sniper gun instead of
a bazooka as in the previous patches sent.


Hmm. This will still miss the .done file if you don't re-establish the 
streaming replication connection after the restart. For example, if you 
shut down the master, and promote the standby server.


I think we should take a more wholesale approach to this. We should 
enforce the rule that the server only ever archives WAL files belonging 
to the same timeline that the server generates. IOW, the server only 
archives the WAL that it has generated.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gin and ranges

2014-10-22 Thread Teodor Sigaev

Shouldn't this be implemented in a more generic manner? An ordinary
Unfortunately I don't see a way for that. GIN is generalized - and it doesn't 
know semantic. Actually, we could fix first five strategy numbers for BTree's 
strategies, and then we could teach GIN core to use BTRee semantic, but what 
about with already existed operator classes?




The documentation describes btree-gin as providing GIN operator
classes that implement B-tree equivalent behavior, but now the
behavior diverges.


Anyway GIN couldn't be used for ORDER BY clause.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Windows,PATCH] Use faster, higher precision timer API

2014-10-22 Thread Craig Ringer
Here's an updated patch addressing David's points.

I haven't had a chance to test it yet, on win2k8 or win2k12 due to
pgconf.eu .


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From d5cd8c82bdc3caa9d18586eac8e4e083a2729d61 Mon Sep 17 00:00:00 2001
From: Craig Ringer cr...@2ndquadrant.com
Date: Thu, 18 Sep 2014 23:02:14 +0800
Subject: [PATCH 2/2] Use GetSystemTimePreciseAsFileTime when available

This will cause PostgreSQL on Windows 8 or Windows Server 2012 to
obtain high-resolution timestamps while allowing the same binaries
to run without problems on older releases.
---
 src/backend/main/main.c |  6 +
 src/include/port.h  |  2 ++
 src/port/gettimeofday.c | 59 +++--
 3 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index c51b391..73c30c5 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -260,6 +260,12 @@ startup_hacks(const char *progname)
 
 		/* In case of general protection fault, don't show GUI popup box */
 		SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);
+
+#ifndef HAVE_GETTIMEOFDAY
+		/* Figure out which syscall to use to capture timestamp information */
+		init_win32_gettimeofday();
+#endif
+
 	}
 #endif   /* WIN32 */
 
diff --git a/src/include/port.h b/src/include/port.h
index 9f8465e..4f8af0a 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -328,6 +328,8 @@ extern FILE *pgwin32_popen(const char *command, const char *type);
 #ifndef HAVE_GETTIMEOFDAY
 /* Last parameter not used */
 extern int	gettimeofday(struct timeval * tp, struct timezone * tzp);
+/* On windows we need to call some backend start setup for accurate timing */
+extern void init_win32_gettimeofday(void);
 #endif
 #else			/* !WIN32 */
 
diff --git a/src/port/gettimeofday.c b/src/port/gettimeofday.c
index 73ec406..b8871d9 100644
--- a/src/port/gettimeofday.c
+++ b/src/port/gettimeofday.c
@@ -30,14 +30,69 @@
 
 #include sys/time.h
 
+#ifndef FRONTEND
+#include utils/elog.h
+#endif
+
 
 /* FILETIME of Jan 1 1970 00:00:00. */
 static const unsigned __int64 epoch = UINT64CONST(1164447360);
 
 /*
+ * Both GetSystemTimeAsFileTime and GetSystemTimePreciseAsFileTime share a
+ * signature, so we can just store a pointer to whichever we find. This
+ * is the pointer's type.
+ */
+typedef VOID (WINAPI *PgGetSystemTimeFn)(LPFILETIME);
+/* Storage for the function we pick at runtime */
+static PgGetSystemTimeFn pg_get_system_time = NULL;
+
+/*
+ * During backend startup, determine if GetSystemTimePreciseAsFileTime is
+ * available and use it; if not, fall back to GetSystemTimeAsFileTime.
+ */
+void
+init_win32_gettimeofday(void)
+{
+	/*
+	 * Because it's guaranteed that kernel32.dll will be linked into our
+	 * address space already, we don't need to LoadLibrary it and worry about
+	 * closing it afterwards, so we're not using Pg's dlopen/dlsym() wrapper.
+	 *
+	 * We'll just look up the address of GetSystemTimePreciseAsFileTime if
+	 * present.
+	 *
+	 * While we could look up the Windows version and skip this on Windows
+	 * versions below Windows 8 / Windows Server 2012 there isn't much point,
+	 * and determining the windows version is its self somewhat Windows version
+	 * and development SDK specific...
+	 */
+	pg_get_system_time = (PgGetSystemTimeFn) GetProcAddress(
+			GetModuleHandle(TEXT(kernel32.dll)),
+GetSystemTimePreciseAsFileTime);
+	if (pg_get_system_time == NULL)
+	{
+		/*
+		 * The expected error is ERROR_PROC_NOT_FOUND, if the function isn't
+		 * present. No other error should occur.
+		 *
+		 * It's too early in startup to elog(...) if we get some unexpected
+		 * error, and not serious enough to warrant a fprintf to stderr about
+		 * it or save the error and report it later. So silently fall back
+		 * to GetSystemTimeAsFileTime unless we're on a cassert build.
+		 */
+		DWORD errcode = GetLastError();
+		Assert(errcode == ERROR_PROC_NOT_FOUND);
+
+		pg_get_system_time = GetSystemTimeAsFileTime;
+	}
+
+}
+
+/*
  * timezone information is stored outside the kernel so tzp isn't used anymore.
  *
- * Note: this function is not for Win32 high precision timing purpose. See
+ * Note: this function is not for Win32 high precision timing purposes. See
  * elapsed_time().
  */
 int
@@ -46,7 +101,7 @@ gettimeofday(struct timeval * tp, struct timezone * tzp)
 	FILETIME	file_time;
 	ULARGE_INTEGER ularge;
 
-	GetSystemTimeAsFileTime(file_time);
+	(*pg_get_system_time)(file_time);
 	ularge.LowPart = file_time.dwLowDateTime;
 	ularge.HighPart = file_time.dwHighDateTime;
 
-- 
1.9.3

From 810abe48ca9bc6dfa0c4422052ea7817e0fb256a Mon Sep 17 00:00:00 2001
From: Craig Ringer cr...@2ndquadrant.com
Date: Fri, 12 Sep 2014 12:41:35 +0800
Subject: [PATCH 1/2] Use GetSystemTimeAsFileTime directly in windows
 gettimeofday
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8

Re: [HACKERS] Obsolete reference to _bt_tuplecompare() within tuplesort.c

2014-10-22 Thread Heikki Linnakangas

On 10/19/2014 11:29 AM, Peter Geoghegan wrote:

On Fri, Oct 10, 2014 at 12:33 AM, Peter Geoghegan p...@heroku.com wrote:

Attached


Have you looked at this?


Committed now, thanks.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-10-22 Thread Dag-Erling Smørgrav
Tom Lane t...@sss.pgh.pa.us writes:
 This looks to me like re-fighting the last war.  Such a GUC has zero value
 *unless* some situation exactly like the POODLE bug comes up again, and
 the odds of that are not high.

Many people would have said the exact same thing before POODLE, and
BEAST, and CRIME, and Heartbleed.  You never know what sort of bugs or
weaknesses will show up or when; all you know is that there are a lot of
people working very hard to find these things and exploit them, and that
they *will* succeeded, again and again and again.  You can gamble that
PostgreSQL will not be vulnerable due to specific details of its
protocol or how it uses TLS, but that's a gamble which you will
eventually lose.

 Moreover, the GUC could easily be misused to decrease rather than increase
 one's security, if it's carelessly set.

That's the user's responsibility.

DES
-- 
Dag-Erling Smørgrav - d...@des.no


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-10-22 Thread Dag-Erling Smørgrav
Magnus Hagander mag...@hagander.net writes:
 If anything, I think the default should be default, and then we have
 that map out to something. Because once you've initdb'ed, the config
 file wil be stuck with a default and we can't change that in a minor
 release *if* something like POODLE shows up again.

Actually, I had that in an earlier version of the patch, but I thought
it was too obscure / circular.  I can easily re-add it.

 In a case like POODLE we probably wouldn't have done it anyway, as it
 doesn't affect our connections (we're not http)

If I understand correctly, imaps has been shown to be vulnerable as
well, so I wouldn't be so sure.

 Having the guc could certainly be useful in some cases. If we do, we
 should of course *also* have a corresponding configuration option in
 libpq, so I'd say this patch is incomplete if we do want to do it.

I can update the patch to include the client side.

DES
-- 
Dag-Erling Smørgrav - d...@des.no


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] speedup tidbitmap patch: hash BlockNumber

2014-10-22 Thread Teodor Sigaev
Just replace tag_hash in tidbitmap which uses hash_any to direct call of 
hash_uint32, it saves ~5% of execution time.


An example:
# create extension btree_gin;
# select (v / 10)::int4 as i into t from generate_series(1, 500) as v;
# create index idx on t using gin (i);
# set enable_seqscan  = off;

# explain analyze select * from t where i = 0;
without patch: Execution time: 2427.692 ms
with patch:Execution time: 2319.376 ms

# explain analyze select * from t where i = 100 and i= 100;
without patch: Execution time: 524.441 ms
with patch:Execution time: 504.478 ms
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


tbm_blocknumber-2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-10-22 Thread Dag-Erling Smørgrav
Tom Lane t...@sss.pgh.pa.us writes:
 And in the end, if we set values like this from PG --- whether
 hard-wired or via a GUC --- the SSL library people will have exactly
 the same perspective with regards to *our* values.  And not without
 reason; we were forcing very obsolete settings up till recently,
 because nobody had looked at the issue for a decade.  I see no reason
 to expect that that history won't repeat itself.

I'm not sure what you're saying here, but - I'm not sure how familiar
you are with the OpenSSL API, but it's insecure by default.  There is
*no other choice* for an application than to explicitly select which
protocols it wants to use (or at least which protocols it wants to
avoid).  And you can't change OpenSSL, because a ton of old crappy
software is going to break.

DES
-- 
Dag-Erling Smørgrav - d...@des.no


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-22 Thread Michael Paquier
On Fri, Oct 17, 2014 at 2:23 PM, Fujii Masao masao.fu...@gmail.com wrote:

 I found one problem in the 0002 patch. The patch changes the recovery so
 that
 it creates .done files for every WAL files which exist in pg_xlog
 directory at
 the end of recovery. But even WAL files which will have to be archived
 later
 can exist in pg_xlog at that moment. For example, the latest, recycled and
 fully-written-but-not-archived-yet (i.e., maybe having .ready files) WAL
 files.
 The patch wrongly prevents them from being archived at all.

Re-looking at patch 2, yes you are right. Even if it was mentioned that we
should do that for a node that had promotion triggered it was not done this
way as a check on CheckForStandbyTrigger() is actually missing. Attached is
an updated patch.


 ISTM that the 0001 patch has the similar problem. Please imagine the
 following
 scenario.
 In this case, the patch seems to make the restartpoint recycle even WAL
 files
 which have .ready files and will have to be archived later. Thought?

Right, that's really backward. This was the approach taken before c9cc7e0,
and this commit actually prevents removal of unarchived WAL files during
recovery.
-- 
Michael
From fdf1f4a46a2f870ee2aeda11f909c465ed12450a Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Wed, 22 Oct 2014 15:18:33 +0200
Subject: [PATCH] Enforce all WAL segment files to be marked as .done at  node
 promotion

This is a safety mechanism to ensure that there are no files that are not
considered as .done as some segments may have been missed particularly in
the case of partially written files or files being written when a disconnection
occurred between a streaming standby and its root node. This makes the node
reaching promotion having a state consistent with what is expected using
the assumption that all the WAL segment files that are done being streamed
should be always considered as archived by the node.
---
 src/backend/access/transam/xlog.c| 10 
 src/backend/access/transam/xlogarchive.c | 39 +++-
 src/include/access/xlog_internal.h   |  1 +
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 235b442..7179003 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6862,6 +6862,16 @@ StartupXLOG(void)
 	}
 
 	/*
+	 * Create a .done entry for each WAL file present in pg_xlog that has
+	 * not been yet marked as such for a node that has been promoted. In some
+	 * cases where for example a streaming replica has had a connection to a
+	 * remote node cut abruptly, such WAL files may have been only partially
+	 * written or even not flagged correctly with .done.
+	 */
+	if (InRecovery  CheckForStandbyTrigger())
+		XLogArchiveForceDoneAll();
+
+	/*
 	 * Kill WAL receiver, if it's still running, before we continue to write
 	 * the startup checkpoint record. It will trump over the checkpoint and
 	 * subsequent records if it's still alive when we start writing WAL.
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 047efa2..931106f 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -554,6 +554,40 @@ XLogArchiveNotifySeg(XLogSegNo segno)
 }
 
 /*
+ * XLogArchiveForceDoneAll
+ *
+ * Wrapper of XLogArchiveForceDone scanning all the XLOG segments files in
+ * XLOGDIR, switching them forcibly to XLOG.done.
+ */
+void
+XLogArchiveForceDoneAll(void)
+{
+	DIR			   *xldir;
+	struct dirent  *xlde;
+
+
+	xldir = AllocateDir(XLOGDIR);
+	if (xldir == NULL)
+		ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg(could not open transaction log directory \%s\: %m,
+		XLOGDIR)));
+
+	/*
+	 * Scan all the WAL segments present in the archives and switch them to
+	 * .done.
+	 */
+	while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
+	{
+		if (strlen(xlde-d_name) == 24 
+			strspn(xlde-d_name, 0123456789ABCDEF) == 24)
+			XLogArchiveForceDone(xlde-d_name);
+	}
+
+	FreeDir(xldir);
+}
+
+/*
  * XLogArchiveForceDone
  *
  * Emit notification forcibly that an XLOG segment file has been successfully
@@ -582,7 +616,6 @@ XLogArchiveForceDone(const char *xlog)
 	(errcode_for_file_access(),
 	 errmsg(could not rename file \%s\ to \%s\: %m,
 			archiveReady, archiveDone)));
-
 		return;
 	}
 
@@ -604,6 +637,10 @@ XLogArchiveForceDone(const char *xlog)
 		archiveDone)));
 		return;
 	}
+
+	ereport(DEBUG2,
+			(errmsg(archive status file of \%s\ has been forcibly switched from \%s\ to \%s\,
+	xlog, archiveReady, archiveDone)));
 }
 
 /*
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 27b9899..04a0de3 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -286,6 +286,7 @@ extern void ExecuteRecoveryCommand(char *command, char *commandName,
 extern void 

Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-10-22 Thread Dag-Erling Smørgrav
Tom Lane t...@sss.pgh.pa.us writes:
 As far as protocol version goes, I think our existing coding basically
 says prefer newest available version, but at least TLS 1.0.  I think
 that's probably a reasonable approach.

The client side forces TLS 1.0:

SSL_context = SSL_CTX_new(TLSv1_method());

In typical OpenSSL fashion, this does *not* mean 1.0 or higher.  It
means 1.0 exactly.

 If the patch exposed a GUC that set a minimum version, rather than
 calling out specific acceptable protocols, it might be less risky.

Not necessarily.  Someone might find a weakness in TLS 1.1 which is not
present in 1.0 because it involves a specific algorithm or mode that 1.0
does not support.

DES
-- 
Dag-Erling Smørgrav - d...@des.no


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-10-22 Thread Dag-Erling Smørgrav
Magnus Hagander mag...@hagander.net writes:
 Yes, it does that. Though it only does it on 9.4,but with the facts we
 know now, what 9.4+ does is perfectly safe.

Agreed.

DES
-- 
Dag-Erling Smørgrav - d...@des.no


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-22 Thread Heikki Linnakangas

On 10/17/2014 01:59 PM, Simon Riggs wrote:

On 17 October 2014 09:55,  furu...@pm.nttdata.co.jp wrote:


A new parameter to send feedback should be called --feedback



A second parameter to decide whether to fsync should be called --fsync


I think keep using --reply-fsync and --fsync-interval is better than make 
new options.
Thought?


We already have hot_standby_feedback, so using the name feedback is best idea.

I am suggesting that we send feedback even if we do not fsync, to
allow the master to track our progress. Hence the name of the second
parameter was just fsync.

So both names were suggested because of links to those terms already
being used for similar reasons elsewhere in Postgres.


We seem to be going in circles. You suggested having two options, 
--feedback, and --fsync, which is almost exactly what Furuya posted 
originally. I objected to that, because I think that user interface is 
too complicated. Instead, I suggested having just a single option called 
--synchronous, or even better, have no option at all and have the server 
tell the client if it's participating in synchronous replication, and 
have pg_receivexlog automatically fsync when it is, and not otherwise 
[1]. That way you don't need to expose any new options to the user. What 
did you think of that idea?


[1] http://www.postgresql.org/message-id/5434e0ef.9050...@vmware.com

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-10-22 Thread Michael Paquier
On Wed, Oct 22, 2014 at 3:12 PM, Dag-Erling Smørgrav d...@des.no wrote:

 Tom Lane t...@sss.pgh.pa.us writes:
  This looks to me like re-fighting the last war.  Such a GUC has zero
 value
  *unless* some situation exactly like the POODLE bug comes up again, and
  the odds of that are not high.

 Many people would have said the exact same thing before POODLE, and
 BEAST, and CRIME, and Heartbleed.  You never know what sort of bugs or
 weaknesses will show up or when; all you know is that there are a lot of
 people working very hard to find these things and exploit them, and that
 they *will* succeeded, again and again and again.  You can gamble that
 PostgreSQL will not be vulnerable due to specific details of its
 protocol or how it uses TLS, but that's a gamble which you will
 eventually lose.

There are some companies, without naming them, that have increased the
resources dedicated to analyze existing security protocols and libraries,
so even if the chances are low, IMO the occurence that similar problems
show up are getting to increase wit the time.


  Moreover, the GUC could easily be misused to decrease rather than
 increase
  one's security, if it's carelessly set.

 That's the user's responsibility.

I actually just had a user knocking about having a way to control the
protocols used by server... So, changing my opinion on the matter, that
would be nice to have even such a parameter on back-branches, with
'default' to let the server decide which one is better.
-- 
Michael


[HACKERS] compress method for spgist

2014-10-22 Thread Teodor Sigaev
When we developed SP-GiST we missed analogue of GiST's compress method. There 
was two reasons for that: lack of imagination to imagine case with different 
types of indexed value and column, and we didn't want call some method while 
index fit in one page. Some discussion on that

http://www.postgresql.org/message-id/27542.1323534...@sss.pgh.pa.us
http://www.postgresql.org/message-id/4ee77ea1.6030...@sigaev.ru

But we was wrong: PostGIS guys found an example: polygon indexing with storing 
just a bounding box. Actually, I don't know index structure for boxes suitable 
for SP-GiST but I'm not a geometer. They are.


Attached patch provides support of optional compress method for SP-GiST.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


spgist_compress_method-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-22 Thread Andres Freund
On 2014-10-22 13:32:07 +0530, Amit Kapila wrote:
 On Tue, Oct 21, 2014 at 7:56 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  
   On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
 
 Today, I have verified all previous comments raised by
 me and looked at new code and below are my findings:
 
 
  4.
  LWLockAcquireCommon()
  {
  ..
  if (!LWLockDequeueSelf(l))
  {
   for (;;)
  {
  PGSemaphoreLock(proc-sem, false);
   if (!proc-lwWaiting)
  break;
  extraWaits++;
   }
  lock-releaseOK = true;
  ..
  }
 
  Setting releaseOK in above context might not be required  because if the
  control comes in this part of code, it will not retry to acquire another
  time.
 
  Hm. You're probably right.
 
 You have agreed to fix this comment, but it seems you have forgot
 to change it.

After I've thought more about it, it's is actually required. This
essentially *is* a retry. Someobdy woke us up, which is where releaseOK
is supposed to be set.

  11.
  LWLockRelease()
  {
  ..
  PRINT_LWDEBUG(LWLockRelease, lock, mode);
  }
 
  Shouldn't this be in begining of LWLockRelease function rather than
  after processing held_lwlocks array?
 
  Ok.
 
 You have agreed to fix this comment, but it seems you have forgot
 to change it.



 Below comment doesn't seem to be adressed?
 
  LWLockAcquireOrWait()
  {
  ..
  LOG_LWDEBUG(LWLockAcquireOrWait, lockid, mode, succeeded);
  ..
  }
 
  a. such a log is not there in any other LWLock.. variants,
   if we want to introduce it, then shouldn't it be done at
   other places as well.

I think you're placing unneccessarily high consistency constraints on a
debugging feature here.

 Below point is yet to be resolved.
 
   12.
   #ifdef LWLOCK_DEBUG
   lock-owner = MyProc;
   #endif
  
   Shouldn't it be reset in LWLockRelease?
 
  That's actually intentional. It's quite useful to know the last owner
  when debugging lwlock code.
 
 Won't it cause any problem if the last owner process exits?

No. PGPROCs aren't deallocated or anything. And it's a debugging only
variable.

 Can you explain how pg_read_barrier() in below code makes this
 access safe?
 
 LWLockWakeup()
 {
 ..
 + pg_read_barrier(); /* pairs with nwaiters-- */
 + if (!BOOL_ACCESS_ONCE(lock-releaseOK))
 ..
 }

What's the concern you have? Full memory barriers (the atomic
nwaiters--) pair with read memory barriers.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-22 Thread Simon Riggs
On 22 October 2014 14:26, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 We seem to be going in circles. You suggested having two options,
 --feedback, and --fsync, which is almost exactly what Furuya posted
 originally. I objected to that, because I think that user interface is too
 complicated. Instead, I suggested having just a single option called
 --synchronous, or even better, have no option at all and have the server
 tell the client if it's participating in synchronous replication, and have
 pg_receivexlog automatically fsync when it is, and not otherwise [1]. That
 way you don't need to expose any new options to the user. What did you think
 of that idea?

Sorry, if we're going in circles.

Yes, I like the idea.

The master setting of synchronous_standby_names defines which
standbys/rep clients will have their feedback used to release waiting
COMMITs. That uses application_name, which is set at the replication
client connection. (That has a default value, but lets assume the user
sets this). So when a replication client connects, the WALSender knows
its priority, as defined in sync_standby_names.

I guess we can have WALSender send a protocol message to the client
every time the sync priority changes (as a result of a changed value
of sync_standby_names). Which is the function SyncRepInitConfig()

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function array_agg(array)

2014-10-22 Thread Pavel Stehule
Hi

2014-10-19 8:02 GMT+02:00 Ali Akbar the.ap...@gmail.com:

 So, is there any idea how we will handle NULL and empty array in
 array_agg(anyarray)?
 I propose we just reject those input because the output will make no
 sense:
 - array_agg(NULL::int[]) -- the result will be indistinguished from
 array_agg of NULL ints.
 - array_agg('{}'::int[]) -- how we determine the dimension of the
 result? is it 0? Or the result will be just an empty array {} ?


 This updated patch rejects NULL and {} arrays as noted above.



I agree with your proposal. I have a few comments to design:

1. patch doesn't hold documentation and regress tests, please append it.

2. this functionality (multidimensional aggregation) can be interesting
more times, so maybe some interface like array builder should be preferred.

3. array_agg was consistent with array(subselect), so it should be fixed too

postgres=# select array_agg(a) from test;
   array_agg
---
 {{1,2,3,4},{1,2,3,4}}
(1 row)

postgres=# select array(select a from test);
ERROR:  could not find array type for data type integer[]

4. why you use a magic constant (64) there?

+ astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+ astate-aitems = 64 * nitems;

+ astate-nullbitmap = (bits8 *)
+ repalloc(astate-nullbitmap, (astate-aitems + 7) / 8);

Regards

Pavel



 Regards,
 --
 Ali Akbar


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




Re: [HACKERS] Reducing lock strength of adding foreign keys

2014-10-22 Thread Tom Lane
Andreas Karlsson andr...@proxel.se writes:
 I have attached a proof of concept 
 patch which reduces the lock strength to ShareLock.

You're kidding, right?  ShareLock isn't even self-exclusive.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-22 Thread Andres Freund
On 2014-10-21 19:56:05 +0530, Amit Kapila wrote:
 On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund and...@2ndquadrant.com
 wrote:
 
  On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
   2.
   LWLockWakeup()
   {
   ..
   #ifdef LWLOCK_STATS
   lwstats-spin_delay_count += SpinLockAcquire(lock-mutex);
   #else
   SpinLockAcquire(lock-mutex);
   #endif
   ..
   }
  
   Earlier while releasing lock, we don't count it towards LWLock stats
   spin_delay_count.  I think if we see other places in lwlock.c, it only
 gets
   counted when we try to acquire it in a loop.
 
  I think the previous situation was clearly suboptimal. I've now modified
  things so all spinlock acquirations are counted.
 
 Code has mainly 4 stats (sh_acquire_count, ex_acquire_count,
 block_count, spin_delay_count) to track, if I try to see
 all stats together to understand the contention situation,
 the unpatched code makes sense.

I don't think it does. It completely disregards that the contention may
actually be in LWLockRelease(). That contributes to to spinlock
contention just as much as LWLockAcquire().

 spin_delay_count gives
 how much delay has happened to acquire spinlock which when
 combined with other stats gives the clear situation about
 the contention around aquisation of corresponding LWLock.
 Now if we want to count the spin lock delay for Release call
 as well, then the meaning of the stat is getting changed.
 It might be that new meaning of spin_delay_count stat is more
 useful in some situations, however the older one has its own
 benefits, so I am not sure if changing this as part of this
 patch is the best thing to do.

In which case does the old definition make sense, where the new one
doesn't? I don't think it exists.

And changing it here seems to make sense because spinlock contention
fundamentally changes it meaning for lwlocks anyway as in most paths we
don't take a spinlock anymore.

   5.
   LWLockWakeup()
   {
   ..
   dlist_foreach_modify(iter, (dlist_head *) wakeup)
   {
   PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
   LOG_LWDEBUG(LWLockRelease, l, mode, release waiter);
   dlist_delete(waiter-lwWaitLink);
   pg_write_barrier();
   waiter-lwWaiting = false;
   PGSemaphoreUnlock(waiter-sem);
   }
   ..
   }
  
   Why can't we decrement the nwaiters after waking up? I don't think
   there is any major problem even if callers do that themselves, but
   in some rare cases LWLockRelease() might spuriously assume that
   there are some waiters and tries to call LWLockWakeup().  Although
   this doesn't create any problem, keeping the value sane is good unless
   there is some problem in doing so.
 
  That won't work because then LWLockWakeup() wouldn't be called when
  necessary - precisely because nwaiters is 0.
 
  The reason I've done so is that it's otherwise much harder to debug
  issues where there are backend that have been woken up already, but
  haven't rerun yet. Without this there's simply no evidence of that
  state. I can't see this being relevant for performance, so I'd rather
  have it stay that way.
 
 I am not sure what useful information we can get during debugging by not
 doing this in LWLockWakeup()

It's useful because you can detect backends that have been scheduled to
acquire the lock, but haven't yet. They're otherwise invisible.

 and w.r.t performance it can lead extra
 function call, few checks and I think in some cases even can
 acquire/release spinlock.

I fail to see how that could be the case. And again, this is code that's
only executed around a couple syscalls. And the cacheline will be
touched around there *anyway*.

   6.
   LWLockWakeup()
   {
   ..
   dlist_foreach_modify(iter, (dlist_head *) l-waiters)
   {
   ..
   if (wokeup_somebody  waiter-lwWaitMode == LW_EXCLUSIVE)
   continue;
   ..
   if (waiter-lwWaitMode != LW_WAIT_UNTIL_FREE)
   {
   ..
   wokeup_somebody = true;
   }
   ..
   }
   ..
   }
  
   a.
   IIUC above logic, if the waiter queue is as follows:
   (S-Shared; X-Exclusive) S X S S S X S S
  
   it can skip the exclusive waiters and release shared waiter.
  
   If my understanding is right, then I think instead of continue, there
   should be *break* in above logic.
 
  No, it looks correct to me. What happened is that the first S was woken
  up. So there's no point in waking up an exclusive locker, but further
  non-exclusive lockers can be woken up.
 
 Okay, even then it makes the current logic of wakingup
 different which I am not sure is what this patch is intended
 for.

It's already done in a separate patch...

   b.
   Consider below sequence of waiters:
   (S-Shared; X-Exclusive) S S X S S
  
   I think as per un-patched code, it will wakeup waiters uptill
 (including)
   first Exclusive, but patch will wake up uptill (*excluding*) first
   Exclusive.
 
  I don't think the current code does that.
 
 LWLockRelease()
 {
 ..
 /*
  * If the front waiter wants exclusive lock, awaken him only.
  *
 Otherwise awaken as many waiters as want 

Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations

2014-10-22 Thread Michael Meskes
On Wed, Oct 15, 2014 at 09:50:16AM -0400, Tom Lane wrote:
 The same thought had occurred to me.  Probably the main use of the
 datetime parsing code in ecpg is for interpreting outputs from the
 server, and (at least by default) the server doesn't use timezone
 abbreviations when printing timestamps.  So maybe that's largely
 dead code anyhow.  I would not propose back-patching such a change,
 but we could try it in 9.5 and see if anyone complains.

Agreed on all accounts.

 A less drastic remedy would be to remove just those abbreviations
 whose meaning has actually changed over time.  Eventually that
 might be all of them ... but in the meantime, we could at least
 argue that we weren't breaking any case that worked well before.

This is what your patch did, right?

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] from_collapse_limit considerations

2014-10-22 Thread Antonin Houska
[ I think I responded earlier but don't see my mail in the archive... ]

Tom Lane t...@sss.pgh.pa.us wrote:
 Antonin Houska a...@cybertec.at writes:
  I noticed that - unlike join_collapse_limit - the from_collapse_limit does 
  not
  enforce maximum length of the top-level list. Shouldn't it do?

 How would it do that?  You want it to fail outright if there are more than
 N tables?  That seems unhelpful.

Sure, truncation of the FROM list would be crazy, and that's not what I tried 
to propose.

--
Antonin Houska
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function array_agg(array)

2014-10-22 Thread Ali Akbar
Thanks for the review

2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 I agree with your proposal. I have a few comments to design:


 1. patch doesn't hold documentation and regress tests, please append it.

OK, i'll add the documentation and regression test


 2. this functionality (multidimensional aggregation) can be interesting
 more times, so maybe some interface like array builder should be preferred.

We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
haven't we?

Actually array_agg(anyarray) can be implemented by using accumArrayResult
and makeMdArrayResult, but in the process we will deconstruct the array
data and null bit-flags into ArrayBuildState-dvalues and dnulls. And we
will reconstruct it through makeMdArrayResult. I think this way it's not
efficient, so i decided to reimplement it and memcpy the array data and
null flags as-is.

In other places, i think it's clearer if we just use accumArrayResult and
makeArrayResult/makeMdArrayResult as API for building (multidimensional)
arrays.


 3. array_agg was consistent with array(subselect), so it should be fixed
 too

 postgres=# select array_agg(a) from test;
array_agg
 ---
  {{1,2,3,4},{1,2,3,4}}
 (1 row)

 postgres=# select array(select a from test);
 ERROR:  could not find array type for data type integer[]


I'm pretty new in postgresql development. Can you point out where is
array(subselect) implemented?



 4. why you use a magic constant (64) there?

 + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
 + astate-aitems = 64 * nitems;

 + astate-nullbitmap = (bits8 *)
 + repalloc(astate-nullbitmap, (astate-aitems + 7) / 8);


just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate-alen = 64; /* arbitrary starting array size */

it can be any number not too small and too big. Too small, and we will
realloc shortly. Too big, we will end up wasting memory.

Regards,
-- 
Ali Akbar


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-22 Thread Merlin Moncure
On Fri, Oct 17, 2014 at 10:47 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, Oct 17, 2014 at 10:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 On Fri, Oct 17, 2014 at 10:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think it wouldn't; Merlin is proposing that f would be taken as the
 field name.  A more realistic objection goes like this:

 create table foo(f int, g int);
 update foo x set x = (1,2);  -- works
 alter table foo add column x int;
 update foo x set x = (1,2,3);  -- no longer works

 It's not a real good thing if a column addition or renaming can
 so fundamentally change the nature of a query.

 That's exactly how SELECT works.  I also dispute that the user should
 be surprised in such cases.

 Well, the reason we have a problem in SELECT is that we support an
 unholy miscegenation of SQL-spec and PostQUEL syntax: the idea that
 SELECT foo FROM foo could represent a whole-row selection is nowhere
 to be found in the SQL standard, for good reason IMO.  But we've never
 had the courage to break cleanly with this Berkeley leftover and
 insist that people spell it SQL-style as SELECT ROW(foo.*) FROM foo.
 I'd just as soon not introduce the same kind of ambiguity into UPDATE
 if we have a reasonable alternative.

 Ah, interesting point (I happen to like the terse syntax and use it
 often).  This is for posterity anyways since you guys seem to like
 Atri's proposal, which surprised me.  However, I think you're over
 simplifying things here.  Syntax aside: I think
 SELECT f FROM foo f;
 and a hypothetical
 SELECT row(f.*) FROM foo f;

 give different semantics.  The former gives an object of type 'f' and
 the latter gives type 'row'.  To get parity, you'd have to add an
 extra cast which means you'd have to play tricky games to avoid extra
 performance overhead besides being significantly more verbose.  I'm
 aware some of the other QUELisms are pretty dodgy and have burned us
 in the past (like that whole function as record reference thing) but
 pulling a record as a field in select is pretty nice.  It's also
 widely used and quite useful in json serialization.

Been thinking about this in the back of my mind the last couple of
days.  There are some things you can do with the QUEL 'table as
column' in select syntax that are impossible otherwise, at least
today, and its usage is going to proliferate because of that.  Row
construction via () or row() needs to be avoided whenever the column
names are important and there is no handy type to cast to. For
example, especially during json serialization it's convenient to do
things like:

select
  array_agg((select q from (select a, b) q) order by ...)
from foo;

...where a,b are fields of foo.  FWICT, this is the only way to do
json serialization of arbitrary runtime row constructions in a way
that does not anonymize the type.  Until I figured out this trick I
used to create lots of composite types that served no purpose other
than to give me a type to cast to which is understandably annoying.

if:
select (x).* from (select (1, 2) as x) q;

worked and properly expanded x to given names should they exist and:
SELECT row(f.*) FROM foo f;

worked and did same, and:
SELECT (row(f.*)).* FROM foo f;

was as reasonably performant and gave the same results as:
SELECT (f).* FROM foo f;

...then IMSNHO you'd have a reasonable path to deprecating the QUEL
inspired syntax.  Food for thought.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations

2014-10-22 Thread Tom Lane
Michael Meskes mes...@postgresql.org writes:
 On Wed, Oct 15, 2014 at 09:50:16AM -0400, Tom Lane wrote:
 The same thought had occurred to me.  Probably the main use of the
 datetime parsing code in ecpg is for interpreting outputs from the
 server, and (at least by default) the server doesn't use timezone
 abbreviations when printing timestamps.  So maybe that's largely
 dead code anyhow.  I would not propose back-patching such a change,
 but we could try it in 9.5 and see if anyone complains.

 Agreed on all accounts.

 A less drastic remedy would be to remove just those abbreviations
 whose meaning has actually changed over time.  Eventually that
 might be all of them ... but in the meantime, we could at least
 argue that we weren't breaking any case that worked well before.

 This is what your patch did, right?

No, I did not touch ecpg's set of tokens at all, just changed the
representation of datetktbl to match the new backend coding.
I figured we could discuss behavioral changes separately.

I don't have a strong opinion about which of the above things to do ...
what's your preference?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about RI checks

2014-10-22 Thread Kevin Grittner
Florian Pflug f...@phlo.org wrote:
 Florian Pflug f...@phlo.org wrote:

 But that's wrong. The transaction's snapshot *would* see that row, so we
 ought to raise an error. Note that this applies also to mode SERIALIZABLE, and
 breaks true serializability in some cases, since we don't do conflict 
 detection
 for RI enforcement queries.

 Here's a test case, involving two transaction A and B. I tried this on
 REL9_4_STABLE.

 Setup
 -
 CREATE TABLE parent (id int NOT NULL PRIMARY KEY);
 CREATE TABLE child (id int NOT NULL PRIMARY KEY,
 parent_id int NOT NULL REFERENCES parent (id));
 INSERT INTO parent (id) VALUES (1);
 INSERT INTO child (id, parent_id) VALUES (1, 1);

 Failure Case
 
 A:: set default_transaction_isolation='serializable';
 A:: begin;
 A:: select * from child;
 -  id | parent_id
 +---
   1 |1
 B:: set default_transaction_isolation='serializable';
 B:: begin;
 B:: delete from child;
 - DELETE 1
 B:: commit;
 A:: delete from parent;
 - DELETE 1
 A:: commit;

 A can neither come before B in any serializable history (since the DELETE
 should fail then), nor after it (since it shouldn't see the row in the child
 table then). Yet we don't complain, even though both transaction run in
 SERIALIZABLE mode.

Simplifying the display of this a bit:

  Tables parent and child each have one row.

  Transaction A
  =
  select * from child;
  [it sees the child row]
Transaction B
=
delete from child;
[child row is deleted]
  delete from parent;
  [parent row deleted]

TA sees the row that TB deletes, creating a rw-dependency that
implies that TA ran first.  On the other hand, the FK trigger
fired by the delete from parent *doesn't* see the row deleted by
TB, which implies that TB ran first.  I agree, this is a problem
for serializable transactions.

This should not be considered a problem for repeatable read
transactions because the change in visible rows meet the definition
of phantom reads, which are allowed in repeatable read: A
transaction re-executes a query returning a set of rows that
satisfy a search condition and finds that the set of rows
satisfying the condition has changed due to another
recently-committed transaction.  Phantom reads are not *required*
to occur in repeatable read transactions, and in PostgreSQL they
generally don't, so we *might* want to change this behavior; I'm
just saying that we are conforming to requirements of the standard
even if we don't.  Leaving this alone for repeatable read
transactions would require a change to our documentation, though,
since we currently assert that we don't allow phantom reads in our
repeatable read implementation.

Either SSI needs to include the RI checking queries in its tracking
*or* TA needs to throw an error if there are child rows visible
according to its transaction snapshot -- at least in serializable
transactions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function array_agg(array)

2014-10-22 Thread Pavel Stehule
2014-10-22 16:58 GMT+02:00 Ali Akbar the.ap...@gmail.com:

 Thanks for the review

 2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 I agree with your proposal. I have a few comments to design:


 1. patch doesn't hold documentation and regress tests, please append it.

 OK, i'll add the documentation and regression test


 2. this functionality (multidimensional aggregation) can be interesting
 more times, so maybe some interface like array builder should be preferred.

 We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
 haven't we?

 Actually array_agg(anyarray) can be implemented by using accumArrayResult
 and makeMdArrayResult, but in the process we will deconstruct the array
 data and null bit-flags into ArrayBuildState-dvalues and dnulls. And we
 will reconstruct it through makeMdArrayResult. I think this way it's not
 efficient, so i decided to reimplement it and memcpy the array data and
 null flags as-is.


aha, so isn't better to fix a performance for accumArrayResult ?



 In other places, i think it's clearer if we just use accumArrayResult and
 makeArrayResult/makeMdArrayResult as API for building (multidimensional)
 arrays.


 3. array_agg was consistent with array(subselect), so it should be fixed
 too

 postgres=# select array_agg(a) from test;
array_agg
 ---
  {{1,2,3,4},{1,2,3,4}}
 (1 row)

 postgres=# select array(select a from test);
 ERROR:  could not find array type for data type integer[]


 I'm pretty new in postgresql development. Can you point out where is
 array(subselect) implemented?


where you can start?

postgres=# explain select array(select a from test);
ERROR:  42704: could not find array type for data type integer[]
LOCATION:  exprType, nodeFuncs.c:116

look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
postgresql sources this keyword





 4. why you use a magic constant (64) there?

 + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
 + astate-aitems = 64 * nitems;

 + astate-nullbitmap = (bits8 *)
 + repalloc(astate-nullbitmap, (astate-aitems + 7) / 8);


 just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
 astate-alen = 64; /* arbitrary starting array size */

 it can be any number not too small and too big. Too small, and we will
 realloc shortly. Too big, we will end up wasting memory.


you can try to alloc 1KB instead as start -- it is used more times in
Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

You take this value from accumArrayResult, but it is targeted for shorted
scalars - you should to expect so any array will be much larger.





 Regards,
 --
 Ali Akbar



[HACKERS] idea: allow AS label inside ROW constructor

2014-10-22 Thread Pavel Stehule
Hi

with new functions row_to_json(b), there is more often usage of ROW
constructor. Using names in fields is relative difficult. Because ROW has
special clause in parser, I am thinking so we can enable labeling inside
ROW constructor

so instead currently supported:

select row_to_json(r) from (select 10 as a, 20 as b) r;

users can to write:

select row_to_json(row(10 as a,20 as b));

labeling will be enabled only inside ROW constructor. I don't propose
enable it everywhere.

What do you think about it?

Regards

Pavel

Currently supported syntax is natural for long time PostgreSQL user, but it
is relative strange for usual user.


Re: [HACKERS] idea: allow AS label inside ROW constructor

2014-10-22 Thread Pavel Stehule
here is a motivation, why I propose this feature

http://dba.stackexchange.com/questions/27732/set-names-to-attributes-when-creating-json-with-row-to-json

same query I have in Czech postgres users mailing list

Pavel

2014-10-22 18:21 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 with new functions row_to_json(b), there is more often usage of ROW
 constructor. Using names in fields is relative difficult. Because ROW has
 special clause in parser, I am thinking so we can enable labeling inside
 ROW constructor

 so instead currently supported:

 select row_to_json(r) from (select 10 as a, 20 as b) r;

 users can to write:

 select row_to_json(row(10 as a,20 as b));

 labeling will be enabled only inside ROW constructor. I don't propose
 enable it everywhere.

 What do you think about it?

 Regards

 Pavel

 Currently supported syntax is natural for long time PostgreSQL user, but
 it is relative strange for usual user.




Re: [HACKERS] idea: allow AS label inside ROW constructor

2014-10-22 Thread Merlin Moncure
On Wed, Oct 22, 2014 at 11:21 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hi

 with new functions row_to_json(b), there is more often usage of ROW
 constructor. Using names in fields is relative difficult. Because ROW has
 special clause in parser, I am thinking so we can enable labeling inside ROW
 constructor

 so instead currently supported:

 select row_to_json(r) from (select 10 as a, 20 as b) r;

 users can to write:

 select row_to_json(row(10 as a,20 as b));

 labeling will be enabled only inside ROW constructor. I don't propose
 enable it everywhere.

 What do you think about it?

It's a neat idea -- maybe a better alternative to what I was thinking
here: 
http://postgresql.1045698.n5.nabble.com/Support-UPDATE-table-SET-tp5823073p5823944.html

Some questions:
*) What would the parser transformation resolve to
*) Are we ok with SQL standard
*) Do you think this (or some similar variant) would work?

select row_to_json(row(foo.*)) from foo;

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about RI checks

2014-10-22 Thread Florian Pflug
 This should not be considered a problem for repeatable read
 transactions because the change in visible rows meet the definition
 of phantom reads, which are allowed in repeatable read: A
 transaction re-executes a query returning a set of rows that
 satisfy a search condition and finds that the set of rows
 satisfying the condition has changed due to another
 recently-committed transaction. 

Now I'm confused. Isn't the whole point of REPEATABLE READ to provide, well, 
repeatable reads? Also, note that after the DELETE FROM parent, further SELECTS 
in the same transaction will use the original snapshot again, und thus will see 
the conflicting child rows again that were ignored by the RI trigger. But they 
won't, of course, see the parent row.

IOW, transaction A will, after the delete, see a state of the database in which 
the PK constraint is broken. I don't think that's acceptable in any isolation 
level.

Best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about RI checks

2014-10-22 Thread Kevin Grittner
Florian Pflug f...@phlo.org wrote:

 This should not be considered a problem for repeatable read
 transactions because the change in visible rows meet the
 definition of phantom reads, which are allowed in repeatable
 read: A transaction re-executes a query returning a set of rows
 that satisfy a search condition and finds that the set of rows
 satisfying the condition has changed due to another
 recently-committed transaction.

 Now I'm confused. Isn't the whole point of REPEATABLE READ to
 provide, well, repeatable reads?

What the standard requires for REPEATABLE READ is that if you
re-read the same row later in a transaction it will containt the
same values -- the read of any particular *row* is repeatable (if
it exists at both times); it does not guarantee that the same
selection criteria will return the same set of rows every time.
The standard only requires that of SERIALIZABLE transactions.

PostgreSQL has historically provided more rigorous protections at
REPEATABLE READ than the standard requires.  Our docs claim:

| When you select the level Read Uncommitted you really get Read
| Committed, and phantom reads are not possible in the PostgreSQL
| implementation of Repeatable Read, so the actual isolation level
| might be stricter than what you select. This is permitted by the
| SQL standard: the four isolation levels only define which
| phenomena must not happen, they do not define which phenomena
| must happen.

As you have shown, our FK/RI implementation exposes phantom reads
to clients, so at a minimum our docs are wrong.

 Also, note that after the DELETE FROM parent, further SELECTS in
 the same transaction will use the original snapshot again, und
 thus will see the conflicting child rows again that were ignored
 by the RI trigger. But they won't, of course, see the parent row.

 IOW, transaction A will, after the delete, see a state of the
 database in which the PK constraint is broken. I don't think
 that's acceptable in any isolation level.

Good point.  Based on that observation, I agree that our RI is
broken at both the REPEATABLE READ and SERIALIZABLE isolation
levels.  I think that READ COMMITTED is OK, because it will see the
child row as deleted in time to prevent problems.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idea: allow AS label inside ROW constructor

2014-10-22 Thread Pavel Stehule
2014-10-22 18:35 GMT+02:00 Merlin Moncure mmonc...@gmail.com:

 On Wed, Oct 22, 2014 at 11:21 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  Hi
 
  with new functions row_to_json(b), there is more often usage of ROW
  constructor. Using names in fields is relative difficult. Because ROW has
  special clause in parser, I am thinking so we can enable labeling inside
 ROW
  constructor
 
  so instead currently supported:
 
  select row_to_json(r) from (select 10 as a, 20 as b) r;
 
  users can to write:
 
  select row_to_json(row(10 as a,20 as b));
 
  labeling will be enabled only inside ROW constructor. I don't propose
  enable it everywhere.
 
  What do you think about it?

 It's a neat idea -- maybe a better alternative to what I was thinking
 here:
 http://postgresql.1045698.n5.nabble.com/Support-UPDATE-table-SET-tp5823073p5823944.html

 Some questions:
 *) What would the parser transformation resolve to


row:ROW '(' expr_list ')' { $$
= $3; }
| ROW '('
')'   { $$ = NIL; }
| '(' expr_list ',' a_expr ')'  {
$$ = lappend($2, $4); }
;

we can replace a expr_list by target_list. I know only so it doesn't
enforce a problems with gramatic  - bison doesn't raise any warning.


*) Are we ok with SQL standard


SQL standard doesn't think named attributes in row - so it is out of range
ANSI. But it is not in conflict with standard. AS name is used more in
SQL/MM, SQL/XML -- and function named parameters has different syntax
parameter_name = value - I checked it against SQL99.


 *) Do you think this (or some similar variant) would work?

 select row_to_json(row(foo.*)) from foo;


It looks like independent feature and can work too - it is more natural for
user.



 merlin



Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-10-22 Thread Martijn van Oosterhout
On Wed, Oct 22, 2014 at 03:14:26PM +0200, Dag-Erling Smørgrav wrote:
  In a case like POODLE we probably wouldn't have done it anyway, as it
  doesn't affect our connections (we're not http)
 
 If I understand correctly, imaps has been shown to be vulnerable as
 well, so I wouldn't be so sure.

Reference? It's an SSL3 specific attack, so most servers are not
vulnerable because TLS will negotiate to the highest supported
protocol.  So unless one of the client/server doesn't support TLS1.0
there's no issue.  The only reason http is vulnerable is because
browsers do protocol downgrading, something strictly forbidden by the
spec.

Additionally, the man-in-the-middle must be able to control the padding
in the startup packet, which just isn't possible (no scripting language
in the client).

Since you can already specify the cipher list, couldn't you just add
-SSLv3 to the cipher list and be done?

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-22 Thread Heikki Linnakangas

On 10/22/2014 04:24 PM, Michael Paquier wrote:

On Wed, Oct 22, 2014 at 1:53 PM, Heikki Linnakangas hlinnakan...@vmware.com

wrote:



I think we should take a more wholesale approach to this. We should
enforce the rule that the server only ever archives WAL files belonging to
the same timeline that the server generates. IOW, the server only archives
the WAL that it has generated.


Hm?! Would that be really back-patchable? There may be in the wild tools or
users that rely on the fact a node archives segment files from all
timelines.


Hmm, so it would be a tool or user that manually copies a file to the 
pg_xlog directory of a standby server, and expects the standby to 
archive the file after promotion. That seems a bit far-fetched, although 
I've seen people do strange things. I think it would be acceptable as 
long as we document the change in behavior in the release notes. I don't 
have much hope that we'll ever be able to nail down the correct behavior 
with the current approach.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-10-22 Thread Dag-Erling Smørgrav
Martijn van Oosterhout klep...@svana.org writes:
 Dag-Erling Smørgrav d...@des.no writes:
  If I understand correctly, imaps has been shown to be vulnerable as
  well, so I wouldn't be so sure.
 Reference?

Sorry, no reference.  I was told that Thunderbird was vulnerable to
POODLE when talking imaps.

 Since you can already specify the cipher list, couldn't you just add
 -SSLv3 to the cipher list and be done?

I didn't want to change the existing behavior; all I wanted was to give
users a way to do so if they wish.

DES
-- 
Dag-Erling Smørgrav - d...@des.no


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-10-22 Thread Robert Haas
On Tue, Sep 23, 2014 at 1:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Add a new p_tableref_hook function pointer, similar to p_paramref_hook.
 Whenever the parser sees a RangeVar that it doesn't recognize (or actually,
 I think it should call it *before* resolving regular tables, but let's
 ignore that for now), it calls the p_tableref_hook. It can return a new
 RelationParam node (similar to regular Param), which contains a numeric ID
 for the table/tuplestore, as well as its tuple descriptor.

 For the execution phase, add a new array of Tuplestorestates to
 ParamListInfo. Similar to the existing array of ParamExternalDatas.

 I haven't been following this issue closely, but this sounds like a
 really nice design.

 I'm on board with the parser hooks part of that.  I don't especially agree
 with the idea of a new sub-structure for ParamListInfo: if you do that you
 will need a whole bunch of new boilerplate infrastructure to allocate,
 copy, and generally manage that structure, for darn little gain.  What I'd
 suggest is just saying that some Params might have type INTERNAL with
 Datum values that are pointers to tuplestores; then all you need to do is
 remember which Param number has been assigned to the particular tuplestore
 you want.  There is already precedent for that in the recursive CTE code,
 IIRC.

Studying this proposed design a bit further, I am a little fuzzy on
what is supposed to happen in this design during parse analysis.  In
Kevin's current code, after checking whether a RangeVar might be a CTE
reference and before deciding that it must be a table reference,
scanNameSpaceForTsr() is used to check whether there's a tuplestore by
that name and, if so, then we insert a RTE with type RTE_TUPLESTORE
which references the tuplestore by name.  To me, the obvious thing to
do here seems to be to instead call p_tableref_hook and let it return
a suitable RangeTblRef (or NULL if it wishes to take no action).  In
the case where the hook wants to redirect the use of that name to a
tuplestore, it can add a range-table entry of type RTE_TUPLESTORE, and
that entry can store a parameter-index rather than, as in the current
design, a name.  But then where does Heikki's notion of a
RelationParam get used?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving of INT64_FORMAT to c.h

2014-10-22 Thread Steve Singer

On 10/16/2014 08:08 AM, Andres Freund wrote:

On 2014-10-16 08:04:17 -0400, Jan Wieck wrote:

Hi,

PostgreSQL has for ages defined INT64_FORMAT and UINT64_FORMAT in
pg_config.h. This commit

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ce486056ecd28050

moved those definitions to c.h, which breaks compilation of all released
Slony-I versions against current master. Can those be moved back to where
they used to be?

Well, you could add additional configure stuff to also emit what you
want.


Slony uses the definitions in external tools, like slon and slonik, to
format sequence numbers in log output.

Then it should include c.h/postgres_fe.h?


So the header of c.h says Note that the definitions here are not 
intended to be exposed to clients

but
postgres_fe.h says This should be the first file included by PostgreSQL 
client libraries and


Should client programs that live outside the postgres source tree be 
including postgres_fe.h ?  I have a feeling the answer is no. If the 
answer is no, then why does a make install install postgres_fe.h ?


Slonik used to include postgre_fe.h but back in 2011 or so we stopped 
doing so because it was causing issues (I think on win32 builds)


Maybe slony client programs shouldn't be trying to steal portability 
definitions from postgres headers, but I doubt we are the only ones 
doing that.  It isn't a big deal for slony to define it's own 
INT64_FORMAT for 9.5+ but third party projects that have been including 
pg_config.h will hit similar issues.  if there was good reason for the 
change then fine (Postgres isn't intended to be a general purpose C 
portability layer).




Steve


Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving of INT64_FORMAT to c.h

2014-10-22 Thread Robert Haas
On Wed, Oct 22, 2014 at 4:12 PM, Steve Singer st...@ssinger.info wrote:
 So the header of c.h says Note that the definitions here are not intended
 to be exposed to clients
 but
 postgres_fe.h says This should be the first file included by PostgreSQL
 client libraries and

 Should client programs that live outside the postgres source tree be
 including postgres_fe.h ?  I have a feeling the answer is no. If the answer
 is no, then why does a make install install postgres_fe.h ?

I think the answer is yes.

 Slonik used to include postgre_fe.h but back in 2011 or so we stopped doing
 so because it was causing issues (I think on win32 builds)

That seems like something we ought to consider fixing, but obviously
we'd need more details.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving of INT64_FORMAT to c.h

2014-10-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Oct 22, 2014 at 4:12 PM, Steve Singer st...@ssinger.info wrote:
 So the header of c.h says Note that the definitions here are not intended
 to be exposed to clients
 but
 postgres_fe.h says This should be the first file included by PostgreSQL
 client libraries and
 
 Should client programs that live outside the postgres source tree be
 including postgres_fe.h ?  I have a feeling the answer is no. If the answer
 is no, then why does a make install install postgres_fe.h ?

 I think the answer is yes.

Yeah.  In particular, postgres_fe.h sees to it that FRONTEND is #define'd;
without that there is *not* a guarantee that c.h will work for non-backend
compiles.  We would much rather you were including postgres_fe.h than c.h
directly.  Having said that, there is not and never will be a guarantee
that everything that postgres_fe.h declares is immutable, and that
certainly applies to pg_config.h in particular.  There's a reason why
libpq doesn't include that into its public headers ...

 Slonik used to include postgre_fe.h but back in 2011 or so we stopped doing
 so because it was causing issues (I think on win32 builds)

 That seems like something we ought to consider fixing, but obviously
 we'd need more details.

Indeed.  But I suspect the core of it is going to be that clients of
postgres_fe.h are expected to be linking in libpgport.a ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-10-22 Thread Heikki Linnakangas

On 10/22/2014 11:10 PM, Robert Haas wrote:

On Tue, Sep 23, 2014 at 1:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Add a new p_tableref_hook function pointer, similar to p_paramref_hook.
Whenever the parser sees a RangeVar that it doesn't recognize (or actually,
I think it should call it *before* resolving regular tables, but let's
ignore that for now), it calls the p_tableref_hook. It can return a new
RelationParam node (similar to regular Param), which contains a numeric ID
for the table/tuplestore, as well as its tuple descriptor.

For the execution phase, add a new array of Tuplestorestates to
ParamListInfo. Similar to the existing array of ParamExternalDatas.



I haven't been following this issue closely, but this sounds like a
really nice design.


I'm on board with the parser hooks part of that.  I don't especially agree
with the idea of a new sub-structure for ParamListInfo: if you do that you
will need a whole bunch of new boilerplate infrastructure to allocate,
copy, and generally manage that structure, for darn little gain.  What I'd
suggest is just saying that some Params might have type INTERNAL with
Datum values that are pointers to tuplestores; then all you need to do is
remember which Param number has been assigned to the particular tuplestore
you want.  There is already precedent for that in the recursive CTE code,
IIRC.


Studying this proposed design a bit further, I am a little fuzzy on
what is supposed to happen in this design during parse analysis.  In
Kevin's current code, after checking whether a RangeVar might be a CTE
reference and before deciding that it must be a table reference,
scanNameSpaceForTsr() is used to check whether there's a tuplestore by
that name and, if so, then we insert a RTE with type RTE_TUPLESTORE
which references the tuplestore by name.  To me, the obvious thing to
do here seems to be to instead call p_tableref_hook and let it return
a suitable RangeTblRef (or NULL if it wishes to take no action).  In
the case where the hook wants to redirect the use of that name to a
tuplestore, it can add a range-table entry of type RTE_TUPLESTORE, and
that entry can store a parameter-index rather than, as in the current
design, a name.  But then where does Heikki's notion of a
RelationParam get used?


I was thinking that the hook would return a RelationParam. When parse 
analysis sees the returned RelationParam, it adds an entry for that to 
the range table, and creates the RangeTblRef for it. The way you 
describe it works too, but manipulating the range table directly in the 
hook seems a bit too low-level.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-10-22 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 10/22/2014 11:10 PM, Robert Haas wrote:
 Studying this proposed design a bit further, I am a little fuzzy on
 what is supposed to happen in this design during parse analysis.  In
 Kevin's current code, after checking whether a RangeVar might be a CTE
 reference and before deciding that it must be a table reference,
 scanNameSpaceForTsr() is used to check whether there's a tuplestore by
 that name and, if so, then we insert a RTE with type RTE_TUPLESTORE
 which references the tuplestore by name.  To me, the obvious thing to
 do here seems to be to instead call p_tableref_hook and let it return
 a suitable RangeTblRef (or NULL if it wishes to take no action).  In
 the case where the hook wants to redirect the use of that name to a
 tuplestore, it can add a range-table entry of type RTE_TUPLESTORE, and
 that entry can store a parameter-index rather than, as in the current
 design, a name.  But then where does Heikki's notion of a
 RelationParam get used?

 I was thinking that the hook would return a RelationParam. When parse 
 analysis sees the returned RelationParam, it adds an entry for that to 
 the range table, and creates the RangeTblRef for it. The way you 
 describe it works too, but manipulating the range table directly in the 
 hook seems a bit too low-level.

The problem with that idea is that then the API for the hook has to cover
every possible sort of RTE that hooks might wish to create; I see no
reason to restrict them to creating just one kind.  I agree that the hook
should avoid *physically* manipulating the rangetable, but it seems
reasonable to expect that it can call one of the addRangeTableEntryXXX
functions provided by parse_relation.c, and then return a RangeTblEntry*
gotten that way.  So hooks would have an API more or less equivalent
to, eg, transformRangeFunction().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-22 Thread Robert Haas
On Wed, Oct 8, 2014 at 6:32 PM, Andres Freund and...@2ndquadrant.com wrote:
 I got to ask: Why is it helpful that we have this in contrib? I have a
 good share of blame to bear for that, but I think we need to stop
 dilluting contrib evermore with test programs. These have a place, but I
 don't think it should be contrib.

I don't think pg_background is merely a test program: I think it's a
quite useful piece of functionality.  It can be used for running
VACUUM from a procedure, which is something people have asked for more
than once, or for simulating an autonomous transaction.  Granted,
it'll be a lot slower than a real autonomous transaction, but it's
still better than doing it via dblink, because you don't have to futz
with authentication.

I would be all in favor of moving things like test_decoding,
test_parser, and test_shm_mq to src/test or contrib/test or wherever
else we want to put them, but I think pg_background belongs in
contrib.

 +/* Fixed-size data passed via our dynamic shared memory segment. */
 +typedef struct pg_background_fixed_data
 +{
 + Oid database_id;
 + Oid authenticated_user_id;
 + Oid current_user_id;
 + int sec_context;
 + char database[NAMEDATALEN];
 + char authenticated_user[NAMEDATALEN];
 +} pg_background_fixed_data;

 Why not NameData?

No particular reason.  Changed.

 whitespace damage.

I went through and fixed everything that git diff --check complained
about.  Let me know if you see other problems yet.

 +static HTAB *worker_hash;

 Hm. So the lifetime of this hash's contents is managed via
 on_dsm_detach(), do I understand that correctly?

More or less, yes.

 Hm. So every user can do this once the extension is created as the
 functions are most likely to be PUBLIC. Is this a good idea?

Why not?  If they can log in, they could start separate sessions with
similar effect.

 + /*
 +  * Whether we succeed or fail, a future invocation of this 
 function
 +  * may not try to read from the DSM once we've begun to do so.
 +  * Accordingly, make arrangements to clean things up at end of 
 query.
 +  */
 + dsm_unkeep_mapping(info-seg);
 +
 + /* Set up tuple-descriptor based on colum definition list. */
 + if (get_call_result_type(fcinfo, NULL, tupdesc) != 
 TYPEFUNC_COMPOSITE)
 + ereport(ERROR,
 + 
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 +  errmsg(function returning record 
 called in context 
 + that cannot accept 
 type record),
 +  errhint(Try calling the function in 
 the FROM clause 
 +  using a column 
 definition list.)));

 Hm, normally we don't add linebreaks inside error messages.

I copied it from dblink.

 I'm unsure right now about the rules surrounding this, but shouldn't we
 check that the user is allowed to execute these? And shouldn't we fall
 back to non binary functions if no binary ones are available?

I can't see any reason to do either of those things.  I'm not aware
that returning data in binary format is in any way intended to be a
security-restricted operation, or that we have any data types that
actually matter without send and receive functions.  If we do, I think
the solution is to add them, not make this more complicated.

 + /*
 +  * Limit the maximum error level to 
 ERROR.  We don't want
 +  * a FATAL inside the background 
 worker to kill the user
 +  * session.
 +  */
 + if (edata.elevel  ERROR)
 + edata.elevel = ERROR;

 Hm. But we still should report that it FATALed? Maybe add error context
 notice about it? Not nice, but I don't have a immediately better idea. I
 think it generally would be rather helpful to add the information that
 this wasn't originally an error triggered by this process. The user
 might otherwise be confused when looking for the origin of the error in
 the log.

Yeah, I was wondering if we needed some kind of annotation here.  What
I'm wondering about is appending something to the errcontext, perhaps
background worker, PID %d.

 + case 'A':
 + {
 + /* Propagate NotifyResponse. */
 + pq_putmessage(msg.data[0], 
 msg.data[1], nbytes - 1);
 + break;

 Hm. Are we sure to be in a situation where the client expects these? And
 are we sure their encoding is correct? The other data goe through
 input/output methods checking for that, but here we don't. 

Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-22 Thread Joshua D. Drake


On 10/22/2014 04:03 PM, Robert Haas wrote:

On Wed, Oct 8, 2014 at 6:32 PM, Andres Freund and...@2ndquadrant.com wrote:

I got to ask: Why is it helpful that we have this in contrib? I have a
good share of blame to bear for that, but I think we need to stop
dilluting contrib evermore with test programs. These have a place, but I
don't think it should be contrib.


I don't think pg_background is merely a test program: I think it's a
quite useful piece of functionality.  It can be used for running
VACUUM from a procedure, which is something people have asked for more
than once, or for simulating an autonomous transaction.  Granted,
it'll be a lot slower than a real autonomous transaction, but it's
still better than doing it via dblink, because you don't have to futz
with authentication.


I think this is a very useful program but I wonder if it makes more 
sense to push it out to pgxn? Why do we need an ever growing contrib? 
Further, if it is good enough to go into contrib, why isn't it just in core?


Sincerely,

Joshua D. Drake


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
If we send our children to Caesar for their education, we should
 not be surprised when they come back as Romans.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] superuser() shortcuts

2014-10-22 Thread Alvaro Herrera
Brightwell, Adam wrote:
 All,
 
 
  Thanks!  Please add it to the next commitfest.
 
 
  Sounds good.  I'll update the patch and add accordingly.
 
 
 Attached is an updated patch.

I noticed something strange while perusing this patch, but the issue
predates the patch.  Some messages say must be superuser or replication
role to foo, but our longstanding practice is to say permission denied
to foo.  Why do we have this inconsistency?  Should we remove it?  If
we do want to keep the old wording this patch should use permission
denied to in the places that it touches.

Other than that, since we already agreed that it's something we want,
the only comment I have about this patch is an empty line in variable
declarations here which should be removed:

 diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
 new file mode 100644
 index c9a9baf..ed89b23
 *** a/src/backend/commands/alter.c

 --- 807,848 
   bool   *nulls;
   bool   *replaces;
   
 ! AclObjectKind aclkind = get_object_aclkind(classId);
 ! 


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-10-22 Thread Alvaro Herrera
Sawada Masahiko wrote:

 Thank you for reviewing.
 I agree 2) - 5).
 Attached patch is latest version patch I modified above.
 Also, I noticed I had forgotten to add the patch regarding document of
 reindexdb.

Please don't use pg_catalog in the regression test.  That way we will
need to update the expected file whenever a new catalog is added, which
seems pointless.  Maybe create a schema with a couple of tables
specifically for this, instead.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-22 Thread Robert Haas
On Wed, Oct 22, 2014 at 7:12 PM, Joshua D. Drake j...@commandprompt.com wrote:
 I don't think pg_background is merely a test program: I think it's a
 quite useful piece of functionality.  It can be used for running
 VACUUM from a procedure, which is something people have asked for more
 than once, or for simulating an autonomous transaction.  Granted,
 it'll be a lot slower than a real autonomous transaction, but it's
 still better than doing it via dblink, because you don't have to futz
 with authentication.

 I think this is a very useful program but I wonder if it makes more sense to
 push it out to pgxn? Why do we need an ever growing contrib? Further, if it
 is good enough to go into contrib, why isn't it just in core?

Sure, there's always this debate.  Certainly, there's no reason it has
to be in contrib rather than core or PGXN.  From my point of view,
putting it on PGXN would be a less-than-awesome development because
then the infrastructure that the patch introduces (patches 1-5 of the
series) would have no users in core or contrib, which means that (A)
if anyone is thinking of modifying that code in the future, they'll
have more difficulty knowing whether their changes break anything and
(B) if someone breaks something unwittingly, the buildfarm will not
tell us.

Now, admittedly, I expect that eventually all of this stuff will be
used in core - for parallelism.  But in the meantime I think having at
least one example in the PostgreSQL source tree of how each piece of
extensibility infrastructure that we have can be used is handy and
useful and generally something that we ought to encourage.  And this
one has the advantage - unlike some stuff in contrib - of actually
being somewhat useful for real work, which a decent amount of the
stuff in contrib isn't.

What about moving it the opposite direction, from contrib all the way
into core?  The main advantage of that in my book is that it might
make it easier to reduce the code duplication between pg_background
and exec_simple_query; the main disadvantage is that it reduces the
super-user's control, because now the functionality is always
available instead of being something that the super-user can choose to
install, or not.

But these are all judgement calls, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting rid of accept incoming network connections prompts on OS X

2014-10-22 Thread Robert Haas
On Tue, Oct 21, 2014 at 1:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If you do any Postgres development on OS X, you've probably gotten
 seriously annoyed by the way that, every single time you reinstall the
 postmaster executable, you get a dialog box asking whether you'd like
 to allow it to accept incoming network connections.  (At least, you
 do unless you disable the OS firewall, which is not a great idea.)
 It's particularly awful to run make check-world in this environment,
 because you get a pop-up for each test install.

Ugh.  This must be new in Mavericks, because I don't get any such
behavior on 10.8.5.

What an awful, awful behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-22 Thread Joshua D. Drake




What about moving it the opposite direction, from contrib all the way
into core?  The main advantage of that in my book is that it might
make it easier to reduce the code duplication between pg_background
and exec_simple_query; the main disadvantage is that it reduces the
super-user's control, because now the functionality is always
available instead of being something that the super-user can choose to
install, or not.

But these are all judgement calls, of course.


I lean toward a push into core because:

1. I expect that eventually all of this stuff will be
used in core - for parallelism.

2. Contrib is already bloated out. (which is a whole other discussion)

3. I am not sure I buy into the super-user argument. Just because the 
functionality is there, doesn't mean it has to be used.


4. The main advantage of that in my book is that it might
make it easier to reduce the code duplication between pg_background
and exec_simple_query;

JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
If we send our children to Caesar for their education, we should
 not be surprised when they come back as Romans.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting rid of accept incoming network connections prompts on OS X

2014-10-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Oct 21, 2014 at 1:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If you do any Postgres development on OS X, you've probably gotten
 seriously annoyed by the way that, every single time you reinstall the
 postmaster executable, you get a dialog box asking whether you'd like
 to allow it to accept incoming network connections.

 Ugh.  This must be new in Mavericks, because I don't get any such
 behavior on 10.8.5.

Hm, I thought it went further back than that ... I remember having put up
with it for some time now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving of INT64_FORMAT to c.h

2014-10-22 Thread Steve Singer

On 10/22/2014 04:17 PM, Robert Haas wrote:



Slonik used to include postgre_fe.h but back in 2011 or so we stopped doing
so because it was causing issues (I think on win32 builds)

That seems like something we ought to consider fixing, but obviously
we'd need more details.

When I'll try to find sometime to get a windows environment running and 
try to figure out what the problems were.
It's also possible that I removed the postgres_fe include at the same 
time as I was fixing other win32 issues and the postgres_fe include 
wasn't causing a problem it was just removed because it was unnecessary.


At the moment slony only includes the server include dir if we are 
building with pgport (which we normally do on windows)
We have had reports of issues like described 
(http://www.slony.info/bugzilla/show_bug.cgi?id=315) where some installs 
make us pick up server and client includes from different versions of PG.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplify calls of pg_class_aclcheck when multiple modes are used

2014-10-22 Thread Peter Eisentraut
On 10/21/14 6:19 PM, Michael Paquier wrote:
 On Wed, Oct 22, 2014 at 5:03 AM, Peter Eisentraut pete...@gmx.net
 mailto:pete...@gmx.net wrote:
 
 While looking at this, I wrote a few tests cases for sequence
 privileges, because that was not covered at all.  That patch is
 attached.
 
 +1 for those tests.
 -- 
 Michael

Committed your patch and tests.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function array_agg(array)

2014-10-22 Thread Ali Akbar
2014-10-22 22:48 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 2014-10-22 16:58 GMT+02:00 Ali Akbar the.ap...@gmail.com:

 Thanks for the review

 2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 I agree with your proposal. I have a few comments to design:


 1. patch doesn't hold documentation and regress tests, please append it.

 OK, i'll add the documentation and regression test


 2. this functionality (multidimensional aggregation) can be interesting
 more times, so maybe some interface like array builder should be preferred.

 We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
 haven't we?

 Actually array_agg(anyarray) can be implemented by using accumArrayResult
 and makeMdArrayResult, but in the process we will deconstruct the array
 data and null bit-flags into ArrayBuildState-dvalues and dnulls. And we
 will reconstruct it through makeMdArrayResult. I think this way it's not
 efficient, so i decided to reimplement it and memcpy the array data and
 null flags as-is.


 aha, so isn't better to fix a performance for accumArrayResult ?


Ok, i'll go this route. While reading the code of array(subselect) as you
pointed below, i think the easiest way to implement support for both
array_agg(anyarray) and array(subselect) is to change accumArrayResult so
it operates both with scalar datum(s) and array datums, with performance
optimization for the latter.

In other places, i think it's clearer if we just use accumArrayResult and
 makeArrayResult/makeMdArrayResult as API for building (multidimensional)
 arrays.


 3. array_agg was consistent with array(subselect), so it should be fixed
 too

 postgres=# select array_agg(a) from test;
array_agg
 ---
  {{1,2,3,4},{1,2,3,4}}
 (1 row)

 postgres=# select array(select a from test);
 ERROR:  could not find array type for data type integer[]


 I'm pretty new in postgresql development. Can you point out where is
 array(subselect) implemented?


 where you can start?

 postgres=# explain select array(select a from test);
 ERROR:  42704: could not find array type for data type integer[]
 LOCATION:  exprType, nodeFuncs.c:116

 look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
 postgresql sources this keyword


Found it. Thanks.

On a side note in postgres array type for integer[] is still integer[], but
in pg_type, integer[] has no array type. I propose we consider to change it
so array type of anyarray is itself (not in this patch, of course, because
it is a big change). Consider the following code in nodeFuncs.c:109

if (sublink-subLinkType == ARRAY_SUBLINK)
{
type = get_array_type(type);
if (!OidIsValid(type))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
 errmsg(could not find array type for data type %s,
format_type_be(exprType((Node *) tent-expr);
}

to implement array(subselect) you pointed above, we must specially checks
for array types. Quick search on get_array_type returns 10 places.

I'll think more about this later. For this patch, i'll go without changes
in pg_type.h.

4. why you use a magic constant (64) there?

 + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
 + astate-aitems = 64 * nitems;

 + astate-nullbitmap = (bits8 *)
 + repalloc(astate-nullbitmap, (astate-aitems + 7) / 8);


 just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
 astate-alen = 64; /* arbitrary starting array size */

 it can be any number not too small and too big. Too small, and we will
 realloc shortly. Too big, we will end up wasting memory.


 you can try to alloc 1KB instead as start -- it is used more times in
 Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

 You take this value from accumArrayResult, but it is targeted for shorted
 scalars - you should to expect so any array will be much larger.


Ok.


Regards,
--
Ali Akbar


Re: [HACKERS] Function array_agg(array)

2014-10-22 Thread Pavel Stehule
2014-10-23 4:00 GMT+02:00 Ali Akbar the.ap...@gmail.com:

 2014-10-22 22:48 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 2014-10-22 16:58 GMT+02:00 Ali Akbar the.ap...@gmail.com:

 Thanks for the review

 2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 I agree with your proposal. I have a few comments to design:


 1. patch doesn't hold documentation and regress tests, please append it.

 OK, i'll add the documentation and regression test


 2. this functionality (multidimensional aggregation) can be interesting
 more times, so maybe some interface like array builder should be preferred.

 We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
 haven't we?

 Actually array_agg(anyarray) can be implemented by using
 accumArrayResult and makeMdArrayResult, but in the process we will
 deconstruct the array data and null bit-flags into ArrayBuildState-dvalues
 and dnulls. And we will reconstruct it through makeMdArrayResult. I think
 this way it's not efficient, so i decided to reimplement it and memcpy the
 array data and null flags as-is.


 aha, so isn't better to fix a performance for accumArrayResult ?


 Ok, i'll go this route. While reading the code of array(subselect) as you
 pointed below, i think the easiest way to implement support for both
 array_agg(anyarray) and array(subselect) is to change accumArrayResult so
 it operates both with scalar datum(s) and array datums, with performance
 optimization for the latter.

 In other places, i think it's clearer if we just use accumArrayResult and
 makeArrayResult/makeMdArrayResult as API for building (multidimensional)
 arrays.


 3. array_agg was consistent with array(subselect), so it should be
 fixed too

 postgres=# select array_agg(a) from test;
array_agg
 ---
  {{1,2,3,4},{1,2,3,4}}
 (1 row)

 postgres=# select array(select a from test);
 ERROR:  could not find array type for data type integer[]


 I'm pretty new in postgresql development. Can you point out where is
 array(subselect) implemented?


 where you can start?

 postgres=# explain select array(select a from test);
 ERROR:  42704: could not find array type for data type integer[]
 LOCATION:  exprType, nodeFuncs.c:116

 look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
 postgresql sources this keyword


 Found it. Thanks.

 On a side note in postgres array type for integer[] is still integer[],
 but in pg_type, integer[] has no array type. I propose we consider to
 change it so array type of anyarray is itself (not in this patch, of
 course, because it is a big change). Consider the following code in
 nodeFuncs.c:109


yes, it is true - this is really big change and maybe needs separate
discuss - ***if we allow cycle there. I am not sure about possible side
effects***.

Maybe this change is not necessary, you can fix a check only ... if type
is not array or if get_array_type is null raise a error



 if (sublink-subLinkType == ARRAY_SUBLINK)
 {
 type = get_array_type(type);
 if (!OidIsValid(type))
 ereport(ERROR,
 (errcode(ERRCODE_UNDEFINED_OBJECT),
  errmsg(could not find array type for data type %s,
 format_type_be(exprType((Node *) tent-expr);
 }

 to implement array(subselect) you pointed above, we must specially checks
 for array types. Quick search on get_array_type returns 10 places.


attention: probably we don't would to allow arrays everywhere.



 I'll think more about this later. For this patch, i'll go without changes
 in pg_type.h.

 4. why you use a magic constant (64) there?

 + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
 + astate-aitems = 64 * nitems;

 + astate-nullbitmap = (bits8 *)
 + repalloc(astate-nullbitmap, (astate-aitems + 7) /
 8);


 just follow the arbitrary size choosen in accumArrayResult
 (arrayfuncs.c):
 astate-alen = 64; /* arbitrary starting array size */

 it can be any number not too small and too big. Too small, and we will
 realloc shortly. Too big, we will end up wasting memory.


 you can try to alloc 1KB instead as start -- it is used more times in
 Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

 You take this value from accumArrayResult, but it is targeted for shorted
 scalars - you should to expect so any array will be much larger.


 Ok.


 Regards,
 --
 Ali Akbar