Re: [HACKERS] security labels on databases are bad for dump restore

2015-08-01 Thread Noah Misch
On Thu, Jul 30, 2015 at 10:37:33PM -0400, Adam Brightwell wrote:
 On Thu, Jul 30, 2015 at 02:26:34PM -0400, Robert Haas wrote:
  1. pg_dumpall -g
  2. pg_dump --create per database
 
  Gah, OK, I see your point.  But we better document this, because if
  you need a PhD in PostgreSQL-ology to take a backup, we're not in a
  good place.

Definitely.

 Agreed.  Though, honestly, I find this to be a cumbersome approach.  I
 think it just makes things more confusing, even if it is well
 documented.  Perhaps it might be necessary as a bridge to get to a
 better place.  But my first question as an end user would be, 'why
 can't one tool do this?'.

pg_dumpall (without -g) is that one tool.  It has excellent dump fidelity.
It lacks the slicing and dump format options of pg_dump, which are important
to many sites.

 Also, by using 'pg_dumpall -g' aren't you
 potentially getting things that you don't want/need/care about?  For
 instance, if database 'foo' is owned by 'user1' and database 'bar' is
 owned by 'user2' and neither have any knowledge/relation of/to the
 other, then when I dump 'foo', in this manner, wouldn't I also be
 including 'user2'?  Said differently, a restore of a 'foo'-only dump
 would also include a 'bar' related role.  That seems like a bad idea,
 IMHO.  Maybe it can't be avoided, but I'd expect that only relevant
 information for the database being dumped would be included.

Nothing in core PostgreSQL attempts to answer the high-level question Is
user1 relevant to database bar?  PostgreSQL has no one concept of a role's
relevance to databases.  Some reasonable heuristics come to mind, but nothing
I'd be eager to memorialize in a core tool like pg_dumpall.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cleaning up missing ERRCODE assignments

2015-08-01 Thread Tom Lane
I believe we have a project policy that all user-facing error reports
should go through ereport not elog (so that they're translatable) and
should not have ERRCODE_INTERNAL_ERROR as SQLSTATE.  It's sometimes
debatable where the line is between user-facing and not, but surely any
error that is triggered by the standard regression tests has to be
considered user-facing.  Yet running the tests with log_error_verbosity
set to verbose turns up quite a few XX000 log entries.

The attached patch fixes all the cases exposed by the regression tests.
I also went through each affected file and adjusted any other elog or
ereport calls that seemed to need it, on the theory that such oversights
probably travel in herds.  I don't pretend that this is a complete fix,
but it's at least a down payment on the problem.

Probably the main thing that's worth discussing here is what to do in
plperl/plpython/pltcl when reporting an error thrown from the respective
language interpreter; in most cases we don't have a clear idea what
SQLSTATE should be used, but that doesn't make ERRCODE_INTERNAL_ERROR an
acceptable choice.  I used ERRCODE_EXTERNAL_ROUTINE_EXCEPTION where there
was not an obviously better candidate, but I wonder if anyone has a
different idea?

I propose to back-patch this into 9.5, but not further; it's not an
important enough issue to justify changing SQLSTATE behavior in stable
branches.

regards, tom lane

diff --git a/contrib/tablefunc/tablefunc.c b/contrib/tablefunc/tablefunc.c
index 8a95d47..cb1d029 100644
*** a/contrib/tablefunc/tablefunc.c
--- b/contrib/tablefunc/tablefunc.c
*** crosstab(PG_FUNCTION_ARGS)
*** 432,438 
  			break;
  		default:
  			/* result type isn't composite */
! 			elog(ERROR, return type must be a row type);
  			break;
  	}
  
--- 432,440 
  			break;
  		default:
  			/* result type isn't composite */
! 			ereport(ERROR,
! 	(errcode(ERRCODE_DATATYPE_MISMATCH),
! 	 errmsg(return type must be a row type)));
  			break;
  	}
  
*** build_tuplestore_recursively(char *key_f
*** 1350,1356 
  appendStringInfo(chk_current_key, %s%s%s,
   branch_delim, current_key, branch_delim);
  if (strstr(chk_branchstr.data, chk_current_key.data))
! 	elog(ERROR, infinite recursion detected);
  			}
  
  			/* OK, extend the branch */
--- 1352,1360 
  appendStringInfo(chk_current_key, %s%s%s,
   branch_delim, current_key, branch_delim);
  if (strstr(chk_branchstr.data, chk_current_key.data))
! 	ereport(ERROR,
! 			(errcode(ERRCODE_INVALID_RECURSION),
! 			 errmsg(infinite recursion detected)));
  			}
  
  			/* OK, extend the branch */
*** validateConnectbyTupleDesc(TupleDesc tup
*** 1429,1435 
  	{
  		if (tupdesc-natts != (CONNECTBY_NCOLS + serial_column))
  			ereport(ERROR,
! 	(errcode(ERRCODE_SYNTAX_ERROR),
  	 errmsg(invalid return type),
  	 errdetail(Query-specified return tuple has  \
  			   wrong number of columns.)));
--- 1433,1439 
  	{
  		if (tupdesc-natts != (CONNECTBY_NCOLS + serial_column))
  			ereport(ERROR,
! 	(errcode(ERRCODE_DATATYPE_MISMATCH),
  	 errmsg(invalid return type),
  	 errdetail(Query-specified return tuple has  \
  			   wrong number of columns.)));
*** validateConnectbyTupleDesc(TupleDesc tup
*** 1438,1444 
  	{
  		if (tupdesc-natts != CONNECTBY_NCOLS_NOBRANCH + serial_column)
  			ereport(ERROR,
! 	(errcode(ERRCODE_SYNTAX_ERROR),
  	 errmsg(invalid return type),
  	 errdetail(Query-specified return tuple has  \
  			   wrong number of columns.)));
--- 1442,1448 
  	{
  		if (tupdesc-natts != CONNECTBY_NCOLS_NOBRANCH + serial_column)
  			ereport(ERROR,
! 	(errcode(ERRCODE_DATATYPE_MISMATCH),
  	 errmsg(invalid return type),
  	 errdetail(Query-specified return tuple has  \
  			   wrong number of columns.)));
*** validateConnectbyTupleDesc(TupleDesc tup
*** 1447,1460 
  	/* check that the types of the first two columns match */
  	if (tupdesc-attrs[0]-atttypid != tupdesc-attrs[1]-atttypid)
  		ereport(ERROR,
! (errcode(ERRCODE_SYNTAX_ERROR),
   errmsg(invalid return type),
   errdetail(First two columns must be the same type.)));
  
  	/* check that the type of the third column is INT4 */
  	if (tupdesc-attrs[2]-atttypid != INT4OID)
  		ereport(ERROR,
! (errcode(ERRCODE_SYNTAX_ERROR),
   errmsg(invalid return type),
   errdetail(Third column must be type %s.,
  		   format_type_be(INT4OID;
--- 1451,1464 
  	/* check that the types of the first two columns match */
  	if (tupdesc-attrs[0]-atttypid != tupdesc-attrs[1]-atttypid)
  		ereport(ERROR,
! (errcode(ERRCODE_DATATYPE_MISMATCH),
   errmsg(invalid return type),
   errdetail(First two columns must be the same type.)));
  
  	/* check that the type of the third column is INT4 */
  	if (tupdesc-attrs[2]-atttypid != INT4OID)
  		

Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

2015-08-01 Thread Andreas Seltenreich
Tom Lane writes:

 Well, I certainly think all of these represent bugs:

 [...]

thanks for priorizing them.  I'll try to digest them somewhat before
posting.

 This one's pretty darn odd, because 2619 is pg_statistic and not an index
 at all:

  4 | ERROR:  cache lookup failed for index 2619

This is actually the one from the README :-).  Quoting to spare media
discontinuity:

--8---cut here---start-8---
Taking a closer look at it reveals that it happens when you query a
certain catalog view like this:

  self=# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL;
  FEHLER:  cache lookup failed for index 2619

This is because the planner then puts pg_get_indexdef(oid) in a context
where it sees non-index-oids, which causes it to croak:

 QUERY PLAN

 Hash Join  (cost=17.60..30.65 rows=9 width=4)
   Hash Cond: (i.oid = x.indexrelid)
   -  Seq Scan on pg_class i  (cost=0.00..12.52 rows=114 width=8)
 Filter: ((pg_get_indexdef(oid) IS NOT NULL) AND (relkind = 
'i'::char))
   -  Hash  (cost=17.31..17.31 rows=23 width=4)
 -  Hash Join  (cost=12.52..17.31 rows=23 width=4)
   Hash Cond: (x.indrelid = c.oid)
   -  Seq Scan on pg_index x  (cost=0.00..4.13 rows=113 width=8)
   -  Hash  (cost=11.76..11.76 rows=61 width=8)
 -  Seq Scan on pg_class c  (cost=0.00..11.76 rows=61 
width=8)
   Filter: (relkind = ANY ('{r,m}'::char[]))
--8---cut here---end---8---

thanks,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doubt about AccessExclusiveLock in ALTER TABLE .. SET ( .. );

2015-08-01 Thread Michael Paquier
On Sat, Aug 1, 2015 at 5:00 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Fabrízio de Royes Mello wrote:

 In this patch I didn't change all lockmode comparison places previous
 pointed by you, but I can change it maybe adding other method called
 LockModeIsValid(lockmode) to do the comparison lockmode = NoLock 
 lockmode  MAX_LOCKMODES used in many places.

 I don't like this.  Is it possible to write these comparisons in terms
 of what they conflict with?  I think there are two main cases in the
 existing code:

 1. is this lock mode valid (sounds reasonable)
 2. can this be acquired in hot standby (not so much, but makes
sense.)

 and now we have your third thing, what is the strongest of these two
 locks.

The third method already exists in tablecmds.c:
/*
 * Take the greatest lockmode from any subcommand
 */
if (cmd_lockmode  lockmode)
lockmode = cmd_lockmode;

 For instance, if you told me to choose between ShareLock and
 ShareUpdateExclusiveLock I wouldn't know which one is strongest.  I
 don't it's sensible to have the lock mode compare primitive honestly.
 I don't have any great ideas to offer ATM sadly.

Yes, the thing is that lowering the lock levels is good for
concurrency, but the non-monotony of the lock levels makes it
impossible to choose an intermediate state correctly. Hence I think
that the one and only safe answer to this problem is that we check if
the locks taken for each subcommand are all the same, and use this
lock. If there is just one lock different, like for example one
subcommand uses ShareLock, and a second one ShareUpdateExclusiveLock
(good example of yours) we simply fallback to AccessExclusiveLock,
based on the fact that we are sure that this lock conflicts with all
the other ones.

At the same time I think that we should as well patch the existing
code path of tablecmds.c that already does those comparisons.
 Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] ParallelAppend

2015-08-01 Thread Kouhei Kaigai
 On Tue, Jul 28, 2015 at 6:08 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 
   On Tue, Jul 28, 2015 at 7:59 AM, Kouhei Kaigai kai...@ak.jp.nec.com 
   wrote:
   
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
 Sent: Monday, July 27, 2015 11:07 PM
 To: Amit Kapila
 
  Is there a real need to have new node like ParallelAppendPath?
  Can't we have Funnel node beneath AppendNode and then each
  worker will be responsible to have SeqScan on each inherited child
  relation.  Something like
 
  Append
 --- Funnel
-- SeqScan rel1
-- SeqScan rel2
 
 If Funnel can handle both of horizontal and vertical parallelism,
 it is a great simplification. I never stick a new node.

 Once Funnel get a capability to have multiple child nodes, probably,
 Append node above will have gone. I expect set_append_rel_pathlist()
 add two paths based on Append and Funnel, then planner will choose
 the cheaper one according to its cost.

In the latest v16 patch, Funnel is declared as follows:
   
  typedef struct Funnel
  {
  Scanscan;
  int num_workers;
  } Funnel;
   
If we try to add Append capability here, I expects the structure will
be adjusted as follows, for example:
   
  typedef struct Funnel
  {
  Scanscan;
  List   *funnel_plans;
  List   *funnel_num_workers;
  } Funnel;
   
As literal, funnel_plans saves underlying Plan nodes instead of the
lefttree. Also, funnel_num_workers saves number of expected workers
to be assigned on individual child plans.
   
  
   or shall we have a node like above and name it as FunnelAppend or
   AppenFunnel?
  
  It is better to have smaller number of node types which are capable to
  kick background workers because of simplification of path construction.
 
  Let's assume the case below. When planner considers a path to append
  child scans on rel1, rel2 and rel3 but the cheapest path of rel2 is
  Funnel+PartialSeqScan, we cannot put Funnel here unless we don't pull
  up Funnel of rel2, can we?
 
(Append? or Funnel)
 -- SeqScan on rel1
 -- Funnel
  -- PartialSeqScan on rel2
 -- IndexScan on rel3
 
 
 I am not sure, but what problem do you see in putting Funnel node
 for one of the relation scans and not for the others.

At this moment, I'm not certain whether background worker can/ought
to launch another background workers.
If sub-Funnel node is executed by 10-processes then it also launch
10-processes for each, 100-processes will run for each?

  If we pull Funnel here, I think the plan shall be as follows:
Funnel
 -- SeqScan on rel1
 -- PartialSeqScan on rel2
 -- IndexScan on rel3
 
 
 So if we go this route, then Funnel should have capability
 to execute non-parallel part of plan as well, like in this
 case it should be able to execute non-parallel IndexScan on
 rel3 as well and then it might need to distinguish between
 parallel and non-parallel part of plans.  I think this could
 make Funnel node complex.

It is difference from what I plan now. In the above example,
Funnel node has two non-parallel aware node (rel1 and rel3)
and one parallel aware node, then three PlannedStmt for each
shall be put on the TOC segment. Even though the background
workers pick up a PlannedStmt from the three, only one worker
can pick up the PlannedStmt for rel1 and rel3, however, rel2
can be executed by multiple workers simultaneously.
(Note: if number of workers are less than three in this case,
PlannedStmt for rel3 shall not be picked up unless any other
worker don't complete to run other plan on rel1 or rel2).

From the standpoint of the Funnel, it just kicks background
workers with:
 - multiple PlannedStmt nodes
 - maximum number of workers for each plan
in addition to the current form.

Then, it continues to fetch records from the shm_mq.
Probably, it does not change the current form so much.

  If all we have to pay attention is Funnel node, it makes the code
  around path construction and pull-up logic much simpler, rather than
  multiple node types can kick background workers.
 
 
 Okay, but I think pulling-up Funnel node makes sense only when all
 nodes beneath it needs to be executed parallely.

I think its decision should be based on the cost, that includes
additional startup_cost to launch background worker, as long as
non-parallel node is also capable to run on the worker side.

Even though create_parallelscan_paths() in v16 set num_workers not
larger than parallel_seqscan_degree, total number of the concurrent
background workers may exceed this configuration if more than two
PartialSeqScan nodes are underlying.
It is a different configuration from max_worker_processes, so it is
not a matter as long as we 

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-08-01 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 The problem that was bothering us (or at least what was bothering me)
 is that the PlannerInfo provides only a list of SpecialJoinInfo
 structures, which don't directly give you the original join order.  In
 fact, min_righthand and min_lefthand are intended to constraint the
 *possible* join orders, and are deliberately designed *not* to specify
 a single join order.  If you're sending a query to a remote PostgreSQL
 node, you don't want to know what all the possible join orders are;
 it's the remote side's job to plan the query.  You do, however, need
 an easy way to identify one join order that you can use to construct a
 query.  It didn't seem easy to do that without duplicating
 make_join_rel(), which seemed like a bad idea.

In principle it seems like you could traverse root-parse-jointree
as a guide to reconstructing the original syntactic structure; though
I'm not sure how hard it would be to ignore the parts of that tree
that correspond to relations you're not shipping.

 But maybe there's a good way to do it.  Tom wasn't crazy about this
 hook both because of the frequency of calls and also because of the
 long argument list.  I think those concerns are legitimate; I just
 couldn't see how to make the other way work.

In my vision you probably really only want one call per build_join_rel
event (that is, per construction of a new RelOptInfo), not per
make_join_rel event.

It's possible that an FDW that wants to handle joins but is not talking to
a remote query planner would need to grovel through all the join ordering
possibilities individually, and then maybe hooking at make_join_rel is
sensible rather than having to reinvent that logic.  But I'd want to see a
concrete use-case first, and I certainly don't think that that's the main
case to design the API around.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

2015-08-01 Thread Tom Lane
Andreas Seltenreich seltenre...@gmx.de writes:
 Tom Lane writes:
 What concerns me more is that what you're finding is only cases that trip
 an assertion sanity check.  It seems likely that you're also managing to
 trigger other bugs with less drastic consequences, such as could not
 devise a query plan failures or just plain wrong answers.

 Ja, some of these are logged as well[1], but most of them are really as
 undrastic as can get, and I was afraid reporting them would be more of a
 nuisance.

Well, I certainly think all of these represent bugs:

  3 | ERROR:  plan should not reference subplan's variable
  2 | ERROR:  failed to assign all NestLoopParams to plan nodes
  1 | ERROR:  could not find pathkey item to sort

This I'm not sure about; it could be that the query gave conflicting
collation specifiers, but on the other hand we've definitely had bugs
with people forgetting to run assign_query_collations on subexpressions:

   4646 | ERROR:  could not determine which collation to use for string 
 comparison

This one's pretty darn odd, because 2619 is pg_statistic and not an index
at all:

  4 | ERROR:  cache lookup failed for index 2619

These seem likely to be bugs as well, though maybe they are race
conditions during a DROP and not worth fixing:

   1171 | ERROR:  cache lookup failed for index 16862
172 | ERROR:  cache lookup failed for index 257148
 84 | ERROR:  could not find member 1(34520,34520) of opfamily 1976
 55 | ERROR:  missing support function 1(34516,34516) in opfamily 1976
 13 | ERROR:  could not find commutator for operator 34538
  2 | ERROR:  cache lookup failed for index 12322

I would say anything of the sort that is repeatable definitely deserves
investigation, because even if it's an expectable error condition, we
should be throwing a more user-friendly error message.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin index vacuum versus transaction snapshots

2015-08-01 Thread Kevin Grittner
Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Untested patch attached.

That fixes the installcheck failure on my machine.


--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doubt about AccessExclusiveLock in ALTER TABLE .. SET ( .. );

2015-08-01 Thread Andres Freund
On August 1, 2015 2:17:24 PM GMT+02:00, Michael Paquier 
michael.paqu...@gmail.com wrote:
On Sat, Aug 1, 2015 at 5:00 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Fabrízio de Royes Mello wrote:

 In this patch I didn't change all lockmode comparison places
previous
 pointed by you, but I can change it maybe adding other method called
 LockModeIsValid(lockmode) to do the comparison lockmode = NoLock

 lockmode  MAX_LOCKMODES used in many places.

 I don't like this.  Is it possible to write these comparisons in
terms
 of what they conflict with?  I think there are two main cases in the
 existing code:

 1. is this lock mode valid (sounds reasonable)
 2. can this be acquired in hot standby (not so much, but makes
sense.)

 and now we have your third thing, what is the strongest of these two
 locks.

The third method already exists in tablecmds.c:
/*
 * Take the greatest lockmode from any subcommand
 */
if (cmd_lockmode  lockmode)
lockmode = cmd_lockmode;

 For instance, if you told me to choose between ShareLock and
 ShareUpdateExclusiveLock I wouldn't know which one is strongest.  I
 don't it's sensible to have the lock mode compare primitive
honestly.
 I don't have any great ideas to offer ATM sadly.

Yes, the thing is that lowering the lock levels is good for
concurrency, but the non-monotony of the lock levels makes it
impossible to choose an intermediate state correctly. 

How about simply acquiring all the locks individually of they're different 
types? These few acquisitions won't matter.

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-08-01 Thread Amit Kapila
On Tue, Feb 17, 2015 at 2:56 PM, Ashutosh Bapat 
ashutosh.ba...@enterprisedb.com wrote:

 2. New catalog - This method takes out the need to have separate method
 for C1, C5 and even C2, also the synchronization will be taken care of by
 row locks, there will be no limit on the number of foreign transactions as
 well as the size of foreign prepared transaction information. But big
 problem with this approach is that, the changes to the catalogs are atomic
 with the local transaction. If a foreign prepared transaction can not be
 aborted while the local transaction is rolled back, that entry needs to
 retained. But since the local transaction is aborting the corresponding
 catalog entry would become invisible and thus unavailable to the resolver
 (alas! we do not have autonomous transaction support). We may be able to
 overcome this, by simulating autonomous transaction through a background
 worker (which can also act as a resolver). But the amount of communication
 and synchronization, might affect the performance.


For Rollback, why can't we do it in reverse way, first rollback
transaction in foreign servers and then rollback local transaction.

I think for Commit, it is essential that we first commit in local
server, so that we can resolve the transaction status of prepared
transactions on foreign servers after crash recovery.  However
for Abort case, I think even if we don't Rollback in local server, it
can be deduced (any transaction which is not committed should be
Rolledback) during crash recovery for the matter of resolving
transaction status of prepared transaction.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-01 Thread Andres Freund
On 2015-07-31 13:31:54 -0400, Robert Haas wrote:
 On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote:
  Heapification is O(n) already, whether siftup (existing) or down.
 
 That's not my impression, or what Wikipedia says.  Source?

Building a binary heap via successive insertions is O(n log n), but
building it directly is O(n). Looks like wikipedia agrees too
https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap
I'm pretty sure that there's a bunch of places where we intentionally
build a heap at once instead successively. At least reorderbuffer.c does
so, and it looks like nodeMergeAppend as well (that's why they use
binaryheap_add_unordered and then binaryheap_build).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solaris testers wanted for strxfrm() behavior

2015-08-01 Thread Noah Misch
On Fri, Jul 31, 2015 at 08:29:51PM -0700, Josh Berkus wrote:
 Where are we with this?  Do we feel confident that this bug is only on
 old versions of Solaris we don't care about?  Or does it remain to be
 resolved?

Affected systems either have an available vendor update addressing the problem
or are so old that we don't care.  Commit be8b06c makes affected systems fail
early, which resolves the matter.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-08-01 Thread Robert Haas
On Fri, Jul 3, 2015 at 6:25 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 Can't FDWs get the join information through the root, which I think we would
 pass to the API as the argument?

This is exactly what Tom suggested originally, and it has some appeal,
but neither KaiGai nor I could see how to make it work .  Do you have
an idea?  It's not too late to go back and change the API.

The problem that was bothering us (or at least what was bothering me)
is that the PlannerInfo provides only a list of SpecialJoinInfo
structures, which don't directly give you the original join order.  In
fact, min_righthand and min_lefthand are intended to constraint the
*possible* join orders, and are deliberately designed *not* to specify
a single join order.  If you're sending a query to a remote PostgreSQL
node, you don't want to know what all the possible join orders are;
it's the remote side's job to plan the query.  You do, however, need
an easy way to identify one join order that you can use to construct a
query.  It didn't seem easy to do that without duplicating
make_join_rel(), which seemed like a bad idea.

But maybe there's a good way to do it.  Tom wasn't crazy about this
hook both because of the frequency of calls and also because of the
long argument list.  I think those concerns are legitimate; I just
couldn't see how to make the other way work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-08-01 Thread Kouhei Kaigai
 On Fri, Jul 3, 2015 at 6:25 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:
  Can't FDWs get the join information through the root, which I think we would
  pass to the API as the argument?
 
 This is exactly what Tom suggested originally, and it has some appeal,
 but neither KaiGai nor I could see how to make it work .  Do you have
 an idea?  It's not too late to go back and change the API.
 
 The problem that was bothering us (or at least what was bothering me)
 is that the PlannerInfo provides only a list of SpecialJoinInfo
 structures, which don't directly give you the original join order.  In
 fact, min_righthand and min_lefthand are intended to constraint the
 *possible* join orders, and are deliberately designed *not* to specify
 a single join order.  If you're sending a query to a remote PostgreSQL
 node, you don't want to know what all the possible join orders are;
 it's the remote side's job to plan the query.  You do, however, need
 an easy way to identify one join order that you can use to construct a
 query.  It didn't seem easy to do that without duplicating
 make_join_rel(), which seemed like a bad idea.
 
 But maybe there's a good way to do it.  Tom wasn't crazy about this
 hook both because of the frequency of calls and also because of the
 long argument list.  I think those concerns are legitimate; I just
 couldn't see how to make the other way work.

I could have a discussion with Fujita-san about this topic.
He has a little bit tricky, but I didn't have a clear reason to deny,
idea to tackle this matter.
At the line just above set_cheapest() of the standard_join_search(),
at least one built-in join logic are already added to the RelOptInfo,
thus, FDW driver can reference the cheapest path by built-in logic
and its lefttree and righttree that construct a joinrel.
Its assumption is, the best paths by built-in logic are at least
enough reasonable join order than other potential ones.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-01 Thread Jeremy Harris
On 31/07/15 18:31, Robert Haas wrote:
 On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote:
 Heapification is O(n) already, whether siftup (existing) or down.
 
 That's not my impression, or what Wikipedia says.  Source?

Measurements done last year:

http://www.postgresql.org/message-id/52f35462.3030...@wizmail.org
(spreadsheet attachment)

http://www.postgresql.org/message-id/52f40ce9.1070...@wizmail.org
(measurement procedure and spreadsheet explanation)

-- 
Cheers,
  Jeremy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] No more libedit?! - openssl plans to switch to APL2

2015-08-01 Thread Andres Freund
Hi,

According to https://www.openssl.org/blog/blog/2015/08/01/cla/ openssl
is planning to relicense to the apache license 2.0. While APL2 is not
compatible with GLP2 it *is* compatible with GPL3.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-01 Thread Rahila Syed
The total number of heap pages is known, and the total number of index
pages is also known, so it's possible to derive a percentage out of
this part.

The total number of index pages scanned during entire vacuum will depend on
number
of index scans that happens.
In order to extrapolate percent complete for phase two(index scan) we need
number of index scans * total index pages.
The number of index scans can vary from 1 to n (n depending on
maintenance_work_mem)

Summarizing suggestions in previous mails, following information can be
reported
Phase 1.heap pages scanned / total heap pages
Phase 2.index pages scanned / total index pages (across all indexes)
Phase 3.count of heap pages vacuumed

Additional info like number of index scans so far, number of dead tuples
being vacuumed in one batch can also be provided.

A combined estimate for vacuum percent complete can be provided by summing
up heap pages scanned, index pages scanned against total heap pages, total
index pages * number of index scans.

Thank you,
Rahila Syed


Re: [HACKERS] pg_rewind failure by file deletion in source server

2015-08-01 Thread Heikki Linnakangas

On 07/17/2015 06:28 AM, Michael Paquier wrote:

On Wed, Jul 1, 2015 at 9:31 PM, Fujii Masao masao.fu...@gmail.com wrote:

On Wed, Jul 1, 2015 at 2:21 AM, Heikki Linnakangas hlinn...@iki.fi wrote:

On 06/29/2015 09:44 AM, Michael Paquier wrote:


On Mon, Jun 29, 2015 at 4:55 AM, Heikki Linnakangas wrote:


But we'll still need to handle the pg_xlog symlink case somehow. Perhaps
it
would be enough to special-case pg_xlog for now.



Well, sure, pg_rewind does not copy the soft links either way. Now it
would be nice to have an option to be able to recreate the soft link
of at least pg_xlog even if it can be scripted as well after a run.


Hmm. I'm starting to think that pg_rewind should ignore pg_xlog entirely. In
any non-trivial scenarios, just copying all the files from pg_xlog isn't
enough anyway, and you need to set up a recovery.conf after running
pg_rewind that contains a restore_command or primary_conninfo, to fetch the
WAL. So you can argue that by not copying pg_xlog automatically, we're
actually doing a favour to the DBA, by forcing him to set up the
recovery.conf file correctly. Because if you just test simple scenarios
where not much time has passed between the failover and running pg_rewind,
it might be enough to just copy all the WAL currently in pg_xlog, but it
would not be enough if more time had passed and not all the required WAL is
present in pg_xlog anymore.  And by not copying the WAL, we can avoid some
copying, as restore_command or streaming replication will only copy what's
needed, while pg_rewind would copy all WAL it can find the target's data
directory.

pg_basebackup also doesn't include any WAL, unless you pass the --xlog
option. It would be nice to also add an optional --xlog option to pg_rewind,
but with pg_rewind it's possible that all the required WAL isn't present in
the pg_xlog directory anymore, so you wouldn't always achieve the same
effect of making the backup self-contained.

So, I propose the attached. It makes pg_rewind ignore the pg_xlog directory
in both the source and the target.


If pg_xlog is simply ignored, some old WAL files may remain in target server.
Don't these old files cause the subsequent startup of target server as new
standby to fail? That is, it's the case where the WAL file with the same name
but different content exist both in target and source. If that's harmfull,
pg_rewind also should remove the files in pg_xlog of target server.


This would reduce usability. The rewound node will replay WAL from the
previous checkpoint where WAL forked up to the minimum recovery point
of source node where pg_rewind has been run. Hence if we remove
completely the contents of pg_xlog we'd lose a portion of the logs
that need to be replayed until timeline is switched on the rewound
node when recovering it (while streaming from the promoted standby,
whatever). I don't really see why recycled segments would be a
problem, as that's perhaps what you are referring to, but perhaps I am
missing something.


Hmm. My thinking was that you need to set up restore_command or 
primary_conninfo anyway, to fetch the old WAL, so there's no need to 
copy any WAL. But there's a problem with that: you might have WAL files 
in the source server that haven't been archived yet, and you need them 
to recover the rewound target node. That's OK for libpq mode, I think as 
the server is still running and presumably and you can fetch the WAL 
with streaming replication, but for copy-mode, that's not a good 
assumption. You might be relying on a WAL archive, and the file might 
not be archived yet.


Perhaps it's best if we copy all the WAL files from source in copy-mode, 
but not in libpq mode. Regarding old WAL files in the target, it's 
probably best to always leave them alone. They should do no harm, and as 
a general principle it's best to avoid destroying evidence.


It'd be nice to get some fix for this for alpha2, so I'll commit a fix 
to do that on Monday, unless we come to a different conclusion before that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-08-01 Thread Kouhei Kaigai
Amit,

Let me ask three more detailed questions.

Why Funnel has a valid qual of the subplan?
The 2nd argument of make_funnel() is qualifier of the subplan
(PartialSeqScan) then it is initialized at ExecInitFunnel,
but never executed on the run-time. Why does Funnel node has
useless qualifier expression here (even though it is harmless)?

Why Funnel delivered from Scan? Even though it constructs
a compatible target-list with underlying partial-scan node,
it does not require the node is also delivered from Scan.
For example, Sort or Append don't change the target-list
definition from its input, also don't have its own qualifier.
It seems to me the definition below is more suitable...
typedef struct Funnel
{
Planplan;
int num_workers;
} Funnel;

Does ExecFunnel() need to have a special code path to handle
EvalPlanQual()? Probably, it just calls underlying node in the
local context. ExecScan() of PartialSeqScan will check its
qualifier towards estate-es_epqTuple[].

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
 Sent: Thursday, July 30, 2015 8:45 AM
 To: Amit Kapila
 Cc: Robert Haas; Gavin Flower; Jeff Davis; Andres Freund; Amit Langote; Amit
 Langote; Fabrízio Mello; Thom Brown; Stephen Frost; pgsql-hackers; Haribabu 
 Kommi
 Subject: Re: [HACKERS] Parallel Seq Scan
 
  On Wed, Jul 29, 2015 at 7:32 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
  
   Hi Amit,
  
   Could you tell me the code intention around ExecInitFunnel()?
  
   ExecInitFunnel() calls InitFunnel() that opens the relation to be
   scanned by the underlying PartialSeqScan and setup ss_ScanTupleSlot
   of its scanstate.
 
  The main need is for relation descriptor which is then required to set
  the scan tuple's slot.  Basically it is required for tuples flowing from
  worker which will use the scan tuple slot of FunnelState.
 
   According to the comment of InitFunnel(), it open the relation and
   takes appropriate lock on it. However, an equivalent initialization
   is also done on InitPartialScanRelation().
  
   Why does it acquire the relation lock twice?
  
 
  I think locking twice is not required, it is just that I have used the API
  ExecOpenScanRelation() which is used during other node's initialisation
  due to which it lock's twice.  I think in general it should be harmless.
 
 Thanks, I could get reason of the implementation.
 
 It looks to me this design is not problematic even if Funnel gets capability
 to have multiple sub-plans thus is not associated with a particular relation
 as long as target-list and projection-info are appropriately initialized.
 
 Best regards,
 --
 NEC Business Creation Division / PG-Strom Project
 KaiGai Kohei kai...@ak.jp.nec.com
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests are badly named

2015-08-01 Thread Noah Misch
On Thu, Jul 30, 2015 at 07:54:27PM -0400, Andrew Dunstan wrote:
 On 07/30/2015 12:40 PM, Andrew Dunstan wrote:
 We should describe test sets by what they test, not by how they test. TAP

I agree with that philosophy.  I also respect the practicality of grouping by
test harness as a shorthand.  Tests sharing a harness tend to share
dependencies, harness bugs, logging mechanisms, etc.

 is a testing tool/protocol. The current set of tests we have test the
 programs in src/bin, and we should really name the test set by a name that
 reflects that, rather than the fact that we are using TAP tools to run the
 tests. What if we decide to test something else using TAP? Would we call
 that set of tests TAP tests too?

Yes.  TAP test refers to any test in a suite written for the prove
harness, including the existing suite outside src/bin:

$ find . -name t
./src/bin/pg_controldata/t
./src/bin/scripts/t
./src/bin/pg_rewind/t
./src/bin/pg_basebackup/t
./src/bin/pg_ctl/t
./src/bin/pg_config/t
./src/bin/initdb/t
./src/test/ssl/t

 --enable-tap-tests is a reasonable configuration setting, because it's
 about whether or not we have a TAP testing framework available, but I
 think we should stop calling the bin tests TAP tests and we should
 change the test name in vcregress.pl to a more appropriate name. In the
 buildfarm I'm calling the step bin-check: 
 http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=crakedt=2015-07-30%2012%3A25%3A58stg=bin-check
 
 Thoughts?

While lack of granularity in vcregress.pl is hostile, with so much about MSVC
development being hostile, this one is below my noise floor.

 In fact, looking more closely at the changes that have been made to
 vcregress.pl, I don't really like the way this has been done. I'm putting
 together some changes to bring things more into line with how the Makefiles
 work.

The commit history of vcregress.pl shows that doing good here is difficult,
and the rewards are low.  If you wish to write tightly-focused patches to
align vcregress.pl with the experience of testing with GNU make, I am
cautiously optimistic.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cleaning up missing ERRCODE assignments

2015-08-01 Thread Peter Geoghegan
On Sat, Aug 1, 2015 at 8:39 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I propose to back-patch this into 9.5, but not further; it's not an
 important enough issue to justify changing SQLSTATE behavior in stable
 branches.

+1. As I've said in the past, I think that making it possible to
determine mechanically that a can't happen error has in fact
occurred is of huge value to the project.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests are badly named

2015-08-01 Thread Noah Misch
On Sat, Aug 01, 2015 at 07:13:04PM -0400, Andrew Dunstan wrote:
 On 08/01/2015 04:44 PM, Noah Misch wrote:
 --enable-tap-tests is a reasonable configuration setting, because it's
 about whether or not we have a TAP testing framework available, but I
 think we should stop calling the bin tests TAP tests and we should
 change the test name in vcregress.pl to a more appropriate name. In the
 buildfarm I'm calling the step bin-check: 
 http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=crakedt=2015-07-30%2012%3A25%3A58stg=bin-check
 
 Thoughts?
 While lack of granularity in vcregress.pl is hostile, with so much about MSVC
 development being hostile, this one is below my noise floor.
 
 
 Speaking with my buildfarm hat on, it's well above mine. And these changes
 make the situation worse quite gratuitously. Anyway, I'll fix it.

Sounds good.  I'll be interested to read your design.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upgrade failure from 9.5 to head

2015-08-01 Thread Andrew Dunstan


On 08/01/2015 07:13 PM, Noah Misch wrote:

On Wed, Jul 29, 2015 at 04:42:55PM -0400, Andrew Dunstan wrote:

The next hump is this, in restoring contrib_regression_test_ddl_parse:

pg_restore: creating FUNCTION public.text_w_default_in(cstring)
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 243; 1255 62534
FUNCTION text_w_default_in(cstring) buildfarm
pg_restore: [archiver (db)] could not execute query: ERROR: pg_type
OID value not set when in binary upgrade mode
 Command was: CREATE FUNCTION text_w_default_in(cstring)
RETURNS text_w_default
 LANGUAGE internal STABLE STRICT
 AS $$texti...

Is this worth bothering about, or should I simply remove the database before
trying to upgrade?

That's a bug.  The test_ddl_deparse suite leaves a shell type, which
pg_upgrade fails to reproduce.  Whether to have pg_upgrade support that or
just error out cleanly is another question.



Well, for now I'll just drop the database.  Thanks for the diagnosis.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests are badly named

2015-08-01 Thread Andrew Dunstan


On 08/01/2015 04:44 PM, Noah Misch wrote:



--enable-tap-tests is a reasonable configuration setting, because it's
about whether or not we have a TAP testing framework available, but I
think we should stop calling the bin tests TAP tests and we should
change the test name in vcregress.pl to a more appropriate name. In the
buildfarm I'm calling the step bin-check: 
http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=crakedt=2015-07-30%2012%3A25%3A58stg=bin-check

Thoughts?

While lack of granularity in vcregress.pl is hostile, with so much about MSVC
development being hostile, this one is below my noise floor.



Speaking with my buildfarm hat on, it's well above mine. And these 
changes make the situation worse quite gratuitously. Anyway, I'll fix it.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upgrade failure from 9.5 to head

2015-08-01 Thread Noah Misch
On Wed, Jul 29, 2015 at 04:42:55PM -0400, Andrew Dunstan wrote:
 The next hump is this, in restoring contrib_regression_test_ddl_parse:
 
pg_restore: creating FUNCTION public.text_w_default_in(cstring)
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 243; 1255 62534
FUNCTION text_w_default_in(cstring) buildfarm
pg_restore: [archiver (db)] could not execute query: ERROR: pg_type
OID value not set when in binary upgrade mode
 Command was: CREATE FUNCTION text_w_default_in(cstring)
RETURNS text_w_default
 LANGUAGE internal STABLE STRICT
 AS $$texti...
 
 Is this worth bothering about, or should I simply remove the database before
 trying to upgrade?

That's a bug.  The test_ddl_deparse suite leaves a shell type, which
pg_upgrade fails to reproduce.  Whether to have pg_upgrade support that or
just error out cleanly is another question.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 64-bit XIDs again

2015-08-01 Thread Simon Riggs
On 31 July 2015 at 22:46, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 07/31/2015 12:29 AM, Josh Berkus wrote:

 On 07/30/2015 07:24 AM, Heikki Linnakangas wrote:


 You'd never be forced to do anti-wraparound
 vacuums, you could just let the clog grow arbitrarily large


 When I introduced the same idea a few years back, having the clog get
 arbitrarily large was cited as a major issue.  I was under the
 impression that clog size had some major performance impacts.


 Well, sure, if you don't want the clog to grow arbitrarily large, then you
 need to freeze.


This statement isn't quite right, things are better than that.

We don't need to freeze in order to shrink the clog, we just need to hint
and thereby ensure we move forwards the lowest unhinted xid. That does
involve scanning, but doesn't need to scan indexes. That scan won't produce
anywhere near as much additional WAL traffic or I/O.

In practice, larger clog would only happen with higher transaction rate,
which means more system resources, so I don't think its too much of a
problem overall.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Encoding of early PG messages

2015-08-01 Thread Shay Rojansky
Oh sorry, I think I misunderstood your suggestion - setting lc_messages in
the startup packet wouldn't work any more than setting client_encoding,
would it. So any solution here would be on the database/backend side, and
so irrelevant for a general-purpose driver...

On Fri, Jul 31, 2015 at 4:28 PM, Shay Rojansky r...@roji.org wrote:

 Thanks for the suggestions Tom.

 As I'm developing a general-purpose driver I can't do anything in
 PostgreSQL config, but it's a good workaround suggestion for users who
 encounter this error.

 Sending lc_messages in the startup packet could work, but if I understand
 correctly that setting combines both encoding and language. I guess I can
 look at the user's locale preference on the client machine, try to
 translate that into a PostgreSQL language/encoding and send that in
 lc_messages - that seems like it might work.

 Shay


 On Fri, Jul 31, 2015 at 3:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Shay Rojansky r...@roji.org writes:
  Developing Npgsql I've encountered the problem described in
 
 http://www.postgresql.org/message-id/20081223212414.gd3...@merkur.hilbert.loc
 :
  a German installation of PostgreSQL seems to respond to an incorrect
  password with a non-UTF8 encoding of the error messages, even if the
  startup message contains client_encoding=UTF8.

 I wouldn't hold your breath waiting for that to change.

 A possible workaround is to run the postmaster with lc_messages=C and
 then switch to your desired message language per-session.  It would
 certainly work to send lc_messages along with client_encoding in the
 startup packet; or possibly you could set those settings as per-database
 or per-role settings to avoid needing to teach the application code
 about it.  This would mean that bad-password and similar errors would
 come out in English, but at least they'd be validly encoded ...

 regards, tom lane





Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

2015-08-01 Thread Andreas Seltenreich
Tom Lane writes:

 What concerns me more is that what you're finding is only cases that trip
 an assertion sanity check.  It seems likely that you're also managing to
 trigger other bugs with less drastic consequences, such as could not
 devise a query plan failures or just plain wrong answers.

Ja, some of these are logged as well[1], but most of them are really as
undrastic as can get, and I was afraid reporting them would be more of a
nuisance.  I analysed a couple of the cache lookup failures, and they
all had a similar severreness than the example in the README[2].  The
operator ones I analysed seem due to intentionally broken operators in
the regression db.  The NestLoopParams and subplan reference one sound
interesting though…

 I'm not sure how we could identify wrong answers automatically :-(

Csmith isn't doing this either.  They discuss differential testing
though in their papers, i.e., comparing the results of different
products.  Maybe a simple metric like numbers of rows returned might
already be valuable for correctness checks.

I also thought about doing some sampling on the data and simulating
relational operations and check for witness tuples, but it is probably
not appropriate to start implementing a mini-rdbms on the client side.

 but it might be worth checking for XX000 SQLSTATE responses, since
 generally that should be a can't-happen case.  (Or if it can happen,
 we need to change the errcode.)

The sqlstate is currently missing in the reports because libpqxx is not
putting it in it's exceptions :-/.

regards,
Andreas

Footnotes: 
[1]  smith=# select * from report24h;
 count |  error 
  
---+--
 43831 | ERROR:  unsupported XML feature
 39496 | ERROR:  invalid regular expression: quantifier operand invalid
 27261 | ERROR:  canceling statement due to statement timeout
 21386 | ERROR:  operator does not exist: point = point
  8580 | ERROR:  cannot compare arrays of different element types
  5019 | ERROR:  invalid regular expression: brackets [] not balanced
  4646 | ERROR:  could not determine which collation to use for string 
comparison
  2583 | ERROR:  invalid regular expression: nfa has too many states
  2248 | ERROR:  operator does not exist: xml = xml
  1198 | ERROR:  operator does not exist: polygon = polygon
  1171 | ERROR:  cache lookup failed for index 16862
   677 | ERROR:  invalid regular expression: parentheses () not balanced
   172 | ERROR:  cache lookup failed for index 257148
84 | ERROR:  could not find member 1(34520,34520) of opfamily 1976
55 | ERROR:  missing support function 1(34516,34516) in opfamily 1976
42 | ERROR:  operator does not exist: city_budget = city_budget
13 | ERROR:  could not find commutator for operator 34538
10 | ERROR:  could not identify a comparison function for type xid
 4 | Connection to database failed
 4 | ERROR:  cache lookup failed for index 2619
 3 | ERROR:  plan should not reference subplan's variable
 2 | ERROR:  cache lookup failed for index 12322
 2 | ERROR:  failed to assign all NestLoopParams to plan nodes
 2 | ERROR:  invalid regular expression: invalid character range
 1 | ERROR:  could not find pathkey item to sort
(25 rows)
Time: 1158,990 ms

[2]  https://github.com/anse1/sqlsmith/blob/master/README.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers