Re: [HACKERS] TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

2015-07-14 Thread Simon Riggs
On 12 July 2015 at 18:50, Tom Lane t...@sss.pgh.pa.us wrote:

 Robert Haas robertmh...@gmail.com writes:
  On Sun, Jul 12, 2015 at 12:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  As best I can tell (evidence below), the SQL standard requires that if a
  single query reads a table with a TABLESAMPLE clause multiple times
 (say,
  because it's on the inside of a nestloop), then the exact same set of
  sampled rows are returned each time.

  Hmm, I tend to agree that it would be good if it behaved that way.
  Otherwise, it seems like the behavior could be quite surprising.

 Yeah.  As a concrete example, consider

 select * from t1, t2 tablesample ... where t1.x = t2.x

 and suppose that there are multiple occurences of x = 10 in both tables.
 As things stand, if the join is done as a nestloop then a particular t2
 row with x = 10 might appear in the output joined with some of the t1 rows
 with x = 10 but not with others.  On the other hand, the results of a hash
 join would not be inconsistent in that way, since t2 would be read only
 once.


Hmm, a non-key join to a sampled table. What would the meaning of such a
query be? The table would need to big enough to experience updates and also
be under current update activity. BERNOULLI isn't likely to have many users
because it is so slow. So overall, such a query is not useful and as such
unlikely.

The mechanism of sampling was discussed heavily before and there wasn't an
approach that met all of the goals: IIRC we would need to test visibility
twice on each tuple to get around these problems. Given that users of
TABLESAMPLE have already explicitly stated their preference for speed over
accuracy, minor tweaks to handle corner cases don't seem warranted.

If you have a simple, better way I would not object. Forgive me, I haven't
yet understood your proposal about sampling rule above.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] [BUGS] BUG #13126: table constraint loses its comment

2015-07-14 Thread Michael Paquier
On Tue, Jul 14, 2015 at 1:42 AM, Heikki Linnakangas wrote:
 There was one bug in this patch: the COMMENT statement that was constructed
 didn't schema-qualify the relation, so if the ALTERed table was not in
 search_path, the operation would fail with a relation not found error (or
 add the comment to wrong object). Fixed that.

Ouch. I had a look at the patches and they look neater than what I
drafted. Thanks!

 I plan to commit the attached patches later today or tomorrow. But how do
 you feel about back-patching this? It should be possible to backpatch,
 although at a quick test it seems that there have been small changes to the
 affected code in many versions, so it would require some work. Also, in
 back-branches we'd need to put the new AT_ReAddComment type to the end of
 the list, like we've done when we added AT_ReAddConstraint in 9.3. I'm
 inclined to backpatch this to 9.5 only, even though this is clearly a bug
 fix, on the grounds that it's not a very serious bug and there's always some
 risk in backpatching.

Well, while it's clearly a bug I don't think that it is worth the risk
to destabilize back branches older than 9.5 in this code path. So +1
for doing it the way you are suggesting. We could still revisit that
later on if there are more complaints, but I doubt there will be much.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-14 Thread Shulgin, Oleksandr
 Yes, but I think the plugin is the right place to do it. What is more,
this won't actually prevent you completely from producing non-ECMAScript
compliant JSON, since json or jsonb values containing offending numerics
won't be caught, AIUI.

Ah, that's a good catch indeed.

 But a fairly simple to write function that reparsed and fixed the JSON
inside the decoder would work.

Need to rethink this, but reparsing was never my favorite option here. :-)

--
Alex


[HACKERS] GSets: Fix bug involving GROUPING and HAVING together

2015-07-14 Thread Jeevan Chalke
Hi,

I have observed some fishy behavior related to GROUPING in HAVING clause
and when we have only one element in GROUPING SETS.

Basically, when we have only one element in GROUING SETS, we are assuming
it as a simple GROUP BY with one column. Due to which we are ending up with
this error.

If we have ROLLUP/CUBE or GROUPING SETS with multiple elements, then we are
not getting this error.

Here are some examples:

postgres=# select ten, grouping(ten) from onek
postgres-# group by grouping sets(ten) having grouping(ten) = 0
postgres-# order by 2,1;
ERROR:  parent of GROUPING is not Agg node
postgres=# select ten, grouping(ten) from onek
postgres-# group by grouping sets(ten, four) having grouping(ten)  0
postgres-# order by 2,1;
 ten | grouping
-+--
(0 rows)fix_bug_related_to_grouping_v1.patch

postgres=# select ten, grouping(ten) from onek
postgres-# group by rollup(ten) having grouping(ten)  0
postgres-# order by 2,1;
 ten | grouping
-+--
 |1
(1 row)

postgres=# select ten, grouping(ten) from onek
postgres-# group by cube(ten) having grouping(ten)  0
postgres-# order by 2,1;
 ten | grouping
-+--
 |1
(1 row)


I had a look over relevant code and found that contain_agg_clause_walker()
is not considering GroupingFunc as an aggregate node, due to which it is
failing to consider it in a having clause in subquery_planner().

Fix this by adding GroupingFunc node in this walker.  We do it correctly in
contain_aggs_of_level_walker() in which we have handling for GroupingFunc
there.

Attached patch to fix this.

The side effect is that, if we have plain group by clause, then too we can
use GROUPING in HAVING clause. But I guess it is fine.

Let me know if I missed anything to consider here.

Thanks

-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index d40083d..cdb6955 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -390,7 +390,7 @@ make_ands_implicit(Expr *clause)
 
 /*
  * contain_agg_clause
- *	  Recursively search for Aggref nodes within a clause.
+ *	  Recursively search for Aggref/GroupingFunc nodes within a clause.
  *
  *	  Returns true if any aggregate found.
  *
@@ -417,6 +417,11 @@ contain_agg_clause_walker(Node *node, void *context)
 		Assert(((Aggref *) node)-agglevelsup == 0);
 		return true;			/* abort the tree traversal and return true */
 	}
+	if (IsA(node, GroupingFunc))
+	{
+		Assert(((GroupingFunc *) node)-agglevelsup == 0);
+		return true;			/* abort the tree traversal and return true */
+	}
 	Assert(!IsA(node, SubLink));
 	return expression_tree_walker(node, contain_agg_clause_walker, context);
 }
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 842c2ae..adb39b3 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -486,6 +486,68 @@ having exists (select 1 from onek b where sum(distinct a.four) = b.four);
9 |   3
 (25 rows)
 
+-- HAVING with GROUPING queries
+select ten, grouping(ten) from onek
+group by grouping sets(ten) having grouping(ten) = 0
+order by 2,1;
+ ten | grouping 
+-+--
+   0 |0
+   1 |0
+   2 |0
+   3 |0
+   4 |0
+   5 |0
+   6 |0
+   7 |0
+   8 |0
+   9 |0
+(10 rows)
+
+select ten, grouping(ten) from onek
+group by grouping sets(ten, four) having grouping(ten)  0
+order by 2,1;
+ ten | grouping 
+-+--
+ |1
+ |1
+ |1
+ |1
+(4 rows)
+
+select ten, grouping(ten) from onek
+group by rollup(ten) having grouping(ten)  0
+order by 2,1;
+ ten | grouping 
+-+--
+ |1
+(1 row)
+
+select ten, grouping(ten) from onek
+group by cube(ten) having grouping(ten)  0
+order by 2,1;
+ ten | grouping 
+-+--
+ |1
+(1 row)
+
+select ten, grouping(ten) from onek
+group by (ten) having grouping(ten) = 0
+order by 2,1;
+ ten | grouping 
+-+--
+   0 |0
+   1 |0
+   2 |0
+   3 |0
+   4 |0
+   5 |0
+   6 |0
+   7 |0
+   8 |0
+   9 |0
+(10 rows)
+
 -- FILTER queries
 select ten, sum(distinct four) filter (where four::text ~ '123') from onek a
 group by rollup(ten);
diff --git a/src/test/regress/sql/groupingsets.sql b/src/test/regress/sql/groupingsets.sql
index 0bffb85..0883afd 100644
--- a/src/test/regress/sql/groupingsets.sql
+++ b/src/test/regress/sql/groupingsets.sql
@@ -154,6 +154,23 @@ select ten, sum(distinct four) from onek a
 group by grouping sets((ten,four),(ten))
 having exists (select 1 from onek b where sum(distinct a.four) = b.four);
 
+-- HAVING with GROUPING queries
+select ten, grouping(ten) from onek
+group by 

Re: [HACKERS] GSets: Fix bug involving GROUPING and HAVING together

2015-07-14 Thread Andrew Gierth
 Jeevan == Jeevan Chalke jeevan.cha...@enterprisedb.com writes:

 Jeevan Basically, when we have only one element in GROUING SETS, we
 Jeevan are assuming it as a simple GROUP BY with one column. Due to
 Jeevan which we are ending up with this error.

 Jeevan If we have ROLLUP/CUBE or GROUPING SETS with multiple elements,
 Jeevan then we are not getting this error.

There's two issues here. One you correctly identified, which is that
contain_agg_clause() should be true for GroupingFuncs too.

The other is that in subquery_planner, the optimization of converting
HAVING clauses to WHERE clauses is suppressed if parse-groupingSets
isn't empty. (It is empty if there's either no group by clause at all,
or if there's exactly one grouping set, which must not be empty, however
derived.) This is costing us some optimizations, especially in the case
of an explicit GROUP BY () clause; I'll look into this.

In the meantime your patch looks OK (and necessary) to me.

 Jeevan The side effect is that, if we have plain group by clause, then
 Jeevan too we can use GROUPING in HAVING clause. But I guess it is
 Jeevan fine.

GROUPING is, per spec, explicitly allowed anywhere you could have an
aggregate call, whether the group by clause is plain or not.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #13126: table constraint loses its comment

2015-07-14 Thread Heikki Linnakangas

On 07/14/2015 10:29 AM, Michael Paquier wrote:

On Tue, Jul 14, 2015 at 1:42 AM, Heikki Linnakangas wrote:

I plan to commit the attached patches later today or tomorrow. But how do
you feel about back-patching this? It should be possible to backpatch,
although at a quick test it seems that there have been small changes to the
affected code in many versions, so it would require some work. Also, in
back-branches we'd need to put the new AT_ReAddComment type to the end of
the list, like we've done when we added AT_ReAddConstraint in 9.3. I'm
inclined to backpatch this to 9.5 only, even though this is clearly a bug
fix, on the grounds that it's not a very serious bug and there's always some
risk in backpatching.


Well, while it's clearly a bug I don't think that it is worth the risk
to destabilize back branches older than 9.5 in this code path. So +1
for doing it the way you are suggesting. We could still revisit that
later on if there are more complaints, but I doubt there will be much.


Ok, committed to master and 9.5. Thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-14 Thread Simon Riggs
On 13 July 2015 at 14:39, Tom Lane t...@sss.pgh.pa.us wrote:

 Michael Paquier michael.paqu...@gmail.com writes:
  Regarding the fact that those two contrib modules can be part of a
  -contrib package and could be installed, nuking those two extensions
  from the tree and preventing the creating of custom tablesample
  methods looks like a correct course of things to do for 9.5.

 TBH, I think the right thing to do at this point is to revert the entire
 patch and send it back for ground-up rework.  I think the high-level
 design is wrong in many ways and I have about zero confidence in most
 of the code details as well.


Based on the various comments here, I don't see that as the right course of
action at this point.

There are no issues relating to security or data loss, just various fixable
issues in a low-impact feature, which in my view is an important feature
also.

If it's
 to stay, it *must* get a line-by-line review from some committer-level
 person; and I think there are other more important things for us to be
 doing for 9.5.


Honestly, I am very surprised by this. My feeling was the code was neat,
clear and complete, much more so than many patches I review. If I had
thought the patch or its implementation was in any way contentious I would
not have committed it.

I take responsibility for the state of the code and will put time into
addressing the concerns mentioned and others.

If we cannot resolve them in reasonable time, a revert is possible: there
is nothing riding on this from me.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Improving log capture of TAP tests with IPC::Run

2015-07-14 Thread Robert Haas
On Thu, Jul 9, 2015 at 9:43 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas hlinn...@iki.fi writes:
 Pushed, thanks.

 Shouldn't we consider back-patching these improvements into 9.5 and 9.4?
 ISTM the main point is to help debug buildfarm failures, and we won't be
 getting much benefit if only one-third of such reports have decent
 logging.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First Aggregate Funtion?

2015-07-14 Thread sudalai
The above implementation of first aggregate returns the first non-NULL item
value. 

To get *first row item value* for a column use the below implementation.

-- create a function that push  at most two element on given array 
-- push the first row value at second index of the array
CREATE OR REPLACE  FUNCTION two_value_holder(anyarray, anyelement) 
returns anyarray as  $$ 
 select case when array_length($1,1)  2 then array_append($1,$2) else
$1 end ; 
$$ language sql immutable;

-- create a function that returns second element of an array
CREATE  OR replace FUNCTION second_element (ANYARRAY) 
RETURNS ANYELEMENT AS $$ SELECT $1[2]; $$ LANGUAGE SQL;

-- create first aggregate function that return first_row item value
CREATE AGGREGATE first(anyelement)(
SFUNC=two_value_holder,
STYPE=ANYARRAY,
INITCOND='{NULL}',
FINALFUNC=second_element
);

I hope this work.. 
--
Sudalai




-
sudalai
--
View this message in context: 
http://postgresql.nabble.com/First-Aggregate-Funtion-tp1943031p5857866.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-14 Thread Robert Haas
On Fri, Jul 10, 2015 at 12:33 PM, Alexander Korotkov
a.korot...@postgrespro.ru wrote:
 I can propose following:

 1) Expose more information about current lock to user. For instance, having
 duration of current wait event, user can determine if backend is getting
 stuck on particular event without sampling.

Although this is appealing from a functionality perspective, I predict
that the overhead of it will be quite significant.  We'd need to do
gettimeofday() every time somebody calls pgstat_report_waiting(), and
if we do that every time we (say) initiate a disk write, I think
that's going to be pretty costly even on platforms where
gettimeofday() is fast, let alone those where it's slow.  If somebody
does a sequential scan of a non-cached table, I don't care to add a
gettimeofday() call for every read().

 2) Accumulate per backend statistics about each wait event type: number of
 occurrences and total duration. With this statistics user can identify
 system bottlenecks again without sampling.

This is even more expensive: now you've got to do TWO gettimeofday()
calls per wait event, one when it starts and one when it ends.  Plus,
you've got to do updates to a backend-local hash table.  It might be
that this is tolerable for wait events that only happen in contended
paths - e.g. when a lock or lwlock acquisition actually blocks, or
when we decide to do a spin-delay - but I suspect it's going to stink
for things that happen frequently even when things are going well,
like reading and writing blocks.  So the effect will either add a lot
of performance overhead, or else we just can't add some of the wait
events that people would like to see.

I really think we should do the simple thing first.  If we make this
complicated and add lots of bells and whistles, it is going to be much
harder to get anything committed, because there will be more things
for somebody to object to.  If we start with something simple, we can
always improve it later, if we are confident that the design for
improving it is good.  The hardest thing about a proposal like this is
going to be getting down the overhead to a level that is acceptable,
and every expansion of the basic design that has been proposed -
gathering more than one byte of information, or gathering times, or
having the backend update a tracking hash - adds *significant*
overhead to the design I proposed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-14 Thread Robert Haas
On Fri, Jul 10, 2015 at 7:57 AM, Andres Freund and...@anarazel.de wrote:
 pg_dump dumps security labels on databases. Which makes sense. The
 problem is that they're dumped including the database name.

 Which means that if you dump a database and restore it into a
 differently named one you'll either get a failure because the database
 does not exist, or worse you'll update the label of the wrong database.

 So I think we need CURRENT_DATABASE (or similar) support for security
 labels on databases.

 I won't have time to do anything about this anytime soon, but I think we
 should fix that at some point.  Shall I put this on the todo? Or do we
 want to create an 'open items' page that's not major version specific?

I think adding it to the TODO would be great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Roles (was: Additional role attributes)

2015-07-14 Thread Fujii Masao
On Tue, Jul 14, 2015 at 3:46 AM, Stephen Frost sfr...@snowman.net wrote:
 Fujii,

 * Fujii Masao (masao.fu...@gmail.com) wrote:
 he documents of the functions which the corresponding default roles
 are added by this patch need to be updated. For example, the description
 of pg_xlog_replay_pause() says Pauses recovery immediately (restricted
 to superusers).. I think that the description needs to mention
 the corresponding default role pg_replay. Otherwise, it's difficult for
 users to see which default role is related to the function they want to use.
 Or probably we can add the table explaining all the relationships between
 default roles and corresponding operations. And it's useful.

 Certainly, totally agree that we need to make it clear in the function
 descriptions also.

 Why do we allow users to change the attributes of default roles?
 For example, ALTER ROLE default_role or GRANT ... TO default_role.
 Those changes are not dumped by pg_dumpall. So if users change
 the attributes for some reasons but they disappear via pg_dumpall,
 maybe the system goes into unexpected state.

 Good point.  I'm fine with simply disallowing that completely; does
 anyone want to argue that we should allow superusers to ALTER or GRANT
 to these roles?  I have a hard time seeing the need for that and it
 could make things quite ugly.

 I think that it's better to allow the roles with pg_monitor to
 execute pgstattuple functions. They are usually used for monitoring.
 Thought?

 Possibly, but I'd need to look at them more closely than I have time to
 right now.  Can you provide a use-case?  That would certainly help.

I have seen the monitoring system which periodically executes
the statement like

SELECT * FROM pgstattuple('hoge');

to monitor the relation's physical length, the number of dead
tuples, etc. Then, for example, if the number of dead tuples is
increased unexpectedly and the relation becomes bloated, DBA tries
to find out the cause and execute the maintenance commands
if necessary to alleviate the situation. The monitoring system calls
pgstattuple() at off-peak times because pgstattuple() needs to
scan all the pages in the relation and its performance penalty
might be big.

 Also, we are mostly focused on things which are currently superuser-only
 capabilities, if you don't need to be superuser today then the
 monitoring system could be granted access using the normal mechanisms.

Currently only superusers can call pgstattuple().

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSets: Fix bug involving GROUPING and HAVING together

2015-07-14 Thread Jeevan Chalke
On Tue, Jul 14, 2015 at 4:23 PM, Andrew Gierth and...@tao11.riddles.org.uk
wrote:

  Jeevan == Jeevan Chalke jeevan.cha...@enterprisedb.com writes:

  Jeevan Basically, when we have only one element in GROUING SETS, we
  Jeevan are assuming it as a simple GROUP BY with one column. Due to
  Jeevan which we are ending up with this error.

  Jeevan If we have ROLLUP/CUBE or GROUPING SETS with multiple elements,
  Jeevan then we are not getting this error.

 There's two issues here. One you correctly identified, which is that
 contain_agg_clause() should be true for GroupingFuncs too.

 The other is that in subquery_planner, the optimization of converting
 HAVING clauses to WHERE clauses is suppressed if parse-groupingSets
 isn't empty. (It is empty if there's either no group by clause at all,
 or if there's exactly one grouping set, which must not be empty, however
 derived.) This is costing us some optimizations, especially in the case
 of an explicit GROUP BY () clause;


I have observed that. But I thought that it is indeed intentional.
As we don't want to choose code path for GSets when we have
only one column which is converted to plain group by and
thus setting parse-groupingSets to FALSE.


 I'll look into this.


OK. Thanks


 In the meantime your patch looks OK (and necessary) to me.

  Jeevan The side effect is that, if we have plain group by clause, then
  Jeevan too we can use GROUPING in HAVING clause. But I guess it is
  Jeevan fine.

 GROUPING is, per spec, explicitly allowed anywhere you could have an
 aggregate call, whether the group by clause is plain or not.

 --
 Andrew (irc:RhodiumToad)




-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] RLS fails to work with UPDATE ... WHERE CURRENT OF

2015-07-14 Thread Robert Haas
On Thu, Jul 9, 2015 at 5:47 PM, Joe Conway m...@joeconway.com wrote:
 On 06/08/2015 02:08 AM, Dean Rasheed wrote:
 Actually I think it is fixable just by allowing the CURRENT OF
 expression to be pushed down into the subquery through the
 security barrier view. The planner is then guaranteed to generate a
 TID scan, filtering by any other RLS quals, which ought to be the
 optimal plan. Patch attached.

 This looks good to me. I have tested and don't find any issues with
 it. Will commit in a day or so unless someone has objections.

Is this fix needed in all versions that support security barrier
views, or just in 9.5 and 9.6 that have RLS specifically?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support retrieving value from any sequence

2015-07-14 Thread David G. Johnston
On Tue, Jul 14, 2015 at 10:52 AM, Thom Brown t...@linux.com wrote:

 The use-case I have in mind is for finding out how close to the 32-bit
 integer limit sequences have reached.  At the moment, this isn't possible
 without creating a custom function to go fetch the last_value from the
 specified sequence.


​Why wouldn't you just query the catalog?​  I was under the impression last
said values were extra-transactional so that table should reflect the
global state.

What am I missing here?

David J.


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I really think we should do the simple thing first.  If we make this
 complicated and add lots of bells and whistles, it is going to be much
 harder to get anything committed, because there will be more things
 for somebody to object to.  If we start with something simple, we can
 always improve it later, if we are confident that the design for
 improving it is good.  The hardest thing about a proposal like this is
 going to be getting down the overhead to a level that is acceptable,
 and every expansion of the basic design that has been proposed -
 gathering more than one byte of information, or gathering times, or
 having the backend update a tracking hash - adds *significant*
 overhead to the design I proposed.

FWIW, I entirely share Robert's opinion that adding gettimeofday()
overhead in routinely-taken paths is likely not to be acceptable.
But there's no need to base this solely on opinions.  I suggest somebody
try instrumenting just one hotspot in the various ways that are being
proposed, and then we can actually measure what it costs, instead
of guessing about that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Support retrieving value from any sequence

2015-07-14 Thread Thom Brown
Hi all,

When using currval() to find the current value of all sequences, it chokes
on those that aren't initialised.  This is expected and documented as
behaving in this manner.  However, I think it would be useful to also
support retrieving the current value of a sequence, regardless of whether
it's been used.  As this wouldn't be to get a sequence value for the
current session, but all sessions, this would ideally get the real current
value.

The use-case I have in mind is for finding out how close to the 32-bit
integer limit sequences have reached.  At the moment, this isn't possible
without creating a custom function to go fetch the last_value from the
specified sequence.

So would it be desirable to have a function which accepts a sequence
regclass as a parameter, and returns the last_value from the sequence?

Effectively, the same result as what this provides:

CREATE FUNCTION lastval(tablename regclass) RETURNS bigint AS $$
DECLARE
  last_value bigint;
BEGIN
  EXECUTE format('SELECT last_value FROM %I ', tablename) INTO last_value
USING tablename;
  RETURN last_value;
END
$$ LANGUAGE plpgsql;

Thom


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-14 Thread Simon Riggs
On 14 July 2015 at 15:32, Tom Lane t...@sss.pgh.pa.us wrote:

 Simon Riggs si...@2ndquadrant.com writes:
  On 13 July 2015 at 14:39, Tom Lane t...@sss.pgh.pa.us wrote:
  TBH, I think the right thing to do at this point is to revert the entire
  patch and send it back for ground-up rework.  I think the high-level
  design is wrong in many ways and I have about zero confidence in most
  of the code details as well.

  There are no issues relating to security or data loss, just various
 fixable
  issues in a low-impact feature, which in my view is an important feature
  also.

 There is a *very large* amount of work needed here, and I do not hear you
 promising to do it.


I thought I had done so clearly enough, happy to do so again.

I promise that either work will be done, or the patch will be reverted.
Since I have more time now, I view that as a realistic prospect.


 What I'm hearing is stonewalling, and I am not happy.


I'm not sure what you mean by that but it sounds negative and is almost
certainly not justified, in this or other cases.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-14 Thread Simon Riggs
On 10 July 2015 at 15:11, Sawada Masahiko sawada.m...@gmail.com wrote:


 Oops, I had forgotten to add new file heapfuncs.c.
 Latest patch is attached.


I think we've established the approach is desirable and defined the way
forwards for this, so this is looking good.

Some of my requests haven't been actioned yet, so I personally would not
commit this yet. I am happy to continue as reviewer/committer unless others
wish to take over.

The main missing item is pg_upgrade support, which won't happen by end of
CF1, so I am marking this as Returned With Feedback. Hopefully we can
review this again before CF2.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-07-14 Thread Robert Haas
On Fri, Jul 10, 2015 at 8:55 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 If somebody still needs it, I'll rebase and adjust the patch towards
 the latest custom-scan interface. However, I cannot be motivated for
 the feature nobody wants.

 Robert, can you weigh in on this? Do we currently have anything in the
 tree that tests the Custom Scan interface? If not, would this be helpful
 for that purpose?

We don't have anything that currently tests the Custom Scan interface
in the tree.  The question is how important that is, and whether it's
worth having what's basically a toy implementation just to demonstrate
that the feature can work.  If so, I think ctidscan is as good a toy
example as any; in the interest of full disclosure, I was the one who
suggested it in the first place.  But I am not entirely sure it's a
good idea to saddle ourselves with that maintenance effort.  It would
be a lot more interesting if we had an example that figured to be
generally useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Enhanced ALTER OPERATOR

2015-07-14 Thread Uriy Zhuravlev
On Tuesday 14 July 2015 18:22:26 you wrote:
  I think you had copy-pasted from one of the generic 
 ALTER variants, like AlterOwnerStmt, which was overly generic for ALTER 
 OPERATOR.
You right.

 Thanks, committed after some editing. 
Thanks you. And it BIG editing. ;) 

After that, can you view my next patch?^_^ 
https://commitfest.postgresql.org/5/253/


-- 
Uriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support retrieving value from any sequence

2015-07-14 Thread David G. Johnston
On Tue, Jul 14, 2015 at 11:05 AM, Thom Brown t...@linux.com wrote:

 On 14 July 2015 at 16:02, David G. Johnston david.g.johns...@gmail.com
 wrote:

 On Tue, Jul 14, 2015 at 10:52 AM, Thom Brown t...@linux.com wrote:

 The use-case I have in mind is for finding out how close to the 32-bit
 integer limit sequences have reached.  At the moment, this isn't possible
 without creating a custom function to go fetch the last_value from the
 specified sequence.


 ​Why wouldn't you just query the catalog?​  I was under the impression
 last said values were extra-transactional so that table should reflect the
 global state.

 What am I missing here?


 Where in the catalog do you mean?


​In attempting to answer your question I now better understand your
original proposal.  Indeed the only way to get the sequence information is
to query it like a table.

This prompts the question: why a function and not (or in addition to) to a
view?​

​David J.​


Re: [HACKERS] WIP: Enhanced ALTER OPERATOR

2015-07-14 Thread Heikki Linnakangas

On 07/13/2015 03:43 PM, Uriy Zhuravlev wrote:

  Hello hackers.

Attached is a new version of patch:
* port syntax from NULL to truth NONE
* fix documentation (thanks Heikki)
* rebase to master


Thanks, committed after some editing. I put all the logic into 
AlterOperator function, in operatorcmds.c, rather than pg_operator.c. I 
think that's how we've lately organized commands. I also simplified the 
code quite a bit - I think you had copy-pasted from one of the generic 
ALTER variants, like AlterOwnerStmt, which was overly generic for ALTER 
OPERATOR. No need to look up the owner/name attributes dynamically, etc.


There was one genuine bug that I fixed: the ALTER OPERATOR command 
didn't check all the same conditions that CREATE OPERATOR did, namely 
that only binary operators can have join selectivity, and that only 
boolean operators can have restriction/join selectivity.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor issue with BRIN regression tests

2015-07-14 Thread Robert Haas
On Fri, Jul 10, 2015 at 2:56 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jun 2, 2015 at 3:11 PM, Peter Geoghegan p...@heroku.com wrote:
 Here is another patch, this time removing a useless ON CONFLICT DO UPDATE 
 test.

 Can someone commit this, please?

Removing that test doesn't seem important to me.  Why does it seem
important to you?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics / patch v7

2015-07-14 Thread Tomas Vondra

Hi,

On 07/13/2015 10:51 AM, Kyotaro HORIGUCHI wrote:


Ok, I understood the diferrence between what I thought and what you
say. The code is actually concious of OR clause but is looks somewhat
confused.


I'm not sure which part is confused by the OR clauses, but it's 
certainly possible. Initially it only handled AND clauses, and the 
support for OR clauses was added later, so it's possible some parts are 
not behaving correctly.




Currently choosing mv stats in clauselist_selectivity can be
outlined as following,

1. find_stats finds candidate mv stats containing *all*
attributes appeared in the whole clauses regardless of and/or
exprs by walking whole the clause tree.

Perhaps this is the measure to early bailout.


Not entirely. The goal of find_stats() is to lookup all stats on the 
'current' relation - it's coded the way it is because I had to deal with 
varRelid=0 cases, in which case I have to inspect the Var nodes. But 
maybe I got this wrong and there's much simpler way to do that?


It is an early bailout in the sense that if there are no multivariate 
stats defined on the table, there's no point in doing any of the 
following steps. So that we don't increase planning times for users not 
using multivariate stats.



2.1. Within every disjunction elements, collect mv-related
attributes while checking whether the all leaf nodes (binop or
ifnull) are compatible by (eventually) walking whole the
clause tree.


Generally, yes. The idea is to check whether there are clauses that 
might be estimated using multivariate statistics, and whether the 
clauses reference at least two different attributes. Imagine a query 
like this:


   SELECT * FROM t WHERE (a=1) AND (a0) AND (a100)

It makes no sense to process this using multivariate statistics, because 
all the Var nodes reference a single attribute.


Similarly, the check is not just about the leaf nodes - to be able to 
estimate a clause at this point, we have to be able to process the whole 
tree, starting from the top-level clause. Although maybe that's no 
longer true, now that support for OR clauses was added ... I wonder 
whether there are other BoolExpr-like nodes, that might make the tree 
incompatible with multivariate statistics (in the sense that the current 
implementation does not know how to handle them).


Also note that even though the clause may be incompatible at this 
level, it may get partially processed by multivariate statistics later. 
For example with a query:


   SELECT * FROM t WHERE (a=1 OR b=2 OR c ~* 'xyz') AND (q=1 OR r=4)

the first query is incompatible because it contains unsupported 
operator '~*', but it will eventually be processed as BoolExpr node, and 
should be split into two parts - (a=1 OR b=2) which is compatible, and 
(c ~* 'xyz') which is incompatible.


This split should happen in clauselist_selectivity_or(), and the other 
thing that may be interesting is that it uses (q=1 OR r=4) as a 
condition. So if there's a statistics built on (a,b,q,r) we'll compute 
conditional probability


P(a=1,b=2 | q=1,r=4)


 2.2. Check if all the collected attribute are contained in
 mv-stats columns.

No, I think you got this wrong. We do not check that *all* the 
attributes are contained in mvstats columns - we only need two such 
columns (then there's a chance that the multivariate statistics will get 
applied).


Anyway, both 2.1 and 2.2 are meant as a quick bailout, before doing the 
most expensive part, which is choose_mv_statistics(). Which is however 
missing in this list.



3. Finally, clauseset_mv_selectivity_histogram() (and others).

This funciton applies every ExprOp onto every attribute in
every histogram backes and (tries to) make the boolean
operation of the result bitmaps.


Yes, but this only happens after choose_mv_statistics(), because that's 
the code that decides which statistics will be used and in what order.


The list is also missing handling of the 'functional dependencies', so a 
complete list of steps would look like this:


1) find_stats - lookup stats on the current relation (from RelOptInfo)

2) apply functional dependencies

   a) check if there are equality clauses that may be reduced using
  functional dependencies, referencing at least two columns

   b) if yes, perform the clause reduction

3) apply MCV lists and histograms

   a) check if there are clauses 'compatible' with those types of
  statistics, again containing at least two columns

   b) if yes, use choose_mv_statistics() to decide which statistics to
 apply and in which order

   c) apply the selected histograms and MCV lists

4) estimate the remaining clauses using the regular statistics

   a) this is where the clauselist_mv_selectivity_histogram and other
  are called

I tried to explain this in the comment before clauselist_selectivity(), 
but maybe it's not detailed enough / missing some important details.




I have some comments on the implement and I 

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-14 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On 13 July 2015 at 14:39, Tom Lane t...@sss.pgh.pa.us wrote:
 TBH, I think the right thing to do at this point is to revert the entire
 patch and send it back for ground-up rework.  I think the high-level
 design is wrong in many ways and I have about zero confidence in most
 of the code details as well.

 There are no issues relating to security or data loss, just various fixable
 issues in a low-impact feature, which in my view is an important feature
 also.

There is a *very large* amount of work needed here, and I do not hear you
promising to do it.  What I'm hearing is stonewalling, and I am not happy.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support retrieving value from any sequence

2015-07-14 Thread Thom Brown
On 14 July 2015 at 16:02, David G. Johnston david.g.johns...@gmail.com
wrote:

 On Tue, Jul 14, 2015 at 10:52 AM, Thom Brown t...@linux.com wrote:

 The use-case I have in mind is for finding out how close to the 32-bit
 integer limit sequences have reached.  At the moment, this isn't possible
 without creating a custom function to go fetch the last_value from the
 specified sequence.


 ​Why wouldn't you just query the catalog?​  I was under the impression
 last said values were extra-transactional so that table should reflect the
 global state.

 What am I missing here?


Where in the catalog do you mean?

Thom


Re: [HACKERS] pg_upgrade + Extensions

2015-07-14 Thread Smitha Pamujula
On Mon, Jul 13, 2015 at 11:56 AM, Andrew Dunstan and...@dunslane.net
wrote:


 On 07/13/2015 01:12 PM, Smitha Pamujula wrote:

 Yes. I have checked that the extension didn't exist in any of the
 databases. I used \dx to see if there was json_build was listed and i didnt
 see any. Is that sufficient to check its existence. I am about to do
 another testing in a few minutes on a different machine. I will capture
 before/after shots



 Please don't top-post on the PostgreSQL lists - see 
 http://idallen.com/topposting.html

 In theory it should be enough if it was installed in the standard way. But
 a more thorough procedure would be to run this command:

select count(*)
from pg_proc
where prosrc ~ 'json_build';

 Here's a one-liner that will check every database for you:

for db in `psql  -t -c 'select datname from pg_database where
datallowconn'` ; do C=`psql -t -c select count(*) from pg_proc
where prosrc ~ 'json_build' $db`; echo $db $C; done

 cheers

 andrew



K i have tested it on our db. Sorry for the long mail, most of it is just
output from the commands. My comments are in blue.

Pre-upgrade:

psql -l
   List of databases
   Name|  Owner   | Encoding |   Collate   |Ctype|
Access 
privileges---+--+--+-+-+--
 postgres  | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
 reporting | sqitch   | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
=Tc/sqitch  +
   |  |  | | |
sqitch=CTc/sqitch   +
   |  |  | | |
owner_gulper=C/sqitch   +
   |  |  | | |
owner_reporting=C/sqitch
 template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
=c/postgres +
   |  |  | | |
postgres=CTc/postgres
 template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
=c/postgres +
   |  |  | | |
postgres=CTc/postgres


Removed the json_build extension from 9.3 database. Verified here:

for db in `psql  -t -c 'select datname from pg_database where
datallowconn'` ; do C=`psql -t -c select count(*) from pg_proc
where prosrc ~ 'json_build' $db`; echo $db $C; donetemplate1
0postgres 0reporting 0

Then I installed the pg 9.4 and started the empty instance.

psql -d postgres
psql (9.3.5, server 9.4.4)WARNING: psql major version 9.3, server
major version 9.4.
 Some psql features might not work.
Type help for help.

postgres=# \l
  List of databases
   Name|  Owner   | Encoding |   Collate   |Ctype|
Access 
privileges---+--+--+-+-+---
 postgres  | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
=c/postgres  +
   |  |  | | |
postgres=CTc/postgres
 template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
=c/postgres  +
   |  |  | | |
postgres=CTc/postgres


Now I ran the same extension check on the 94.

for db in `psql  -t -c 'select datname from pg_database where
datallowconn'` ; do C=`psql -t -c select count(*) from pg_proc
where prosrc ~ 'json_build' $db`; echo $db $C; donetemplate1
4postgres 4

I see that its got the new procs as part of the 94. Now if i do the
check link its giving me this error.

[postgres@pdxqarptsrd04 pg_94_upgrade]$ /usr/pgsql-9.4/bin/pg_upgrade
--check --link
Performing Consistency Checks
-
Checking cluster versions   ok
Checking database user is a superuser   ok
Checking database connection settings   ok
Checking for prepared transactions  ok
Checking for reg* system OID user data typesok
Checking for contrib/isn with bigint-passing mismatch   ok
Checking for invalid line user columnsok
Checking for presence of required libraries fatal

Your installation references loadable libraries that are missing from the
new installation.  You can add these libraries to the new installation,
or remove the functions using them from the old installation.  A list of
problem libraries is in the file:
loadable_libraries.txt

Failure, exiting
[postgres@pdxqarptsrd04 pg_94_upgrade]$ cat loadable_libraries.txt
Could not load library $libdir/json_build
ERROR:  could not access file $libdir/json_build: No such file or directory


[postgres@pdxqarptsrd04 pg_94_upgrade]$ rpm -qa|grep json_build
json_build93-1.0.0-1iov.x86_64

This error will go away only if I install the new json_build94.

[postgres@pdxqarptsrd04 pg_94_upgrade]$ rpm -qa|grep 

Re: [HACKERS] Support retrieving value from any sequence

2015-07-14 Thread Tom Lane
Thom Brown t...@linux.com writes:
 On 14 July 2015 at 17:17, Robert Haas robertmh...@gmail.com wrote:
 Since it's trivial to define this function if you need it, I'm not
 sure there's a reason to include it in core.

 It's not always possible to create functions on a system when access
 is restricted.  It may even be the case that procedural languages are
 prohibited, and plpgsql has been removed.

By that argument, *any* random function has to be in the core.

I really don't see what's wrong with SELECT last_value FROM sequence,
especially since that has worked in every Postgres version since 6.x.
Anyone slightly worried about backwards compatibility wouldn't use
an equivalent function even if we did add one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-14 Thread Adam Brightwell
All,

 I won't have time to do anything about this anytime soon, but I think we
 should fix that at some point.  Shall I put this on the todo? Or do we
 want to create an 'open items' page that's not major version specific?

 I think adding it to the TODO would be great.

I'd be willing to look/dive into this one further.

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Enhanced ALTER OPERATOR

2015-07-14 Thread Jeff Janes
On Tue, Jul 14, 2015 at 8:22 AM, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 07/13/2015 03:43 PM, Uriy Zhuravlev wrote:

   Hello hackers.

 Attached is a new version of patch:
 * port syntax from NULL to truth NONE
 * fix documentation (thanks Heikki)
 * rebase to master


 Thanks, committed after some editing. I put all the logic into
 AlterOperator function, in operatorcmds.c, rather than pg_operator.c. I
 think that's how we've lately organized commands. I also simplified the
 code quite a bit - I think you had copy-pasted from one of the generic
 ALTER variants, like AlterOwnerStmt, which was overly generic for ALTER
 OPERATOR. No need to look up the owner/name attributes dynamically, etc.

 There was one genuine bug that I fixed: the ALTER OPERATOR command didn't
 check all the same conditions that CREATE OPERATOR did, namely that only
 binary operators can have join selectivity, and that only boolean operators
 can have restriction/join selectivity.



I'm getting some compiler warnings now:

operatorcmds.c: In function 'AlterOperator':
operatorcmds.c:504: warning: 'address.objectSubId' may be used
uninitialized in this function
operatorcmds.c:365: note: 'address.objectSubId' was declared here
operatorcmds.c:504: warning: 'address.objectId' may be used uninitialized
in this function
operatorcmds.c:365: note: 'address.objectId' was declared here
operatorcmds.c:504: warning: 'address.classId' may be used uninitialized in
this function
operatorcmds.c:365: note: 'address.classId' was declared here

gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)


Thanks,

Jeff


Re: [HACKERS] Support retrieving value from any sequence

2015-07-14 Thread Thom Brown
On 14 July 2015 at 17:17, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jul 14, 2015 at 10:52 AM, Thom Brown t...@linux.com wrote:
  When using currval() to find the current value of all sequences, it chokes
  on those that aren't initialised.  This is expected and documented as
  behaving in this manner.  However, I think it would be useful to also
  support retrieving the current value of a sequence, regardless of whether
  it's been used.  As this wouldn't be to get a sequence value for the current
  session, but all sessions, this would ideally get the real current value.
 
  The use-case I have in mind is for finding out how close to the 32-bit
  integer limit sequences have reached.  At the moment, this isn't possible
  without creating a custom function to go fetch the last_value from the
  specified sequence.
 
  So would it be desirable to have a function which accepts a sequence
  regclass as a parameter, and returns the last_value from the sequence?
 
  Effectively, the same result as what this provides:
 
  CREATE FUNCTION lastval(tablename regclass) RETURNS bigint AS $$
  DECLARE
last_value bigint;
  BEGIN
EXECUTE format('SELECT last_value FROM %I ', tablename) INTO last_value
  USING tablename;
RETURN last_value;
  END
  $$ LANGUAGE plpgsql;

 Since it's trivial to define this function if you need it, I'm not
 sure there's a reason to include it in core.

It's not always possible to create functions on a system when access
is restricted.  It may even be the case that procedural languages are
prohibited, and plpgsql has been removed.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Enhanced ALTER OPERATOR

2015-07-14 Thread Heikki Linnakangas

On 07/14/2015 07:28 PM, Jeff Janes wrote:

I'm getting some compiler warnings now:

operatorcmds.c: In function 'AlterOperator':
operatorcmds.c:504: warning: 'address.objectSubId' may be used
uninitialized in this function
operatorcmds.c:365: note: 'address.objectSubId' was declared here
operatorcmds.c:504: warning: 'address.objectId' may be used uninitialized
in this function
operatorcmds.c:365: note: 'address.objectId' was declared here
operatorcmds.c:504: warning: 'address.classId' may be used uninitialized in
this function
operatorcmds.c:365: note: 'address.classId' was declared here

gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)


Ah, thanks, fixed. I was apparently compiling with -O0.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git push hook to check for outdated timestamps

2015-07-14 Thread Peter Eisentraut
On 7/14/15 3:44 AM, Alvaro Herrera wrote:
 Peter Eisentraut wrote:
 On 6/25/15 8:08 PM, Robert Haas wrote:
 Because I don't want to have to do git log --format=fuller to see when
 the thing was committed, basically.

 Then I suggest to you the following configuration settings:

 [format]
 pretty=cmedium
 [pretty]
 cmedium=format:%C(auto,yellow)commit %H%C(reset)%nCommit: %cn 
 %ce%nCommitDate: %cd%n%n%w(80,4,4)%B
 
 I have been using a slightly tweaked version of this and I have found
 that the %w(80,4,4)%B thingy results in mangled formatting;

I have since refined this to

... %n%n%w(0,4,4)%s%n%+b

You might find that that works better.

One of the curiosities is that the built-in log formats don't appear to
be defined or easily definable in terms of the formatting language.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git push hook to check for outdated timestamps

2015-07-14 Thread Alvaro Herrera
Peter Eisentraut wrote:
 On 7/14/15 3:44 AM, Alvaro Herrera wrote:

  I have been using a slightly tweaked version of this and I have found
  that the %w(80,4,4)%B thingy results in mangled formatting;
 
 I have since refined this to
 
 ... %n%n%w(0,4,4)%s%n%+b
 
 You might find that that works better.

Ah, yes it does, thanks.

 One of the curiosities is that the built-in log formats don't appear to
 be defined or easily definable in terms of the formatting language.

TBH I'm not surprised.  Normally the built-in formats for things grow
organically in ad-hoc ways and the mini-languages for the generic
mechanisms don't support all the same features.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Could be improved point of UPSERT

2015-07-14 Thread Peter Geoghegan
On Sun, Jul 12, 2015 at 4:09 AM, Yourfriend doudou...@gmail.com wrote:
 Suggestion:  When a conflict was found for UPSERT, don't access the
 sequence, so users can have a reasonable list of ID.

This is not technically feasible. What if the arbiter index is a serial PK?

The same thing can happen when a transaction is aborted. SERIAL is not
guaranteed to be gapless.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor issue with BRIN regression tests

2015-07-14 Thread Robert Haas
On Tue, Jul 14, 2015 at 1:39 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jul 14, 2015 at 10:28 AM, Robert Haas robertmh...@gmail.com wrote:
 And what, in your opinion, is the issue?

 The test does not match the comment above it. It looks like someone
 (possibly me) pasted one too many template queries, that were never
 appropriately modified to fit the area under consideration.

OK, now I understand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-07-14 Thread Robert Haas
On Tue, Jul 14, 2015 at 3:07 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 We don't have anything that currently tests the Custom Scan interface
 in the tree.  The question is how important that is, and whether it's
 worth having what's basically a toy implementation just to demonstrate
 that the feature can work.  If so, I think ctidscan is as good a toy
 example as any; in the interest of full disclosure, I was the one who
 suggested it in the first place.  But I am not entirely sure it's a
 good idea to saddle ourselves with that maintenance effort.  It would
 be a lot more interesting if we had an example that figured to be
 generally useful.

 As a general principle, I think it's a good idea to have a module that's
 mostly just a skeleton that guides people into writing something real to
 use whatever API is being tested.  It needs to be simple enough that not
 much need to be deleted when writing the real thing, and complex enough
 to cover the parts that need covering.  If whatever replaces ctidscan is
 too complex, it will not serve that purpose.

 My guess is that something whose only purpose is to test the custom scan
 interface for coverage purposes can be simpler than this module.

See, I actually think the opposite: I think we've been accumulating a
reasonable amount of test code that actually serves no really useful
purpose and is just cruft.  Stuff like test_shm_mq and test_decoding
seem like they actually catches bugs, so I like that, but I think
stuff like worker_spi is actually TOO simple to be useful in building
anything real, and it provides no useful test coverage, either.  But
this is all a matter of opinion, of course, and I'll defer to whatever
the consensus is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm version 1.2

2015-07-14 Thread Jeff Janes
On Tue, Jul 7, 2015 at 6:33 AM, Alexander Korotkov 
a.korot...@postgrespro.ru wrote:



 See Tom Lane's comment about downgrade scripts. I think just remove it is
 a right solution.


The new patch removes the downgrade path and the ability to install the old
version.

(If anyone wants an easy downgrade path for testing, they can keep using
the prior patch--no functional changes)

It also added a comment before the trigramsMatchGraph call.

I retained the palloc and the loop to promote the ternary array to a binary
array.  While I also think it is tempting to get rid of that by abusing the
type system and would do it that way in my own standalone code, it seems
contrary to way the project usually does things.  And I couldn't measure a
difference in performance.





 Let's consider '^(?!.*def).*abc' regular expression as an example. It
 matches strings which contains 'abc' and don't contains 'def'.

 # SELECT 'abc' ~ '^(?!.*def).*abc';
  ?column?
 --
  t
 (1 row)

 # SELECT 'def abc' ~ '^(?!.*def).*abc';
  ?column?
 --
  f
 (1 row)

 # SELECT 'abc def' ~ '^(?!.*def).*abc';
  ?column?
 --
  f
 (1 row)

 Theoretically, our trigram regex processing could support negative
 matching of 'def' trigram, i.e. trigramsMatchGraph(abc = true, def = false)
 = true but trigramsMatchGraph(abc = true, def = true) = false. Actually, it
 doesn't because trigramsMatchGraph() implements a monotonic function. I
 just think it should be stated explicitly.


Do you think it is likely to change to stop being monotonic and so support
the (def=GIN_TRUE) = false case?

^(?!.*def)   seems like a profoundly niche situation.  (Although one that I
might actually start using myself now that I know it isn't just a Perl-ism).

It doesn't make any difference to this patch, other than perhaps how to
word the comments.

Cheers,

Jeff


pg_trgm_1_2_v003.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace

2015-07-14 Thread Jim Nasby

On 7/7/15 7:07 AM, Andres Freund wrote:

On 2015-07-03 18:03:58 -0400, Tom Lane wrote:

I have just looked through this thread, and TBH I think we should reject
this patch altogether --- not RWF, but no we don't want this.  The
use-case remains hypothetical: no performance numbers showing a real-world
benefit have been exhibited AFAICS.


It's not that hard to imagine a performance benefit though? If the
toasted column is accessed infrequently/just after filtering on other
columns (not uncommon) it could very well be beneficial to put the main
table on fast storage (SSD) and the toast table on slow storage
(spinning rust).

I've seen humoungous toast tables that are barely ever read for tables
that are constantly a couple times in the field.


+1. I know of one case where the heap was ~8GB and TOAST was over 400GB.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of global temporary tables?

2015-07-14 Thread Jim Nasby

On 7/9/15 12:45 AM, Pavel Stehule wrote:


2015-07-09 7:32 GMT+02:00 Zhaomo Yang zhy...@cs.ucsd.edu
mailto:zhy...@cs.ucsd.edu:

  I am not sure, if it is not useless work.

I don't understand why an implementation taking approach 2.a would
be useless. As I said, its performance will be no worse than current
temp tables and it will provide a lot of convenience to users who
need to create temp tables in every session.


Surely it should be step forward. But you will to have to solve lot of
problems with duplicated tables in system catalogue, and still it
doesn't solve the main problem with temporary tables - the bloating
catalogue - and related performance degradation.


That being the main problem is strictly a matter of opinion based on 
your experience. Many people don't have a performance problem today, but 
do have to deal with all the pain of handling this manually (as well as 
all the limitations that go with that).


If it's easy to fix the bloat problem at the same time as adding GLOBAL 
TEMP then great! But there's no reason to reject this just because it 
doesn't fix that issue.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-07-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Both you and Andres have articulated the concern that CustomScan isn't
 actually useful, but I still don't really understand why not.

 I'm curious, for example, whether CustomScan would have been
 sufficient to build TABLESAMPLE, and if not, why not.  Obviously the
 syntax has to be in core,

... so you just made the point ...

 but why couldn't the syntax just call an
 extension-provided callback that returns a custom scan, instead of
 having a node just for TABLESAMPLE?

Because that only works for small values of work.  As far as TABLESAMPLE
goes, I intend to hold Simon's feet to the fire until there's a less
cheesy answer to the problem of scan reproducibility.  Assuming we're
going to allow sample methods that can't meet the reproducibility
requirement, we need something to prevent them from producing visibly
broken query results.  Ideally, the planner would avoid putting such a
scan on the inside of a nestloop.  A CustomScan-based implementation could
not possibly arrange such a thing; we'd have to teach the core planner
about the concern.

Or, taking the example of a GpuScan node, it's essentially impossible
to persuade the planner to delegate any expensive function calculations,
aggregates, etc to such a node; much less teach it that that way is cheaper
than doing such things the usual way.  So yeah, KaiGai-san may have a
module that does a few things with a GPU, but it's far from doing all or
even very much of what one would want.

Now, as part of the upper-planner-rewrite business that I keep hoping to
get to when I'm not riding herd on bad patches, it's likely that we might
have enough new infrastructure soon that that particular problem could
be solved.  But there would just be another problem after that; a likely
example is not having adequate statistics, or sufficiently fine-grained
function cost estimates, to be able to make valid choices once there's
more than one way to do such calculations.  (I'm not really impressed by
the GPU is *always* faster approaches.)  Significant improvements of
that sort are going to take core-code changes.

Even worse, if there do get to be any popular custom-scan extensions,
we'll break them anytime we make any nontrivial planner changes, because
there is no arms-length API there.  A trivial example is that even adding
or changing any fields in struct Path will necessarily break custom scan
providers, because they build Paths for themselves with no interposed API.

In large part this is the same as my core concern about the TABLESAMPLE
patch: exposing dubiously-designed APIs is soon going to force us to make
choices between breaking those APIs or not being able to make changes we
need to make.  In the case of custom scans, I will not be particularly
sad when (not if) we break custom scan providers; but in other cases such
tradeoffs are going to be harder to make.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS fails to work with UPDATE ... WHERE CURRENT OF

2015-07-14 Thread Dean Rasheed
On 14 July 2015 at 13:59, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jul 9, 2015 at 5:47 PM, Joe Conway m...@joeconway.com wrote:
 On 06/08/2015 02:08 AM, Dean Rasheed wrote:
 Actually I think it is fixable just by allowing the CURRENT OF
 expression to be pushed down into the subquery through the
 security barrier view. The planner is then guaranteed to generate a
 TID scan, filtering by any other RLS quals, which ought to be the
 optimal plan. Patch attached.

 This looks good to me. I have tested and don't find any issues with
 it. Will commit in a day or so unless someone has objections.

 Is this fix needed in all versions that support security barrier
 views, or just in 9.5 and 9.6 that have RLS specifically?


It's only needed in 9.5 and later for tables with RLS, because WHERE
CURRENT OF isn't supported on views.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-07-14 Thread Robert Haas
On Tue, Jul 14, 2015 at 4:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think this ties into my core unhappiness with the customscan stuff,
 which is that I don't believe it's *possible* to do anything of very
 great interest with it.  I think anything really useful will require
 core code modifications and/or hooks that don't exist now.  So a finger
 exercise like ctidscan, even though it might have some marginal use,
 doesn't do much to alleviate that concern.  It certainly doesn't seem
 like it's a suitable placeholder proving that we aren't breaking any
 actual use cases for the feature.

Both you and Andres have articulated the concern that CustomScan isn't
actually useful, but I still don't really understand why not.  KaiGai
has got working code (under a GPL license) that shows that it can be
used for what he calls GpuScan and GpuJoin, and the speedups are
actually pretty cool if you happen to have the right sort of query to
take advantage of it.  That code may be buggy and the license
precludes us using it anyway, but FWICT it does seem to work. I'd be
entirely amenable to improving the infrastructure such that it could
be used for a broader array of purposes, and I'm also amenable to
adding more hooks if we need more hooks to make it useful, but I'm not
clear at all on what you think is missing.

I'm curious, for example, whether CustomScan would have been
sufficient to build TABLESAMPLE, and if not, why not.  Obviously the
syntax has to be in core, but why couldn't the syntax just call an
extension-provided callback that returns a custom scan, instead of
having a node just for TABLESAMPLE?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-07-14 Thread Alvaro Herrera
Robert Haas wrote:

 We don't have anything that currently tests the Custom Scan interface
 in the tree.  The question is how important that is, and whether it's
 worth having what's basically a toy implementation just to demonstrate
 that the feature can work.  If so, I think ctidscan is as good a toy
 example as any; in the interest of full disclosure, I was the one who
 suggested it in the first place.  But I am not entirely sure it's a
 good idea to saddle ourselves with that maintenance effort.  It would
 be a lot more interesting if we had an example that figured to be
 generally useful.

As a general principle, I think it's a good idea to have a module that's
mostly just a skeleton that guides people into writing something real to
use whatever API is being tested.  It needs to be simple enough that not
much need to be deleted when writing the real thing, and complex enough
to cover the parts that need covering.  If whatever replaces ctidscan is
too complex, it will not serve that purpose.

My guess is that something whose only purpose is to test the custom scan
interface for coverage purposes can be simpler than this module.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory Accounting v11

2015-07-14 Thread Robert Haas
On Sat, Jul 11, 2015 at 2:28 AM, Jeff Davis pg...@j-davis.com wrote:
 After talking with a few people at PGCon, small noisy differences in CPU
 timings can appear for almost any tweak to the code, and aren't
 necessarily cause for major concern.

I agree with that in general, but the concern is a lot bigger when the
function is something that is called everywhere and accounts for a
measurable percentage of our total CPU usage on almost any workload.
If memory allocation got slower because, say, you added some code to
regexp.c and it caused AllocSetAlloc to split a cache line where it
hadn't previously, I wouldn't be worried about that; the next patch,
like as not, will buy the cost back again.  But here you really are
adding code to a hot path.

tuplesort.c does its own accounting, and TBH that seems like the right
thing to do here, too.  The difficulty is, I think, that some
transition functions use an internal data type for the transition
state, which might not be a single palloc'd chunk.  But since we can't
spill those aggregates to disk *anyway*, that doesn't really matter.
If the transition is a varlena or a fixed-length type, we can know how
much space it's consuming without hooking into the memory context
framework.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor issue with BRIN regression tests

2015-07-14 Thread Peter Geoghegan
On Tue, Jul 14, 2015 at 12:45 PM, Robert Haas robertmh...@gmail.com wrote:
 OK, now I understand.

Thanks.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-07-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 14, 2015 at 3:07 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 As a general principle, I think it's a good idea to have a module that's
 mostly just a skeleton that guides people into writing something real to
 use whatever API is being tested.  It needs to be simple enough that not
 much need to be deleted when writing the real thing, and complex enough
 to cover the parts that need covering.  If whatever replaces ctidscan is
 too complex, it will not serve that purpose.
 
 My guess is that something whose only purpose is to test the custom scan
 interface for coverage purposes can be simpler than this module.

 See, I actually think the opposite: I think we've been accumulating a
 reasonable amount of test code that actually serves no really useful
 purpose and is just cruft.  Stuff like test_shm_mq and test_decoding
 seem like they actually catches bugs, so I like that, but I think
 stuff like worker_spi is actually TOO simple to be useful in building
 anything real, and it provides no useful test coverage, either.  But
 this is all a matter of opinion, of course, and I'll defer to whatever
 the consensus is.

I think this ties into my core unhappiness with the customscan stuff,
which is that I don't believe it's *possible* to do anything of very
great interest with it.  I think anything really useful will require
core code modifications and/or hooks that don't exist now.  So a finger
exercise like ctidscan, even though it might have some marginal use,
doesn't do much to alleviate that concern.  It certainly doesn't seem
like it's a suitable placeholder proving that we aren't breaking any
actual use cases for the feature.

(BTW, if we care about the use cases this has, such as data recovery from
partially-corrupt tables, it would make way more sense and take way less
net new code to just teach TidScan about it.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-14 Thread Haribabu Kommi
On Mon, Jul 13, 2015 at 5:16 PM, Haribabu Kommi
kommi.harib...@gmail.com wrote:
 On Mon, Jul 13, 2015 at 12:06 PM, Haribabu Kommi
 kommi.harib...@gmail.com wrote:
 On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi kommi.harib...@gmail.com 
 wrote:

 I will do some performance tests and send you the results.

 Here are the performance results tested on my machine.


  Head  vm patchvm+prefetch 
 patch

 First vacuum120sec1sec 1sec
 second vacuum180 sec   180 sec30 sec

 I did some modifications in the code to skip the vacuum truncation by
 the first vacuum command.
 This way I collected the second vacuum time taken performance.

 I just combined your vm and prefetch patch into a single patch
 vm+prefetch patch without a GUC.
 I kept the prefetch as 32 and did the performance test. I chosen
 prefetch based on the current
 buffer access strategy, which is 32 for vacuum presently instead of an
 user option.
 Here I attached the modified patch with both vm+prefetch logic.

 I will do some tests on a machine with SSD and let you know the
 results. Based on these results,
 we can decide whether we need a GUC or not? based on the impact of
 prefetch on ssd machines.

 Following are the performance readings on a machine with SSD.
 I increased the pgbench scale factor to 1000 in the test instead of 500
 to show a better performance numbers.

  Head   vm patchvm+prefetch patch

 First vacuum6.24 sec   2.91 sec   2.91 sec
 second vacuum6.66 sec   6.66 sec   7.19 sec

 There is a small performance impact on SSD with prefetch.

The above prefetch overhead is observed with prefeching of 1639345 pages.
I feel this overhead is small.

Hi Jeff,

If you are fine with earlier attached patch, then I will mark this patch as
ready for committer, to get some committer view on the patch.


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-07-14 Thread Kouhei Kaigai
 -Original Message-
 From: Tom Lane [mailto:t...@sss.pgh.pa.us]
 Sent: Wednesday, July 15, 2015 5:47 AM
 To: Robert Haas
 Cc: Alvaro Herrera; hlinnaka; Kaigai Kouhei(海外 浩平); Michael Paquier; Jim
 Nasby; Kohei KaiGai; PgHacker; Simon Riggs
 Subject: Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] 
 Custom
 Plan API)
 
 Robert Haas robertmh...@gmail.com writes:
  On Tue, Jul 14, 2015 at 3:07 PM, Alvaro Herrera
  alvhe...@2ndquadrant.com wrote:
  As a general principle, I think it's a good idea to have a module that's
  mostly just a skeleton that guides people into writing something real to
  use whatever API is being tested.  It needs to be simple enough that not
  much need to be deleted when writing the real thing, and complex enough
  to cover the parts that need covering.  If whatever replaces ctidscan is
  too complex, it will not serve that purpose.
 
  My guess is that something whose only purpose is to test the custom scan
  interface for coverage purposes can be simpler than this module.
 
  See, I actually think the opposite: I think we've been accumulating a
  reasonable amount of test code that actually serves no really useful
  purpose and is just cruft.  Stuff like test_shm_mq and test_decoding
  seem like they actually catches bugs, so I like that, but I think
  stuff like worker_spi is actually TOO simple to be useful in building
  anything real, and it provides no useful test coverage, either.  But
  this is all a matter of opinion, of course, and I'll defer to whatever
  the consensus is.
 
 I think this ties into my core unhappiness with the customscan stuff,
 which is that I don't believe it's *possible* to do anything of very
 great interest with it.  I think anything really useful will require
 core code modifications and/or hooks that don't exist now.  So a finger
 exercise like ctidscan, even though it might have some marginal use,
 doesn't do much to alleviate that concern.  It certainly doesn't seem
 like it's a suitable placeholder proving that we aren't breaking any
 actual use cases for the feature.

The ctidscan is originally designed to validate the behavior of custom-scan
interface, so it is natural this module is valuable in limited cased.

However, I don't think that anything valuable usually takes core code
enhancement and/or new hooks, because we already have various extensions
around core code that utilizes existing infrastructures (even though its
specifications may be changed on major version up).
At least, custom-scan enables to implement edge-features which are not
easy to merge the core because of various reasons; like dependency to
proprietary library, too experimental features, too large code to review
as minimum valuable portion and so on.

 (BTW, if we care about the use cases this has, such as data recovery from
 partially-corrupt tables, it would make way more sense and take way less
 net new code to just teach TidScan about it.)

What I discussed with Hanada-san before was, a custom-scan provider that
replaces a particular relations join by simple scan of materialized-view
transparently. It is probably one other use case. Its design is in my
brain, but time for development is missing piece for me.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-07-14 Thread Kouhei Kaigai
 Or, taking the example of a GpuScan node, it's essentially impossible
 to persuade the planner to delegate any expensive function calculations,
 aggregates, etc to such a node; much less teach it that that way is cheaper
 than doing such things the usual way.  So yeah, KaiGai-san may have a
 module that does a few things with a GPU, but it's far from doing all or
 even very much of what one would want.

Why do we need to run all the functions on GPU device? PG-Strom simply
gives up to inject CustomPath if required qualifier contains unsupported
functions or data types, thus, these workloads are executed as usual.
I don't intend 100% coverage by GPU. That's no matter. People who use
massive numerical/mathematical workload will love GPU.

 Now, as part of the upper-planner-rewrite business that I keep hoping to
 get to when I'm not riding herd on bad patches, it's likely that we might
 have enough new infrastructure soon that that particular problem could
 be solved.  But there would just be another problem after that; a likely
 example is not having adequate statistics, or sufficiently fine-grained
 function cost estimates, to be able to make valid choices once there's
 more than one way to do such calculations.  (I'm not really impressed by
 the GPU is *always* faster approaches.)  Significant improvements of
 that sort are going to take core-code changes.

We never guarantee the interface compatibility between major version up.
If we add/modify interface on v9.6, it is duty for developer of extensions
to follow the new version, even not specific to custom-scan provider.
If v9.6 adds support fine-grained function cost estimation, I also have
to follow the feature, but it is natural.

 Even worse, if there do get to be any popular custom-scan extensions,
 we'll break them anytime we make any nontrivial planner changes, because
 there is no arms-length API there.  A trivial example is that even adding
 or changing any fields in struct Path will necessarily break custom scan
 providers, because they build Paths for themselves with no interposed API.

I cannot understand... If Path field gets changed, it is duty of extension
to follow the core change. We usually add/modify API specifications on
major version up. For example, I remember ProcessUtility_hook has been
changed a few times in the last several years.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory Accounting v11

2015-07-14 Thread Tomas Vondra

Hi,

On 07/14/2015 10:19 PM, Robert Haas wrote:

On Sat, Jul 11, 2015 at 2:28 AM, Jeff Davis pg...@j-davis.com wrote:

After talking with a few people at PGCon, small noisy differences
in CPU timings can appear for almost any tweak to the code, and
aren't necessarily cause for major concern.


I agree with that in general, but the concern is a lot bigger when the
function is something that is called everywhere and accounts for a
measurable percentage of our total CPU usage on almost any workload.
If memory allocation got slower because, say, you added some code to
regexp.c and it caused AllocSetAlloc to split a cache line where it
hadn't previously, I wouldn't be worried about that; the next patch,
like as not, will buy the cost back again.  But here you really are
adding code to a hot path.


I think Jeff was suggesting that we should ignore changes measurably 
affecting performance - I'm one of those he discussed this patch with at 
pgcon, and I can assure you impact on performance was one of the main 
topics of the discussion.


Firstly, do we really have good benchmarks and measurements? I really 
doubt that. We do have some numbers for REINDEX, where we observed 
0.5-1% regression on noisy results from a Power machine (and we've been 
unable to reproduce that on x86). I don't think that's a representative 
benchmark, and I'd like to see more thorough measurements. And I agreed 
to do this, once Jeff comes up with a new version of the patch.


Secondly, the question is whether the performance is impacted more by 
the additional instructions, or by other things - say, random padding, 
as was explained by Andrew Gierth in [1].


I don't know whether that's happening in this patch, but if it is, it 
seems rather strange to use this against this patch and not the others 
(because there surely will be other patches causing similar issues).


[1] 
http://www.postgresql.org/message-id/87vbitb2zp@news-spur.riddles.org.uk



tuplesort.c does its own accounting, and TBH that seems like the right
thing to do here, too.  The difficulty is, I think, that some
transition functions use an internal data type for the transition
state, which might not be a single palloc'd chunk.  But since we can't
spill those aggregates to disk *anyway*, that doesn't really matter.
If the transition is a varlena or a fixed-length type, we can know how
much space it's consuming without hooking into the memory context
framework.


I respectfully disagree. Our current inability to dump/load the state 
has little to do with how we measure consumed memory, IMHO.


It's true that we do have two kinds of aggregates, depending on the 
nature of the aggregate state:


(a) fixed-size state (data types passed by value, variable length types
that do not grow once allocated, ...)

(b) continuously growing state (as in string_agg/array_agg)

Jeff's HashAgg patch already fixes (a) and can fix (b) once we get a 
solution for dump/load of the aggregate stats - which we need to 
implement anyway for parallel aggregate.


I know there was a proposal to force all aggregates to use regular data 
types as aggregate stats, but I can't see how that could work without a 
significant performance penalty. For example array_agg() is using 
internal to pass ArrayBuildState - how do you turn that to a regular 
data type without effectively serializing/deserializing the whole array 
on every transition?


And even if we come up with a solution for array_agg, do we really 
believe it's possible to do for all custom aggregates? Maybe I'm missing 
something but I doubt that. ISTM designing ephemeral data structure 
allows tweaks that are impossible otherwise.


What might be possible is extending the aggregate API with another 
custom function returning size of the aggregate state. So when defining 
an aggregate using 'internal' for aggregate state, you'd specify 
transfunc, finalfunc and sizefunc. That seems doable, I guess.


I find the memory accounting as a way more elegant solution, though.

kind regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-14 Thread Noah Misch
On Tue, Jul 14, 2015 at 11:14:55AM +0100, Simon Riggs wrote:
 On 13 July 2015 at 14:39, Tom Lane t...@sss.pgh.pa.us wrote:
  TBH, I think the right thing to do at this point is to revert the entire
  patch and send it back for ground-up rework.  I think the high-level
  design is wrong in many ways and I have about zero confidence in most
  of the code details as well.
 
 
 Based on the various comments here, I don't see that as the right course of
 action at this point.
 
 There are no issues relating to security or data loss, just various fixable
 issues in a low-impact feature, which in my view is an important feature
 also.
 
  If it's
  to stay, it *must* get a line-by-line review from some committer-level
  person; and I think there are other more important things for us to be
  doing for 9.5.
 
 
 Honestly, I am very surprised by this.

Tom's partial review found quite a crop of unvarnished bugs:

1. sample node can give different tuples across rescans within an executor run
2. missing dependency machinery to restrict dropping a sampling extension
3. missing pg_dump --binary-upgrade treatment
4. potential core dumps due to dereferencing values that could be null
5. factually incorrect comments
6. null argument checks in strict functions
7. failure to check for constisnull
8. failure to sanity-check ntuples
9. arithmetic errors in random_relative_prime()

(That's after sifting out design counterproposals, feature requests, and other
topics of regular disagreement.  I erred on the side of leaving things off
that list.)  Finding one or two like that during a complete post-commit review
would be business as usual.  Finding nine in a partial review diagnoses a
critical shortfall in pre-commit review vigilance.  Fixing the bugs found to
date will not cure that shortfall.  A qualified re-review could cure it.

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assessing parallel-safety

2015-07-14 Thread Amit Kapila
On Fri, Jul 3, 2015 at 5:30 PM, Amit Kapila amit.kapil...@gmail.com wrote:


 Attached, find the rebased patch which can be used to review/test latest
 version of parallel_seqscan patch which I am going to post in the parallel
 seq. scan thread soonish.


I went ahead and completed this patch with respect to adding new clause
in CREATE/ALTER FUNCTION which can allow users to define the
parallel property for User defined functions.

The new clause is defined as

Create Function ... PARALLEL {SAFE | RESTRICTED | UNSAFE}
Alter Function ... PARALLEL {SAFE | RESTRICTED | UNSAFE}

I have added PARALLEL as non-reserved keyword and store other
parameters as options which can be verified during CreateFunction.

Apart from this, added pg_dump support and updated the docs.

I have changed the parallel safety for some of the newly added
functions for pg_replication_origin* which will be defined as:

pg_replication_origin_create - unsafe, because it can start a transaction

pg_replication_origin_drop   - unsafe, because it can start a transaction

pg_replication_origin_session_setup - unsafe, because it changes shared
memory state(ReplicationState) that is not shared among workers.

pg_replication_origin_session_reset - unsafe, because it changes shared
memory state(ReplicationState) that is not shared among workers.

pg_replication_origin_advance  - unsafe, because it changes shared
memory state(ReplicationState) that is not shared among workers.

pg_replication_origin_progress - unsafe, because it can call XlogFlush
which can change shared memory state.

pg_replication_origin_session_progress - unsafe, because it can call
XlogFlush which can change shared memory state.

pg_replication_origin_xact_setup- Restricted,
because replorigin_sesssion_origin_lsn is not shared

pg_replication_origin_xact_reset- Restricted,
because replorigin_sesssion_origin_lsn is not shared

pg_replication_origin_session_is_setup - Restricted,
because replorigin_sesssion_origin is not shared

pg_show_replication_origin_status - Restricted, because
ReplicationState is not shared.

pg_replication_origin_oid - Safe



One issue which I think we should fix in this patch as pointed earlier
is, in some cases, Function containing Select stmt won't be able to
use parallel plan even though it is marked as parallel safe.

create or replace function parallel_func_select() returns integer
as $$
declare
ret_val int;
begin
Select c1 into ret_val from t1 where c1 = 1;
return ret_val;
end;
$$ language plpgsql Parallel Safe;

Above function won't be able to use parallel plan for Select statement
because of below code:

static int
exec_stmt_execsql(PLpgSQL_execstate *estate,
  PLpgSQL_stmt_execsql
*stmt)
{
..
if (expr-plan == NULL)
{
..
exec_prepare_plan(estate, expr, 0);
}
..
}

As the cursorOptions passed in this function are always 0, planner
will treat it as unsafe function. Shouldn't we need to check parallelOk
to pass cursoroption in above function as is done in exec_run_select()
in patch?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


assess-parallel-safety-v7.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-14 Thread Kouhei Kaigai
 That doesn't answer my question. I'm talking about a client and server
 running on the same system with SELinux MLS policy so that getpeercon
 will return the context of the client process unless it has explicitly
 sets the socket create context . So again will postgresql if the
 sepgsql module is loaded call a function in sepgsql to compute the
 access vector for the source (getpeercon label) contexts access to the
 target context (tables context set by SECURITY LABEL) and fail the
 operation generating an AVC if access is denied because there is no
 policy?

Yes. You may see AVC denial/allowed message on PostgreSQL log, like:

LOG:  SELinux: allowed { create } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_table 
name=regtest_schema.regtest_table

scontext comes from getpeercon(3),
tcontext comes from the configuration by SECURITY LABEL command.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-14 Thread Beena Emerson
On Jul 14, 2015 7:15 AM, Fujii Masao masao.fu...@gmail.com wrote:

 On Tue, Jul 14, 2015 at 9:00 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  On Mon, Jul 13, 2015 at 10:34 PM, Fujii Masao wrote:
  On Fri, Jul 10, 2015 at 10:06 PM, Beena Emerson wrote:
  On Tue, Jul 7, 2015 at 2:19 PM, Michael Paquier wrote:
 
  Something like pg_syncinfo/ coupled with a LW lock, we already do
  something similar for replication slots with pg_replslot/.
 
  I was trying to figure out how the JSON metadata can be used.
  It would have to be set using a given set of functions.
 
  So we can use only such a set of functions to configure synch rep?
  I don't like that idea. Because it prevents us from configuring that
  while the server is not running.
 
  If you store a json blob in a set of files of PGDATA you could update
  them manually there as well. That's perhaps re-inventing the wheel
  with what is available with GUCs though.

 Why don't we just use GUC? If the quorum setting is not so complicated
 in real scenario, GUC seems enough for that.

I agree GUC would be enough.
We could also name groups in it.

I am thinking of the following format similar to JSON

group_name:count (list)
Use of square brackets for priority.

Ex:
s_s_names = 'remotes: 2 (london: 1 [lndn1, lndn2], nyc: 1[nyc1,nyc2])'

Regards,

Beena Emerson


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-14 Thread Simon Riggs
On 15 July 2015 at 05:58, Noah Misch n...@leadboat.com wrote:


   If it's
   to stay, it *must* get a line-by-line review from some committer-level
   person; and I think there are other more important things for us to be
   doing for 9.5.
  
 
  Honestly, I am very surprised by this.

 Tom's partial review found quite a crop of unvarnished bugs:

 1. sample node can give different tuples across rescans within an executor
 run
 2. missing dependency machinery to restrict dropping a sampling extension
 3. missing pg_dump --binary-upgrade treatment
 4. potential core dumps due to dereferencing values that could be null
 5. factually incorrect comments
 6. null argument checks in strict functions
 7. failure to check for constisnull
 8. failure to sanity-check ntuples
 9. arithmetic errors in random_relative_prime()

 (That's after sifting out design counterproposals, feature requests, and
 other
 topics of regular disagreement.  I erred on the side of leaving things off
 that list.)  Finding one or two like that during a complete post-commit
 review
 would be business as usual.  Finding nine in a partial review diagnoses a
 critical shortfall in pre-commit review vigilance.  Fixing the bugs found
 to
 date will not cure that shortfall.  A qualified re-review could cure it.


Thank you for the summary of points. I agree with that list.

I will work on the re-review as you suggest.

1 and 4 relate to the sample API exposed, which needs some rework. We'll
see how big that is; at this time I presume not that hard, but I will wait
for Petr's opinion also when he returns on Friday.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] creating extension including dependencies

2015-07-14 Thread Michael Paquier
On Fri, Jul 10, 2015 at 11:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@anarazel.de writes:
 On July 10, 2015 4:16:59 PM GMT+02:00, Tom Lane t...@sss.pgh.pa.us wrote:
 Would that propagate down through multiple levels of CASCADE?  (Although
 I'm not sure it would be sensible for a non-relocatable extension to
 depend on a relocatable one, so maybe the need doesn't arise in
 practice.)

 I'd day so. I was thinking it'd adding a flag that allows to pass a
 schema to a non relocatable extension. That'd then be passed down. I
 agree that it's unlikely to be used often.

 Yeah, I was visualizing it slightly differently, as a separate
 internal-only option schema_if_needed, but it works out to the
 same thing either way.

I just had a look at this patch, and here are some comments:
+ [ RECURSIVE ]
After reading the thread, CASCADE sounds like a good thing as well to me.

+   /* Create and execute new CREATE
EXTENSION statement. */
+   ces = makeNode(CreateExtensionStmt);
+   ces-extname = curreq;
+   ces-if_not_exists = false;
+   parents =
lappend(list_copy(recursive_parents), stmt-extname);
+   ces-options =
list_make1(makeDefElem(recursive,
+
   (Node *) parents));
+   CreateExtension(ces);
+   list_free(parents);
ces should be free'd after calling CreateExtension perhaps?

The test_ext*--*.sql files should not be completely empty. They should
include a header like this one (hoge is the Japanese foo...):
/* src/test/modules/test_extension/hoge--1.0.sql */
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use CREATE EXTENSION hoge to load this file. \quit

That is good practice compared to the other modules, and this way
there is no need to have Makefile for example touch'ing those files
before installing them (I have created them manually to test this
patch).

The list of contrib modules excluded from build in the case of MSVC
needs to include test_extensions ($contrib_excludes in
src/tools/msvc/Mkvcbuild.pm) or build on Windows using MS of VC will
fail. commit_ts does that for example.

Regression tests of contrib/ modules doing transforms should be
updated to use this new stuff IMO. That's not part of the core patch
obviously, but it looks worth including them as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-14 Thread Ted Toth
That doesn't answer my question. I'm talking about a client and server
running on the same system with SELinux MLS policy so that getpeercon
will return the context of the client process unless it has explicitly
sets the socket create context . So again will postgresql if the
sepgsql module is loaded call a function in sepgsql to compute the
access vector for the source (getpeercon label) contexts access to the
target context (tables context set by SECURITY LABEL) and fail the
operation generating an AVC if access is denied because there is no
policy?

On Tue, Jul 14, 2015 at 8:35 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 So if I label a table with an SELinux context and the type of my
 client connection does not have policy to be able to access the table
 type will an AVC be generated and the access denied?

 Of course, it depends on the policy of the system.

 If client connection come from none-SELinux system, use netlabelctl
 to configure default fallback security context. It gives getpeercon(3)
 the client label shall be applied when netlabel is not configured on
 the connection.

 Thanks,
 --
 NEC Business Creation Division / PG-Strom Project
 KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Ted Toth
 Sent: Wednesday, July 15, 2015 2:59 AM
 To: Kohei KaiGai
 Cc: Robert Haas; Adam Brightwell; Andres Freund; 
 pgsql-hackers@postgresql.org;
 Alvaro Herrera
 Subject: Re: [HACKERS] security labels on databases are bad for dump  
 restore

 So if I label a table with an SELinux context and the type of my
 client connection does not have policy to be able to access the table
 type will an AVC be generated and the access denied?

 Ted

 On Tue, Jul 14, 2015 at 12:53 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
  2015-07-15 2:39 GMT+09:00 Ted Toth txt...@gmail.com:
  That's exactly what I'm talking about like I said KaiGais branch was
  never merged into the mainline so I do not believe that it is used at
  all.
 
  It depends on the definition of integrated.
  The PostgreSQL core offers an infrastructure for label based security
  mechanism, not only selinux. Also, one extension module that is
  usually distributed with PosgreSQL bridges the world of database and
  the world of selinux (even though all the features I initially designed
  are not yet implemented). I like to say it is integrated.
 
  On Tue, Jul 14, 2015 at 12:28 PM, Robert Haas robertmh...@gmail.com 
  wrote:
  On Tue, Jul 14, 2015 at 1:22 PM, Ted Toth txt...@gmail.com wrote:
  I'm sort of new to this so maybe I'm missing something but since the
  sepgsql SELinux userspace object manager was never integrated into
  postgresql (AFAIK KaiGais branch was never merged into the mainline)
  who uses these labels? What use are they?
 
  See contrib/sepgsql
 
  --
  Robert Haas
  EnterpriseDB: http://www.enterprisedb.com
  The Enterprise PostgreSQL Company
 
 
 
  --
  KaiGai Kohei kai...@kaigai.gr.jp


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-14 Thread Kouhei Kaigai
 So if I label a table with an SELinux context and the type of my
 client connection does not have policy to be able to access the table
 type will an AVC be generated and the access denied?

Of course, it depends on the policy of the system.

If client connection come from none-SELinux system, use netlabelctl
to configure default fallback security context. It gives getpeercon(3)
the client label shall be applied when netlabel is not configured on
the connection.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Ted Toth
 Sent: Wednesday, July 15, 2015 2:59 AM
 To: Kohei KaiGai
 Cc: Robert Haas; Adam Brightwell; Andres Freund; pgsql-hackers@postgresql.org;
 Alvaro Herrera
 Subject: Re: [HACKERS] security labels on databases are bad for dump  restore
 
 So if I label a table with an SELinux context and the type of my
 client connection does not have policy to be able to access the table
 type will an AVC be generated and the access denied?
 
 Ted
 
 On Tue, Jul 14, 2015 at 12:53 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
  2015-07-15 2:39 GMT+09:00 Ted Toth txt...@gmail.com:
  That's exactly what I'm talking about like I said KaiGais branch was
  never merged into the mainline so I do not believe that it is used at
  all.
 
  It depends on the definition of integrated.
  The PostgreSQL core offers an infrastructure for label based security
  mechanism, not only selinux. Also, one extension module that is
  usually distributed with PosgreSQL bridges the world of database and
  the world of selinux (even though all the features I initially designed
  are not yet implemented). I like to say it is integrated.
 
  On Tue, Jul 14, 2015 at 12:28 PM, Robert Haas robertmh...@gmail.com 
  wrote:
  On Tue, Jul 14, 2015 at 1:22 PM, Ted Toth txt...@gmail.com wrote:
  I'm sort of new to this so maybe I'm missing something but since the
  sepgsql SELinux userspace object manager was never integrated into
  postgresql (AFAIK KaiGais branch was never merged into the mainline)
  who uses these labels? What use are they?
 
  See contrib/sepgsql
 
  --
  Robert Haas
  EnterpriseDB: http://www.enterprisedb.com
  The Enterprise PostgreSQL Company
 
 
 
  --
  KaiGai Kohei kai...@kaigai.gr.jp
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Could be improved point of UPSERT

2015-07-14 Thread Gianni
On Tuesday 14 July 2015 11:33:34 Peter Geoghegan wrote:
 On Sun, Jul 12, 2015 at 4:09 AM, Yourfriend doudou...@gmail.com wrote:
  Suggestion:  When a conflict was found for UPSERT, don't access the
  sequence, so users can have a reasonable list of ID.
 
 This is not technically feasible. What if the arbiter index is a serial PK?
 
 The same thing can happen when a transaction is aborted. SERIAL is not
 guaranteed to be gapless.

Could there be a version of UPSERT where an update is tried, and if 0 records 
are modified, an insert is done?

Just wondering, I haven't got am use-case for that.  I don't mid gaps in 
sequences.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in bttext_abbrev_convert()

2015-07-14 Thread Peter Geoghegan
On Mon, Jul 6, 2015 at 12:52 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 I suggest CC'ing Peter as a first measure.  I already suggested this (or
 something similar) to him months ago.

This would be a worthwhile effort. tuplesort.c only has 46% coverage.
There is no coverage for functions that I know are used all the time
in production, like dumptuples(), or ExecHashIncreaseNumBatches().

We should make increasing test coverage an explicit goal. Ideally,
there'd be a lower quality set of tests that fill in certain gaps in
code coverage, but are used less frequently, and may take much longer
to run. Simply having the code executed will allow tools like Valgrind
to catch bugs.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forensic recovery deleted pgdump custom format file

2015-07-14 Thread David Guimaraes
Yes Michael, I agree.

This is the CloseArchive function at pg_backup_custom.c

WriteHead(AH);
tpos = ftello(AH-FH);
WriteToc(AH);
ctx-dataStart = _getFilePos(AH, ctx);
WriteDataChunks(AH);

This is the WriteHead function at pg_backup_archiver.c:

(*AH-WriteBufPtr) (AH, PGDMP, 5); /* Magic code */
(*AH-WriteBytePtr) (AH, AH-vmaj);
(*AH-WriteBytePtr) (AH, AH-vmin);
(*AH-WriteBytePtr) (AH, AH-vrev);
(*AH-WriteBytePtr) (AH, AH-intSize);
(*AH-WriteBytePtr) (AH, AH-offSize);
(*AH-WriteBytePtr) (AH, AH-format);
WriteInt(AH, AH-compression);
crtm = *localtime(AH-createDate);
WriteInt(AH, crtm.tm_sec);
WriteInt(AH, crtm.tm_min);
WriteInt(AH, crtm.tm_hour);
WriteInt(AH, crtm.tm_mday);
WriteInt(AH, crtm.tm_mon);
WriteInt(AH, crtm.tm_year);
WriteInt(AH, crtm.tm_isdst);
WriteStr(AH, PQdb(AH-connection));
WriteStr(AH, AH-public.remoteVersionStr);
WriteStr(AH, PG_VERSION);

There is no mention to File Size or whatsoever in the Header..

WriteToc, however write the number of TOCs structs at the beginning:

void WriteToc(ArchiveHandle *AH) {
...
WriteInt(AH, tocCount);

but these structs are dynamic(linked list), so there is no way to know the
size of each one...

At the definition of tocEntry struct, there is no reference to size or
anything like that.. it is a linked list with a count number.

And at the end, the CloseArchive function calls WriteDataChunks to write
blob information... i don't understand what this function is doing.. it
save size information of blob data at the beginning?

(*te-dataDumper) ((Archive *) AH, te-dataDumperArg);

What this function does?



David


On Mon, Jul 13, 2015 at 11:00 PM, Michael Paquier michael.paqu...@gmail.com
 wrote:

 On Tue, Jul 14, 2015 at 11:20 AM, David Guimaraes skys...@gmail.com
 wrote:
  Yeah bingo

 Hm. While there is a magic-code header for the custom format, by
 looking at the code I am not seeing any traces of a similar thing at
 the end of the dump file (_CloseArchive in pg_backup_custom.c), and I
 don't recall wither that there is an estimation of the size of the
 dump either in the header. If those files were stored close to each
 other, one idea may be to look for the next header present. or to
 attempt to roughly estimate the size that they would have I am afraid.
 In any case, applying reverse engineering methods seems like the most
 reliable method to reconstitute an archive handler that could be used
 by pg_restore or pg_dump, but perhaps others have other ideas.
 --
 Michael




-- 
David Gomes Guimarães


Re: [HACKERS] Could be improved point of UPSERT

2015-07-14 Thread Peter Geoghegan
On Tue, Jul 14, 2015 at 11:40 AM, Gianni nasus.maxi...@gmail.com wrote:
 Could there be a version of UPSERT where an update is tried, and if 0 records
 are modified, an insert is done?

 Just wondering, I haven't got am use-case for that.  I don't mid gaps in
 sequences.

Perhaps, if you don't mind having a severely restricted set of
qualifications in the UPDATE, which the existing command effectively
has anyway. That would be a very odd thing.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor issue with BRIN regression tests

2015-07-14 Thread Peter Geoghegan
On Tue, Jul 14, 2015 at 9:27 AM, Robert Haas robertmh...@gmail.com wrote:
 Removing that test doesn't seem important to me.  Why does it seem
 important to you?

It's a minor issue, but it's easily fixed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor issue with BRIN regression tests

2015-07-14 Thread Robert Haas
On Tue, Jul 14, 2015 at 1:26 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jul 14, 2015 at 9:27 AM, Robert Haas robertmh...@gmail.com wrote:
 Removing that test doesn't seem important to me.  Why does it seem
 important to you?

 It's a minor issue, but it's easily fixed.

And what, in your opinion, is the issue?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-14 Thread Robert Haas
On Tue, Jul 14, 2015 at 1:22 PM, Ted Toth txt...@gmail.com wrote:
 I'm sort of new to this so maybe I'm missing something but since the
 sepgsql SELinux userspace object manager was never integrated into
 postgresql (AFAIK KaiGais branch was never merged into the mainline)
 who uses these labels? What use are they?

See contrib/sepgsql

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] Incremental checksums

2015-07-14 Thread Jim Nasby

On 7/13/15 4:02 PM, David Christensen wrote:



On Jul 13, 2015, at 3:50 PM, Jim Nasby jim.na...@bluetreble.com wrote:

On 7/13/15 3:26 PM, David Christensen wrote:

* Incremental Checksums

PostgreSQL users should have a way up upgrading their cluster to use data 
checksums without having to do a costly pg_dump/pg_restore; in particular, 
checksums should be able to be enabled/disabled at will, with the database 
enforcing the logic of whether the pages considered for a given database are 
valid.

Considered approaches for this are having additional flags to pg_upgrade to set 
up the new cluster to use checksums where they did not before (or optionally 
turning these off).  This approach is a nice tool to have, but in order to be 
able to support this process in a manner which has the database online while 
the database is going throught the initial checksum process.


It would be really nice if this could be extended to handle different page 
formats as well, something that keeps rearing it's head. Perhaps that could be 
done with the cycle idea you've described.


I had had this thought too, but the main issues I saw were that new page 
formats were not guaranteed to take up the same space/storage, so there was an 
inherent limitation on the ability to restructure things out *arbitrarily*; 
that being said, there may be a use-case for the types of modifications that 
this approach *would* be able to handle.


After some discussion on IRC, I there's 2 main points to consider.

First, we're currently unhappy with how relfrozenxid works, and this 
proposal follows the same pattern of having essentially a counter field 
in pg_class. Perhaps this is OK because things like checksum really 
shouldn't change that often. (My inclination is that fields in pg_class 
are OK for now.)


Second, there are 4 use cases here that are very similar. We should 
*consider* them now, while designing this. That doesn't mean the first 
patch needs to support anything other than checksums.


1) Page layout changes
2) Page run-time changes (currently only checksums)
3) Tuple layout changes (ie: HEAP_MOVED_IN)
4) Tuple run-time changes (ie: DROP COLUMN)

1 is currently handled in pg_upgrade by forcing a page-by-by-page copy 
during upgrade. Doing this online would require the same kind of 
conversion plugin pg_upgrade uses. If we want to support conversions 
that need extra free space on a page we'd also need support for that.


2 is similar to 1, except this can change via GUC or similar. Checksums 
are an example of this, as is creating extra free space on a page to 
support an upgrade.


3  4 are tuple-level equivalents to 1  2.

I think the bigger challenge to these things is how to track the status 
of a conversion (as opposed to the conversion function itself).


- Do we want each of these to have a separate counter in pg_class? 
(rellastchecksum, reloldestpageversion, etc)


- Should that info be combined?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-14 Thread Alvaro Herrera
Michael Paquier wrote:
 On Mon, Jul 13, 2015 at 9:03 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:

  We use script file which are generated by pg_upgrade.
 
 I haven't followed this thread closely, but I am sure you recall that
 vacuumdb has a parallel mode.

I think having to vacuum the whole database during pg_upgrade (or
immediately thereafter, which in practice means that the database is
unusable for queries until that has finished) is way too impractical.
Even in parallel mode, it could take far too long.  People already
complain that our upgrading procedure takes too long as opposed to that
of other database systems.

I don't think there's any problem with rewriting the existing server's
VM file into vfm format during pg_upgrade, since we expect those files
to be much smaller than the data itself.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git push hook to check for outdated timestamps

2015-07-14 Thread Alvaro Herrera
Peter Eisentraut wrote:
 On 6/25/15 8:08 PM, Robert Haas wrote:
  Because I don't want to have to do git log --format=fuller to see when
  the thing was committed, basically.
 
 Then I suggest to you the following configuration settings:
 
 [format]
 pretty=cmedium
 [pretty]
 cmedium=format:%C(auto,yellow)commit %H%C(reset)%nCommit: %cn 
 %ce%nCommitDate: %cd%n%n%w(80,4,4)%B

I have been using a slightly tweaked version of this and I have found
that the %w(80,4,4)%B thingy results in mangled formatting; for
instance, commit bbfd7edae5 ends with this:

Discussion: 54b58ba3.8040...@ohmu.fi Author: Oskari Saarenmaa, with some
minor changes by me.

whereas it originally was written as

Discussion: 54b58ba3.8040...@ohmu.fi
Author: Oskari Saarenmaa, with some minor changes by me.

I find this a bad enough problem that I'll probably have to remove that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-14 Thread Ted Toth
I'm sort of new to this so maybe I'm missing something but since the
sepgsql SELinux userspace object manager was never integrated into
postgresql (AFAIK KaiGais branch was never merged into the mainline)
who uses these labels? What use are they?

Ted

On Tue, Jul 14, 2015 at 12:09 PM, Adam Brightwell
adam.brightw...@crunchydatasolutions.com wrote:
 All,

 I won't have time to do anything about this anytime soon, but I think we
 should fix that at some point.  Shall I put this on the todo? Or do we
 want to create an 'open items' page that's not major version specific?

 I think adding it to the TODO would be great.

 I'd be willing to look/dive into this one further.

 -Adam

 --
 Adam Brightwell - adam.brightw...@crunchydatasolutions.com
 Database Engineer - www.crunchydatasolutions.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add missing \ddp psql command

2015-07-14 Thread Alvaro Herrera
Fujii Masao wrote:

 Sometimes I type TAB after \ to display all the psql meta commands.
 Even single-character completion like \s may be useful for that case.
 Yeah, I agree that's narrow use case, though.

I agree that that's useful, so thanks for having pushed it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-14 Thread Jeff Janes
On Tue, May 26, 2015 at 4:58 AM, Uriy Zhuravlev u.zhurav...@postgrespro.ru
wrote:

 Hello.

 Attached patch based on:

 http://www.postgresql.org/message-id/capphfdssy+qepdcovxx-b4lp3ybr+qs04m6-arggknfk3fr...@mail.gmail.com

 and adds selectivity estimation functions to @@ (port from tsquery). Now we
 support , @, @ and @@.
 In addition it was written migration to version 1.1 intarray. Because of
 what
 this patch requires my other patch:
 http://www.postgresql.org/message-id/14346041.DNcb5Y1inS@dinodell

 Alexander Korotkov know about this patch.



Hi Uriy,

This patch looks pretty good.

The first line of intarray--1.1.sql mis-identifies itself as /*
contrib/intarray/intarray--1.0.sql */

The real file intarray--1.0.sql file probably should not be included in the
final patch, but I like having it for testing.

It applies and builds cleanly over the alter operator patch (and now the
commit as well), passes make check of the contrib module both with and
without cassert.

I could succesfully upgrade from version 1.0 to 1.1 without having to drop
the gin__int_ops indexes in the process.

I could do pg_upgrade from 9.2 and 9.4 to 9.6devel with large indexes in
place, and then upgrade the extension to 1.1, and it worked without having
to rebuild the index.

It does what it says, and I think we want this.

There were some cases where the estimates were not very good, but that
seems to be limitation of what pg_stats makes available, not of this
patch.  Specifically if the elements listed in the query text are not part
of most_common_elems (or worse yet, most_common_elems is null) then it is
left to guess with no real data, and those guesses can be pretty bad.  It
is not this patches job to fix that, however.

It assumes all the stats are independent and so doesn't account for
correlation between members.  This is also how the core selectivity
estimates work between columns, so I can't really complain about that.  It
is easy to trick it with things like @@ '(!300  300)'::query_int, but I
don't think that is necessary to fix that.

I have not been creative enough to come up with queries for which this
improvement in selectivity estimate causes the execution plan to change in
important ways.  I'm sure the serious users of this module would have such
cases, however.

I haven't tested gist__int_ops as I can't get those indexes to build in a
feasible amount of time.  But the selectivity estimates are independent of
any actual index, so testing those doesn't seem to be critical.

There is no documentation change, which makes sense as this internal stuff
which isn't documented to start with.

There are no regression test changes.  Not sure about that, it does seem
like regression tests would be desirable.

I haven't gone through the C code.

Cheers,

Jeff


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-14 Thread Kohei KaiGai
2015-07-15 2:39 GMT+09:00 Ted Toth txt...@gmail.com:
 That's exactly what I'm talking about like I said KaiGais branch was
 never merged into the mainline so I do not believe that it is used at
 all.

It depends on the definition of integrated.
The PostgreSQL core offers an infrastructure for label based security
mechanism, not only selinux. Also, one extension module that is
usually distributed with PosgreSQL bridges the world of database and
the world of selinux (even though all the features I initially designed
are not yet implemented). I like to say it is integrated.

 On Tue, Jul 14, 2015 at 12:28 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jul 14, 2015 at 1:22 PM, Ted Toth txt...@gmail.com wrote:
 I'm sort of new to this so maybe I'm missing something but since the
 sepgsql SELinux userspace object manager was never integrated into
 postgresql (AFAIK KaiGais branch was never merged into the mainline)
 who uses these labels? What use are they?

 See contrib/sepgsql

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-14 Thread Ted Toth
So if I label a table with an SELinux context and the type of my
client connection does not have policy to be able to access the table
type will an AVC be generated and the access denied?

Ted

On Tue, Jul 14, 2015 at 12:53 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2015-07-15 2:39 GMT+09:00 Ted Toth txt...@gmail.com:
 That's exactly what I'm talking about like I said KaiGais branch was
 never merged into the mainline so I do not believe that it is used at
 all.

 It depends on the definition of integrated.
 The PostgreSQL core offers an infrastructure for label based security
 mechanism, not only selinux. Also, one extension module that is
 usually distributed with PosgreSQL bridges the world of database and
 the world of selinux (even though all the features I initially designed
 are not yet implemented). I like to say it is integrated.

 On Tue, Jul 14, 2015 at 12:28 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jul 14, 2015 at 1:22 PM, Ted Toth txt...@gmail.com wrote:
 I'm sort of new to this so maybe I'm missing something but since the
 sepgsql SELinux userspace object manager was never integrated into
 postgresql (AFAIK KaiGais branch was never merged into the mainline)
 who uses these labels? What use are they?

 See contrib/sepgsql

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



 --
 KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-14 Thread Sawada Masahiko
On Wed, Jul 15, 2015 at 12:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 July 2015 at 15:11, Sawada Masahiko sawada.m...@gmail.com wrote:


 Oops, I had forgotten to add new file heapfuncs.c.
 Latest patch is attached.


 I think we've established the approach is desirable and defined the way
 forwards for this, so this is looking good.

If we want to move stuff like pg_stattuple, pg_freespacemap into core,
we could move them into heapfuncs.c.

 Some of my requests haven't been actioned yet, so I personally would not
 commit this yet. I am happy to continue as reviewer/committer unless others
 wish to take over.
 The main missing item is pg_upgrade support, which won't happen by end of
 CF1, so I am marking this as Returned With Feedback. Hopefully we can review
 this again before CF2.

I appreciate your reviewing.
Yeah, the pg_upgrade support and regression test for VFM patch is
almost done now, I will submit the patch in this week after testing it
.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers