date:20131113

On 12 November 2013 19:54, Claudio Freire klaussfre...@gmail.com wrote:

 On Tue, Nov 12, 2013 at 7:14 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I still think we need to look at this from a query perspective though.
 We need to check whether there is a class of real world queries that
 are not well optimised by minmax indexes, or cannot be made to be in
 future releases. For example, large DELETEs from a table are almost
 trivially optimised for min max.

 Only if you don't have a PK (or other index).

Right. Min max indexes are optimised for large DELETEs, btrees are not
(yet), which is what we are discussing.

I believe it remains to be shown that a btree is actually desirable on
a very big table. So far the discussion has just assumed this is the
case, without looking at specific SQL. It might be better to look at
ways of avoiding a btree altogether than to spend time optimising
btrees for this case.

Perhaps we can enforce a PK constraint without using a btree, if one
is required. This might be guaranteed by using a sequence or other
mechanism.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Nicolas Barbier

2013/11/12 Simon Riggs si...@2ndquadrant.com:

 On 12 November 2013 21:41, Nicolas Barbier nicolas.barb...@gmail.com wrote:

 Look-up speed is as follows: Each look-up must look through all
 B-trees.

 That can be optimised by using a min max approach, so we need only
 look at sub-trees that may contain data.

Under the assumption that the data are really *really* inserted in
random order, I think that min max indexes would not help much: All
subtrees would contain so many different values that the boundaries
will almost never be narrow enough to exclude scanning it (unless one
is searching for outliers).

I think that min max indexes are very useful for data that is inserted
in a some less-than-random way: When rather large swathes of inserted
data fall in between boundaries that a lot of other data doesn’t fall
in between. For min max index scans to be fast, the data doesn’t
exactly have to be ordered in the heap (that’s one of the advantages
vs. B-trees, where a different order in the index and the heap is very
bad for scans of any significant part of the index), but it can also
not be entirely arbitrary.

I would say that min max indexes should be used in either of the
following cases (and probably some more that I don’t think of right
now):

* When the data is inserted close to ordered (i.e., past or future
dates that have a tendency to be correlated with “the current day,”
such as invoice dates). For this usecase, a normal B-tree would also
work, but would take more space and require more heavy maintenance.
* When large batches of “similar” data (fitting in between boundaries
that are more narrow than the “global boundaries”) are inserted at a
time, even when multiple of such batches arrive in random order.
* When insertion is up to entirely random, but the goal is to optimize
look-ups for (relatively rare) outliers.
* (combined with any the above to actually make the index *useful*)
When needing a lot a indexes on the same data or having a very high
rate of insertion, and therefore the maintenance burden matters a lot.

 I would add that it is possible to optimise large DELETEs from a table
 if complete sub-trees of the btree can be easily removed. This for me
 would be the compelling benefit of this approach.

Idem WRT the randomness: If the data are really deleted in a
completely random fashion (e.g., Leonardo might want to delete all
phone calls in a certain year, while the index is on phone number),
whole subtrees would almost never become candidates for deletion (the
same problem applies to a normal min max index). Of course, everything
would change if the data is not really deleted randomly.

The same usecases as mentioned above (replace “insertion” with
“deletion”) seem to apply for deletion in min max indexes.

Note that B-forests as described before don’t work well for a high (or
even not-so-high) rate of deletions: During VACUUM the updates to the
bigger trees would kill all performance similar to how a high rate of
insertion kills the performance of normal B-trees once they get big.
To remedy this, one could adds stuff such as “B-trees of deleted
entries” (i.e., negative trees) that may then afterwards be merged
with other such “negative” trees + “positive” trees. Look-ups would
need to take all those trees into account.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast insertion indexes: why no developments

Simon Riggs wrote
On 5 November 2013 14:28, Leonardo Francalanci lt;

m_lists@

gt; wrote:

Either my sql is not correct (likely), or my understanding of the minmax
index is
not correct (even more likely), or the minmax index is not usable in a
random inputs
scenario.

Please show the real world SQL you intend to run, so we can comment on
it. Inventing a use case that breaks effectiveness of any optimisation
is always easy, but the question is whether the use case is likely or
even desirable.

The use case is pretty simple.
Think it as the NSA, as it would be much easier.
Collect all calls made/received by any user on a mobile network.
(in fact, it's something more than calls, so in fact is 2-10 rows per call).
Keep the data for 20 days.
That's the insert part.

Query:
search calls made/received by the user using IMSI (caller id) or IMEI
(phone id). Date range is usually days (past 4 days, from 10 days ago
to 5 days ago...)

The result is just a very small percentage of the rows present in the
table: a single user doesn't call that much!
Searches are made by a human, so no that many request per second.

It's not a write mostly scenario, it's a 99% write 1% read scenario.

Problem? having 4 btree indexes on random values (imsi+imei * 2,
since we have calling and caller) kills the performance in insertion
after a while.
Solution so far? partition every 15 minutes, create the indexes in bulk.

Simon Riggs wrote
If we have a query to show the most recent calls by a particular caller

SELECT *
FROM cdr
WHERE callerid = X
ORDER BY call_timestamp DESC
LIMIT 100

then this could potentially be optimised using a minmax index, by
traversing the data ranges in call_timestamp order. That is not part
of the code in this initial release, since the main use case is for
WHERE call_timestamp = X, or WHERE primarykey = Y

I don't understand how a index on call_timestamp would help
in the query above.

Simon Riggs wrote
I don't believe there is a credible business case for running that
same query but without the ORDER BY and LIMIT, since it could
potentially return gazillions of rows

Gazillion of rows??? We're talking about calls made/received by
one user here. How many calls do you make in 10 days???

Simon Riggs wrote
so it isn't surprising at all
that it would access a large % of the table.

In fact, the query I use return a fraction of the table, and only
a very small amount of users get searched.

Simon, you keep on talking about these minmax indexes, and
I still don't see any reference to some performance tests.

And, again, I think that random values insertion is the worst
use case for minmax indexes.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778092.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLE not synonymous with SELECT * FROM?

2013-11-13 Thread Colin 't Hart

David et al,

How about something like this?

Cheers,

Colin
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index e603b76..a68014b 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -33,13 +33,14 @@ PostgreSQL documentation
  refsynopsisdiv
 synopsis
 [ WITH [ RECURSIVE ] replaceable class=parameterwith_query/replaceable 
[, ...] ]
-SELECT [ ALL | DISTINCT [ ON ( replaceable 
class=parameterexpression/replaceable [, ...] ) ] ]
+{ SELECT [ ALL | DISTINCT [ ON ( replaceable 
class=parameterexpression/replaceable [, ...] ) ] ]
 * | replaceable class=parameterexpression/replaceable [ [ AS ] 
replaceable class=parameteroutput_name/replaceable ] [, ...]
 [ FROM replaceable class=parameterfrom_item/replaceable [, ...] ]
 [ WHERE replaceable class=parametercondition/replaceable ]
 [ GROUP BY replaceable class=parameterexpression/replaceable [, ...] 
]
 [ HAVING replaceable class=parametercondition/replaceable [, ...] ]
 [ WINDOW replaceable class=parameterwindow_name/replaceable AS ( 
replaceable class=parameterwindow_definition/replaceable ) [, ...] ]
+| TABLE [ ONLY ] replaceable class=parametertable_name/replaceable [ * ] 
}
 [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] replaceable 
class=parameterselect/replaceable ]
 [ ORDER BY replaceable class=parameterexpression/replaceable [ ASC | 
DESC | USING replaceable class=parameteroperator/replaceable ] [ NULLS { 
FIRST | LAST } ] [, ...] ]
 [ LIMIT { replaceable class=parametercount/replaceable | ALL } ]
@@ -60,8 +61,6 @@ SELECT [ ALL | DISTINCT [ ON ( replaceable 
class=parameterexpression/replac
 phraseand replaceable class=parameterwith_query/replaceable 
is:/phrase
 
 replaceable class=parameterwith_query_name/replaceable [ ( 
replaceable class=parametercolumn_name/replaceable [, ...] ) ] AS ( 
replaceable class=parameterselect/replaceable | replaceable 
class=parametervalues/replaceable | replaceable 
class=parameterinsert/replaceable | replaceable 
class=parameterupdate/replaceable | replaceable 
class=parameterdelete/replaceable )
-
-TABLE [ ONLY ] replaceable class=parametertable_name/replaceable [ * ]
 /synopsis
 
  /refsynopsisdiv
@@ -198,6 +197,27 @@ TABLE [ ONLY ] replaceable 
class=parametertable_name/replaceable [ * ]
literalUPDATE/literal privilege as well (for at least one column
of each table so selected).
   /para
+
+  refsect2 id=SQL-TABLE
+   titleliteralTABLE/literal Command/title
+
+   para
+The command
+programlisting
+TABLE replaceable class=parametername/replaceable
+/programlisting
+is equivalent to
+programlisting
+SELECT * FROM replaceable class=parametername/replaceable
+/programlisting
+It can be used as a top-level command or as a space-saving syntax
+variant in parts of complex queries. Only the literalWITH/, 
literalORDER BY/, literalLIMIT/,
+and Locking clauses and set operations can be used with commandTABLE/; 
the
+literalWHERE/ clause and any form of aggregation cannot be used.
+
+Note that on this page and other places in the documentation, where 
commandSELECT/ is mentioned, commandTABLE/ is also assumed, subject to 
the restrictions mentioned here.
+   /para
+  /refsect2
  /refsect1
 
  refsect1
@@ -211,7 +231,7 @@ TABLE [ ONLY ] replaceable 
class=parametertable_name/replaceable [ * ]
 subqueries that can be referenced by name in the primary query.
 The subqueries effectively act as temporary tables or views
 for the duration of the primary query.
-Each subquery can be a commandSELECT/command, 
commandVALUES/command,
+Each subquery can be a commandSELECT/command, commandTABLE/, 
commandVALUES/command,
 commandINSERT/command, commandUPDATE/command or
 commandDELETE/command statement.
 When writing a data-modifying statement (commandINSERT/command,
@@ -1437,23 +1457,6 @@ SELECT * FROM (SELECT * FROM mytable FOR UPDATE) ss 
ORDER BY column1;
/para
   /caution
   /refsect2
-
-  refsect2 id=SQL-TABLE
-   titleliteralTABLE/literal Command/title
-
-   para
-The command
-programlisting
-TABLE replaceable class=parametername/replaceable
-/programlisting
-is completely equivalent to
-programlisting
-SELECT * FROM replaceable class=parametername/replaceable
-/programlisting
-It can be used as a top-level command or as a space-saving syntax
-variant in parts of complex queries.
-   /para
-  /refsect2
  /refsect1
 
  refsect1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Sort contents entries in reference documentation

2013-11-13 Thread Colin 't Hart

Hi,

While looking at the documentation on SELECT I noticed that the
entries in reference.sgml aren't sorted correctly -- psql \h does have
them in the correct order.

Attached a trivial patch to fix this.

In addition, reference.sgml doesn't have entries for TABLE or WITH
which should link to ref/select.sgml but I couldn't figure out how to
achieve this. psql \h does have this so the reference page probably
should too.

Cheers,

Colin
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 14e217a..d967f66 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -40,8 +40,8 @@
alterDatabase;
alterDefaultPrivileges;
alterDomain;
-   alterExtension;
alterEventTrigger;
+   alterExtension;
alterForeignDataWrapper;
alterForeignTable;
alterFunction;
@@ -84,8 +84,8 @@
createConversion;
createDatabase;
createDomain;
-   createExtension;
createEventTrigger;
+   createExtension;
createForeignDataWrapper;
createForeignTable;
createFunction;
@@ -124,8 +124,8 @@
dropConversion;
dropDatabase;
dropDomain;
-   dropExtension;
dropEventTrigger;
+   dropExtension;
dropForeignDataWrapper;
dropForeignTable;
dropFunction;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] init_sequence spill to hash table

2013-11-13 Thread David Rowley

Here http://www.postgresql.org/message-id/24278.1352922...@sss.pgh.pa.us there
was some talk about init_sequence being a bottleneck when many sequences
are used in a single backend.

The attached I think implements what was talked about in the above link
which for me seems to double the speed of a currval() loop over 3
sequences. It goes from about 7 seconds to 3.5 on my laptop.

I thought I would post the patch early to see if this is actually wanted
before I do too much more work on it.

My implementation maintains using the linear list for sequences up to a
defined threshold (currently 32) then it moves everything over to a
hashtable and free's off the list.

A more complete solution would contain regression tests to exercise the
hash table code.

I know there is a desire to move sequences over to a single table still,
but I see this as a separate patch and storing current values in a hash
table for each backend should still be a win even if/when the single table
stuff gets implemented.

Regards

David Rowley
CREATE FUNCTION create_seq(n integer) RETURNS integer
LANGUAGE plpgsql
AS $$
BEGIN
WHILE n  0 LOOP
EXECUTE 'CREATE SEQUENCE test' || CAST(n AS TEXT);
  n := n - 1;
END LOOP;
RETURN 0;
END
$$;

CREATE FUNCTION currval_seq(n integer) RETURNS integer
LANGUAGE plpgsql
AS $_$
BEGIN
WHILE n  0 LOOP
EXECUTE $$SELECT currval('test$$ || CAST(n AS TEXT) || $$')$$;
  n := n - 1;
END LOOP;
RETURN 0;
END
$_$;

CREATE FUNCTION drop_seq(n integer) RETURNS integer
LANGUAGE plpgsql
AS $$
BEGIN
WHILE n  0 LOOP
EXECUTE 'DROP SEQUENCE test' || CAST(n AS TEXT);
  n := n - 1;
END LOOP;
RETURN 0;
END
$$;

CREATE OR REPLACE FUNCTION nextval_seq(n integer) RETURNS integer
LANGUAGE plpgsql
AS $_$
BEGIN
WHILE n  0 LOOP
EXECUTE $$SELECT nextval('test$$ || CAST(n AS TEXT) || $$')$$;
n := n - 1;
END LOOP;
RETURN 0;
END
$_$;


SELECT create_seq(1);
SELECT nextval_seq(1);
SELECT currval_seq(1);



hashtab_seq_v0.1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-11-13 Thread Haribabu kommi

On 01 October 2013 00:56 Amit Kapila wrote:
 On Mon, Sep 30, 2013 at 9:07 PM, Peter Eisentraut pete...@gmx.net
 wrote:
  On 9/28/13 3:05 AM, Amit Kapila wrote:
  Now as we have an agreement, I had updated patch for below left
 issues:
 
  Regression tests are failing.
 
Thanks for informing. I am sorry for not running regression before
 sending patch.
 
Reason for failure was that source for GUC in new function
 validate_conf_option() was hardcoded to PGC_S_FILE which was okay for
 Alter System, but
not for SET path. I had added new parameter source in this function
 and get the value of source when this is called from SET path.

Some of the initial observation of the patch are,
1. Patch is not applying against git head, needs a rebase.
2. Patch doesn't contain the tests.

I started reviewing the patch, will share the details once I finish.

Regards,
Hari babu.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast insertion indexes: why no developments

On 13 November 2013 06:07, Leonardo Francalanci m_li...@yahoo.it wrote:

 The use case is pretty simple.
 Think it as the NSA, as it would be much easier.
 Collect all calls made/received by any user on a mobile network.
 (in fact, it's something more than calls, so in fact is 2-10 rows per call).
 Keep the data for 20 days.
 That's the insert part.

 Query:
 search calls made/received by the user using IMSI (caller id) or IMEI
 (phone id). Date range is usually days (past 4 days, from 10 days ago
 to 5 days ago...)

 The result is just a very small percentage of the rows present in the
 table: a single user doesn't call that much!
 Searches are made by a human, so no that many request per second.

 It's not a write mostly scenario, it's a 99% write 1% read scenario.

 Problem? having 4 btree indexes on random values (imsi+imei * 2,
 since we have calling and caller) kills the performance in insertion
 after a while.
 Solution so far? partition every 15 minutes, create the indexes in bulk.

So in the use case you describe, the min max index would require a
scan of only 25% of the table, not the 80% described earlier for
random inserts. In my experience, people wish to keep data for much
longer periods and so the percentage of scan required would drop lower
than 25%, possibly to 5% or less for many applications.

The plan would use sequential I/O so could still work quickly; given
the low read rate, longer query times could be acceptable.

Minmax indexes are simply a way to make this use case happen
automatically, without the need for manual partitioning of the table.
They are not the answer to every prayer, but with respect they are
better than you had claimed they would be. (25% not 80%, in your use
case). I saw this was likely to be the case and this is why I
challenged you to describe in more detail. Thank you.

 Simon Riggs wrote
 If we have a query to show the most recent calls by a particular caller

 SELECT *
 FROM cdr
 WHERE callerid = X
 ORDER BY call_timestamp DESC
 LIMIT 100

 then this could potentially be optimised using a minmax index, by
 traversing the data ranges in call_timestamp order. That is not part
 of the code in this initial release, since the main use case is for
 WHERE call_timestamp = X, or WHERE primarykey = Y

 I don't understand how a index on call_timestamp would help
 in the query above.

The min max index would cover call_timestamp and would be used to
optimise the query. That is not in the current version, it is a later
optimisation that I think is possible after considering your use case
and similar ones. This is a similar optimisation to the Merge Append
case for partitioned tables.

 Simon Riggs wrote
 I don't believe there is a credible business case for running that
 same query but without the ORDER BY and LIMIT, since it could
 potentially return gazillions of rows


 Gazillion of rows??? We're talking about calls made/received by
 one user here. How many calls do you make in 10 days???


 Simon Riggs wrote
  so it isn't surprising at all
 that it would access a large % of the table.

 In fact, the query I use return a fraction of the table, and only
 a very small amount of users get searched.

 Simon, you keep on talking about these minmax indexes, and
 I still don't see any reference to some performance tests.

Performance tests are only possible with a clear use case. It would be
helpful if you could benchmark your own use case and I request others
do the same. I have and will challenge people that simply assert this
new type of index is not useful based on a generic argument, with no
use case and no tests. That is nothing personal, it is simply that I
do not wish misunderstandings to block the adoption of new features.

Please see that Alvaro and I have gone out of our way to provide a new
facility to help you and others, and that it requires changing how we
think about the solution. I accept it may not provide for every case
but it requires careful analysis before deciding that is so.

 And, again, I think that random values insertion is the worst
 use case for minmax indexes.

I agree, it is. What we have disagreed on is the extent to which that
scenario exists for use cases on very large tables, which are
typically append-mostly with most queries accessing a subset of the
table, e.g. date range.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] init_sequence spill to hash table

2013-11-13 Thread Heikki Linnakangas


On 13.11.2013 11:55, David Rowley wrote:

I thought I would post the patch early to see if this is actually wanted
before I do too much more work on it.


Seems reasonable.


My implementation maintains using the linear list for sequences up to a
defined threshold (currently 32) then it moves everything over to a
hashtable and free's off the list.


Did you check how it would perform if you just always used the hash 
table? Or if you just have a single entry before you move to hash table, 
ie. set the threshold to 2? That would be slightly simpler.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OT] why not keeping the original column name in catalog after a drop?

2013-11-13 Thread Luca Ferrari

On Wed, Nov 13, 2013 at 9:00 AM, Andres Freund and...@2ndquadrant.com wrote:

 The old name might not fit there, attribute names have a relatively low
 maximum length (64 by default), so we cannot always fit the entire old
 name there.


Thanks, I was guessing this.


 Also, think about:
 CREATE TABLE foo(cola int);
 ALTER TABLE foo DROP COLUMN cola;
 ALTER TABLE foo ADD COLUMN cola;
 ALTER TABLE foo DROP COLUMN cola; -- should not error out

Well, I was talking about appending the original column name, and
therefore the above should have been respectively  pg.dropped.1.cola.
and pg.dropped.2.cola. Of course the original name is not very much
interesting, I was just curios about the conflicts.

Luca


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] The number of character limitation of custom script on pgbench

2013-11-13 Thread Sawada Masahiko

Hi all,

The function of custom script of pgbench allows only  BUFSIZ
(i.g.,1024byte) or less as length of a SQL.
I think that  when we want to bench mark with long SQL then it will difficult.
At that time even pgbench doesn't return ERROR. It will try to do
query with the broken SQL.
And user can not know why function of custom script is not work fine.
It look as just error of SQL to user.

So I'm thinking following solution.
(1) to increase buffer size
(2) to change to variable buffer size
(3) to return ERROR with information

Thought?


Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Jeremy Harris


On 13/11/13 09:07, Leonardo Francalanci wrote:

Problem? having 4 btree indexes on random values (imsi+imei * 2,
since we have calling and caller) kills the performance in insertion
after a while.


Surely there's good correlation between IMSI  IMEI, so have a separate
table to translate one to (a group of) the others, and
halve the indexes on your main table?
--
Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch to fix unused variable warning on windows build

On Thu, 2013-11-07 at 19:13 +1300, David Rowley wrote:
 Attached is a small patch which fixes the unused variable warning in
 the visual studios build. Seems like VS does not
 support  __attribute__((unused)) but looks like all other places we
 must assign to the variable.
 
committed




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Report exit code from external recovery commands properly

When an external recovery command such as restore_command or
archive_cleanup_command fails, it just reports return code 34567 or
something, but we have facilities to do decode this properly, so use
them.

From 8aa3cf503fe1c1f41a2a833c008f4273c22a86c9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut pete...@gmx.net
Date: Wed, 13 Nov 2013 06:38:18 -0500
Subject: [PATCH] Report exit code from external recovery commands properly

When an external recovery command such as restore_command or
archive_cleanup_command fails, report the exit code properly,
distinguishing signals and normal exists, using the existing
wait_result_to_str() facility, instead of just reporting the return
value from system().
---
 src/backend/access/transam/xlogarchive.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 342975c..be95684 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -300,8 +300,8 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc)  125;
 
 	ereport(signaled ? FATAL : DEBUG2,
-		(errmsg(could not restore file \%s\ from archive: return code %d,
-xlogfname, rc)));
+		(errmsg(could not restore file \%s\ from archive: %s,
+xlogfname, wait_result_to_str(rc;
 
 not_available:
 
@@ -410,9 +410,10 @@ ExecuteRecoveryCommand(char *command, char *commandName, bool failOnSignal)
 		ereport((signaled  failOnSignal) ? FATAL : WARNING,
 		/*--
 		   translator: First %s represents a recovery.conf parameter name like
-		  recovery_end_command, and the 2nd is the value of that parameter. */
-(errmsg(%s \%s\: return code %d, commandName,
-		command, rc)));
+		  recovery_end_command, the 2nd is the value of that parameter, the
+		  third an already translated error message. */
+(errmsg(%s \%s\: %s, commandName,
+		command, wait_result_to_str(rc;
 	}
 }
 
-- 
1.8.4.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

On 1 August 2013 01:53, Noah Misch n...@leadboat.com wrote:

 A remediation strategy that seemed attractive when I last contemplated this
 problem is to repoint rd_att immediately but arrange to free the obsolete
 TupleDesc in AtEOXact_RelationCache().

I agree that the best way to resolve this is to retain a copy of the
TupleDesc, so that copied pointers to it remain valid.

EOXact is actually longer than strictly necessary in some cases, but
trying to work out a more minimal approach seems hard and possibly
inefficient.

Comments in relcache.c indicate that the Relation swapping concept
might be replaced by refcounting approach. I can't see how that
differs from your suggested route.

Which means I can't see any other way of doing this other than the way
you suggest. Will implement.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN bugs in master branch

2013-11-13 Thread Heikki Linnakangas


On 12.11.2013 21:33, Teodor Sigaev wrote:

Suppose, some last changes in GIN are broken, 9.3 works fine.


Fixed, thanks for the report.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast insertion indexes: why no developments

Simon Riggs wrote
So in the use case you describe, the min max index would require a
scan of only 25% of the table, not the 80% described earlier for
random inserts. In my experience, people wish to keep data for much
longer periods and so the percentage of scan required would drop lower
than 25%, possibly to 5% or less for many applications.

The plan would use sequential I/O so could still work quickly; given
the low read rate, longer query times could be acceptable.

Quickly??? Seq scan 25% of a table of TB is not quick.

Simon Riggs wrote
Minmax indexes are simply a way to make this use case happen
automatically, without the need for manual partitioning of the table.

You logic assumes that we don't index anything but the call_timestamp.
That would lead to huge query response times.
Plus: btree doesn't have a big problem to keep up in sequential insertion
scenario (such as the call_timestamp). So I still don't see the gain
in using the minmax indexes: again, could you point me to some
performance tests of *any* use case?

Simon Riggs wrote
They are not the answer to every prayer, but with respect they are
better than you had claimed they would be. (25% not 80%, in your use
case). I saw this was likely to be the case and this is why I
challenged you to describe in more detail. Thank you.

I claimed they would scan the 80% of the table because I assumed I
had to use them in the random fields; not in the call_timestamp field.
I don't need a better index in the call_timestamp, because it's sequential,
I don't have problems with that. But it's useless: I don't want to seq scan
25% of a multi-TB table.

Simon Riggs wrote
Performance tests are only possible with a clear use case.

Well, so I can add my weird_index patch in postgresql source code,
and it would be committed right away??? I assumed you had
to prove somehow that the new index was better than what
is already available, at least for some cases.

Or, in other words: what are you going to write in the minmax index
documentation, try and see if they work better for you?

Simon Riggs wrote
Please see that Alvaro and I have gone out of our way to provide a new
facility to help you and others, and that it requires changing how we
think about the solution. I accept it may not provide for every case
but it requires careful analysis before deciding that is so.

If I came out too rough, I ask your pardon. I always appreciate people
taking their time to help someone else for free.

Plus, I'm *very* interested in the minmax index, especially for
call_timestamp (some queries are date-range only, such as give me
all the calls in this particular 2 secs range) or the id column I have.
But, at the same time, I don't see any evidence that they work
better than btrees (except for the size of the index).
I would like to see some numbers.

I worked a little in the bitmap index implementation, and I stopped because
on the large tables these indexes are supposed to be used, the heap
lookup took so much time that the (slightly) faster index access didn't
really help, because it was a fraction of the query time... I'm afraid
it would be the same with minmax indexes, that's why I wanted to
see some numbers...

Simon Riggs wrote
And, again, I think that random values insertion is the worst
use case for minmax indexes.

I agree, it is. What we have disagreed on is the extent to which that
scenario exists for use cases on very large tables, which are
typically append-mostly with most queries accessing a subset of the
table, e.g. date range.

Mhh... maybe this is this point we don't understand each other?

I query the table by userid + date range.
The date range is *not* selective enough (it's, as you said, 25%
of the multi-TB table). The userid is selective *a lot*.

I'm looking for a better index for the userid column(s).

The new indexes I mentioned in the OP claim they are better
in this scenario (but I don't blindly believe them)

I don't see how indexing the call_timestamp only could be
of any interest, since it would require seq-scanning 25%
of a huge table for every query.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778124.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast insertion indexes: why no developments

Jeremy Harris wrote
 Surely there's good correlation between IMSI  IMEI, so have a separate
 table to translate one to (a group of) the others, and
 halve the indexes on your main table?

Yes; unfortunately not always both are available; but it's something
we are thinking about (it requires logic in the inserting application
that at the moment doesn't exist, but it is something that we'll
have to add sooner or later).
But in the end yes, trying to use less indexed-fields is a good path.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778125.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.5

2013-11-13 Thread Steve Singer


On 11/11/2013 02:06 PM, Andres Freund wrote:

On 2013-11-10 14:45:17 -0500, Steve Singer wrote:

Not really keen - that'd be a noticeable overhead. Note that in the
cases where DEFAULT|INDEX is used, you can just use the new tuple to
extract what you need for the pkey lookup since they now have the same
format and since it's guaranteed that the relevant columns haven't
changed if oldtup is null and there's a key.

What are you actually doing with those columns? Populating a WHERE
clause?


Yup building a WHERE clause


Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The number of character limitation of custom script on pgbench

On 11/13/13, 6:18 AM, Sawada Masahiko wrote:
 Hi all,
 
 The function of custom script of pgbench allows only  BUFSIZ
 (i.g.,1024byte) or less as length of a SQL.
 I think that  when we want to bench mark with long SQL then it will difficult.
 At that time even pgbench doesn't return ERROR. It will try to do
 query with the broken SQL.
 And user can not know why function of custom script is not work fine.
 It look as just error of SQL to user.
 
 So I'm thinking following solution.
 (1) to increase buffer size
 (2) to change to variable buffer size
 (3) to return ERROR with information

I'd go for #2.  But at least an error.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The number of character limitation of custom script on pgbench

Peter Eisentraut pete...@gmx.net writes:
 On 11/13/13, 6:18 AM, Sawada Masahiko wrote:
 The function of custom script of pgbench allows only  BUFSIZ
 (i.g.,1024byte) or less as length of a SQL.
 So I'm thinking following solution.
 (1) to increase buffer size
 (2) to change to variable buffer size
 (3) to return ERROR with information

 I'd go for #2.  But at least an error.

#2 definitely.  I've run into this limitation myself recently, and
so have other people.  It's time to fix it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast insertion indexes: why no developments

On 13 November 2013 09:31, Leonardo Francalanci m_li...@yahoo.it wrote:

 Or, in other words: what are you going to write in the minmax index
 documentation, try and see if they work better for you?

That seems like good advice to me. Bacon  Aristotle.

There is very little written about suitability of any index type for
Postgres. I'm sure the docs could be improved there.


 Plus, I'm *very* interested in the minmax index

Good, thanks.

 But, at the same time, I don't see any evidence that they work
 better than btrees (except for the size of the index).

Min max indexes are not designed to be a better btree, they are
designed to be roughly same as automatic partitioning. They offer
considerably improved time to build, significantly reduced index size
and significantly improved insert performance. Min max will be clearly
slower than btrees for small numbers of records, though for large
numbers of records we may expect min max to perform same as btrees,
though that requires better testing to get a more accurate picture.
There may yet be optimisations of the patch also.

Based what we have discussed here, we've come up with two new
optimisations that can be used with MinMax, namely the bulk DELETE
case and the Merge Append case. Thank you for highlighting additional
cases and requirements.

From our discussions here, IMHO there is a strong case for avoiding
btrees completely for larger historical data tables. That isn't
something I had even considered as desirable before this conversation
but ISTM now that taking that approach will be more fruitful than
attempting to implement LSM trees.

Having said that, I am also in favour of declarative partitioning,
just that there is no funding available to work on that at present.

Further work on bitmap indexes is expected. They are already designed
with good insert performance in mind and this discussion re-emphasises
that requirement.

 I would like to see some numbers.

Alvaro has given me some results for his patch. The figures I have are
for a 2GB table.

Index Build Time
MinMax 11 s
Btree   96s

Index Size
MinMax 2 pages + metapage
Btree approx 200,000 pages + metapage

Load time
MinMax no overhead, same as raw COPY
BTree - considerably slower

Index SELECT
Slower for small groups of rows
Broadly same for large requests (more results required on that assessment)

I expect to publish results against TPC-H cases in next few weeks.

Additional tests are welcome for other use cases.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Replace duplicate_oids with Perl implementation

On 11/12/13, 10:48 PM, Andrew Dunstan wrote:
 You quoted me out of context. Your prevuious para referred to
 duplicate_oids.

... which was in turn quoted out of context. ;-)

 What exactly is your argument, here?

If we change unused_oids to a Perl implementation, we will add
additional inconvenience to users who don't have /usr/bin/perl in that
exact location.  Versus allowing use by users who don't have /bin/sh.  I
don't know what the ratio between those two camps is, among potential
users of unused_oids.  But most of the discussion so far appeared to be
under the impression that unused_oids is called from a makefile, which
is not correct.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] hail the CFM

The commit fest manager mace has been passed on to me[*].  More to follow.


[*] Actually, I found it behind the dumpster in the alley.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hail the CFM

2013-11-13 Thread Robert Haas

On Wed, Nov 13, 2013 at 8:45 AM, Peter Eisentraut pete...@gmx.net wrote:
 The commit fest manager mace has been passed on to me[*].  More to follow.

 [*] Actually, I found it behind the dumpster in the alley.

ROFL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-11-13 Thread Amit Kapila

On Wed, Nov 13, 2013 at 4:05 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 01 October 2013 00:56 Amit Kapila wrote:
 On Mon, Sep 30, 2013 at 9:07 PM, Peter Eisentraut pete...@gmx.net
 wrote:
  On 9/28/13 3:05 AM, Amit Kapila wrote:
  Now as we have an agreement, I had updated patch for below left
 issues:
 
  Regression tests are failing.

Thanks for informing. I am sorry for not running regression before
 sending patch.

Reason for failure was that source for GUC in new function
 validate_conf_option() was hardcoded to PGC_S_FILE which was okay for
 Alter System, but
not for SET path. I had added new parameter source in this function
 and get the value of source when this is called from SET path.

 Some of the initial observation of the patch are,
 1. Patch is not applying against git head, needs a rebase.
 2. Patch doesn't contain the tests.
   It was intentional and as per feedback for this patch. As for
testing this feature, we need to put sleep after operation, so it was
suggested to remove tests.

 I started reviewing the patch, will share the details once I finish.
Thanks.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Replace duplicate_oids with Perl implementation



On 11/13/2013 08:36 AM, Peter Eisentraut wrote:

What exactly is your argument, here?

If we change unused_oids to a Perl implementation, we will add
additional inconvenience to users who don't have /usr/bin/perl in that
exact location.  Versus allowing use by users who don't have /bin/sh.  I
don't know what the ratio between those two camps is, among potential
users of unused_oids.



According to my count we currently have 30 instances of scripts with 
this shebang line. That includes things not called from makefiles, such 
as pgindent.


I don't recall this causing an issue.

For any small group of people that might exists who don't have 
/usr/bin/perl, they have a simple workaround to call perl unused_oids.


Can we please get on with dealing with real problems? I don't believe 
this is one.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

On 11/12/13, 1:57 AM, Tatsuo Ishii wrote:
 Currently there's no such an universal encoding in the universe, I
 think the only way is, inventing it by ourselves.

I think ISO 2022 is something in that direction, but it's not
ASCII-safe, AFAICT.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The number of character limitation of custom script on pgbench

2013-11-13 Thread Sawada Masahiko

On Wed, Nov 13, 2013 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
 On 11/13/13, 6:18 AM, Sawada Masahiko wrote:
 The function of custom script of pgbench allows only  BUFSIZ
 (i.g.,1024byte) or less as length of a SQL.
 So I'm thinking following solution.
 (1) to increase buffer size
 (2) to change to variable buffer size
 (3) to return ERROR with information

 I'd go for #2.  But at least an error.

 #2 definitely.  I've run into this limitation myself recently, and
 so have other people.  It's time to fix it.


Yes, I also think #2 is good.
I will implement the patch.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast insertion indexes: why no developments

Simon Riggs wrote
 From our discussions here, IMHO there is a strong case for avoiding
 btrees completely for larger historical data tables. That isn't
 something I had even considered as desirable before this conversation
 but ISTM now that taking that approach will be more fruitful than
 attempting to implement LSM trees.

Eh? I don't understand this point. How can I avoid btrees, and
searching by caller_id? I don't get it...


Simon Riggs wrote
 Alvaro has given me some results for his patch. The figures I have are
 for a 2GB table.
 
 Index Build Time
 MinMax 11 s
 Btree   96s
 
 Index Size
 MinMax 2 pages + metapage
 Btree approx 200,000 pages + metapage
 
 Load time
 MinMax no overhead, same as raw COPY
 BTree - considerably slower

Great!!! This looks very promising. Were the values indexed
sequential?




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778150.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

On Wed, Nov 13, 2013 at 1:33 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 11/13/2013 02:34 AM, Andrew Dunstan wrote:

 If there's agreement on taking these, I will prepare patches and submit
 them by the 15th.

 With JSON enhancement, my only concern is that there's work ongoing to
 integrate the v2 development version of hstore with json, providing
 typed hstore and an efficient binary storage format for json.

 It might be worth seeing how that work is going and what functionality
 needs to be added to it, rather than enhancing the existing json support
 that may soon change dramatically.

I'm not so sure we should require hstore to do things like build
arbitrary json objects even though I agree that hstore will probably
displace json for must cases where you want to store nested data (as
opposed to (de-)serialize).  Andrew's patches just fill out a couple
of missing cases that are handled in the existing API.   Putting all
the patches together, ISTM there might be a function or two too many.
I'm not sure why the json_  prefix was abandoned for build_json_object
and build_json_array.

Also, json_object is pretty weird to me, I'm not sure I see the
advantage of a new serialization format, and I don't agree with the
statement but it is the caller's reponsibility to ensure that keys
are not repeated..  I think the caller should have no such
responsibility.  Keys should be able to repeated.  Also, I'm not sure
how the {k,v,k,v,k,v}...convention serialized into a string is very
useful in general practice.  I greatly prefer the aggregation and the
variadic methods in json_build.

Putting it all together, I'd consider:
*) dropping json_object (although maybe there is a case I'm not thinking about)
*) changing json_build function names to get the json prefix
*) adding a json object constructor that takes two parallel arrays as
arguments.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast insertion indexes: why no developments

On Wed, Nov 13, 2013 at 7:33 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 13 November 2013 09:31, Leonardo Francalanci m_li...@yahoo.it wrote:
 I would like to see some numbers.

 Alvaro has given me some results for his patch. The figures I have are
 for a 2GB table.

 Index Build Time
 MinMax 11 s
 Btree   96s

 Index Size
 MinMax 2 pages + metapage
 Btree approx 200,000 pages + metapage

 Load time
 MinMax no overhead, same as raw COPY
 BTree - considerably slower

 Index SELECT
 Slower for small groups of rows
 Broadly same for large requests (more results required on that assessment)

 I expect to publish results against TPC-H cases in next few weeks.

 Additional tests are welcome for other use cases.

Those are pretty exciting numbers.  These days for analytics work I'm
using mostly covering index type approaches.  I bet the tiny index
would more than offset the extra heap accesses.   Can you CLUSTER
against a minmax index?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast insertion indexes: why no developments

On 13 November 2013 11:54, Merlin Moncure mmonc...@gmail.com wrote:
 On Wed, Nov 13, 2013 at 7:33 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 13 November 2013 09:31, Leonardo Francalanci m_li...@yahoo.it wrote:
 I would like to see some numbers.

 Alvaro has given me some results for his patch. The figures I have are
 for a 2GB table.

 Index Build Time
 MinMax 11 s
 Btree   96s

 Index Size
 MinMax 2 pages + metapage
 Btree approx 200,000 pages + metapage

 Load time
 MinMax no overhead, same as raw COPY
 BTree - considerably slower

 Index SELECT
 Slower for small groups of rows
 Broadly same for large requests (more results required on that assessment)

 I expect to publish results against TPC-H cases in next few weeks.

 Additional tests are welcome for other use cases.

 Those are pretty exciting numbers.  These days for analytics work I'm
 using mostly covering index type approaches.

If you're using index only scans then this will work for you as well,
hopefully better. Same principle wrt all visible page ranges.

 I bet the tiny index
 would more than offset the extra heap accesses.

That's the trade-off, yes. I was hoping that would lead to cases where
the min max is better than a btree, but not there yet.

 Can you CLUSTER
 against a minmax index?

Not in this release, at least in my understanding. It's not yet
possible to do an ordered fetch, so the cluster scan probably won't
work.

I was hoping to include some special Freespace Map modes that would help there.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast insertion indexes: why no developments

On 13 November 2013 11:54, Merlin Moncure mmonc...@gmail.com wrote:

 Load time
 MinMax no overhead, same as raw COPY
 BTree - considerably slower

And just as a general comment, the min max index does not slow down
COPY as the table gets larger, whereas the btree gets slower as the
table gets larger. Which is the reason Leonardo requires partitioned
tables.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Idea for debug/recovery snapshots

2013-11-13 Thread Bruce Momjian

On Tue, Nov 12, 2013 at 08:39:35PM -0500, Bruce Momjian wrote:
  My specific wording is incoherent mostly because it really belongs to a
  larger corpus that currently exists only in my head.
 
 Oh, OK, it sounds fine.  The user really doesn't choose what timeline to
 see --- rather, it is the current xid at the time they take their
 snapshot and other running xids that controls that.  You can control
 your transaction isolation level, but that only controls how often you
 take snapshots.

[ moved to hackers ]

Actually, it would be possible to allow snapshots that are equal to the
earliest active snapshot for the current database.  I don't think it
would be useful in production, but perhaps for debugging or some
disaster recovery.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add \i option to bring in the specified file as a quoted literal

2013-11-13 Thread Bruce Momjian

On Wed, Nov 13, 2013 at 08:58:07AM +0530, Amit Kapila wrote:
 On Tue, Nov 12, 2013 at 9:37 PM, Bruce Momjian br...@momjian.us wrote:
  On Wed, Oct 23, 2013 at 10:31:39AM +0530, Amit Kapila wrote:
  On Tue, Oct 22, 2013 at 3:04 AM, Piotr Marcinczyk pmarc...@gmail.com 
  wrote:
   Hi,
  
   I would like to implement item from TODO marked as easy: Add \i option
   to bring in the specified file as a quoted literal. I understand intent
   of this item, to be able to have parts of query written in separate
   files (now it is impossible, because \i tries to execute content of file
   as a separate command by function process_file).
 
  For the usecase discussed in the mail chain of that TODO item, Robert
  Haas has provided an alternative to achieve it, please check below
  link:
  http://www.postgresql.org/message-id/AANLkTi=7c8xfyf7uqw0y+si8ondkoy2nx8jc4bu0g...@mail.gmail.com
 
  If you think that alternative is not sufficient for the use case, then
  adding new option/syntax is worth, otherwise it might be a shortcut or
  other form of some existing way which can be useful depending on how
  frequently users use this syntax.
 
  So, can we remove this TODO item?
   TODO item is created before Robert Haas has provided an alternative
 way to achieve the same thing. In some cases there are multiple ways
 to
   achieve the same thing (example: shortcut options in psql) if it is
 used quite frequently and people want some easy way of doing it. In
 this case I
   don't think this is used frequently, so I don't see the need of
 doing it. We should remove this TODO item.

OK, removed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade misreports full disk

When pg_upgrade encounters a full disk while copying relation files,
it reports this as:

error while copying relation xyz (...): Success

because it doesn't set errno in some error cases.  In other places we
treat short writes as ENOSPC, so here is a patch to do that for
pg_upgrade as well.
diff --git a/contrib/pg_upgrade/file.c b/contrib/pg_upgrade/file.c
index dfeb79f..b35034b 100644
--- a/contrib/pg_upgrade/file.c
+++ b/contrib/pg_upgrade/file.c
@@ -136,16 +136,22 @@ copy_file(const char *srcfile, const char *dstfile, bool force)
 	int			save_errno = 0;
 
 	if ((srcfile == NULL) || (dstfile == NULL))
+	{
+		errno = EINVAL;
 		return -1;
+	}
 
 	if ((src_fd = open(srcfile, O_RDONLY, 0))  0)
 		return -1;
 
 	if ((dest_fd = open(dstfile, O_RDWR | O_CREAT | (force ? 0 : O_EXCL), S_IRUSR | S_IWUSR))  0)
 	{
+		save_errno = errno;
+
 		if (src_fd != 0)
 			close(src_fd);
 
+		errno = save_errno;
 		return -1;
 	}
 
@@ -170,6 +176,8 @@ copy_file(const char *srcfile, const char *dstfile, bool force)
 
 		if (write(dest_fd, buffer, nbytes) != nbytes)
 		{
+			if (errno == 0)
+errno = ENOSPC;
 			save_errno = errno;
 			ret = -1;
 			break;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast insertion indexes: why no developments

Simon Riggs wrote
 Can you CLUSTER
 against a minmax index?
 
 Not in this release, at least in my understanding. It's not yet
 possible to do an ordered fetch, so the cluster scan probably won't
 work.

As per the patch I helped writing, CLUSTER should use the
sequential heap scan+sort when it makes sense.
So I think that if the index is not able to do an ordered fetch, 
CLUSTER should fall back to scan+sort automatically (which is
what you want in a large table anyway).

Obviously, that should be tested.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778171.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality



On 11/13/2013 09:45 AM, Merlin Moncure wrote:

On Wed, Nov 13, 2013 at 1:33 AM, Craig Ringer cr...@2ndquadrant.com wrote:

On 11/13/2013 02:34 AM, Andrew Dunstan wrote:

If there's agreement on taking these, I will prepare patches and submit
them by the 15th.

With JSON enhancement, my only concern is that there's work ongoing to
integrate the v2 development version of hstore with json, providing
typed hstore and an efficient binary storage format for json.

It might be worth seeing how that work is going and what functionality
needs to be added to it, rather than enhancing the existing json support
that may soon change dramatically.


I'm going to be fairly upset if I'm told I have to wait for the new 
format work, and then I'm later told it's too late to bring this into 9.4.


I think these are really orthogonal issues. Adding a new serialization 
format (which I have been discussing with Oleg and Teodor, and which I 
hope to help in bringing to JSON) will make some things lots faster than 
they now are, and might make some things easier or possible where now 
they are hard or impossible, but it won't remove any functionality 
requirement.


In particular, json_build, which lets you build up arbitrarily complex 
and irregular json in a way that's just not possible without using a PL 
right now, is quite an important requirement. I've given talks about it 
and JSON users have been quite excited by the possibilities it opens up.


The patch for json_to_record is quite small (two functions), and it has 
the advantage that unlike the json_populate_record functions you don't 
need to have or create a named type to use it. I think that makes it 
worth having in itself.



I'm not so sure we should require hstore to do things like build
arbitrary json objects even though I agree that hstore will probably
displace json for must cases where you want to store nested data (as
opposed to (de-)serialize).


I have no idea what this means.

The plan with the work that Oleg and Teodor are doing is to provide a 
set of common code that can be used by either a binary json 
representation (which will be able to be distinguished from a text 
representation, so there would be no pg_upgrade problems) or nested 
hstore. In effect, nested hstore and json would have pretty much 
identical capabilities, so using one ovber another should be largely a 
matter of preference than a forced choice. Frankly, I believe the 
audience for JSON is vastly larger, and I expect it to be the treeish 
data format of choice for almost all users.




Andrew's patches just fill out a couple
of missing cases that are handled in the existing API.   Putting all
the patches together, ISTM there might be a function or two too many.
I'm not sure why the json_  prefix was abandoned for build_json_object
and build_json_array.



I'm quite happy to change it.



Also, json_object is pretty weird to me, I'm not sure I see the
advantage of a new serialization format,



What? there is no new serialization format. This is a way to generate a 
json object in the existing format from a one or two dimensional array 
of text. c.f. |existing function hstore(text[]) = hstore|



and I don't agree with the
statement but it is the caller's reponsibility to ensure that keys
are not repeated..  I think the caller should have no such
responsibility.  Keys should be able to repeated.



They can be repeated, as they can in the current json text format. 
However, the function makes no attempt to deal with repeated keys. If a 
key is repeated in the inout it will be repeated in the output. In this 
respect it differs from the hstore function.


Note too, that one effect of moving to a non-text representation of json 
will be that duplicated keys will be resolved (last value will win). But 
that's a much wider issue that this function.




Also, I'm not sure
how the {k,v,k,v,k,v}...convention serialized into a string is very
useful in general practice.  I greatly prefer the aggregation and the
variadic methods in json_build.



The extension was built before json_build. But it met a requirement that 
existed at the time. It probably wouldn't be a tragedy to leave it out, 
but there is probably a place for it just as there is for the hstore 
function.




Putting it all together, I'd consider:
*) dropping json_object (although maybe there is a case I'm not thinking about)
*) changing json_build function names to get the json prefix
*) adding a json object constructor that takes two parallel arrays as
arguments.


The third point seems to conflict with the first. I'd only consider that 
if we *do* add the one-array version of json_object.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clang 3.3 Analyzer Results

2013-11-13 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:

 No, this isn't about test code vs production, it's about not bothering
 to free memory explicitly when a program is about to terminate.  Alvaro
 is suggesting that the proposed addition to pg_regress.c is just a waste
 of cycles.  IMO it's not that big a deal either way in this case, since
 it's just one line of code that isn't going to take too long.

Right.  IMV, it's easier in this case to silence the warnings for
all future static code analysis runs, by this tool or any other, by
fixing it rather than having this particular triviality resurface
to annoy anyone in the future.

Fix pushed to the master branch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] alter_table regression test problem

2013-11-13 Thread Robert Haas

On Mon, Nov 11, 2013 at 4:34 PM, Andres Freund and...@2ndquadrant.com wrote:
I'm pretty sure that the current coding, which blows away the whole
relation, is used in other places, and I really don't see why it
should be fundamentally flawed, or why we should change it to clear
the cache entries out one by one instead of en masse.
RelidByRelfilenode definitely needs to use HASH_FIND rather than
HASH_ENTER, so that part I agree with.

 It surely is possible to go that route, but imagine what happens if the 
 heap_open() blows away the entire hash. We'd either need to recheck if the 
 hash exists before entering or recreate it after dropping. It seemed simpler 
 to follow attoptcache's example.

I'm not sure if this is the best way forward, but I don't feel like
arguing about it, either, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ERROR during end-of-xact/FATAL

2013-11-13 Thread Noah Misch

On Tue, Nov 12, 2013 at 09:55:34AM -0500, Robert Haas wrote:
 On Fri, Nov 8, 2013 at 4:13 PM, Noah Misch n...@leadboat.com wrote:
  A PANIC will reinitialize everything relevant, largely resolving the 
  problems
  around ERROR during FATAL.  It's a heavy-handed solution, but it may well be
  the best solution.  Efforts to harden CommitTransaction() and
  AbortTransaction() seem well-spent, but the additional effort to make FATAL
  exit cope where AbortTransaction() or another exit action could not cope 
  seems
  to be slicing ever-smaller portions of additional robustness.
 
  I pondered a variant of that conclusion that distinguished critical cleanup
  needs from the rest.  Each shared resource (heavyweight locks, buffer pins,
  LWLocks) would have an on_shmem_exit() callback that cleans up the resource
  under a critical section.  (AtProcExit_Buffers() used to fill such a role, 
  but
  resowner.c's work during AbortTransaction() has mostly supplanted it.)  The
  ShutdownPostgres callback would not use a critical section, so lesser 
  failures
  in AbortTransaction() would not upgrade to a PANIC.  But I'm leaning against
  such a complication on the grounds that it would add seldom-tested code 
  paths
  posing as much a chance of eroding robustness as bolstering it.
 
 The current situation is just plain weird: in the ERROR-then-ERROR
 case, we emit a WARNING and bounce right back into AbortTransaction(),
 and if it doesn't work any better the second time than the first time,
 we recurse again, and eventually if it fails enough times in a row, we
 just give up and PANIC.  But in the ERROR-then-FATAL case, we *don't*
 retry AbortTransaction(); instead, we just continue running the rest
 of the on_shmem_exit callbacks and then exit.
 
 So, in short, ERROR + ERROR*10 = PANIC, but FATAL + ERROR*10 = FATAL.
 That's bizarre.

Quite so.

 Given that that's where we are, promoting an ERROR during FATAL
 processing to PANIC doesn't seem like it's losing much; we're
 essentially already doing that in the (probably more likely) case of a
 persistent ERROR during ERROR processing.  But since PANIC sucks, I'd
 rather go the other direction: let's make an ERROR during ERROR
 processing promote to FATAL.  And then let's do what you write above:
 make sure that there's a separate on-shmem-exit callback for each
 critical shared memory resource and that we call of those during FATAL
 processing.

Many of the factors that can cause AbortTransaction() to fail can also cause
CommitTransaction() to fail, and those would still PANIC if the transaction
had an xid.  How practical might it be to also escape from an error during
CommitTransaction() with a FATAL instead of PANIC?  There's more to fix up in
that case (sinval, NOTIFY), but it may be within reach.  If such a technique
can only reasonably fix abort, though, I have doubts it buys us enough.

 It seems to me that that's how things were originally designed to
 work, but that we've drifted away from it basically because the
 low-level callbacks to release heavyweight locks and buffer pins
 turned out to be kinda, uh, slow, and we thought those code paths
 couldn't be taken anyway (turns out they can).  I think we could
 either make those routines very fast, or arrange only to run that code
 at all in the case where AbortTransaction() didn't complete
 successfully.

Agreed; the performance hazards look tractable.

 It's true that such code will be rarely run, but the
 logic is simple enough that I think we can verify it by hand, and it's
 sure nice to avoid PANICs.

True.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.6

2013-11-13 Thread Andres Freund

On 2013-11-12 19:24:39 +0100, Andres Freund wrote:
 On 2013-11-12 13:18:19 -0500, Robert Haas wrote:
  On Tue, Nov 12, 2013 at 12:50 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
   Completely agreed. As evidenced by the fact that the current change
   doesn't update all relevant comments  code. I wonder if we shouldn't
   leave the function the current way and just add a new function for the
   new behaviour.
   The hard thing with that would be coming up with a new
   name. IsSystemRelationId() having a different behaviour than
   IsSystemRelation() seems strange to me, so just keeping that and
   adapting the callers seems wrong to me.
   IsInternalRelation()? IsCatalogRelation()?
  
  Well, I went through and looked at the places that were affected by
  this and I tend to think that most places will be happier with the new
  definition.
 
 I agree that many if not most want the new definition.
 
  If there are call sites that want the existing test, maybe we should
  have IsRelationInSystemNamespace() for that, and reserve
  IsSystemRelation() for the test as to whether it's a bona fide system
  catalog.
 
 The big reason that I think we do not want the new behaviour for all is:
 
  *NB: TOAST relations are considered system relations by this test
  *for compatibility with the old IsSystemRelationName function.
  *This is appropriate in many places but not all.  Where it's not,
  *also check IsToastRelation.
 
 the current state of things would allow to modify toast relations in
 some places :/

So, I think I found a useful defintion of IsSystemRelation() that fixes
many of the issues with moving relations to pg_catalog: Continue to
treat all pg_toast.* relations as system tables, but only consider
initdb created relations in pg_class.
I've then added IsCatalogRelation() which has a narrower definition of
system relations, namely, it only counts toast tables if they are a
catalog's toast table.

This allows far more actions on user defined relations moved to
pg_catalog. Now they aren't stuck there anymore and can be renamed,
dropped et al. With one curious exception: We still cannot move a
relation out of pg_catalog.
I've included a hunk to allow creation of indexes on relations in
pg_catalog in heap_create(), indexes on catalog relations are prevented
way above, but maybe that should rather be a separate commit.

What do you think?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 91a22bd7fb998b609e73a69b941999596ce4569f Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Wed, 13 Nov 2013 16:15:16 +0100
Subject: [PATCH] Don't regard user defined relations in pg_catalog as system
 tables.

This allows for a more consistent behaviour of user defined relations
in pg_catalog. As before it's not possible to create relations in
pg_catalog, but now they can, after being moved there, manipulated
normally. Before it was impossible to move or drop user defined
relations in pg_catalog, now that's allowed.

To that end, modify IsSystemRelation/Class() to not regard user
defined tables in pg_catalog as system relations and add
IsCatalogRelation/Class() which don't regard toast relations as
catalog relation unless they are a catalog relation's toast table.

This is also preparation for logical decoding which needs
IsCatalogRelation().
---
 src/backend/access/heap/heapam.c  |  2 +-
 src/backend/catalog/aclchk.c  |  2 +-
 src/backend/catalog/catalog.c | 69 ++-
 src/backend/catalog/heap.c| 11 -
 src/backend/commands/cluster.c|  2 +-
 src/backend/commands/indexcmds.c  |  5 ++-
 src/backend/commands/tablecmds.c  |  8 ++--
 src/backend/commands/trigger.c|  2 +-
 src/backend/optimizer/util/plancat.c  |  2 +-
 src/backend/rewrite/rewriteDefine.c   |  4 +-
 src/backend/tcop/utility.c|  2 +-
 src/include/catalog/catalog.h |  4 +-
 src/test/regress/expected/alter_table.out | 34 +++
 src/test/regress/sql/alter_table.sql  | 28 +
 14 files changed, 148 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a21f31b..bbb71c7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2465,7 +2465,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	 * because the heaptuples data structure is all in local memory, not in
 	 * the shared buffer.
 	 */
-	if (IsSystemRelation(relation))
+	if (IsCatalogRelation(relation))
 	{
 		for (i = 0; i  ntuples; i++)
 			CacheInvalidateHeapTuple(relation, heaptuples[i], NULL);
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 06aa766..5a46fd9 100644
--- a/src/backend/catalog/aclchk.c
+++

Re: [HACKERS] Re: Exempting superuser from row-security isn't enough. Run predicates as DEFINER?

2013-11-13 Thread Robert Haas

On Mon, Nov 11, 2013 at 11:10 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 11/11/2013 06:37 PM, Kohei KaiGai wrote:
 I'd like to vote the last options. It is a separate problem (or, might
 be specification), I think.

 I tend to agree, but I'm nervous about entirely hand-waving around this,
 as doing so would *expand* the existing problem.

Suppose we define a new GUC, allow_row_level_security, which defaults
to true.  When set to false, any attempt to access a table protected
with RLS will either (1) bypass RLS, if you have sufficient privileges
to do that (presumably table owner and superuser, at least, would be
sufficient) or (2) fail with an error if RLS cannot be bypassed.  But,
when allow_row_level_security is false, *under no circumstances* will
we evaluate RLS quals - it's bypass-or-error.

Then, we can teach pg_dump to set allow_row_level_security = false on
server versions = 9.4, with an option --allow-row-level-security that
bypasses this behavior.

With these changes, pg_dump is safe by default, not only against
hijacking attacks but against accidentally failing to dump all the
data because you fail to realize that you're subject to an RLS qual.
You'll either get a clean, restorable dump, or you'll fail with an
easy-to-understand error.  In the latter case, if you want to try to
back up that portion of the table you can access, there's an option
for that behavior, which can be documented to imply trust in the table
owner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

On Wed, Nov 13, 2013 at 9:32 AM, Andrew Dunstan and...@dunslane.net wrote:
 I'm not so sure we should require hstore to do things like build
 arbitrary json objects even though I agree that hstore will probably
 displace json for must cases where you want to store nested data (as
 opposed to (de-)serialize).

 I have no idea what this means.

What I'm saying there is I agree: what's going with hstore is not
relevant here.  json features currently provide standalone
serialization and deserialization minus a couple of edge cases that
you are fixing up here.  Hstore will emerge as a json manipulation
engine.

Aside: I thought we blew it (as you know) by not unifying the hstore
and json APIs in the 9.2 cycle and now with the emerging json stuff I
really think so...it's way to late to do anything about it now even if
there was consensus on that point.

 Also, json_object is pretty weird to me, I'm not sure I see the
 advantage of a new serialization format,

 What? there is no new serialization format. This is a way to generate a json
 object in the existing format from a one or two dimensional array of text.
 c.f. |existing function hstore(text[]) = hstore|

Right --  I thought it took text, not text[] -- withdrawn.  I consider
a variant taking (text[], text[]) to be generally more practical than
the one argument version (either 1d or 2d variant).  Dealing with 2d
arrays is a headache unless you really know what you're doing.

 and I don't agree with the
 statement but it is the caller's reponsibility to ensure that keys
 are not repeated..  I think the caller should have no such
 responsibility.  Keys should be able to repeated.

 They can be repeated, as they can in the current json text format. However,
 the function makes no attempt to deal with repeated keys. If a key is
 repeated in the inout it will be repeated in the output. In this respect it
 differs from the hstore function.

Yes.  and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet).  In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.

 Note too, that one effect of moving to a non-text representation of json
 will be that duplicated keys will be resolved (last value will win). But
 that's a much wider issue that this function.

Right, exactly.  And I think this a pretty serious problem with 'non
text json' unless there is a relatively robust and fast process to
recompose the json properly for serialization purposes (but that's
mostly off topic for your proposed patch).

 Also, I'm not sure
 how the {k,v,k,v,k,v}...convention serialized into a string is very
 useful in general practice.  I greatly prefer the aggregation and the
 variadic methods in json_build.

 The extension was built before json_build. But it met a requirement that
 existed at the time. It probably wouldn't be a tragedy to leave it out, but
 there is probably a place for it just as there is for the hstore function.

 Putting it all together, I'd consider:
 *) dropping json_object (although maybe there is a case I'm not thinking
 about)
 *) changing json_build function names to get the json prefix
 *) adding a json object constructor that takes two parallel arrays as
 arguments.

 The third point seems to conflict with the first. I'd only consider that if
 we *do* add the one-array version of json_object.

ok, agreed.  so now I'm just saying to unify function names over json
prefix and maybe add text[], text[] variant for the object builder, or
maybe just drop json_object completely.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3

2013-11-13 Thread J Smith

Looks like we got another set of errors overnight. Here's the log file
from the errors. (Log file scrubbed slightly to remove private data,
but still representative of the problem I believe.)

Nov 13 05:34:34 dev postgres[6084]: [4-1] user=dev,db=dev ERROR:
could not access status of transaction 6337381
Nov 13 05:34:34 dev postgres[6084]: [4-2] user=dev,db=dev DETAIL:
Could not open file pg_subtrans/0060: No such file or directory.
Nov 13 05:34:34 dev postgres[6084]: [4-3] user=dev,db=dev CONTEXT:
SQL statement SELECT 1 FROM ONLY typhon.collection_batches x
WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
Nov 13 05:34:34 dev postgres[6084]: [4-4] user=dev,db=dev STATEMENT:
update listings set deactivated_at=$1 where id=$2 and lock_version=$3
Nov 13 05:34:34 dev postgres[6076]: [4-1] user=dev,db=dev ERROR:
could not access status of transaction 6337381
Nov 13 05:34:34 dev postgres[6076]: [4-2] user=dev,db=dev DETAIL:
Could not open file pg_subtrans/0060: No such file or directory.
Nov 13 05:34:34 dev postgres[6076]: [4-3] user=dev,db=dev CONTEXT:
SQL statement SELECT 1 FROM ONLY typhon.collection_batches x
WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
Nov 13 05:34:34 dev postgres[6076]: [4-4] user=dev,db=dev STATEMENT:
update listings set deactivated_at=$1 where id=$2 and lock_version=$3
Nov 13 05:34:34 dev postgres[6087]: [4-1] user=dev,db=dev ERROR:
could not access status of transaction 6337381
Nov 13 05:34:34 dev postgres[6087]: [4-2] user=dev,db=dev DETAIL:
Could not open file pg_subtrans/0060: No such file or directory.
Nov 13 05:34:34 dev postgres[6087]: [4-3] user=dev,db=dev CONTEXT:
SQL statement SELECT 1 FROM ONLY typhon.collection_batches x
WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
Nov 13 05:34:34 dev postgres[6087]: [4-4] user=dev,db=dev STATEMENT:
update listings set deactivated_at=$1 where id=$2 and lock_version=$3
Nov 13 05:34:34 dev postgres[6086]: [4-1] user=dev,db=dev ERROR:
could not access status of transaction 6337381
Nov 13 05:34:34 dev postgres[6086]: [4-2] user=dev,db=dev DETAIL:
Could not open file pg_subtrans/0060: No such file or directory.
Nov 13 05:34:34 dev postgres[6086]: [4-3] user=dev,db=dev CONTEXT:
SQL statement SELECT 1 FROM ONLY typhon.collection_batches x
WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
Nov 13 05:34:34 dev postgres[6086]: [4-4] user=dev,db=dev STATEMENT:
update listings set deactivated_at=$1 where id=$2 and lock_version=$3
Nov 13 05:34:34 dev postgres[6088]: [4-1] user=dev,db=dev ERROR:
could not access status of transaction 6337381
Nov 13 05:34:34 dev postgres[6088]: [4-2] user=dev,db=dev DETAIL:
Could not open file pg_subtrans/0060: No such file or directory.
Nov 13 05:34:34 dev postgres[6088]: [4-3] user=dev,db=dev CONTEXT:
SQL statement SELECT 1 FROM ONLY typhon.collection_batches x
WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
Nov 13 05:34:34 dev postgres[6088]: [4-4] user=dev,db=dev STATEMENT:
update listings set deactivated_at=$1 where id=$2 and lock_version=$3
Nov 13 05:34:34 dev postgres[6085]: [4-1] user=dev,db=dev ERROR:
could not access status of transaction 6337381
Nov 13 05:34:34 dev postgres[6085]: [4-2] user=dev,db=dev DETAIL:
Could not open file pg_subtrans/0060: No such file or directory.
Nov 13 05:34:34 dev postgres[6085]: [4-3] user=dev,db=dev CONTEXT:
SQL statement SELECT 1 FROM ONLY typhon.collection_batches x
WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
Nov 13 05:34:34 dev postgres[6085]: [4-4] user=dev,db=dev STATEMENT:
update listings set deactivated_at=$1 where id=$2 and lock_version=$3

Several processes all seemed to hit the problem at the same moment,
and all of them refer to the same transaction ID. Again, the file
pg_subtrans/0060 doesn't exist, and the only file that does exist is
pg_subtrans/005A which appears to be a zeroed-out file 245760 bytes in
length.

Still don't have a clue as to how I can reproduce the problem. It
seems that in all cases the error occurred during either an UPDATE to
a table_X or an INSERT to table_Y. In all cases, the error occurred in
a manner identical to those shown in the log above, the only
difference being either an UPDATE on table_X or an INSERT on table_Y.

Not sure what direction I should head to now. Perhaps some aggressive
logging would help, so we can see the queries surrounding the
problems? I could reconfigure things to capture all statements and set
up monit or something to send an alert when the problem resurfaces,
for instance.

Cheers all.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clang 3.3 Analyzer Results

Kevin Grittner kgri...@ymail.com writes:
 If nobody objects, I'll fix that small memory leak in the
 regression test driver.  Hopefully someone more familiar with
 pg_basebackup will fix the double-free (and related problems
 mentioned by Tom) in streamutil.c.

Here's a less convoluted (IMHO) approach to the password management logic
in streamutil.c.  One thing I really didn't care for about the existing
coding is that the loop-for-password included all the rest of the
function, even though there's no intention to retry for any purpose except
collecting a password.  So I moved up the bottom of the loop.  For ease of
review, I've not reindented the code below the new loop bottom, but would
do so before committing.

Any objections to this version?

regards, tom lane

diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index 1dfb80f..c89fca9 100644
*** a/src/bin/pg_basebackup/streamutil.c
--- b/src/bin/pg_basebackup/streamutil.c
*** GetConnection(void)
*** 40,47 
  	int			i;
  	const char **keywords;
  	const char **values;
- 	char	   *password = NULL;
  	const char *tmpparam;
  	PQconninfoOption *conn_opts = NULL;
  	PQconninfoOption *conn_opt;
  	char	   *err_msg = NULL;
--- 40,47 
  	int			i;
  	const char **keywords;
  	const char **values;
  	const char *tmpparam;
+ 	bool		need_password;
  	PQconninfoOption *conn_opts = NULL;
  	PQconninfoOption *conn_opt;
  	char	   *err_msg = NULL;
*** GetConnection(void)
*** 114,140 
  		i++;
  	}
  
  	while (true)
  	{
! 		if (password)
! 			free(password);
  
  		if (dbpassword)
  		{
- 			/*
- 			 * We've saved a password when a previous connection succeeded,
- 			 * meaning this is the call for a second session to the same
- 			 * database, so just forcibly reuse that password.
- 			 */
  			keywords[i] = password;
  			values[i] = dbpassword;
- 			dbgetpassword = -1; /* Don't try again if this fails */
  		}
! 		else if (dbgetpassword == 1)
  		{
! 			password = simple_prompt(_(Password: ), 100, false);
! 			keywords[i] = password;
! 			values[i] = password;
  		}
  
  		tmpconn = PQconnectdbParams(keywords, values, true);
--- 114,143 
  		i++;
  	}
  
+ 	/* If -W was given, force prompt for password, but only the first time */
+ 	need_password = (dbgetpassword == 1  dbpassword == NULL);
+ 
  	while (true)
  	{
! 		/* Get a new password if appropriate */
! 		if (need_password)
! 		{
! 			if (dbpassword)
! free(dbpassword);
! 			dbpassword = simple_prompt(_(Password: ), 100, false);
! 			need_password = false;
! 		}
  
+ 		/* Use (or reuse, on a subsequent connection) password if we have it */
  		if (dbpassword)
  		{
  			keywords[i] = password;
  			values[i] = dbpassword;
  		}
! 		else
  		{
! 			keywords[i] = NULL;
! 			values[i] = NULL;
  		}
  
  		tmpconn = PQconnectdbParams(keywords, values, true);
*** GetConnection(void)
*** 150,163 
  			exit(1);
  		}
  
  		if (PQstatus(tmpconn) == CONNECTION_BAD 
  			PQconnectionNeedsPassword(tmpconn) 
  			dbgetpassword != -1)
  		{
- 			dbgetpassword = 1;	/* ask for password next time */
  			PQfinish(tmpconn);
! 			continue;
  		}
  
  		if (PQstatus(tmpconn) != CONNECTION_OK)
  		{
--- 153,169 
  			exit(1);
  		}
  
+ 		/* If we need a password and -w wasn't given, loop back and get one */
  		if (PQstatus(tmpconn) == CONNECTION_BAD 
  			PQconnectionNeedsPassword(tmpconn) 
  			dbgetpassword != -1)
  		{
  			PQfinish(tmpconn);
! 			need_password = true;
  		}
+ 		else
+ 			break;
+ 	}
  
  		if (PQstatus(tmpconn) != CONNECTION_OK)
  		{
*** GetConnection(void)
*** 204,212 
  			exit(1);
  		}
  
- 		/* Store the password for next run */
- 		if (password)
- 			dbpassword = password;
  		return tmpconn;
- 	}
  }
--- 210,214 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] pg_basebackup: progress report max once per second

2013-11-13 Thread Mika Eloranta

Prevent excessive progress reporting that can grow to gigabytes
of output with large databases.
---
 src/bin/pg_basebackup/pg_basebackup.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c 
b/src/bin/pg_basebackup/pg_basebackup.c
index a1e12a8..90c4683 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -45,6 +45,7 @@ bool  streamwal = false;
 bool   fastcheckpoint = false;
 bool   writerecoveryconf = false;
 intstandby_message_timeout = 10 * 1000;/* 10 
sec = default */
+intlast_progress_report = 0;
 
 /* Progress counters */
 static uint64 totalsize;
@@ -74,7 +75,7 @@ static PQExpBuffer recoveryconfcontents = NULL;
 /* Function headers */
 static void usage(void);
 static void verify_dir_is_empty_or_create(char *dirname);
-static void progress_report(int tablespacenum, const char *filename);
+static void progress_report(int tablespacenum, const char *filename, int 
force);
 
 static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
 static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
@@ -399,12 +400,15 @@ verify_dir_is_empty_or_create(char *dirname)
  * is enabled, also print the current file name.
  */
 static void
-progress_report(int tablespacenum, const char *filename)
+progress_report(int tablespacenum, const char *filename, int force)
 {
int percent = (int) ((totaldone / 1024) * 100 / 
totalsize);
chartotaldone_str[32];
chartotalsize_str[32];
 
+   if(!showprogress || (time(NULL) == last_progress_report  !force)) 
return; /* Max once per second */
+   last_progress_report = time(NULL);
+
/*
 * Avoid overflowing past 100% or the full size. This may make the total
 * size number change as we approach the end of the backup (the estimate
@@ -850,9 +854,9 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
}
}
totaldone += r;
-   if (showprogress)
-   progress_report(rownum, filename);
+   progress_report(rownum, filename, 0);
}   /* while (1) */
+   progress_report(rownum, filename, 1);
 
if (copybuf != NULL)
PQfreemem(copybuf);
@@ -1073,8 +1077,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int 
rownum)
disconnect_and_exit(1);
}
totaldone += r;
-   if (showprogress)
-   progress_report(rownum, filename);
+   progress_report(rownum, filename, 0);
 
current_len_left -= r;
if (current_len_left == 0  current_padding == 0)
@@ -1090,6 +1093,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int 
rownum)
}
}   /* continuing 
data in existing file */
}   /* loop over 
all data blocks */
+   progress_report(rownum, filename, 1);
 
if (file != NULL)
{
@@ -1450,8 +1454,7 @@ BaseBackup(void)
tablespacecount = PQntuples(res);
for (i = 0; i  PQntuples(res); i++)
{
-   if (showprogress)
-   totalsize += atol(PQgetvalue(res, i, 2));
+   totalsize += atol(PQgetvalue(res, i, 2));
 
/*
 * Verify tablespace directories are empty. Don't bother with 
the
@@ -1498,7 +1501,7 @@ BaseBackup(void)
 
if (showprogress)
{
-   progress_report(PQntuples(res), NULL);
+   progress_report(PQntuples(res), NULL, 1);
fprintf(stderr, \n);  /* Need to move to next line */
}
PQclear(res);
-- 
1.8.4.2



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality



On 11/13/2013 11:37 AM, Merlin Moncure wrote:


Yes.  and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet).  In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.


I think you're probably going to lose any argument that says we should 
necessarily preserve key order (and possibly key duplication) in 
objects. The standard doesn't support such a contention, either:


   An object is an unordered collection of zero or more name/value
   pairs

   ...

   The names within an object SHOULD be unique.


Forcing us to preserve order and key duplication would be a pretty 
effective barrier to any performance improvements.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] nested hstore patch

On 11/12/13, 1:35 PM, Teodor Sigaev wrote:
 Attatched patch adds nesting feature, types (string, boll and numeric
 values), arrays and scalar to hstore type.

Could you check your email client for next time?  It's sending
Content-Type: application/x-tar for a *.patch.gz file.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction-lifespan memory leak with plpgsql DO blocks

Robert Haas robertmh...@gmail.com writes:
 On Tue, Nov 12, 2013 at 11:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Or we could say what the heck are you doing executing tens of
 thousands of DO blocks?  Make it into a real live function;
 you'll save a lot of cycles on parsing costs.  I'm not sure that
 this is a usage pattern we ought to be optimizing for.

 I'm not volunteering to spend time fixing this, but I disagree with
 the premise that it isn't worth fixing, because I think it's a POLA
 violation.

Yeah, I'm not terribly comfortable with letting it go either.  Attached
is a proposed patch.  I couldn't see any nice way to do it without adding
a field to PLpgSQL_execstate, so this isn't a feasible solution for
back-patching (it'd break the plpgsql debugger).  However, given the
infrequency of complaints, I think fixing it in 9.4 and up is good enough.

I checked that this eliminates the memory leak using this test case:

do $outer$
begin
  for i in 1..100 loop
execute $e$
  do $$
  declare x int = 0;
  begin
x := x + 1;
  end;
  $$;
$e$;
  end loop;
end;
$outer$;

which eats a couple GB in HEAD and nothing with the patch.
The run time seems to be the same or a bit less, too.

Any objections to applying this to HEAD?

regards, tom lane

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index bc31fe9..76da842 100644
*** a/src/pl/plpgsql/src/pl_exec.c
--- b/src/pl/plpgsql/src/pl_exec.c
*** typedef struct
*** 50,81 
  	bool	   *freevals;		/* which arguments are pfree-able */
  } PreparedParamsData;
  
! /*
!  * All plpgsql function executions within a single transaction share the same
!  * executor EState for evaluating simple expressions.  Each function call
!  * creates its own eval_econtext ExprContext within this estate for
!  * per-evaluation workspace.  eval_econtext is freed at normal function exit,
!  * and the EState is freed at transaction end (in case of error, we assume
!  * that the abort mechanisms clean it all up).	Furthermore, any exception
!  * block within a function has to have its own eval_econtext separate from
!  * the containing function's, so that we can clean up ExprContext callbacks
!  * properly at subtransaction exit.  We maintain a stack that tracks the
!  * individual econtexts so that we can clean up correctly at subxact exit.
!  *
!  * This arrangement is a bit tedious to maintain, but it's worth the trouble
!  * so that we don't have to re-prepare simple expressions on each trip through
!  * a function.	(We assume the case to optimize is many repetitions of a
!  * function within a transaction.)
!  */
! typedef struct SimpleEcontextStackEntry
! {
! 	ExprContext *stack_econtext;	/* a stacked econtext */
! 	SubTransactionId xact_subxid;		/* ID for current subxact */
! 	struct SimpleEcontextStackEntry *next;		/* next stack entry up */
! } SimpleEcontextStackEntry;
! 
! static EState *simple_eval_estate = NULL;
! static SimpleEcontextStackEntry *simple_econtext_stack = NULL;
  
  /
   * Local function forward declarations
--- 50,57 
  	bool	   *freevals;		/* which arguments are pfree-able */
  } PreparedParamsData;
  
! /* Shared simple-expression eval context for regular plpgsql functions */
! static SimpleEvalContext simple_eval_context = {NULL, NULL};
  
  /
   * Local function forward declarations
*** static int exec_stmt_dynfors(PLpgSQL_exe
*** 136,142 
  
  static void plpgsql_estate_setup(PLpgSQL_execstate *estate,
  	 PLpgSQL_function *func,
! 	 ReturnSetInfo *rsi);
  static void exec_eval_cleanup(PLpgSQL_execstate *estate);
  
  static void exec_prepare_plan(PLpgSQL_execstate *estate,
--- 112,119 
  
  static void plpgsql_estate_setup(PLpgSQL_execstate *estate,
  	 PLpgSQL_function *func,
! 	 ReturnSetInfo *rsi,
! 	 SimpleEvalContext *simple_context);
  static void exec_eval_cleanup(PLpgSQL_execstate *estate);
  
  static void exec_prepare_plan(PLpgSQL_execstate *estate,
*** static char *format_preparedparamsdata(P
*** 230,239 
  /* --
   * plpgsql_exec_function	Called by the call handler for
   *function execution.
   * --
   */
  Datum
! plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo)
  {
  	PLpgSQL_execstate estate;
  	ErrorContextCallback plerrcontext;
--- 207,222 
  /* --
   * plpgsql_exec_function	Called by the call handler for
   *function execution.
+  *
+  * This is also used to execute inline code blocks (DO blocks).  The only
+  * difference that this code is aware of is that for a DO block, we want
+  * to use a private SimpleEvalContext, whose address must be passed as
+  * simple_context.  For regular functions, pass NULL.
   * --
   */
  Datum
! plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-13 Thread Martijn van Oosterhout

On Tue, Nov 12, 2013 at 03:57:52PM +0900, Tatsuo Ishii wrote:
 I have been thinking about this for years and I think the key idea for
 this is, implementing universal encoding. The universal encoding
 should have following characteristics to implement N2 encoding in a
 database.
 
 1) no loss of round trip encoding conversion
 
 2) no mapping table is necessary to convert from/to existing encodings
 
 Once we implement the universal encoding, other problem such as
 pg_database with multiple encoding problem can be solved easily.

Isn't this essentially what the MULE internal encoding is?

 Currently there's no such an universal encoding in the universe, I
 think the only way is, inventing it by ourselves.

This sounds like a terrible idea. In the future people are only going
to want more advanced text functions, regular expressions, indexing and
making encodings that don't exist anywhere else seems like a way to
make a lot of work for little benefit.

A better idea seems to me is to (if postgres is configured properly)
embed the non-round-trippable characters in the custom character part
of the unicode character set. In other words, adjust the mappings
tables on demand and voila.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature

[HACKERS] pg_upgrade rebuild_tsvector_tables.sql includes child table columns

When pg_upgrade generates a rebuild_tsvector_tables.sql script to
rewrite tsvector columns, it includes ALTER TABLE commands for child
tables that cannot be altered independently from their parents:

psql:rebuild_tsvector_tables.sql:673: ERROR:  cannot alter inherited column 
xxx

This isn't a problem unless you run the script in single-transaction
mode, because the commands will just fail and the work has already
been done in the parent table.

It's probably not worth fixing this, given that this only applies to
upgrades from 8.3, but I wanted to get it into the archives.  If we
ever need to do something like this again, we should make it smarter.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

Martijn van Oosterhout klep...@svana.org writes:
 On Tue, Nov 12, 2013 at 03:57:52PM +0900, Tatsuo Ishii wrote:
 Once we implement the universal encoding, other problem such as
 pg_database with multiple encoding problem can be solved easily.

 Isn't this essentially what the MULE internal encoding is?

MULE is completely evil.  It has N different encodings for the same
character, not to mention no support code available.

 Currently there's no such an universal encoding in the universe, I
 think the only way is, inventing it by ourselves.

 This sounds like a terrible idea. In the future people are only going
 to want more advanced text functions, regular expressions, indexing and
 making encodings that don't exist anywhere else seems like a way to
 make a lot of work for little benefit.

Agreed.

 A better idea seems to me is to (if postgres is configured properly)
 embed the non-round-trippable characters in the custom character part
 of the unicode character set. In other words, adjust the mappings
 tables on demand and voila.

From the standpoint of what will happen with existing library code
(like strcoll), I'm not sure it's all that easy.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hail the CFM

2013-11-13 Thread Gavin Flower


On 14/11/13 02:45, Peter Eisentraut wrote:

The commit fest manager mace has been passed on to me[*].  More to follow.


[*] Actually, I found it behind the dumpster in the alley.


Did you take care not to touch it with bare skin  properly sterilize 
it?  As a used mace, may pick up blood from people with diseases you'd 
rather not know about...


Ensure that you know how to use it effectively, as PostgreSQL developers 
are surly lot and respond well to a good bit of applied violence (just 
ask Tom Lane about his extensive list of fractured bones  other old war 
injuries). If you are not sure how to use it, then there are plenty of 
sites on the web to help you, such as: 
http://www.lordsandladies.org/maces.htm


Also ensure you apply for the correct licence from your local Police 
Station, as they will be very interested in how you intend to use such a 
weapon, so it is very important to consult them before you first use it!



Cheers,
Gavin
(Who is now glad that he not actually a PostgreSQL developer)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Getting the clog bits for a particular xid

2013-11-13 Thread Christophe Pettus

As part of doing some database corruption investigation, I'm trying to get the 
pg_clog/ bit pair for a particular transaction.

Let's say we check on a particular tuple, and get:

SELECT xmin, id FROM mytable WHERE pk=4727366;

   xmin|id
---+--
 107898222 |  4727366

Each pg_clog file (as least, as of 9.3.1) is 256KB, so there are 256KB * 8 
bits/byte / 2 bits/transaction = 1M transactions per file

So:

107898222 / 1048576 = 102, or 0x0066
107898222 % 1048576 = 943470.

So, we're looking at file 0x0066.  It's the 943470th transaction in that file, 
or the 943470*2 = 1886940th bit.  So, (counting from the MSB being 0), it's the 
4th and 5th bit of byte offset 235867 in that file.

Is that correct?
--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan and...@dunslane.net wrote:
 On 11/13/2013 11:37 AM, Merlin Moncure wrote:

 Yes.  and I think this is one of the major advantages of the json API
 vs hstore: you can serialize objects that hstore cannot -- at least
 not without extra scaffolding (at least, AIUI, I haven't fully
 grappled with the coming hstore stuff yet).  In other words, just
 because key order and cardinality is unimportant in an associative
 array, it does not in any way follow it is similarly unimportant for
 object serialization.

An object is an unordered collection ofz ero or more name/value
pairs
...

The names within an object SHOULD be unique.

 Forcing us to preserve order and key duplication would be a pretty effective
 barrier to any performance improvements.

SHOULD != MUST. Here is the definition of object per RFC 4627.

  An object structure is represented as a pair of curly brackets
   surrounding zero or more name/value pairs (or members).  A name is a
   string.  A single colon comes after each name, separating the name
   from the value.  A single comma separates a value from a following
   name.  The names within an object SHOULD be unique.

And SHOULD means
  3. SHOULD. This word, or the adjective RECOMMENDED, mean that
there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

As far as I'm concerned, that settles things right there.  Beyond that
(although they do say 'unordered' above), as a consequence of your
argument the json strings {a: 1, b: 2} and {b: 1, a: 2} should
be considered equivalent.  Another consequence is that creating
particular legal constructions should be discouraged.  I disagree with
this.

This is simply not the case with many json consuming clients.  It's a
nice idea but not how things work universally and that's exactly why
the rules were hedged in the RFC.  I have a couple of cases right now
where I'm producing key order sensitive json for some (admittedly not
very well designed) json consuming clients that are out of my control.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.5

On 11/9/13, 5:56 AM, Andres Freund wrote:
 ISTM ecpg's regression tests should be built (not run!) during
 $(recurse) not just during make check.

Actually, I did just the opposite change some years ago.  The rationale
is, the build builds that which you want to install.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2013-11-13 Thread Peter Geoghegan

On Wed, Oct 23, 2013 at 8:52 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Hmm, now if we had portable atomic addition, so that we could spare the
 spinlock ...

That certainly seems like an interesting possibility.

I think that pg_stat_statements should be made to do this kind of
thing by a third party tool that aggregates snapshots of deltas.
Time-series data, including (approximate) *local* minima and maxima
should be built from that. I think tools like KONDO-san's pg_statsinfo
tool have an important role to play here. I would like to see it or a
similar tool become a kind of defacto standard for consuming
pg_stat_statements' output.

At this point we are in general very much chasing diminishing returns
by adding new things to the counters struct, particularly given that
it's currently protected by a spinlock. And adding a histogram or
min/max for something like execution time isn't an approach that can
be made to work for every existing cost tracked by pg_stat_statements.
So, taking all that into consideration, I'm afraid this patch gets a
-1 from me.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-13 Thread Josh Berkus

On 11/13/2013 06:45 AM, Merlin Moncure wrote: I'm not so sure we should
require hstore to do things like build
 Also, json_object is pretty weird to me, I'm not sure I see the
 advantage of a new serialization format, and I don't agree with the
 statement but it is the caller's reponsibility to ensure that keys
 are not repeated..

This is pretty standard in the programming languages I know of which use
JSON.

 I think the caller should have no such
 responsibility.  Keys should be able to repeated.

Apparently your experience with using JSON in practice has been fairly
different from mine; the projects I work on, the JSON is being
constantly converted back and forth to hashes and dictionaries, which
means that ordering is not preserved and keys have to be unique (or
become unique within one conversion cycle).  I think, based on the
language of the RFC and common practice, that it's completely valid for
us to require unique keys within JSON-manipulation routines.

Certainly the upcoming binary storage is going to require unique keys.
For that matter, both MongoDB and CouchDB store unique, unordered keys.
 And ever supporting CRUD functions (i.e. update this key) is going to
require uniqueness.

 Putting it all together, I'd consider:
 *) dropping json_object (although maybe there is a case I'm not
thinking about)
 *) changing json_build function names to get the json prefix
 *) adding a json object constructor that takes two parallel arrays as
 arguments.

I was with you until the third idea.  Huh?

The scripting languages I use (Perl, Python) have functions which
convert a list/array to a hash/dictionary.  In each case, the standard
input is a single list/array in the form [ k, v, k, v, k, v ].  Now,
while there are standard language functions which support munging two
parallel arrays into one hash (such as Python's zip()), these are less
frequently used.  Supporting the zip() option without supporting the [
k, v ] array option would be a bizarre and puzzling approach to most
programmers I know.  I can see three approaches which make sense:

1. we don't include json_object at all.
2. we include the existing json_object
3. we include json_object, plus a second json_object function which
takes two arrays

Keep in mind that all of Andrew's functions came out of real-life use
cases of writing applications which return JSON to the caller, so they
are based on real needs to fill holes in our JSON-building function library.

In the case of json_object, the need was to supply column labels
where, usually due to calculated columns, none exist in the input.  Take
the example where I want to return a bunch of aggregates from a table as
a series of json objects with user-friendly labels:

SELECT build_json_object( dept, department, total_costs, sum(costs),
running_total, running_sum() )
FROM 

Where it becomes even more useful is when you want the json label to be
the result of a calculated expression:

SELECT build_json_object ( department, sum() )

Yes, you could do this with a two-array version as well; it's just not
more intuitive, and in cases where you have dozens of columns, puts you
in column-counting hell.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality



On 11/13/2013 04:58 PM, Merlin Moncure wrote:

On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan and...@dunslane.net wrote:

On 11/13/2013 11:37 AM, Merlin Moncure wrote:

Yes.  and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet).  In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.

An object is an unordered collection ofz ero or more name/value
pairs
...

The names within an object SHOULD be unique.

Forcing us to preserve order and key duplication would be a pretty effective
barrier to any performance improvements.

SHOULD != MUST. Here is the definition of object per RFC 4627.

   An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members).  A name is a
string.  A single colon comes after each name, separating the name
from the value.  A single comma separates a value from a following
name.  The names within an object SHOULD be unique.

And SHOULD means
   3. SHOULD. This word, or the adjective RECOMMENDED, mean that
there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

As far as I'm concerned, that settles things right there.  Beyond that
(although they do say 'unordered' above), as a consequence of your
argument the json strings {a: 1, b: 2} and {b: 1, a: 2} should
be considered equivalent.  Another consequence is that creating
particular legal constructions should be discouraged.  I disagree with
this.

This is simply not the case with many json consuming clients.  It's a
nice idea but not how things work universally and that's exactly why
the rules were hedged in the RFC.  I have a couple of cases right now
where I'm producing key order sensitive json for some (admittedly not
very well designed) json consuming clients that are out of my control.




I understand the difference between should and must. But there is 
nothing that REQUIRES us to preserve key order or duplicate keys. If you 
really need textual preservation, you should probably store the data as 
text and convert it to json to do json-ish things to it. If not, we're 
going to face huge demands to implement another type which almost 
everyone but you will move to in rapid order because it performs so much 
better. The strong consensus I have seen in discussions at conferences 
and elsewhere is to go the way we're going, instead.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-13 Thread Gavin Flower


On 14/11/13 11:33, Andrew Dunstan wrote:


On 11/13/2013 04:58 PM, Merlin Moncure wrote:
On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan and...@dunslane.net 
wrote:

On 11/13/2013 11:37 AM, Merlin Moncure wrote:

Yes.  and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet).  In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.

An object is an unordered collection ofz ero or more name/value
pairs
...

The names within an object SHOULD be unique.

Forcing us to preserve order and key duplication would be a pretty 
effective

barrier to any performance improvements.

SHOULD != MUST. Here is the definition of object per RFC 4627.

   An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members).  A name is a
string.  A single colon comes after each name, separating the name
from the value.  A single comma separates a value from a following
name.  The names within an object SHOULD be unique.

And SHOULD means
   3. SHOULD. This word, or the adjective RECOMMENDED, mean that
there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

As far as I'm concerned, that settles things right there. Beyond that
(although they do say 'unordered' above), as a consequence of your
argument the json strings {a: 1, b: 2} and {b: 1, a: 2} should
be considered equivalent.  Another consequence is that creating
particular legal constructions should be discouraged.  I disagree with
this.

This is simply not the case with many json consuming clients. It's a
nice idea but not how things work universally and that's exactly why
the rules were hedged in the RFC.  I have a couple of cases right now
where I'm producing key order sensitive json for some (admittedly not
very well designed) json consuming clients that are out of my control.




I understand the difference between should and must. But there is 
nothing that REQUIRES us to preserve key order or duplicate keys. If 
you really need textual preservation, you should probably store the 
data as text and convert it to json to do json-ish things to it. If 
not, we're going to face huge demands to implement another type which 
almost everyone but you will move to in rapid order because it 
performs so much better. The strong consensus I have seen in 
discussions at conferences and elsewhere is to go the way we're going, 
instead.


cheers

andrew




I can see that both points of view are valid, in different contexts.

Would be possible to have a boolean, such as 'strict' - so that unique  
ordered was only imposed when strict was TRUE? Alternately, separate 
functions to allow the same choice?



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

 Isn't this essentially what the MULE internal encoding is?

No. MULE is not powerfull enough and overly complicated to deal with
different encodings (character sets).

 Currently there's no such an universal encoding in the universe, I
 think the only way is, inventing it by ourselves.
 
 This sounds like a terrible idea. In the future people are only going
 to want more advanced text functions, regular expressions, indexing and
 making encodings that don't exist anywhere else seems like a way to
 make a lot of work for little benefit.

That is probably a misunderstanding. We don't need to modify existing
text handling modules such as text functions, regular expressions,
indexing etc. We just convert from the universal encoding X to the
original encoding before calling them. The process is pretty easy and
fast because it just requires skipping encoding identifier and
encoding length part.

Basically the encoding X should be used for lower layer modules of
PostgreSQL and higher layer module such as living in
src/backend/utils/adt should not aware it.

 A better idea seems to me is to (if postgres is configured properly)
 embed the non-round-trippable characters in the custom character part
 of the unicode character set. In other words, adjust the mappings
 tables on demand and voila.

Using Unicode requires overhead for encoding conversion because it
needs to look up mapping tables. That will be a huge handicap for
large data and that I want to avoid in the first place.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/13/2013 06:45 AM, Merlin Moncure wrote: I'm not so sure we should
 require hstore to do things like build
 Also, json_object is pretty weird to me, I'm not sure I see the
 advantage of a new serialization format, and I don't agree with the
 statement but it is the caller's reponsibility to ensure that keys
 are not repeated..

 This is pretty standard in the programming languages I know of which use
 JSON.

 I think the caller should have no such
 responsibility.  Keys should be able to repeated.

 Apparently your experience with using JSON in practice has been fairly
 different from mine; the projects I work on, the JSON is being
 constantly converted back and forth to hashes and dictionaries, which
 means that ordering is not preserved and keys have to be unique (or
 become unique within one conversion cycle).  I think, based on the
 language of the RFC and common practice, that it's completely valid for
 us to require unique keys within JSON-manipulation routines.

Common practice?  The internet is littered with complaints about
documents being spontaneously re-ordered and or de-duplicated in
various stacks.  Other stacks provide mechanisms for explicit key
order handling (see here: http://docs.python.org/2/library/json.html).
  Why do you think they did that?

I use pg/JSON all over the place.  In several cases I have to create
documents with ordered keys because the parser on the other side wants
them that way -- this is not a hypothetical argument.  The current
json serialization API handles that just fine and the hstore stuff
coming down the pike will not.  I guess that's a done deal based on
'performance'.  I'm clearly not the only one to have complained about
this though.

merln


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-13 Thread Mike Blackwell



On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote:


  Putting it all together, I'd consider:
  *) dropping json_object (although maybe there is a case I'm not
 thinking about)
  *) changing json_build function names to get the json prefix
  *) adding a json object constructor that takes two parallel arrays as
  arguments.

 I was with you until the third idea.  Huh?
 


I actually had a use case for this today, though with hstore, importing a
fixed length record with something along the lines of:

hstore(
  ARRAY['field 1', 'field 2', 'field 3'],
  regexp_matches(fixed_field,'(.{4})(.{10})(.{5})')
)

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com


http://www.rrdonnelley.com/
* mike.blackw...@rrd.com*

Re: [HACKERS] additional json functionality

2013-11-13 Thread Josh Berkus

Merlin,


 I use pg/JSON all over the place.  In several cases I have to create
 documents with ordered keys because the parser on the other side wants
 them that way -- this is not a hypothetical argument.  The current
 json serialization API handles that just fine and the hstore stuff
 coming down the pike will not.  I guess that's a done deal based on
 'performance'.  I'm clearly not the only one to have complained about
 this though.

It's not just a matter of performance.  It's the basic conflict of
JSON as document format vs. JSON as data storage.  For the latter,
unique, unordered keys are required, or certain functionality isn't
remotely possible: indexing, in-place key update, transformations, etc.

XML went through the same thing, which is part of how we got a bunch of
incompatible dialects of XML.

Now, your use case does show us that there's a case to be made for still
having text JSON even after we have binary JSON.  There's a strong
simplicity argument against that, though ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] -d option for pg_isready is broken

2013-11-13 Thread Josh Berkus


handyrep@john:~/handyrep$ pg_isready --version
pg_isready (PostgreSQL) 9.3.1

handyrep@john:~/handyrep$ pg_isready -h john -p 5432 -U postgres -d
postgres -q
pg_isready: missing = after postgres in connection info string

handyrep@john:~/handyrep$ pg_isready --host=john --port=5432
--user=postgres --dbname=postgres
pg_isready: missing = after postgres in connection info string

handyrep@john:~/handyrep$ pg_isready -h john -p 5432 -U postgres
john:5432 - accepting connections

so apparently the -d option:

a) doesn't work, and
b) doesn't do anything

I suggest simply removing it from the utility.

I'll note that the -U option doesn't appear to do anything relevant
either, but at least it doesn't error unnecessarily:

handyrep@john:~/handyrep$ pg_isready -h john -p 5432 -U no_such_user
john:5432 - accepting connections

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] First patch -- somewhat trivial feature

2013-11-13 Thread Robert Berry

This is my first attempt at writing a patch, so it's pretty simple.

Commit log sums it up:

Adds a convenience feature to the explain command which prints out GUC cost
parameters in explain text output.

For example:

explain (params) select * from table;

will include text output like the following:

Cost Params:
 seq_page: 1.00, rnd_page: 4.00, cpu_tup: 0.01,
cpu_ind: 0.005000, cpu_op: 0.002500
 amenabled: 111

The bit vector is enable variables in the order listed in cost.h,
though mainly provides a high level view on whether or not any strategies
are disabled.

--

I recognize that this is kind of a frivolous feature which may not be worth
any potential maintenance burden, so submitted for what it's worth as an
initial effort.

Best Regards,
Robert
From 960dc64864306b554a2e150cb8e28e7a63f218d1 Mon Sep 17 00:00:00 2001
From: robert berry berrydigi...@gmail.com
Date: Wed, 13 Nov 2013 15:29:23 -0800
Subject: [PATCH 1/1] Adds a convenience feature to the explain command which
 prints out GUC cost parameters in explain text output.

For example:

will include text output like the following:

Cost Params:
 seq_page: 1.00, rnd_page: 4.00, cpu_tup: 0.01, cpu_ind: 0.005000, cpu_op: 0.002500
 amenabled: 111

The bit vector is enable variables in the order listed in cost.h, though mainly provides a high level
view on whether or not any strategies are disabled.
---
 src/backend/commands/explain.c | 18 ++
 src/include/commands/explain.h |  1 +
 2 files changed, 19 insertions(+)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 4e93df2..8feeca8 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -31,6 +31,7 @@
 #include utils/snapmgr.h
 #include utils/tuplesort.h
 #include utils/xml.h
+#include optimizer/cost.h
 
 
 /* Hook for plugins to get control in ExplainOneQuery() */
@@ -140,6 +141,8 @@ ExplainQuery(ExplainStmt *stmt, const char *queryString,
 			es.costs = defGetBoolean(opt);
 		else if (strcmp(opt-defname, buffers) == 0)
 			es.buffers = defGetBoolean(opt);
+		else if (strcmp(opt-defname, params) == 0)
+			es.params = defGetBoolean(opt);
 		else if (strcmp(opt-defname, timing) == 0)
 		{
 			timing_set = true;
@@ -516,6 +519,21 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 
 	PopActiveSnapshot();
 
+	/* include one line display of cost parameters */
+	if (es-params) 
+	{
+		if (es-format == EXPLAIN_FORMAT_TEXT)
+		{
+		  appendStringInfo(es-str, \nCost Params:\n);
+			appendStringInfo(es-str, \tseq_page: %f, rnd_page: %f, cpu_tup: %f, cpu_ind: %f, cpu_op: %f,
+		seq_page_cost, random_page_cost, cpu_tuple_cost, cpu_index_tuple_cost, cpu_operator_cost, enable_seqscan);
+		  appendStringInfo(es-str, \n\tamenabled: %d%d%d%d%d%d%d%d%d%d%d\n\n,
+		enable_seqscan, enable_indexscan, enable_indexonlyscan, enable_bitmapscan, 
+		enable_tidscan, enable_sort, enable_hashagg, enable_nestloop, enable_material,
+		enable_mergejoin, enable_hashjoin);  
+		}
+	}
+
 	/* We need a CCI just in case query expanded to multiple plans */
 	if (es-analyze)
 		CommandCounterIncrement();
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index ca213d7..d76476c 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -33,6 +33,7 @@ typedef struct ExplainState
 	bool		costs;			/* print costs */
 	bool		buffers;		/* print buffer usage */
 	bool		timing;			/* print timing */
+	bool		params;			/* print optimizer cost params */
 	ExplainFormat format;		/* output format */
 	/* other states */
 	PlannedStmt *pstmt;			/* top of plan */
-- 
1.8.3.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] nested hstore patch

2013-11-13 Thread Hannu Krosing

On 11/13/2013 01:37 AM, Andrew Dunstan wrote:

 On 11/12/2013 01:35 PM, Teodor Sigaev wrote:
 Hi!

 Attatched patch adds nesting feature, types (string, boll and numeric
 values), arrays and scalar to hstore type.

 All new features are described in PGConf.EU talk
 http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2013.pdf
 (since PGCon some features was added).

 Patch includes:
 1 implementaion SRF_RETURN_NEXT_NULL()
 2 contrib/hstore changes
 3 docs of new hstore module (many thanks to  David E. Wheeler
 david.whee...@pgexperts.com)

 In current state patch is in WIP status, for short period I plan to
 move support of binary nested structure to core to share binary
 representation for hstore and json types.




 Thanks, Teodor.

 As soon as we have that shared binary representation available, I will
 be working on adapting it to JSON.
As I remember from earlier discussions, current json has some
artefacts that some people want to preserve and which are incompatible
with hstore approach where you have actual object behind the serialisation.

I remember strong voices in support of *not* normalising json, so that
things like

{a:1,a:true, a:b, a:none}

would go through the system unaltered, for claimed standard usage of
json as
processing instructions. That is as source code which can possibly
converted
to JavaScript Object and not something that would come out of
serialising of
any existing JavaScript Object.

I suggest we add another type, maybe jsobj, which has input and output
as standard
 JSON but which is defined from the start to be equivalent of existing
object
and not preservable source code to such object.


Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-13 Thread Hannu Krosing

On 11/14/2013 12:09 AM, Merlin Moncure wrote:
 On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/13/2013 06:45 AM, Merlin Moncure wrote: I'm not so sure we should
 require hstore to do things like build
 Also, json_object is pretty weird to me, I'm not sure I see the
 advantage of a new serialization format, and I don't agree with the
 statement but it is the caller's reponsibility to ensure that keys
 are not repeated..
 This is pretty standard in the programming languages I know of which use
 JSON.

 I think the caller should have no such
 responsibility.  Keys should be able to repeated.
 Apparently your experience with using JSON in practice has been fairly
 different from mine; the projects I work on, the JSON is being
 constantly converted back and forth to hashes and dictionaries, which
 means that ordering is not preserved and keys have to be unique (or
 become unique within one conversion cycle).  I think, based on the
 language of the RFC and common practice, that it's completely valid for
 us to require unique keys within JSON-manipulation routines.
 Common practice?  The internet is littered with complaints about
 documents being spontaneously re-ordered and or de-duplicated in
 various stacks.  Other stacks provide mechanisms for explicit key
 order handling (see here: http://docs.python.org/2/library/json.html).
   Why do you think they did that?

 I use pg/JSON all over the place.  In several cases I have to create
 documents with ordered keys because the parser on the other side wants
 them that way -- this is not a hypothetical argument.  The current
 json serialization API handles that just fine and the hstore stuff
 coming down the pike will not.  
I guess we should not replace current JSON type with hstore based
one, but add something json-like based on nested hstore instead.

Maybe call it jsdoc or jdoc or jsobj or somesuch.

For some time I was also pretty perplexed by by some PostgreSQL JSON
type discussions where JSON was not really being a defined as
the the type constructed from its string representation, and even not
a string which results from serialising an existing javascript object,
but rather a source code, which can be parsed into a structured type.

So PostgreSQL json type is *not* a structured type like hstore is but
is really a string type with a few syntax checks.

Some of the json_* functions are then defined on top of this
json-source type which treat this source as if it were actual
structured type.

It is kind of defining an int-notation type, which acts like an integer
when added to another integer, but is required to  also keep its original
representation:

select '1+1'::int-notation + 2;
== 4
select '1+1'::int-notation
== 1+1

 I guess that's a done deal based on
 'performance'.  I'm clearly not the only one to have complained about
 this though.
I am pretty sure we can not move to internal object representation and
preserve the current 'json source behaviour.

this is why I recommend not replacing json, but rather adding another
built-in
for real structured type.

then you can keep using current json for the earlier-quoted uses of
processing instructions and do real data manipulation on jsdoc/jsobj type.

Also most of the current json functions should also be moved to work on
jsobj instead with explicit cast from json to jsobj


Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First patch -- somewhat trivial feature

2013-11-13 Thread Stephen Frost

* Robert Berry (berrydigi...@gmail.com) wrote:
 This is my first attempt at writing a patch, so it's pretty simple.

Neat!

  seq_page: 1.00, rnd_page: 4.00, cpu_tup: 0.01,
 cpu_ind: 0.005000, cpu_op: 0.002500
  amenabled: 111
 
 The bit vector is enable variables in the order listed in cost.h,
 though mainly provides a high level view on whether or not any strategies
 are disabled.

While I like the general idea, I have to admit that I don't particularly
like the format and I'm not sure why it makes sense to have this as part
of 'explain'?  Why not do a 'show all;' ahead of the explain?

 I recognize that this is kind of a frivolous feature which may not be worth
 any potential maintenance burden, so submitted for what it's worth as an
 initial effort.

A bit more understanding about why you find it particularly useful (your
specific use-case for it) would be helpful.  Also, may I suggest that
you look at http://commitfest.postgresql.org as you'll be asked to post
any patches you wish to submit there for review and consideration.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] First patch -- somewhat trivial feature

2013-11-13 Thread David Johnston

Stephen Frost wrote
 * Robert Berry (

 berrydigital@

 ) wrote:
 This is my first attempt at writing a patch, so it's pretty simple.
 
 Neat!
 
  seq_page: 1.00, rnd_page: 4.00, cpu_tup: 0.01,
 cpu_ind: 0.005000, cpu_op: 0.002500
  amenabled: 111
 
 The bit vector is enable variables in the order listed in cost.h,
 though mainly provides a high level view on whether or not any strategies
 are disabled.
 
 While I like the general idea, I have to admit that I don't particularly
 like the format and I'm not sure why it makes sense to have this as part
 of 'explain'?  Why not do a 'show all;' ahead of the explain?

I kinda get the theory behind this but, WRT formatting, explain can output
multiple formats and any patch affecting said output should provide for
changing all of them.  Having each of the sample outputs in the post would
allow for comments from those who would not generally apply such patches.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/First-patch-somewhat-trivial-feature-tp5778245p5778250.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

 MULE is completely evil.

 It has N different encodings for the same
 character,

What's wrong with that? It aims that in the first place.

 not to mention no support code available.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

Tatsuo Ishii is...@postgresql.org writes:
 MULE is completely evil.
 It has N different encodings for the same character,

 What's wrong with that? It aims that in the first place.

It greatly complicates comparisons --- at least, if you'd like to preserve
the principle that strings that appear the same are equal.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First patch -- somewhat trivial feature

2013-11-13 Thread Robert Berry

Snowman --

Thanks for feedback. I imagined this feature could be useful in a couple
of contexts, though mainly in the sense of documenting query optimization
efforts.

In one case you may be tweaking cost parameters and having a built in
record of the parameters in the explain output can make that more reliable.

In a support context, it can help communicate all the details behind the
selection of a query plan, perhaps on IRC support.

David J. --

I originally conceived the notion for the feature while off in the deep end
thinking about automatic cost parameter selection.

It seems plausible that the ideal theoretical costs would be influenced by
server activity (io/cpu utilization) and various buffer states at a point
in time. This kind of optimization work is a little beyond my
understanding and capability so this was a first step in thinking about
this topic. It turned into a trivial user text output feature, but
implementing output formats that are more machine friendly still makes a
lot of sense.

Best Regards,
-Robert

On Wed, Nov 13, 2013 at 4:16 PM, David Johnston pol...@yahoo.com wrote:

Stephen Frost wrote
* Robert Berry (

berrydigital@

) wrote:
This is my first attempt at writing a patch, so it's pretty simple.

Neat!

seq_page: 1.00, rnd_page: 4.00, cpu_tup: 0.01,
cpu_ind: 0.005000, cpu_op: 0.002500
amenabled: 111

The bit vector is enable variables in the order listed in cost.h,
though mainly provides a high level view on whether or not any
strategies
are disabled.

While I like the general idea, I have to admit that I don't particularly
like the format and I'm not sure why it makes sense to have this as part
of 'explain'? Why not do a 'show all;' ahead of the explain?

I kinda get the theory behind this but, WRT formatting, explain can output
multiple formats and any patch affecting said output should provide for
changing all of them. Having each of the sample outputs in the post would
allow for comments from those who would not generally apply such patches.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/First-patch-somewhat-trivial-feature-tp5778245p5778250.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] nested hstore patch

2013-11-13 Thread David E. Wheeler

On Nov 13, 2013, at 3:59 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

 I remember strong voices in support of *not* normalising json, so that
 things like
 
 {a:1,a:true, a:b, a:none}
 
 would go through the system unaltered, for claimed standard usage of
 json as
 processing instructions. That is as source code which can possibly
 converted
 to JavaScript Object and not something that would come out of
 serialising of
 any existing JavaScript Object.

My recollection from PGCon was that there was consensus to normalize on the way 
in -- or at least, if we switched to a binary representation as proposed by 
Oleg  Teodor, it was not worth the hassle to try to keep it.

 I suggest we add another type, maybe jsobj, which has input and output
 as standard
 JSON but which is defined from the start to be equivalent of existing
 object
 and not preservable source code to such object.

-1 Let's try to keep this simple. See also VARCHAR and VARCHAR2 on Oracle.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-13 Thread David E. Wheeler

On Nov 13, 2013, at 2:41 PM, Gavin Flower gavinflo...@archidevsys.co.nz wrote:

 Would be possible to have a boolean, such as 'strict' - so that unique  
 ordered was only imposed when strict was TRUE? Alternately, separate 
 functions to allow the same choice?

It should be a pretty-printing function option, perhaps, but not core to the 
type itself, IMO.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality



On 11/13/2013 07:01 PM, Hannu Krosing wrote:


I guess we should not replace current JSON type with hstore based
one, but add something json-like based on nested hstore instead.



Well, that's two voices for that course of action.

Interesting that I don't think I heard a single voice for this either at 
pgCon or pgOpen, although I spent large amounts of time at both talking 
to people about Json, so I'd be interested to hear more voices.


It would actually simplify things in a way if we do that - we've been 
working on a way of doing this that wouldn't upset pg_upgrade. This 
would render that effort unnecessary.


However it will complicate things for users who will have to choose 
between the data types, and function authors who will possibly have to 
write versions of functions to work with both types.




Also most of the current json functions should also be moved to work on
jsobj instead with explicit cast from json to jsobj



Sure, we can overload them - that's probably the least of our worries.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

 Tatsuo Ishii is...@postgresql.org writes:
 MULE is completely evil.
 It has N different encodings for the same character,
 
 What's wrong with that? It aims that in the first place.
 
 It greatly complicates comparisons --- at least, if you'd like to preserve
 the principle that strings that appear the same are equal.

You don't need to consider it because there's no place in PostgreSQL
where a MULE encoded text consists of multiple encodings as far as I
know.

BTW, same characters are assigned different code points are pretty
common in many character sets (Unicode, for example).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality



On 11/13/2013 07:39 PM, David E. Wheeler wrote:

On Nov 13, 2013, at 2:41 PM, Gavin Flower gavinflo...@archidevsys.co.nz wrote:


Would be possible to have a boolean, such as 'strict' - so that unique  
ordered was only imposed when strict was TRUE? Alternately, separate functions to 
allow the same choice?

It should be a pretty-printing function option, perhaps, but not core to the 
type itself, IMO.



I don't in the least understand how it could be a pretty printing 
option. If we move to a binary rep using the hstore stuff order will be 
destroyed and not stored anywhere, and duplicate keys will be lost. Once 
that's done, how would a pretty print option restore the lost info?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-13 Thread David E. Wheeler

On Nov 13, 2013, at 4:45 PM, Andrew Dunstan and...@dunslane.net wrote:

 It should be a pretty-printing function option, perhaps, but not core to the 
 type itself, IMO.
 
 I don't in the least understand how it could be a pretty printing option. If 
 we move to a binary rep using the hstore stuff order will be destroyed and 
 not stored anywhere, and duplicate keys will be lost. Once that's done, how 
 would a pretty print option restore the lost info?

I meant ordering the keys, usually in lexicographic order. I agree that 
preserving order is untenable.

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_basebackup: progress report max once per second

2013-11-13 Thread Michael Paquier

Iy

On Thu, Nov 14, 2013 at 3:51 AM, Mika Eloranta m...@ohmu.fi wrote:
 Prevent excessive progress reporting that can grow to gigabytes
 of output with large databases.
It might be interesting to add this patch to the next commit fest
where you could get a formal review:
https://commitfest.postgresql.org/action/commitfest_view?id=20

Then just be sure to attach a patch file properly to your email such
as people can grab and test the patch easily.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] commit fest 2013-11 starts this Friday

Reminder: Commit fest 2013-11, the third commit fest (out of four) in
the PostgreSQL development cycle, will start this Friday, November 15.
If you have a patch that you would like to see considered for inclusion
into PostgreSQL 9.4, please register it in the commit fest application
before Friday:

https://commitfest.postgresql.org/action/commitfest_view?id=20

While you're there, submitting your patch, please also put your name
down to review at least one other patch.  If you don't, we might frown
upon your patch.

Note that the reverse is not true: Yo can put your name down to review a
patch if you have not submitted one.  We will not frown upon your review
in that case.

For first-time and forgetful reviewers, here are some instructions:

https://wiki.postgresql.org/wiki/Reviewing_a_Patch




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

Tatsuo Ishii is...@postgresql.org writes:
 BTW, same characters are assigned different code points are pretty
 common in many character sets (Unicode, for example).

This is widely considered a security bug; read section 10 in RFC 3629 (the
definition of UTF8), and search the CVE database a bit if you still doubt
it's a threat.  I'm going to push back very hard on any suggestion that
Postgres should build itself around a text representation with that kind
of weakness designed in.

regards, tom lane

[1] http://tools.ietf.org/html/rfc3629#section-10


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Report exit code from external recovery commands properly

2013-11-13 Thread Peter Geoghegan

On Wed, Nov 13, 2013 at 3:42 AM, Peter Eisentraut pete...@gmx.net wrote:
 When an external recovery command such as restore_command or
 archive_cleanup_command fails, it just reports return code 34567 or
 something, but we have facilities to do decode this properly, so use
 them.

I think this is a very good idea, but you should go a bit further:
document the special relationship restore_command has to special
return codes. Currently, the documentation says:

It is important that the archive command return zero exit status if
and only if it succeeded. Upon getting a zero result, PostgreSQL will
assume that the WAL segment file has been successfully archived, and
will remove or recycle it. However, a nonzero status tells PostgreSQL
that the file was not archived; it will try again periodically until
it succeeds.

Yes, this concerns archive_command (where return code values that are
non-zero *are* never distinguished), but nothing much is said about
the return code of restore_command specifically anywhere else, so it's
implied that it's exactly inverse to archive_command. In reality, some
special return codes have a significance to restore_command: they make
recovery abort, because they're taking as proxies for various failures
that it isn't sensible to continue recovery in the event of.

We're talking about the difference between recovery aborting, and
recovery having conceptually reached the end of the WAL stream, so
it's very surprising that this isn't documented currently.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get more from indices.

On Tue, 2013-11-12 at 17:48 +0900, Kyotaro HORIGUCHI wrote:
 Hello, this is the revised patch.

Since you're using git, please check your patch for trailing whitespace
with git diff --check.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using indices for UNION.

On Wed, 2013-11-13 at 17:25 +0900, Kyotaro HORIGUCHI wrote:

 Added explicit cast there and rebased to current master.
 Checked no new warning by this patch.
 make check succeeded at both $(src) and $(src)/src/test.

This patch also has whitespace errors detected by git diff --check.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stats for network traffic WIP

On Fri, 2013-11-08 at 10:01 -0500, Nigel Heron wrote:
 here's v4 of the patch. I added documentation and a new global view
 called pg_stat_socket (includes bytes_sent, bytes_received and
 stats_reset time)

Your patch needs to be rebased:

CONFLICT (content): Merge conflict in src/test/regress/expected/rules.out



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertions in PL/PgSQL

On Wed, 2013-10-09 at 18:57 +0200, Pavel Stehule wrote:
 here is a patch for RAISE WHEN clause

Your patch needs to be rebased.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] inherit support for foreign tables

2013-11-13 Thread Shigeru Hanada

Hi hackers,

I'd like to propose adding inheritance support for foriegn tables.
David Fetter mentioned this feature last July, but it seems stalled.

http://www.postgresql.org/message-id/20130719005601.ga5...@fetter.org

Supporting inheritance by foreign tables allows us to distribute query
to remote servers by using foreign tables as partition table of a
(perhaps ordinary) table.  For this purpose, I think that constraint
exclusion is necessary.

As result of extending Devid's patch for PoC, and AFAIS we need these changes:

1) Add INHERITS(rel, ...) clause to CREATE/ALTER FOREIGN TABLE
Apperantly we need to add new syntax to define parent table(s) of a
foreign table.  We have options about the position of INHERIT clause,
but I'd prefer before SERVER clause because having options specific to
foreign tables at the tail would be most extensible.

a) CREATE FOREIGN TABLE child (...) INHERITS(p1, p2) SERVER server;
b) CREATE FOREIGN TABLE child (...) SERVER server INHERITS(p1, p2);

2) Allow foreign tables to have CHECK constraints
Like NOT NULL, I think we don't need to enforce the check duroing
INSERT/UPDATE against foreign table.

3) Allow foreign table as a child node of Append
Currently prepunion.c assumes that children of Append have
RELKIND_RELATION as relkind always, so we need to set relkind of child
RTE explicitly.

Please see attached PoC patch.  I'll enhance implementation, tests and
document and submit the patch for the next CF.

Regards,
-- 
Shigeru HANADA
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0b31f55..2f2dc88 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -465,10 +465,25 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid 
ownerId)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
 errmsg(ON COMMIT can only be used on 
temporary tables)));
-   if (stmt-constraints != NIL  relkind == RELKIND_FOREIGN_TABLE)
-   ereport(ERROR,
-   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
-errmsg(constraints are not supported on 
foreign tables)));
+/*
+ * Shouldn't this have been checked in parser?
+ */
+   if (relkind == RELKIND_FOREIGN_TABLE)
+   {
+   ListCell   *lc;
+   foreach(lc, stmt-constraints)
+   {
+   NewConstraint  *nc = lfirst(lc);
+
+   if (nc-contype != CONSTR_CHECK 
+   nc-contype != CONSTR_DEFAULT 
+   nc-contype != CONSTR_NULL 
+   nc-contype != CONSTR_NOTNULL)
+   ereport(ERROR,
+   
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+errmsg(only check constraints 
are supported on foreign tables)));
+   }
+   }
 
/*
 * Look up the namespace in which we are supposed to create the 
relation,
@@ -1463,10 +1478,11 @@ MergeAttributes(List *schema, List *supers, char 
relpersistence,
 */
relation = heap_openrv(parent, ShareUpdateExclusiveLock);
 
-   if (relation-rd_rel-relkind != RELKIND_RELATION)
+   if (relation-rd_rel-relkind != RELKIND_RELATION 
+   relation-rd_rel-relkind != RELKIND_FOREIGN_TABLE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-errmsg(inherited relation \%s\ is 
not a table,
+errmsg(inherited relation \%s\ is 
not a table or foreign table,
parent-relname)));
/* Permanent rels cannot inherit from temporary ones */
if (relpersistence != RELPERSISTENCE_TEMP 
@@ -3043,7 +3059,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
pass = AT_PASS_ADD_INDEX;
break;
case AT_AddConstraint:  /* ADD CONSTRAINT */
-   ATSimplePermissions(rel, ATT_TABLE);
+   ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE);
/* Recursion occurs during execution phase */
/* No command-specific prep needed except saving 
recurse flag */
if (recurse)
@@ -3057,7 +3073,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
pass = AT_PASS_ADD_CONSTR;
break;
case AT_DropConstraint: /* DROP CONSTRAINT */
-   ATSimplePermissions(rel, ATT_TABLE);
+   ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE);
/* Recursion

Re: [HACKERS] inherit support for foreign tables

Shigeru Hanada shigeru.han...@gmail.com writes:
 I'd like to propose adding inheritance support for foriegn tables.
 David Fetter mentioned this feature last July, but it seems stalled.
   http://www.postgresql.org/message-id/20130719005601.ga5...@fetter.org

The discussion there pointed out that not enough consideration had been
given to interactions with other commands.  I'm not really satisfied
with your analysis here.  In particular:

 2) Allow foreign tables to have CHECK constraints
 Like NOT NULL, I think we don't need to enforce the check duroing
 INSERT/UPDATE against foreign table.

Really?  It's one thing to say that somebody who adds a CHECK constraint
to a foreign table is responsible to make sure that the foreign data will
satisfy the constraint.  It feels like a different thing to say that ALTER
TABLE ADD CONSTRAINT applied to a parent table will silently assume that
some child table that happens to be foreign doesn't need any enforcement.

Perhaps more to the point, inheritance trees are the main place where the
planner depends on the assumption that CHECK constraints represent
reality.  Are we really prepared to say that it's the user's fault if the
planner generates an incorrect plan on the strength of a CHECK constraint
that's not actually satisfied by the foreign data?  If so, that had better
be documented by this patch.  But for a project that refuses to let people
create a local CHECK or FOREIGN KEY constraint without mechanically
checking it, it seems pretty darn weird to be so laissez-faire about
constraints on foreign data.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] tcp_keepalives_idle