Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-12-09 Thread Michael Paquier
On Tue, Dec 9, 2014 at 4:44 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Tue, Dec 9, 2014 at 10:10 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Tue, Dec 2, 2014 at 3:42 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Adding on top of that a couple of things cleaned up, like docs and
 typos, and I got the patch attached. Let's have a committer have a
 look a it now, I am marking that as Ready for Committer.
 For the archives, this has been committed as fe263d1. Thanks Simon for
 looking and the final push. And sorry that I didn't spot the issue
 with tap tests when reviewing, check-world passed but my dev VM missed
 necessary perl packages.
 While re-looking at that. I just found that when selecting the
 relations that are reindexed for a schema we ignore materialized view
 as the key scan is only done using 'r' as relkind. The patch attached
 fixes that.
Here is an updated patch doing as well that:
- Regression test checking if user has permissions on schema was broken
- Silent NOTICE messages of REINDEX by having client_min_messages set
to WARINING (thoughts about having a plpgsql function doing
consistency checks of relfilenode before and after reindex?)
-- 
Michael
From 402afad6c124d2b74a5a82e36e017d2dedb0186d Mon Sep 17 00:00:00 2001
From: Michael Paquier michael@otacoo.com
Date: Tue, 9 Dec 2014 16:40:39 +0900
Subject: [PATCH] Fix a couple of bugs in REINDEX SCHEMA

The following issues are fixed:
- The key scan used was using a filter on relation relkind, but that's not
actually necessary as a filter is applied when building the list of OIDs
reindexed.
- Regression test checking permission of reindexed schema was broken
- Upgrade client_min_messages to 'warning' per complaints from jaragundi
and leech as the table reindex ordering is not entirely guaranteed (we
may as well here use a plpgsql function that does check if relfilenode
has been changed for the reindexed relations.
---
 src/backend/commands/indexcmds.c   |  8 ++--
 src/test/regress/expected/create_index.out | 21 +
 src/test/regress/sql/create_index.sql  | 10 ++
 3 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index a3e8a15..9b07216 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1867,16 +1867,12 @@ ReindexObject(const char *objectName, ReindexObjectType objectKind)
 	 */
 	if (objectKind == REINDEX_OBJECT_SCHEMA)
 	{
-		scan_keys = palloc(sizeof(ScanKeyData) * 2);
+		scan_keys = palloc(sizeof(ScanKeyData));
 		ScanKeyInit(scan_keys[0],
 	Anum_pg_class_relnamespace,
 	BTEqualStrategyNumber, F_OIDEQ,
 	ObjectIdGetDatum(objectOid));
-		ScanKeyInit(scan_keys[1],
-	Anum_pg_class_relkind,
-	BTEqualStrategyNumber, F_CHAREQ,
-	'r');
-		num_keys = 2;
+		num_keys = 1;
 	}
 	else
 		num_keys = 0;
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index ebac939..418b0ec 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2834,31 +2834,28 @@ explain (costs off)
 --
 -- REINDEX SCHEMA
 --
+SET client_min_messages TO 'warning';
 REINDEX SCHEMA schema_to_reindex; -- failure, schema does not exist
 ERROR:  schema schema_to_reindex does not exist
 CREATE SCHEMA schema_to_reindex;
-CREATE TABLE schema_to_reindex.table1(col1 SERIAL PRIMARY KEY);
-CREATE TABLE schema_to_reindex.table2(col1 SERIAL PRIMARY KEY, col2 VARCHAR(100) NOT NULL);
-CREATE INDEX ON schema_to_reindex.table2(col2);
+CREATE TABLE schema_to_reindex.table(col1 SERIAL PRIMARY KEY);
+CREATE MATERIALIZED VIEW schema_to_reindex.matview AS SELECT col1 FROM schema_to_reindex.table;
+CREATE INDEX ON schema_to_reindex.matview(col1);
 REINDEX SCHEMA schema_to_reindex;
-NOTICE:  table schema_to_reindex.table1 was reindexed
-NOTICE:  table schema_to_reindex.table2 was reindexed
 BEGIN;
 REINDEX SCHEMA schema_to_reindex; -- failure, cannot run in a transaction
 ERROR:  REINDEX SCHEMA cannot run inside a transaction block
 END;
+RESET client_min_messages;
 -- Failure for unauthorized user
-CREATE ROLE reindexuser login;
+CREATE ROLE user_reindex login;
 SET SESSION ROLE user_reindex;
-ERROR:  role user_reindex does not exist
 REINDEX SCHEMA schema_to_reindex;
-NOTICE:  table schema_to_reindex.table1 was reindexed
-NOTICE:  table schema_to_reindex.table2 was reindexed
+ERROR:  must be owner of schema schema_to_reindex
 -- Clean up
 RESET ROLE;
 DROP ROLE user_reindex;
-ERROR:  role user_reindex does not exist
 DROP SCHEMA schema_to_reindex CASCADE;
 NOTICE:  drop cascades to 2 other objects
-DETAIL:  drop cascades to table schema_to_reindex.table1
-drop cascades to table schema_to_reindex.table2
+DETAIL:  drop cascades to table schema_to_reindex.table
+drop cascades to materialized view schema_to_reindex.matview
diff --git a/src/test/regress/sql/create_index.sql 

[HACKERS] committsdesc.c not ignored in contrib/pg_xlogdump

2014-12-09 Thread Michael Paquier
Hi all,

As mentioned in $subject, we are missing an entry in pg_xlogdump's .gitignore.
Patch attached.
Regards,
-- 
Michael
diff --git a/contrib/pg_xlogdump/.gitignore b/contrib/pg_xlogdump/.gitignore
index 37974e0..16cf749 100644
--- a/contrib/pg_xlogdump/.gitignore
+++ b/contrib/pg_xlogdump/.gitignore
@@ -2,6 +2,7 @@
 # Source files copied from src/backend/access/
 /brindesc.c
 /clogdesc.c
+/committsdesc.c
 /dbasedesc.c
 /gindesc.c
 /gistdesc.c

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.5] Custom Plan API

2014-12-09 Thread Simon Riggs
On 7 December 2014 at 08:21, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

 Please wait for a few days. The ctidscan module is not adjusted for the
 latest interface yet.

I am in many ways a patient man. At this point it is 12 days since my
request for a working example.

Feedback I am receiving is that the API is unusable. That could be
because it is impenetrable, or because it is unusable. I'm not sure it
matters which.

We need a working example to ensure that the API meets the needs of a
wide section of users and if it does not, to give other users a chance
to request changes to the API so that it becomes usable. The window
for such feedback is approaching zero very quickly now and we need
action.

Thanks

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-09 Thread Amit Langote


On Tue, Dec 9, 2014 at 12:59 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Dec 9, 2014 at 8:08 AM, Amit Langote langote_amit...@lab.ntt.co.jp
 wrote:
  From: Robert Haas [mailto:robertmh...@gmail.com]
  On Sat, Dec 6, 2014 at 2:59 AM, Amit Kapila amit.kapil...@gmail.com
  wrote:
   I guess you could list or hash partition on multiple columns, too.
  
   How would you distinguish values in list partition for multiple
   columns? I mean for range partition, we are sure there will
   be either one value for each column, but for list it could
   be multiple and not fixed for each partition, so I think it will not
   be easy to support the multicolumn partition key for list
   partitions.
 
  I don't understand.  If you want to range partition on columns (a, b),
  you say that, say, tuples with (a, b) values less than (100, 200) go
  here and the rest go elsewhere.  For list partitioning, you say that,
  say, tuples with (a, b) values of EXACTLY (100, 200) go here and the
  rest go elsewhere.  I'm not sure how useful that is but it's not
  illogical.
 

 In case of list partitioning, 100 and 200 would respectively be one of the
 values in lists of allowed values for a and b. I thought his concern is
 whether this list of values for each column in partkey is as convenient to
 store and manipulate as range partvalues.


 Yeah and also how would user specify the values, as an example
 assume that table is partitioned on monthly_salary, so partition
 definition would look:

 PARTITION BY LIST(monthly_salary)
 (
 PARTITION salary_less_than_thousand VALUES(300, 900),
 PARTITION salary_less_than_two_thousand VALUES (500,1000,1500),
 ...
 )

 Now if user wants to define multi-column Partition based on
 monthly_salary and annual_salary, how do we want him to
 specify the values.  Basically how to distinguish which values
 belong to first column key and which one's belong to second
 column key.


Perhaps you are talking about syntactic difficulties that I totally missed in 
my other reply to this mail?

Can we represent the same data by rather using a subpartitioning scheme? ISTM, 
semantics would remain the same.

... PARTITION BY (monthly_salary) SUBPARTITION BY (annual_salary)?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-12-09 Thread Amit Kapila
On Tue, Nov 11, 2014 at 3:00 PM, Andres Freund and...@2ndquadrant.com
wrote:
 On 2014-11-11 09:29:22 +, Thom Brown wrote:
  On 26 September 2014 12:40, Amit Kapila amit.kapil...@gmail.com wrote:
 
On Tue, Sep 23, 2014 at 10:31 AM, Robert Haas robertmh...@gmail.com
   wrote:
   
But this gets at another point: the way we're benchmarking this
right
now, we're really conflating the effects of three different things:
   
1. Changing the locking regimen around the freelist and clocksweep.
2. Adding a bgreclaimer process.
3. Raising the number of buffer locking partitions.
  
   First of all thanks for committing part-1 of this changes and it
   seems you are planing to commit part-3 based on results of tests
   which Andres is planing to do and for remaining part (part-2), today
  
 
  Were parts 2 and 3 committed in the end?

 3 was committed. 2 wasn't because it's not yet clear whether how
 beneficial it is generally.


As shown upthread that this patch (as it stands today) is dependent on
another patch (wait free LW_SHARED acquisition) which is still not
committed and still some more work is needed to clearly show the
gain by this patch, so I have marked it as Returned with Feedback.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-12-09 Thread Simon Riggs
On 9 December 2014 at 17:17, Michael Paquier michael.paqu...@gmail.com wrote:

 While re-looking at that. I just found that when selecting the
 relations that are reindexed for a schema we ignore materialized view
 as the key scan is only done using 'r' as relkind. The patch attached
 fixes that.
 Here is an updated patch doing as well that:
 - Regression test checking if user has permissions on schema was broken
 - Silent NOTICE messages of REINDEX by having client_min_messages set
 to WARINING (thoughts about having a plpgsql function doing
 consistency checks of relfilenode before and after reindex?)

ISTM that REINDEX is not consistent with VACUUM, ANALYZE or CLUSTER in
the way it issues NOTICE messages.

I'm inclined to simply remove the NOTICE messages, except when a
REINDEX ... VERBOSE is requested.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-12-09 Thread Sawada Masahiko
On Tuesday, December 9, 2014, Simon Riggs si...@2ndquadrant.com wrote:

 On 9 December 2014 at 17:17, Michael Paquier michael.paqu...@gmail.com
 javascript:; wrote:

  While re-looking at that. I just found that when selecting the
  relations that are reindexed for a schema we ignore materialized view
  as the key scan is only done using 'r' as relkind. The patch attached
  fixes that.
  Here is an updated patch doing as well that:
  - Regression test checking if user has permissions on schema was broken
  - Silent NOTICE messages of REINDEX by having client_min_messages set
  to WARINING (thoughts about having a plpgsql function doing
  consistency checks of relfilenode before and after reindex?)

 ISTM that REINDEX is not consistent with VACUUM, ANALYZE or CLUSTER in
 the way it issues NOTICE messages.

 I'm inclined to simply remove the NOTICE messages, except when a
 REINDEX ... VERBOSE is requested.


+1 to remove the NOTICE messages except when specifying VERBOSE.

It would output a lot of messages if there are many table in schema.
If nobody objects to it, I will work on this.

Regards,

--
Sawada Masahiko


-- 
Regards,

---
Sawada Masahiko


Re: [HACKERS] [v9.5] Custom Plan API

2014-12-09 Thread Kouhei Kaigai
Simon,

The sample code is here:
  https://github.com/kaigai/ctidscan

The code itself and regression tests shows how does it work and
interact with the core backend.

However, its source code comments are not updated and SGML document
is not ready yet, because of my schedule in earlier half of December.
I try to add the above stuff for a patch of contrib module, but will
take a few more days.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: Simon Riggs [mailto:si...@2ndquadrant.com]
 Sent: Tuesday, December 09, 2014 12:24 AM
 To: Kaigai Kouhei(海外 浩平)
 Cc: Robert Haas; Thom Brown; Kohei KaiGai; Tom Lane; Alvaro Herrera; Shigeru
 Hanada; Stephen Frost; Andres Freund; PgHacker; Jim Mlodgenski; Peter
 Eisentraut
 Subject: Re: [HACKERS] [v9.5] Custom Plan API
 
 On 7 December 2014 at 08:21, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 
  Please wait for a few days. The ctidscan module is not adjusted for
  the latest interface yet.
 
 I am in many ways a patient man. At this point it is 12 days since my request
 for a working example.
 
 Feedback I am receiving is that the API is unusable. That could be because
 it is impenetrable, or because it is unusable. I'm not sure it matters which.
 
 We need a working example to ensure that the API meets the needs of a wide
 section of users and if it does not, to give other users a chance to request
 changes to the API so that it becomes usable. The window for such feedback
 is approaching zero very quickly now and we need action.
 
 Thanks
 
 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-12-09 Thread Michael Paquier
On Tue, Dec 9, 2014 at 6:00 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 9 December 2014 at 17:17, Michael Paquier michael.paqu...@gmail.com 
 wrote:

 While re-looking at that. I just found that when selecting the
 relations that are reindexed for a schema we ignore materialized view
 as the key scan is only done using 'r' as relkind. The patch attached
 fixes that.
 Here is an updated patch doing as well that:
 - Regression test checking if user has permissions on schema was broken
 - Silent NOTICE messages of REINDEX by having client_min_messages set
 to WARINING (thoughts about having a plpgsql function doing
 consistency checks of relfilenode before and after reindex?)

 ISTM that REINDEX is not consistent with VACUUM, ANALYZE or CLUSTER in
 the way it issues NOTICE messages.

 I'm inclined to simply remove the NOTICE messages, except when a
 REINDEX ... VERBOSE is requested.
OK. Perhaps that's not worth mentioning in the release notes, but some
users may be used to the old behavior. What about the other issues
(regression test for permissions incorrect and matviews)?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Casting issues with domains

2014-12-09 Thread Thomas Reiss
Le 08/12/2014 16:18, Tom Lane a écrit :
 Thomas Reiss thomas.re...@dalibo.com writes:
 postgres=# explain select * from test2 where a='toto';
 QUERY PLAN
 --
  Seq Scan on test1  (cost=0.00..1693.00 rows=500 width=5)
Filter: (((a)::tstdom)::text = 'toto'::text)
 (2 lignes)
 
 As you can see, a is casted to tstdom then again to text. This casts
 prevents the optimizer to choose an index scan to retrieve the data. The
 casts are however strictly equivalent and should be not prevent the
 optimizer to use indexes.
 
 No, they are not equivalent.  The optimizer can't simply drop the
 cast-to-domain, because that cast might result in a runtime error due
 to a domain CHECK constraint violation.  (This is true even if no such
 constraint exists at planning time, unfortunately.  If we had a
 mechanism to force replanning at ALTER DOMAIN ADD CONSTRAINT, maybe the
 no-constraints case could be handled better, but we don't; and adding
 one would also imply adding more locks around domain usage, so it's not
 all that attractive to do it.)
 
 The short answer is that SQL domains are not zero-cost type aliases.
 Perhaps there would be value in having a feature that *is* a a zero-cost
 alias, but it wouldn't be a domain.

I agree regarding the feature for zero-cost aliases. It would ease
access on the catalog done via the information_schema for example.

Thanks for your answer. There's some room for improvement for sure, but
it not as easy as it seems.

Regards,
Thomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-12-09 Thread Michael Paquier
On Tue, Dec 9, 2014 at 6:43 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 OK. Perhaps that's not worth mentioning in the release notes, but some
 users may be used to the old behavior. What about the other issues
 (regression test for permissions incorrect and matviews)?
Here is in any case an updated patch to fix all those things rebased
on latest HEAD (ae4e688).
Thanks,
-- 
Michael
From 4cc862ed78b3537438e257aa24a5d0ee1479eb24 Mon Sep 17 00:00:00 2001
From: Michael Paquier michael@otacoo.com
Date: Tue, 9 Dec 2014 16:40:39 +0900
Subject: [PATCH] Fix two bugs in REINDEX SCHEMA: regression and matviews

The following issues are fixed:
- The key scan used was using a filter on relation relkind, but that's not
actually necessary as a filter is applied when building the list of OIDs
reindexed.
- Regression test checking permission of reindexed schema was broken
---
 src/backend/commands/indexcmds.c   |  8 ++--
 src/test/regress/expected/create_index.out | 15 +++
 src/test/regress/sql/create_index.sql  |  8 
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index cf4de72..6711a66 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1867,16 +1867,12 @@ ReindexObject(const char *objectName, ReindexObjectType objectKind)
 	 */
 	if (objectKind == REINDEX_OBJECT_SCHEMA)
 	{
-		scan_keys = palloc(sizeof(ScanKeyData) * 2);
+		scan_keys = palloc(sizeof(ScanKeyData));
 		ScanKeyInit(scan_keys[0],
 	Anum_pg_class_relnamespace,
 	BTEqualStrategyNumber, F_OIDEQ,
 	ObjectIdGetDatum(objectOid));
-		ScanKeyInit(scan_keys[1],
-	Anum_pg_class_relkind,
-	BTEqualStrategyNumber, F_CHAREQ,
-	'r');
-		num_keys = 2;
+		num_keys = 1;
 	}
 	else
 		num_keys = 0;
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index eba14e2..0d0a5ca 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2837,24 +2837,23 @@ explain (costs off)
 REINDEX SCHEMA schema_to_reindex; -- failure, schema does not exist
 ERROR:  schema schema_to_reindex does not exist
 CREATE SCHEMA schema_to_reindex;
-CREATE TABLE schema_to_reindex.table1(col1 SERIAL PRIMARY KEY);
-CREATE TABLE schema_to_reindex.table2(col1 SERIAL PRIMARY KEY, col2 VARCHAR(100) NOT NULL);
-CREATE INDEX ON schema_to_reindex.table2(col2);
+CREATE TABLE schema_to_reindex.table(col1 SERIAL PRIMARY KEY);
+CREATE MATERIALIZED VIEW schema_to_reindex.matview AS SELECT col1 FROM schema_to_reindex.table;
+CREATE INDEX ON schema_to_reindex.matview(col1);
 REINDEX SCHEMA schema_to_reindex;
 BEGIN;
 REINDEX SCHEMA schema_to_reindex; -- failure, cannot run in a transaction
 ERROR:  REINDEX SCHEMA cannot run inside a transaction block
 END;
 -- Failure for unauthorized user
-CREATE ROLE reindexuser login;
+CREATE ROLE user_reindex login;
 SET SESSION ROLE user_reindex;
-ERROR:  role user_reindex does not exist
 REINDEX SCHEMA schema_to_reindex;
+ERROR:  must be owner of schema schema_to_reindex
 -- Clean up
 RESET ROLE;
 DROP ROLE user_reindex;
-ERROR:  role user_reindex does not exist
 DROP SCHEMA schema_to_reindex CASCADE;
 NOTICE:  drop cascades to 2 other objects
-DETAIL:  drop cascades to table schema_to_reindex.table1
-drop cascades to table schema_to_reindex.table2
+DETAIL:  drop cascades to table schema_to_reindex.table
+drop cascades to materialized view schema_to_reindex.matview
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 1cd57da..5b6c9e6 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -970,16 +970,16 @@ explain (costs off)
 --
 REINDEX SCHEMA schema_to_reindex; -- failure, schema does not exist
 CREATE SCHEMA schema_to_reindex;
-CREATE TABLE schema_to_reindex.table1(col1 SERIAL PRIMARY KEY);
-CREATE TABLE schema_to_reindex.table2(col1 SERIAL PRIMARY KEY, col2 VARCHAR(100) NOT NULL);
-CREATE INDEX ON schema_to_reindex.table2(col2);
+CREATE TABLE schema_to_reindex.table(col1 SERIAL PRIMARY KEY);
+CREATE MATERIALIZED VIEW schema_to_reindex.matview AS SELECT col1 FROM schema_to_reindex.table;
+CREATE INDEX ON schema_to_reindex.matview(col1);
 REINDEX SCHEMA schema_to_reindex;
 BEGIN;
 REINDEX SCHEMA schema_to_reindex; -- failure, cannot run in a transaction
 END;
 
 -- Failure for unauthorized user
-CREATE ROLE reindexuser login;
+CREATE ROLE user_reindex login;
 SET SESSION ROLE user_reindex;
 REINDEX SCHEMA schema_to_reindex;
 
-- 
2.2.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] committsdesc.c not ignored in contrib/pg_xlogdump

2014-12-09 Thread Alvaro Herrera
Michael Paquier wrote:
 Hi all,
 
 As mentioned in $subject, we are missing an entry in pg_xlogdump's .gitignore.

Thanks, pushed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-09 Thread Alvaro Herrera
Josh Berkus wrote:

Hi,

 Pardon me for jumping into this late.  In general, I like Alvaro's
 approach.

Please don't call this Alvaro's approach as I'm not involved in this
anymore.  Amit Langote has taken ownership of it now.  While some
resemblance to what I originally proposed might remain, I haven't kept
track of how this has evolved and this might be a totally different
thing now.  Or not.

Anyway I just wanted to comment on a single point:

 6. Unique Index Problem
 Cannot create a unique index across multiple partitions, which prevents
 the partitioned table from being FK'd.
 Not Addressed
 (but could be addressed in the future)

I think it's unlikely that we will ever create a unique index that spans
all the partitions, actually.  Even if there are some wild ideas on how
to implement such a thing, the number of difficult issues that no one
knows how to attack seems too large.  I would perhaps be thinking in
allowing foreign keys to be defined on column sets that are prefixed by
partition keys; unique indexes must exist on all partitions on the same
columns including the partition keys.  (Perhaps make an extra exception
that if a partition allows a single value for the partition column, that
column need not be part of the unique index.)

 10. Scaling Problem
 Inheritance partitioning becomes prohibitively slow for the planner at
 somewhere between 100 and 500 partitions depending on various factors.
 No idea?

At least it was my intention to make the system scale to huge number of
partitions, but this requires some forward thinking (such as avoiding
loading the index list of all of them or evern opening all of them at
the planner stage) and I think would be defeated if we want to keep
all the generality of the inheritance-based approach.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-09 Thread Alvaro Herrera
Amit Kapila wrote:
 On Tue, Dec 9, 2014 at 1:42 AM, Robert Haas robertmh...@gmail.com wrote:
  On Mon, Dec 8, 2014 at 2:56 PM, Andres Freund and...@2ndquadrant.com
 wrote:
   I don't think that's mutually exclusive with the idea of
   partitions-as-tables.  I mean, you can add code to the ALTER TABLE
   path that says if (i_am_not_the_partitioning_root) ereport(ERROR, ...)
   wherever you want.
  
   That'll be a lot of places you'll need to touch. More fundamentally: Why
   should we name something a table that's not one?
 
  Well, I'm not convinced that it isn't one.  And adding a new relkind
  will involve a bunch of code churn, too.  But I don't much care to
  pre-litigate this: when someone has got a patch, we can either agree
  that the approach is OK or argue that it is problematic because X.  I
  think we need to hammer down the design in broad strokes first, and
  I'm not sure we're totally there yet.
 
 That's right, I think at this point defining the top level behaviour/design
 is very important to proceed, we can decide about the better
 implementation approach afterwards (may be once initial patch is ready,
 because it might not be a major work to do it either way).  So here's where
 we are on this point till now as per my understanding, I think that direct
 operations should be prohibited on partitions, you think that they should be
 allowed and Andres think that it might be better to allow direct operations
 on partitions for Read.

FWIW in my original proposal I was rejecting some things that after
further consideration turn out to be possible to allow; for instance
directly referencing individual partitions in COPY.  We could allow
something like

COPY lineitems PARTITION FOR VALUE '2000-01-01' TO STDOUT
or maybe
COPY PARTITION FOR VALUE '2000-01-01' ON TABLE lineitems TO STDOUT

and this would emit the whole partition for year 2000 of table
lineitems, and only that (the value is just computed on the fly to fit
the partitioning constraints for that individual partition).  Then
pg_dump would be able to dump each and every partition separately.

In a similar way we could have COPY FROM allow input into individual
partitions so that such a dump can be restored in parallel for each
partition.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small TRUNCATE glitch

2014-12-09 Thread Alex Shulgin
Bruce Momjian br...@momjian.us writes:

 Added to TODO:

 o Clear table counters on TRUNCATE

   http://archives.postgresql.org/pgsql-hackers/2008-04/msg00169.php

Hello,

Attached is a WIP patch for this TODO.

From 97665ef1ca7d1847e90d4dfab38562135f01fb2b Mon Sep 17 00:00:00 2001
From: Alex Shulgin a...@commandprompt.com
Date: Tue, 9 Dec 2014 16:35:14 +0300
Subject: [PATCH] WIP: track TRUNCATEs in pgstat transaction stats.

The n_live_tup and n_dead_tup counters need to be set to 0 after a
TRUNCATE on the relation.  We can't issue a special message to the stats
collector because the xact might be later aborted, so we track the fact
that the relation was truncated during the xact (and reset this xact's
insert/update/delete counters).  When xact is committed, we use the
`truncated' flag to reset the n_live_tup and n_dead_tup counters.
---
 src/backend/commands/tablecmds.c |  2 ++
 src/backend/postmaster/pgstat.c  | 70 
 src/include/pgstat.h |  3 ++
 3 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
new file mode 100644
index 1e737a0..192d033
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
*** ExecuteTruncate(TruncateStmt *stmt)
*** 1224,1229 
--- 1224,1231 
  			 */
  			reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST);
  		}
+ 
+ 		pgstat_count_heap_truncate(rel);
  	}
  
  	/*
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
new file mode 100644
index c7f41a5..7ff66b5
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
*** typedef struct TwoPhasePgStatRecord
*** 200,208 
  	PgStat_Counter tuples_updated;		/* tuples updated in xact */
  	PgStat_Counter tuples_deleted;		/* tuples deleted in xact */
  	Oid			t_id;			/* table's OID */
! 	bool		t_shared;		/* is it a shared catalog? */
  } TwoPhasePgStatRecord;
  
  /*
   * Info about current snapshot of stats file
   */
--- 200,211 
  	PgStat_Counter tuples_updated;		/* tuples updated in xact */
  	PgStat_Counter tuples_deleted;		/* tuples deleted in xact */
  	Oid			t_id;			/* table's OID */
! 	char		t_flags;		/* see TWOPHASE_PGSTAT_RECORD_*_FLAGs */
  } TwoPhasePgStatRecord;
  
+ #define TWOPHASE_PGSTAT_RECORD_SHARED_FLAG	0x01	/* is it a shared catalog? */
+ #define TWOPHASE_PGSTAT_RECORD_TRUNC_FLAG	0x02	/* was the relation truncated? */
+ 
  /*
   * Info about current snapshot of stats file
   */
*** pgstat_count_heap_delete(Relation rel)
*** 1864,1869 
--- 1867,1896 
  }
  
  /*
+  * pgstat_count_heap_truncate - update tuple counters due to truncate
+  */
+ void
+ pgstat_count_heap_truncate(Relation rel)
+ {
+ 	PgStat_TableStatus *pgstat_info = rel-pgstat_info;
+ 
+ 	if (pgstat_info != NULL)
+ 	{
+ 		/* We have to log the effect at the proper transactional level */
+ 		int			nest_level = GetCurrentTransactionNestLevel();
+ 
+ 		if (pgstat_info-trans == NULL ||
+ 			pgstat_info-trans-nest_level != nest_level)
+ 			add_tabstat_xact_level(pgstat_info, nest_level);
+ 
+ 		pgstat_info-trans-tuples_inserted = 0;
+ 		pgstat_info-trans-tuples_updated = 0;
+ 		pgstat_info-trans-tuples_deleted = 0;
+ 		pgstat_info-trans-truncated = true;
+ 	}
+ }
+ 
+ /*
   * pgstat_update_heap_dead_tuples - update dead-tuples count
   *
   * The semantics of this are that we are reporting the nontransactional
*** AtEOXact_PgStat(bool isCommit)
*** 1927,1932 
--- 1954,1961 
  			tabstat-t_counts.t_tuples_deleted += trans-tuples_deleted;
  			if (isCommit)
  			{
+ tabstat-t_counts.t_truncated = trans-truncated;
+ 
  /* insert adds a live tuple, delete removes one */
  tabstat-t_counts.t_delta_live_tuples +=
  	trans-tuples_inserted - trans-tuples_deleted;
*** AtEOSubXact_PgStat(bool isCommit, int ne
*** 1991,1999 
  			{
  if (trans-upper  trans-upper-nest_level == nestDepth - 1)
  {
! 	trans-upper-tuples_inserted += trans-tuples_inserted;
! 	trans-upper-tuples_updated += trans-tuples_updated;
! 	trans-upper-tuples_deleted += trans-tuples_deleted;
  	tabstat-trans = trans-upper;
  	pfree(trans);
  }
--- 2020,2039 
  			{
  if (trans-upper  trans-upper-nest_level == nestDepth - 1)
  {
! 	if (trans-truncated)
! 	{
! 		trans-upper-truncated = true;
! 		/* replace upper xact stats with ours */
! 		trans-upper-tuples_inserted = trans-tuples_inserted;
! 		trans-upper-tuples_updated = trans-tuples_updated;
! 		trans-upper-tuples_deleted = trans-tuples_deleted;
! 	}
! 	else
! 	{
! 		trans-upper-tuples_inserted += trans-tuples_inserted;
! 		trans-upper-tuples_updated += trans-tuples_updated;
! 		trans-upper-tuples_deleted += trans-tuples_deleted;
! 	}
  	tabstat-trans = trans-upper;
  	pfree(trans);
  }
*** 

Re: [HACKERS] No documentation on record *= record?

2014-12-09 Thread Kevin Grittner
Jim Nasby jim.na...@bluetreble.com wrote:

 There doesn't seem to be documentation on *= (or search isn't
 finding it). Is this intentional?

http://www.postgresql.org/docs/9.4/static/functions-comparisons.html#COMPOSITE-TYPE-COMPARISON

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2014-12-09 Thread Robert Haas
On Tue, Dec 9, 2014 at 12:46 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 I agree with this.  For a first version, I think it's OK to start a
 worker up for a particular sequential scan and have it help with that
 sequential scan until the scan is completed, and then exit.  It should
 not, as the present version of the patch does, assign a fixed block
 range to each worker; instead, workers should allocate a block or
 chunk of blocks to work on until no blocks remain.  That way, even if
 every worker but one gets stuck, the rest of the scan can still
 finish.

 I will check on this point and see if it is feasible to do something on
 those lines, basically currently at Executor initialization phase, we
 set the scan limits and then during Executor Run phase use
 heap_getnext to fetch the tuples accordingly, but doing it dynamically
 means at ExecutorRun phase we need to reset the scan limit for
 which page/pages to scan, still I have to check if there is any problem
 with such an idea.  Do you any different idea in mind?

Hmm.  Well, it looks like there are basically two choices: you can
either (as you propose) deal with this above the level of the
heap_beginscan/heap_getnext API by scanning one or a few pages at a
time and then resetting the scan to a new starting page via
heap_setscanlimits; or alternatively, you can add a callback to
HeapScanDescData that, if non-NULL, will be invoked to get the next
block number to scan.  I'm not entirely sure which is better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user set local_preload_libraries.

2014-12-09 Thread Robert Haas
On Mon, Dec 8, 2014 at 9:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Barring someone committing to spend the time to improve that situation
 (time that would be poorly invested IMO), I don't think that we want to
 open up ignore_system_indexes as USERSET, or do anything else to encourage
 its use.

 If we're intent on removing PGC_BACKEND then I'd be okay with
 reclassifying ignore_system_indexes as SUSET; but I'm not exactly
 convinced that we should be trying to get rid of PGC_BACKEND.

Well, if you want to discourage its use, I think there's an argument
that marking it as SUSET would be more restrictive than what we have
at present, since it would altogether prohibit non-superuser use.

I'm not wedded to the idea of getting rid of PGC_BACKEND, but I do
like it.  Peter's survey of the landscape seems to show that there's
very little left in that category and the stuff that is there is
somewhat uninspiring.  And simplifying things is always nice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-12-09 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 ISTM that REINDEX is not consistent with VACUUM, ANALYZE or CLUSTER in
 the way it issues NOTICE messages.

 I'm inclined to simply remove the NOTICE messages, except when a
 REINDEX ... VERBOSE is requested.

My recollection is that those other commands do issue messages always,
but at some DEBUGn log level when not VERBOSE.  Ideally REINDEX would
adopt that same behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-12-09 Thread Sawada Masahiko
On Tue, Dec 9, 2014 at 11:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 ISTM that REINDEX is not consistent with VACUUM, ANALYZE or CLUSTER in
 the way it issues NOTICE messages.

 I'm inclined to simply remove the NOTICE messages, except when a
 REINDEX ... VERBOSE is requested.

 My recollection is that those other commands do issue messages always,
 but at some DEBUGn log level when not VERBOSE.  Ideally REINDEX would
 adopt that same behavior.


So it seems to that REINDEX command issues messages as INFO level when
that is  specified VERBOSE option.
If not specified, it issue message as DEBUG2.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-12-09 Thread Fabrízio de Royes Mello
On Tue, Dec 9, 2014 at 10:21 AM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Tue, Dec 9, 2014 at 6:43 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  OK. Perhaps that's not worth mentioning in the release notes, but some
  users may be used to the old behavior. What about the other issues
  (regression test for permissions incorrect and matviews)?
 Here is in any case an updated patch to fix all those things rebased
 on latest HEAD (ae4e688).


The patch is fine:
- No compiler warnings
- Added properly regressions tests and run ok

There are no changes in the docs. Maybe we can mention matviews on REINDEX
SCHEMA docs, what do you think?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


[HACKERS] logical column ordering

2014-12-09 Thread Alvaro Herrera
So I've been updating my very old patch to allow logical and physical
column reordering.  Here's a WIP first cut for examination.  There are
plenty of rough edges here; most importantly there is no UI at all for
column reordering other than direct UPDATEs of pg_attribute, which most
likely falls afoul of cache invalidation problems.  For now I'm focusing
on correct processing when columns are moved logically; I haven't yet
started to see how to move columns physically, but I think that part is
much more localized than the logical one.

Just as a reminder, this effort is an implementation of ideas that have
been discussed previously; in particular, see these threads:

http://www.postgresql.org/message-id/20414.1166719...@sss.pgh.pa.us (2006)
http://www.postgresql.org/message-id/6843.1172126...@sss.pgh.pa.us (2007)
http://www.postgresql.org/message-id/23035.1227659...@sss.pgh.pa.us (2008)

To recap, this is based on the idea of having three numbers for each
attribute rather than a single attnum; the first of these is attnum (a
number that uniquely identifies an attribute since its inception and may
or may not have any relationship to its storage position and the place
it expands to through user interaction).  The second is attphysnum,
which indicates where it is stored in the physical structure.  The third
is attlognum, which indicates where it expands in *, where must its
values be placed in COPY or VALUES lists, etc --- the logical position
as the user sees it.

The first thing where this matters is tuple descriptor expansion in
parse analysis; at this stage, things such as * (in select *) are
turned into a target list, which must be sorted according to attlognum.
To achieve this I added a new routine to tupledescs,
TupleDescGetSortedAttrs() which computes a new Attribute array and
caches it in the TupleDesc for later uses; this array points to the
same elements in the normal attribute list but is order by attlognum.

Additionally there are a number of places that iterate on such target
lists and use the iterator as the attribute number; those were modified
to have a separate attribute number as attnum within the loop.

Another place that needs tweaking is heapam.c, which must construct a
physical tuple from Datum/nulls arrays (heap_form_tuple).  In some cases
the input arrays are sorted in logical column order.  I have opted to
add a flag that indicates whether the array is in logical order; if it
is the routines compute the correct physical order.  (Actually as I
mentioned above I still haven't made any effort to make sure they work
in the case that attnum differs from attphysnum, but this should be
reasonably contained changes.)


The part where I stopped just before sending the current state is this
error message:

alvherre=# select * from quux where (a,c) in ( select a,c from quux );
select * from quux where (a,c) in ( select a,c from quux );
ERROR:  failed to find unique expression in subplan tlist

I'm going to see about it while I get feedback on the rest of this patch; in
particular, extra test cases that fail to work when columns have been
moved around are welcome, so that I can add them to the regress test.
What I have now is the basics I'm building as I go along.  The
regression tests show examples of some logical column renumbering (which
can be done after the table already contains some data) but none of
physical column renumbering (which can only be done when the table is
completely empty.)  My hunch is that the sample foo, bar, baz, quux
tables should present plenty of opportunities to display brokenness in
the planner and executor.


PS: Phil Currier allegedly had a patch back in 2007-2008 that did this,
or something very similar ... though he never posted a single bit of it,
and then he vanished without a trace.  If he's still available it would
be nice to see his WIP patch, even if outdated, as it might serve as
inspiration and let us know what other places need tweaking.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 18ae318..54473be 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2368,6 +2368,9 @@ get_attnum_pk_pos(int *pkattnums, int pknumatts, int key)
 	return -1;
 }
 
+/*
+ * FIXME this probably needs to be tweaked.
+ */
 static HeapTuple
 get_tuple_of_interest(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals)
 {
diff --git a/contrib/spi/timetravel.c b/contrib/spi/timetravel.c
index a37cbee..d7e6e50 100644
--- a/contrib/spi/timetravel.c
+++ b/contrib/spi/timetravel.c
@@ -314,6 +314,7 @@ timetravel(PG_FUNCTION_ARGS)
 		Oid		   *ctypes;
 		char		sql[8192];
 		char		separ = ' ';
+		Form_pg_attribute *attrs;
 
 		/* allocate ctypes for preparation */
 		ctypes = (Oid *) palloc(natts * sizeof(Oid));
@@ -322,10 +323,11 @@ timetravel(PG_FUNCTION_ARGS)
 		 * Construct query: INSERT INTO _relation_ 

Re: [HACKERS] On partitioning

2014-12-09 Thread Josh Berkus
On 12/09/2014 12:17 AM, Amit Langote wrote:
 Now if user wants to define multi-column Partition based on
  monthly_salary and annual_salary, how do we want him to
  specify the values.  Basically how to distinguish which values
  belong to first column key and which one's belong to second
  column key.
 
 Perhaps you are talking about syntactic difficulties that I totally missed 
 in my other reply to this mail?
 
 Can we represent the same data by rather using a subpartitioning scheme? 
 ISTM, semantics would remain the same.
 
 ... PARTITION BY (monthly_salary) SUBPARTITION BY (annual_salary)?

... or just use arrays.

PARTITION BY LIST ( monthly_salary, annual_salary )
PARTITION salary_small VALUES ({[300,400],[5000,6000]})
) 

... but that begs the question of how partition by list over two columns
(or more) would even work?  You'd need an a*b number of partitions, and
the user would be pretty much certain to miss a few value combinations.
 Maybe we should just restrict list partitioning to a single column for
a first release, and wait and see if people ask for more?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] moving from contrib to bin

2014-12-09 Thread Josh Berkus

 Here are the contrib programs:
 
 oid2name
 pg_archivecleanup
 pg_standby
 pg_test_fsync
 pg_test_timing
 pg_upgrade
 pg_xlogdump
 pgbench
 vacuumlo
 
 The proposal would basically be to mv contrib/$x src/bin/$x and also
 move the reference pages in the documentation.

+1

Considering that all of the above have been around for a while, it's
kind of silly that they're still in contrib.  Mostly that just
guarantees that nobody will use them, even when it's appropriate.

The one exception I might make above is pg_standby.  What do we need
this for today, exactly?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] moving from contrib to bin

2014-12-09 Thread Robert Haas
On Mon, Dec 8, 2014 at 10:26 PM, Peter Eisentraut pete...@gmx.net wrote:
 Let's take another crack at moving stuff out of contrib.  Nobody likes
 contrib.  The task is only finding something that most people like better.

I like contrib fine.  It's a great place for things to live that are
not quite baked enough for core, but still worth distributing.

 oid2name
 pg_archivecleanup
 pg_standby
 pg_test_fsync
 pg_test_timing
 pg_upgrade
 pg_xlogdump
 pgbench
 vacuumlo

 The proposal would basically be to mv contrib/$x src/bin/$x and also
 move the reference pages in the documentation.

I think pg_archivecleanup, pg_upgrade, and possibly pgbench have
enough general utility to justify putting them in src/bin, but not the
rest.  oid2name, pg_standby, and vacuumlo are probably closer to being
candidates for removal than promotion in my book.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing DDL deparsing support

2014-12-09 Thread Bruce Momjian
On Mon, Dec  8, 2014 at 12:43:36PM -0500, Robert Haas wrote:
 On Sat, Dec 6, 2014 at 10:43 PM, Bruce Momjian br...@momjian.us wrote:
  This causes creation DDL is checked if it is used in the regression
  database, but what about ALTER and DROP?  pg_dump doesn't issue those,
  except in special cases like inheritance.
 
 The proposed testing mechanism should cover any ALTER commands that
 are in the regression tests provided that those objects are not
 subsequently dropped -- because if the ALTER commands aren't replayed
 properly, then the later pg_dump won't produce the same output.
 
 There probably are some gaps in our current regression tests in this
 area, but that's probably a good thing to fix regardless of this.

OK, I understand now that the ALTERs are being passed to the slave and
we then can test that against pg_dump --- sounds good.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: multivariate statistics / proof of concept

2014-12-09 Thread Tomas Vondra
On 8.12.2014 02:01, Michael Paquier wrote:
 On Sun, Nov 16, 2014 at 3:35 AM, Tomas Vondra t...@fuzzy.cz wrote:
 Thanks for the link. I've been looking for a good dataset with such
 data, and this one is by far the best one.

 The current version of the patch supports only data types passed by
 value (i.e. no varlena types - text, ), which means it's impossible to
 build multivariate stats on some of the interesting columns (state,
 city, ...).

 I guess it's time to start working on removing this limitation.
 Tomas, what's your status on this patch? Are you planning to make it
 more complicated than it is? For now I have switched it to a Needs
 Review state because even your first version did not get advanced
 review (that's quite big btw). I guess that we should switch it to the
 next CF.

Hello Michael,

I agree with moving the patch to the next CF - I'm working on the patch,
but I will take a bit more time to submit a new version and I can do
that in the next CF.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] advance local xmin more aggressively

2014-12-09 Thread Robert Haas
On Mon, Dec 8, 2014 at 9:31 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 8, 2014 at 4:56 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 I don't immediately see the problem either, but I have to say that
 grovelling through all the resource owners seems ugly anyway. Resource
 owners are not meant to be traversed like that. And there could be a lot of
 them, and you have to visit every one of them. That could get expensive if
 there are a lot of resource owners.

 1. I don't really see why resource owners shouldn't be traversed like
 that.  They are clearly intended to form a hierarchy, and there's
 existing code that recurses through the hierarchy from a given level
 downward.  What's ugly about that?

Here's a patch.  I looked at the issue of tracking parent-less
resource owners a bit more closely.  It turns out that resource owners
are always created in TopMemoryContext since they should only be
freed explicitly (cf. resowner.c).  I was a bit worried about that,
because it would be bad to keep a list of all parent-less resource
owners if list elements could vanish in a context reset.  But that
doesn't seem to be a concern.  So I stole the nextchild links of
parent-less resource owners, which are not used for anything
currently, to keep a list of such resource owners.

It occurred to me that there's probably not much value in recomputing
xmin when the active snapshot stack is non-empty.  It's not impossible
that a PL/pgsql function could close a cursor with an old xmin and
then do lots of other work (or just sleep for a long time) before
returning to the top-level, but it is pretty unlikely.  So the
attached patch only considers recomputing the advertised xmin when the
active snapshot stack is empty.  That'll happen at the end of each
statement, which seems soon enough.

Upthread, I suggested keeping a tally of the number of snapshots with
the advertised xmin and recomputing the xmin to advertise only when it
reaches 0.  This patch doesn't implementation that optimization, but
it does have code that aborts the traversal of the resource owner
hierarchy as soon as we see an xmin that will preclude advancing our
advertised xmin.  Releasing N resource owners could therefore cost
O(N^2) in the worst case, but note that releasing N resource owners is
*already* an O(N^2) operation in the worst case, because the list of
children of a particular parent resource owner is singly linked, and
thus deleting a resource owner is O(N).  It's been that way for an
awfully long time without anyone complaining, probably because (a)
it's not particularly common to have large numbers of cursors open
simultaneously and (b) even if you do have that, the constant factor
is pretty low.

Following Heikki's previous suggestion, the patch contains checks to
make sure that we find exactly the number of registered snapshots that
we expect to find.  We could consider demoting these to asserts or
something, but this is more likely to catch bugs if there are any.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index 0955bcc..186fa91 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -113,6 +113,7 @@ typedef struct ResourceOwnerData
 ResourceOwner CurrentResourceOwner = NULL;
 ResourceOwner CurTransactionResourceOwner = NULL;
 ResourceOwner TopTransactionResourceOwner = NULL;
+ResourceOwner FirstRootResourceOwner = NULL;
 
 /*
  * List of add-on callbacks for resource releasing
@@ -167,6 +168,11 @@ ResourceOwnerCreate(ResourceOwner parent, const char *name)
 		owner-nextchild = parent-firstchild;
 		parent-firstchild = owner;
 	}
+	else
+	{
+		owner-nextchild = FirstRootResourceOwner;
+		FirstRootResourceOwner = owner;
+	}
 
 	return owner;
 }
@@ -442,6 +448,8 @@ ResourceOwnerDelete(ResourceOwner owner)
 	 * the owner tree.  Better a leak than a crash.
 	 */
 	ResourceOwnerNewParent(owner, NULL);
+	Assert(FirstRootResourceOwner == owner);
+	FirstRootResourceOwner = owner-nextchild;
 
 	/* And free the object. */
 	if (owner-buffers)
@@ -502,6 +510,14 @@ ResourceOwnerNewParent(ResourceOwner owner,
 			}
 		}
 	}
+	else
+	{
+		ResourceOwner *link = FirstRootResourceOwner;
+
+		while (*link != owner)
+			link = ((*link)-nextchild);
+		*link = owner-nextchild;
+	}
 
 	if (newparent)
 	{
@@ -513,7 +529,8 @@ ResourceOwnerNewParent(ResourceOwner owner,
 	else
 	{
 		owner-parent = NULL;
-		owner-nextchild = NULL;
+		owner-nextchild = FirstRootResourceOwner;
+		FirstRootResourceOwner = owner;
 	}
 }
 
@@ -1161,6 +1178,50 @@ ResourceOwnerForgetSnapshot(ResourceOwner owner, Snapshot snapshot)
 }
 
 /*
+ * Invoke a caller-supplied function on every snapshot registered with this
+ * resource owner or any descendent resource owner.  The callback can abort
+ * the traversal by returning true.
+ */
+bool
+ResourceOwnerWalkSnapshots(ResourceOwner owner,
+	

Re: [HACKERS] moving from contrib to bin

2014-12-09 Thread Alvaro Herrera
Peter Eisentraut wrote:

 Here are the contrib programs:
 
 oid2name
 pg_archivecleanup
 pg_standby
 pg_test_fsync
 pg_test_timing
 pg_upgrade
 pg_xlogdump
 pgbench
 vacuumlo
 
 The proposal would basically be to mv contrib/$x src/bin/$x and also
 move the reference pages in the documentation.

Maybe it makes sense to have a distinction between client programs and
server programs.  Can we have src/sbin/ and move stuff that involves the
server side in there?  I think that'd be pg_xlogdump, pg_archivecleanup,
pg_upgrade, pg_test_timing, pg_test_fsync.  (If we were feeling bold we
could also move pg_resetxlog, pg_controldata and initdb there.)

(For pg_upgrade you also need to do something about pg_upgrade_support,
which is good because that is one very ugly crock.)

I agree that oid2name and vacuumlo need to be in a better state to
deserve their promotion to src/bin, if we keep them at all.

In any case I support the move out of contrib of everything except
vacuumlo and oid2name, for reasons already stated by others in the
thread.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small TRUNCATE glitch

2014-12-09 Thread Alex Shulgin

Alex Shulgin a...@commandprompt.com writes:

 Bruce Momjian br...@momjian.us writes:

 Added to TODO:

 o Clear table counters on TRUNCATE

   http://archives.postgresql.org/pgsql-hackers/2008-04/msg00169.php

 Hello,

 Attached is a WIP patch for this TODO.

This part went as an attachment, which wasn't my intent:


It does the trick by tracking if a TRUNCATE command was issued under a
(sub)transaction and uses this knowledge to reset the live/dead tuple
counters later if the transaction was committed.  Testing in simple
cases shows that this clears the counters correctly, including use of
savepoints.

The 2PC part requires extending bool flag to fit the trunc flag, is this
approach sane?  Given that 2PC transaction should survive server
restart, it's reasonable to expect it to also survive the upgrade, so I
see no clean way of adding another bool field to the
TwoPhasePgStatRecord struct (unless some would like to add checks on
record length, etc.).

I'm going to add some regression tests, but not sure what would be the
best location for this.  The truncate.sql seems like natural choice, but
stats are not updating realtime, so I'd need to borrow some tricks from
stats.sql or better put the new tests in the stats.sql itself?

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] moving from contrib to bin

2014-12-09 Thread Bruce Momjian
On Tue, Dec  9, 2014 at 06:10:02PM -0300, Alvaro Herrera wrote:
 (For pg_upgrade you also need to do something about pg_upgrade_support,
 which is good because that is one very ugly crock.)

FYI, pg_upgrade_support was segregated from pg_upgrade only because we
wanted separate binary and shared object build/install targets.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSSAPI, SSPI - include_realm default

2014-12-09 Thread Peter Eisentraut
On 12/5/14 1:06 PM, Stephen Frost wrote:
 I suggest we also backpatch some documentation suggesting that people
  manually change the include_realm parameter (perhaps also with a note
  saying that the default will change in 9.5).
 I'll work on a patch for back-branches if everyone is alright with this
 patch against master.

I don't think backpatching this is necessary or appropriate.

First of all, this isn't even released, and it might very well change
again later.  The right time to publicly notify about this change is not
before when 9.5 is released.

Also, it's not like people keep re-reading the old documentation in
order to get updated advice.  It might very well be confusing if stable
documentation changes because of future events.  Users who are
interested in knowing about changes in future releases should read the
release notes of those future releases.

My comment that include_realm is supported back to 8.4 was because there
is an expectation that a pg_hba.conf file can be used unchanged across
several major releases.  So when 9.5 comes out and people update their
pg_hba.conf files for 9.5, those files will still work in old releases.
 But the time to do those updates is then, not now.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSSAPI, SSPI - include_realm default

2014-12-09 Thread Magnus Hagander
On Dec 9, 2014 10:52 PM, Peter Eisentraut pete...@gmx.net wrote:

 On 12/5/14 1:06 PM, Stephen Frost wrote:
  I suggest we also backpatch some documentation suggesting that people
   manually change the include_realm parameter (perhaps also with a note
   saying that the default will change in 9.5).
  I'll work on a patch for back-branches if everyone is alright with this
  patch against master.

 I don't think backpatching this is necessary or appropriate.

 First of all, this isn't even released, and it might very well change
 again later.  The right time to publicly notify about this change is not
 before when 9.5 is released.

 Also, it's not like people keep re-reading the old documentation in
 order to get updated advice.  It might very well be confusing if stable
 documentation changes because of future events.  Users who are
 interested in knowing about changes in future releases should read the
 release notes of those future releases.

 My comment that include_realm is supported back to 8.4 was because there
 is an expectation that a pg_hba.conf file can be used unchanged across
 several major releases.  So when 9.5 comes out and people update their
 pg_hba.conf files for 9.5, those files will still work in old releases.
  But the time to do those updates is then, not now.


I thought the idea was to backpatch documentation saying it's a good idea
to change this value to x because of y. Not actually referring to the
upcoming change directly. And I still think that part is a good idea, as it
helps people avoid potential security pitfalls.

So not really a backpatch as so, rather a separate patch for the back
branches. (and people definitely reread the docs - since they deploy new
systems on the existing versions...)

/Magnus


[HACKERS] operator does not exist: character varying[] character[]

2014-12-09 Thread Jim Nasby

Is there any particular reason we don't allow comparing char and varchar 
arrays? If not I'll submit a patch.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] operator does not exist: character varying[] character[]

2014-12-09 Thread Tom Lane
Jim Nasby jim.na...@bluetreble.com writes:
 On 12/9/14, 4:19 PM, Jim Nasby wrote:
 Is there any particular reason we don't allow comparing char and varchar 
 arrays? If not I'll submit a patch.

 We're also missing operators on text and varchar arrays.

Adding operators would be an incorrect fix.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-09 Thread Jim Nasby

On 12/8/14, 5:19 PM, Josh Berkus wrote:

On 12/08/2014 02:12 PM, Jim Nasby wrote:

On 12/8/14, 12:26 PM, Josh Berkus wrote:

4. Creation Locking Problem
high probability of lock pile-ups whenever a new partition is created on
demand due to multiple backends trying to create the partition at the
same time.
Not Addressed?


Do users actually try and create new partitions during DML? That sounds
doomed to failure in pretty much any system...


There is no question that it would be easier for users to create
partitions on demand automatically.  Particularly if you're partitioning
by something other than time.  For a particular case, consider users on
RDS, which has no cron jobs for creating new partitons; it's on demand
or manually.

It's quite possible that there is no good way to work out the locking
for on-demand partitions though, but *if* we're going to have a 2nd
partition system, I think it's important to at least discuss the
problems with on-demand creation.


Yeah, we should discuss it. Perhaps the right answer here may be our own job 
scheduler, something a lot of folks want anyway.


11. Hash Partitioning
Some users would prefer to partition into a fixed number of
hash-allocated partitions.
Not Addressed.


Though, you should be able to do that in either system if you bother to
define your own hash in a BEFORE trigger...


That doesn't do you any good with the SELECT query, unless you change
your middleware to add a hash(column) to every query.  Which would be
really hard to do for joins.


A. COPY/ETL then attach
In inheritance partitioning, you can easily build a partition outside
the master and then attach it, allowing for minimal disturbance of
concurrent users.  Could be addressed in the future.


How much of the desire for this is because our current row routing
solutions are very slow? I suspect that's the biggest reason, and
hopefully Alvaro's proposal mostly eliminates it.


That doesn't always work, though.  In some cases the partition is being
built using some fairly complex logic (think of partitions which are
based on matviews) and there's no fast way to create the new data.
Again, this is an acceptable casualty of an improved design, but if it
will be so, we should consciously decide that.


Is there an example you can give here? If the scheme is that complicated I'm 
failing to see how you're supposed to do things like partition elimination.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSSAPI, SSPI - include_realm default

2014-12-09 Thread Stephen Frost
* Peter Eisentraut (pete...@gmx.net) wrote:
 On 12/5/14 1:06 PM, Stephen Frost wrote:
  I suggest we also backpatch some documentation suggesting that people
   manually change the include_realm parameter (perhaps also with a note
   saying that the default will change in 9.5).
  I'll work on a patch for back-branches if everyone is alright with this
  patch against master.
 
 I don't think backpatching this is necessary or appropriate.

Sorry if that wasn't clear but the idea was to *just* backpatch the
documentation comments, not to change the default in back-branches.

 First of all, this isn't even released, and it might very well change
 again later.  The right time to publicly notify about this change is not
 before when 9.5 is released.
 
 Also, it's not like people keep re-reading the old documentation in
 order to get updated advice.  It might very well be confusing if stable
 documentation changes because of future events.  Users who are
 interested in knowing about changes in future releases should read the
 release notes of those future releases.
 
 My comment that include_realm is supported back to 8.4 was because there
 is an expectation that a pg_hba.conf file can be used unchanged across
 several major releases.  So when 9.5 comes out and people update their
 pg_hba.conf files for 9.5, those files will still work in old releases.
  But the time to do those updates is then, not now.

The back-branches are being patched to discourage using the default
because it's not a secure approach.  New users start using PG all the
time and so changing the existing documentation is worthwhile to ensure
those new users understand.  A note in the release notes for whichever
minor release the change to the documentation shows up in would be a
good way to make existing users aware of the change and hopefully
encourage them to review their configuration.

If we don't agree that the change should be made then we can discuss
that, but everyone commenting so far has agreed on the change.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] operator does not exist: character varying[] character[]

2014-12-09 Thread Jim Nasby

On 12/9/14, 4:19 PM, Jim Nasby wrote:

Is there any particular reason we don't allow comparing char and varchar 
arrays? If not I'll submit a patch.


We're also missing operators on text and varchar arrays.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSSAPI, SSPI - include_realm default

2014-12-09 Thread Stephen Frost
* Magnus Hagander (mag...@hagander.net) wrote:
 On Dec 9, 2014 10:52 PM, Peter Eisentraut pete...@gmx.net wrote:
 
  On 12/5/14 1:06 PM, Stephen Frost wrote:
   I suggest we also backpatch some documentation suggesting that people
manually change the include_realm parameter (perhaps also with a note
saying that the default will change in 9.5).
   I'll work on a patch for back-branches if everyone is alright with this
   patch against master.
 
  I don't think backpatching this is necessary or appropriate.
 
  First of all, this isn't even released, and it might very well change
  again later.  The right time to publicly notify about this change is not
  before when 9.5 is released.
 
  Also, it's not like people keep re-reading the old documentation in
  order to get updated advice.  It might very well be confusing if stable
  documentation changes because of future events.  Users who are
  interested in knowing about changes in future releases should read the
  release notes of those future releases.
 
  My comment that include_realm is supported back to 8.4 was because there
  is an expectation that a pg_hba.conf file can be used unchanged across
  several major releases.  So when 9.5 comes out and people update their
  pg_hba.conf files for 9.5, those files will still work in old releases.
   But the time to do those updates is then, not now.
 
 
 I thought the idea was to backpatch documentation saying it's a good idea
 to change this value to x because of y. Not actually referring to the
 upcoming change directly. And I still think that part is a good idea, as it
 helps people avoid potential security pitfalls.

I agree with this but I don't really see why we wouldn't say hey, this
is going to change in 9.5.  Peter's argument sounds like he'd rather we
not make any changes to the existing documentation, and I don't agree
with that, and if we're making changes then, imv, we might as well
comment that the default is changed in 9.5.

 So not really a backpatch as so, rather a separate patch for the back
 branches. (and people definitely reread the docs - since they deploy new
 systems on the existing versions...)

Yes, I was going to write a different patch for the back-branches,
apologies if that wasn't clear.  I'll see about drafting something up
soon as there doesn't seem to be any argument about the substance of the
proposed patch for master.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] logical column ordering

2014-12-09 Thread Jim Nasby

On 12/9/14, 11:41 AM, Alvaro Herrera wrote:

I'm going to see about it while I get feedback on the rest of this patch; in
particular, extra test cases that fail to work when columns have been
moved around are welcome, so that I can add them to the regress test.
What I have now is the basics I'm building as I go along.  The
regression tests show examples of some logical column renumbering (which
can be done after the table already contains some data) but none of
physical column renumbering (which can only be done when the table is
completely empty.)  My hunch is that the sample foo, bar, baz, quux
tables should present plenty of opportunities to display brokenness in
the planner and executor.


The ideal case would be to do something like randomizing logical and physical 
ordering as tables are created throughout the entire test suite (presumably as 
an option). That should work for physical ordering, but presumably it would 
pointlessly blow up on logical ordering because the expected output is 
hard-coded.

Perhaps instead of randomizing logical ordering we could force that to be the 
same ordering in which fields were supplied and actually randomize attnum?

In particular, I'm thinking that in DefineRelation we can randomize 
stmt-tableElts before merging in inheritance attributes.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] operator does not exist: character varying[] character[]

2014-12-09 Thread Jim Nasby

On 12/9/14, 4:30 PM, Tom Lane wrote:

Jim Nasby jim.na...@bluetreble.com writes:

On 12/9/14, 4:19 PM, Jim Nasby wrote:

Is there any particular reason we don't allow comparing char and varchar 
arrays? If not I'll submit a patch.



We're also missing operators on text and varchar arrays.


Adding operators would be an incorrect fix.


Right, I'm assuming this is a problem somewhere else (haven't looked into it 
yet).

I just wanted confirmation that this is unexpected before I try and fix it. 
I'll take your silence on that point as confirmation that this is a bug. :)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical column ordering

2014-12-09 Thread Josh Berkus
On 12/09/2014 09:41 AM, Alvaro Herrera wrote:
 The first thing where this matters is tuple descriptor expansion in
 parse analysis; at this stage, things such as * (in select *) are
 turned into a target list, which must be sorted according to attlognum.
 To achieve this I added a new routine to tupledescs,

The two other major cases are:

INSERT INTO table SELECT|VALUES ...

COPY table FROM|TO ...

... although copy should just be a subclass of SELECT.

Question on COPY, though: there's reasons why people would want COPY to
dump in either physical or logical order.  If you're doing COPY to
create CSV files for output, then you want the columns in logical order.
 If you're doing COPY for pg_dump, then you want them in physical order
for faster dump/reload.  So we're almost certainly going to need to have
an option for COPY.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-12-09 Thread Michael Paquier
On Wed, Dec 10, 2014 at 1:37 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 On Tue, Dec 9, 2014 at 10:21 AM, Michael Paquier michael.paqu...@gmail.com
 wrote:

 On Tue, Dec 9, 2014 at 6:43 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  OK. Perhaps that's not worth mentioning in the release notes, but some
  users may be used to the old behavior. What about the other issues
  (regression test for permissions incorrect and matviews)?
 Here is in any case an updated patch to fix all those things rebased
 on latest HEAD (ae4e688).


 The patch is fine:
 - No compiler warnings
 - Added properly regressions tests and run ok

 There are no changes in the docs. Maybe we can mention matviews on REINDEX
 SCHEMA docs, what do you think?
Current documentation tells that all the indexes in schema are
reindexed, only matviews and relations can have one, so that's
implicitly specified IMO. If in the future we add support for another
relkind and that it can have indexes, we won't need to update the docs
as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Elusive segfault with 9.3.5 query cancel

2014-12-09 Thread Richard Frith-Macdonald
On 5 Dec 2014, at 22:41, Jim Nasby jim.na...@bluetreble.com wrote:
 
 
 Perhaps we should also officially recommend production servers be setup to 
 create core files. AFAIK the only downside is the time it would take to write 
 a core that's huge because of shared buffers, but perhaps there's some way to 
 avoid writing those? (That means the core won't help if the bug is due to 
 something in a buffer, but that seems unlikely enough that the tradeoff is 
 worth it...)

Good idea.  It seems the madvise() system call (with MADV_DONTDUMP) is exactly 
what's needed to avoid dumping shared buffers.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical column ordering

2014-12-09 Thread Andrew Dunstan


On 12/09/2014 06:19 PM, Josh Berkus wrote:

On 12/09/2014 09:41 AM, Alvaro Herrera wrote:

The first thing where this matters is tuple descriptor expansion in
parse analysis; at this stage, things such as * (in select *) are
turned into a target list, which must be sorted according to attlognum.
To achieve this I added a new routine to tupledescs,

The two other major cases are:

INSERT INTO table SELECT|VALUES ...

COPY table FROM|TO ...

... although copy should just be a subclass of SELECT.

Question on COPY, though: there's reasons why people would want COPY to
dump in either physical or logical order.  If you're doing COPY to
create CSV files for output, then you want the columns in logical order.
  If you're doing COPY for pg_dump, then you want them in physical order
for faster dump/reload.  So we're almost certainly going to need to have
an option for COPY.





I seriously doubt it, although I could be wrong. Unless someone can show 
a significant performance gain from using physical order, which would be 
a bit of a surprise to me, I would just stick with logical ordering as 
the default.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-12-09 Thread Peter Geoghegan
On Sun, Nov 23, 2014 at 3:59 PM, Andres Freund and...@2ndquadrant.com wrote:
  The heavyweight locking issue is really killing me, though.
 
  I don't really understand why you're not content with just detecting
  deadlocks for now. Everything else seems like bells and whistles for
  later.

 I don't think it's OK for a parallel query to just randomly deadlock.

 I think that should be the end goal, yes.

 It's OK for parallel query to excuse itself politely and retire from
 the field, but I don't think it's OK for it to say, oh, sure, I can
 parallelize that for you, and then fail by deadlocking with itself.

 But I also think that right now it doesn't necessarily need to be the
 focus immediately.

I am in strong agreement with Robert that unprincipled deadlocks are
not okay. Now, what that actually means in practice might be kind of
hard to delineate.

Take my ON CONFLICT UPDATE patch. The locking there is optimistic, and
so in theory there are risks of lock starvation [1]. There is no
axiomatic reason why a single backend cannot be starved of a lock in
the event of a perfect storm of conflicts. Similarly, the Lehman and
Yao paper makes what can only be called a plausibility argument around
how livelocks are not a concern in practice [2] when an indefinite
number of move right operations are required after a page split;
having written a bunch of theorems around the correctness of other
aspects of their algorithm, the argument here is more or less: Come
on, that'll never happen!. Concurrency is hard.

If you want another example of this kind of thing, then there's how
Oracle deals with READ COMMITTED row conflicts -- the statement is
rolled back and restarted. Who is to say with certainty that it'll
make any progress the second time? Or the third or forth?

Note that the optimistic locking stuff was actually my solution to the
unprincipled deadlocks problem, a solution that, as I say, also
implies the theoretical risk of lock starvation (but not livelock). I
think I would also have found it acceptable if there was a solution
that reduced the original risk, the risk of *deadlocks* to merely a
theoretical one, as opposed to a very real one (that could be shown
with a simple testcase).

 This is a topic that I think will be much easier to
 approach if some demonstration of actual parallel users would be
 available. Doesn't have to be committable or such, but I think it'll
 help to shape how this has to look.  I think unrecognized deadlocks will
 make things too annoying to use, but I don't think it needs to be
 guaranteed deadlock free or such.

I couldn't agree more. The greater point about livelock, which I
believe applies here as well, is that there are different standards by
which we might consider that unprincipled deadlocks won't happen.
Basically, it seems inevitable that a practical standard sometimes
must be applied. If a well informed person, say myself or Andres,
cannot make any given practical implementation deadlock (or
alternatively be starved for locks, if we're talking about an
optimistic locking work-around to unprincipled deadlocks that
introduces that risk), then I guess that means that there are no
unprincipled deadlocks/livelocks, and I for one will probably find
that acceptable given enough testing.

I realize that that might sound terribly loose (I probably forgot a
few caveats), but what it boils down to is that we *don't* have the
tools to analyze certain types of problems like this, or at least I
don't think we do, because no one does. The closest thing we have is
some well written custom stress testing, and lots of scrutiny. Of
course I'd prefer a solution that is provably correct from first
principles (in other words, I'd prefer to use algorithms that are in
some formal sense non-blocking [3] or something along those lines),
but that will often be cost prohibitive or otherwise illusive when
designing concurrent algorithms.

I think I agree with you both.

[1] https://wiki.postgresql.org/wiki/UPSERT#Theoretical_lock_starvation_hazards
[2] http://www.cs.cornell.edu/courses/cs4411/2009sp/blink.pdf ,
Section 6.4 Livelock
[3] https://en.wikipedia.org/wiki/Non-blocking_algorithm
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] thinko in convertToJsonb()

2014-12-09 Thread Mark Dilger
The call:

reserveFromBuffer(buffer, sizeof(VARHDRSZ))

is assuming that the size of varlena header is the same
size as the type used to return that size, which happens
to be so, but someone could easily change that macro
to:

#define VARHDRSZ ((int64) sizeof(int32))

And you'd want to reserve sizeof(int32), not sizeof(int64)
in convertToJsonb.

Perhaps the code really meant to say:

reserveFromBuffer(buffer, VARHDRSZ)


mark


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical column ordering

2014-12-09 Thread Alvaro Herrera
Andrew Dunstan wrote:
 
 On 12/09/2014 06:19 PM, Josh Berkus wrote:
 On 12/09/2014 09:41 AM, Alvaro Herrera wrote:
 The first thing where this matters is tuple descriptor expansion in
 parse analysis; at this stage, things such as * (in select *) are
 turned into a target list, which must be sorted according to attlognum.
 To achieve this I added a new routine to tupledescs,
 The two other major cases are:
 
 INSERT INTO table SELECT|VALUES ...
 
 COPY table FROM|TO ...

Yes, both are covered.

 ... although copy should just be a subclass of SELECT.

It is not.  There's one part of COPY that goes through SELECT
processing, but only when the table being copied is a subselect.
Normal COPY does not use the same code path.

 Question on COPY, though: there's reasons why people would want COPY to
 dump in either physical or logical order.  If you're doing COPY to
 create CSV files for output, then you want the columns in logical order.
   If you're doing COPY for pg_dump, then you want them in physical order
 for faster dump/reload.  So we're almost certainly going to need to have
 an option for COPY.
 
 I seriously doubt it, although I could be wrong. Unless someone can show a
 significant performance gain from using physical order, which would be a bit
 of a surprise to me, I would just stick with logical ordering as the
 default.

Well, we have an optimization that avoids a projection step IIRC by
using the physical tlist instead of having to build a tailored one.  I
guess the reason that's there is because somebody did measure an
improvement.  Maybe it *is* worth having as an option for pg_dump ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small TRUNCATE glitch

2014-12-09 Thread Alvaro Herrera
Alex Shulgin wrote:

 The 2PC part requires extending bool flag to fit the trunc flag, is this
 approach sane?  Given that 2PC transaction should survive server
 restart, it's reasonable to expect it to also survive the upgrade, so I
 see no clean way of adding another bool field to the
 TwoPhasePgStatRecord struct (unless some would like to add checks on
 record length, etc.).

I don't think we need to have 2PC files survive a pg_upgrade.  It seems
perfectly okay to remove them from the new cluster at some appropriate
time, *if* they are copied from the old cluster at all (I don't think
they should be.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-09 Thread Peter Geoghegan
There is an interesting thread about strcoll() overhead over on -general:

http://www.postgresql.org/message-id/cab25xexnondrmc1_cy3jvmb0tmydm38ef9q2d7xla0rbncj...@mail.gmail.com

My guess was that this person experienced a rather unexpected downside
of spilling to disk when sorting on a text attribute: System
throughput becomes very CPU bound, because tapesort tends to result in
more comparisons [1].  With abbreviated keys, tapesort can actually
compete with quicksort in certain cases [2]. Tapesorts of text
attributes are especially bad on released versions of Postgres, and
will perform very little actual I/O.

In all seriousness, I wonder if we should add a release note item
stating that when using Postgres 9.5, due to the abbreviated key
optimization, external sorts can be much more I/O bound than in
previous releases...

[1] 
http://www.postgresql.org/message-id/20140806035512.ga91...@tornado.leadboat.com
[2] 
http://www.postgresql.org/message-id/CAM3SWZQiGvGhMB4TMbEWoNjO17=ysb5b5y5mgqjsanq4uwt...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-12-09 Thread Etsuro Fujita

Hi Ashutosh,

Thanks for the review!

(2014/11/28 18:14), Ashutosh Bapat wrote:

On Thu, Nov 27, 2014 at 3:52 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:
(2014/11/17 17:55), Ashutosh Bapat wrote:
Here are my review comments for patch fdw-inh-3.patch.



Tests
---
1. It seems like you have copied from testcase inherit.sql to
postgres_fdw testcase. That's a good thing, but it makes the
test quite
long. May be we should have two tests in postgres_fdw contrib
module,
one for simple cases, and other for inheritance. What do you say?



IMO, the test is not so time-consuming, so it doesn't seem worth
splitting it into two.



I am not worried about the timing but I am worried about the length of
the file and hence ease of debugging in case we find any issues there.
We will leave that for the commiter to decide.


OK


Documentation

1. The change in ddl.sgml
-We will refer to the child tables as partitions, though
they
-are in every way normal productnamePostgreSQL/ tables.
+We will refer to the child tables as partitions, though
we assume
+that they are normal productnamePostgreSQL/ tables.

adds phrase we assume that, which confuses the intention
behind the
sentence. The original text is intended to highlight the equivalence
between partition and normal table, where as the addition
esp. the
word assume weakens that equivalence. Instead now we have to
highlight
the equivalence between partition and normal or foreign
table. The
wording similar to though they are either normal or foreign tables
should be used there.



You are right, but I feel that there is something out of place in
saying that there (5.10.2. Implementing Partitioning) because the
procedure there has been written based on normal tables only.  Put
another way, if we say that, I think we'd need to add more docs,
describing the syntax and/or the corresponding examples for
foreign-table cases.  But I'd like to leave that for another patch.
So, how about the wording we assume *here* that, instead of we
assume that, as I added the following notice in the previous
section (5.10.1. Overview)?

@@ -2650,7 +2669,10 @@ VALUES ('Albany', NULL, NULL, 'NY');
  table of a single parent table.  The parent table itself is
normally
  empty; it exists just to represent the entire data set.  You
should be
  familiar with inheritance (see xref linkend=ddl-inherit)
before
-attempting to set up partitioning.
+attempting to set up partitioning.  (The setup and management of
+partitioned tables illustrated in this section assume that each
+partition is a normal table.  However, you can do that in a
similar way
+for cases where some or all partitions are foreign tables.)



This looks ok, though, I would like to see final version of the
document. But I think, we will leave that for committer to handle.


OK


2. The wording some kind of optimization gives vague picture.
May be
it can be worded as Since the constraints are assumed to be
true, they
are used in constraint-based query optimization like constraint
exclusion for partitioned tables..
+Those constraints are used in some kind of query
optimization such
+as constraint exclusion for partitioned tables (see
+xref linkend=ddl-partitioning).



Will follow your revision.


Done.


Code
---
1. In the following change
+/*
* acquire_inherited_sample_rows -- acquire sample rows from
inheritance tree
*
* This has the same API as acquire_sample_rows, except that
rows are
* collected from all inheritance children as well as the
specified table.
- * We fail and return zero if there are no inheritance children.
+ * We fail and return zero if there are no inheritance children or
there are
+ * inheritance children that foreign tables.

The addition should be there are inheritance children that *are all
*foreign tables. Note the addition are all.



Sorry, I incorrectly wrote the comment.  What I tried to write is
We fail and return zero if there are no inheritance children or if
we are not in VAC_MODE_SINGLE case and inheritance tree contains at
least one foreign table..



You might want to use English description of VAC_MODE_SINGLE instead
of that macro in the comment, so that reader doesn't have to look up
VAC_MODE_SINGLE. But I think, we will leave this for the committer.


I corrected the comments and 

Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-09 Thread Michael Paquier
On Thu, Dec 4, 2014 at 9:26 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 12/04/2014 01:47 PM, Petr Jelinek wrote:

 On 04/12/14 12:26, Heikki Linnakangas wrote:

 On 12/04/2014 01:16 PM, Petr Jelinek wrote:

 On 04/12/14 10:42, Heikki Linnakangas wrote:

 On 12/03/2014 04:54 PM, Alvaro Herrera wrote:

 ir commit timestamp directly as they commit,
 or an external transaction c


 Sorry, I'm late to the party, but here's some random comments on this
 after a quick review:

 * The whole concept of a node ID seems to be undocumented, and unused.
 No-one calls CommitTsSetDefaultNodeId(). The whole SET_TS record type
 is
 dead code too, when a non-default node_id is not set. Please remove, or
 explain how all that's supposed to work.


 That's API meant to be used by extensions, maybe it would be preferable
 if there was SQL API exposed also?

 (It might also make more sense once replication identifiers are
 submitted.)


 Maybe. Hard to tell without any documentation or comments on how it's
 supposed to work. In unacceptable in its current state.

 I would prefer to remove it now, and add it back later together with the
 code that actually uses it. I don't like having unused internal
 functions and APIs sitting the source tree in the hope that someone will
 find them useful in the future. It's more likely that it will bitrot, or
 not actually work as intended, when someone later tries to use it.


 The thing is I already have extension for 9.4 where I would use this API
 for conflict detection if it was available so it's not about hoping for
 somebody to find it useful. There was discussion about this during the
 review process because the objection was raised already then.


 Yeah, it was raised. I don't think it was really addressed. There was
 lengthy discussion on whether to include LSN, node id, and/or some other
 information, but there was no discussion on:

 * What is a node ID?
 * How is it used?
 * Who assigns it?

 etc.

 None of those things are explained in the documentation nor code comments.

 The node ID sounds suspiciously like the replication identifiers that have
 been proposed a couple of times. I don't remember if I like replication
 identifiers or not, but I'd rather discuss that issue explicitly and
 separately. I don't want the concept of a replication/node ID to fly under
 the radar like this.

 What would the SQL API look like, and what would it be used for?


 It would probably mirror the C one, from my POV it's not needed but
 maybe some replication solution would prefer to use this without having
 to write C components...


 I can't comment on that, because without any documentation, I don't even
 know what the C API is.

 BTW, why is it OK that the node-ID of a commit is logged as a separate WAL
 record, after the commit record? That means that it's possible that a
 transaction commits, but you crash just before writing the SETTS record, so
 that after replay the transaction appears committed but with the default
 node ID.

 (Maybe that's OK given the intended use case, but it's hard to tell without
 any documentation. See a pattern here? ;-) )

Could it be possible to get a refresh on that before it bitrots too
much or we'll simply forget about it? In the current state, there is
no documentation able to explain what is the point of the nodeid
stuff, how to use it and what use cases it is aimed at. The API in
committs.c should as well be part of it. If that's not done, I think
that it would be fair to remove this portion and simply WAL-log the
commit timestamp its GUC is activated.
Thanks,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-09 Thread Amit Kapila
On Tue, Dec 9, 2014 at 11:44 PM, Josh Berkus j...@agliodbs.com wrote:
 On 12/09/2014 12:17 AM, Amit Langote wrote:
  Now if user wants to define multi-column Partition based on
   monthly_salary and annual_salary, how do we want him to
   specify the values.  Basically how to distinguish which values
   belong to first column key and which one's belong to second
   column key.
  
  Perhaps you are talking about syntactic difficulties that I totally
missed in my other reply to this mail?
 
  Can we represent the same data by rather using a subpartitioning
scheme? ISTM, semantics would remain the same.
 
  ... PARTITION BY (monthly_salary) SUBPARTITION BY (annual_salary)?


Using SUBPARTITION is not the answer for multi-column partition,
I think if we have to support it for List partitioning then something
on lines what Josh has mentioned below could workout, but I don't
think it is important to support multi-column partition for List at this
stage.

 ... or just use arrays.

 PARTITION BY LIST ( monthly_salary, annual_salary )
 PARTITION salary_small VALUES ({[300,400],[5000,6000]})
 ) 

 ... but that begs the question of how partition by list over two columns
 (or more) would even work?  You'd need an a*b number of partitions, and
 the user would be pretty much certain to miss a few value combinations.
  Maybe we should just restrict list partitioning to a single column for
 a first release, and wait and see if people ask for more?


I also think we should not support multi-column list partition in first
release.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] On partitioning

2014-12-09 Thread Amit Kapila
On Tue, Dec 9, 2014 at 7:21 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Amit Kapila wrote:
  On Tue, Dec 9, 2014 at 1:42 AM, Robert Haas robertmh...@gmail.com
wrote:
   On Mon, Dec 8, 2014 at 2:56 PM, Andres Freund and...@2ndquadrant.com
  wrote:
I don't think that's mutually exclusive with the idea of
partitions-as-tables.  I mean, you can add code to the ALTER TABLE
path that says if (i_am_not_the_partitioning_root) ereport(ERROR,
...)
wherever you want.
   
That'll be a lot of places you'll need to touch. More
fundamentally: Why
should we name something a table that's not one?
  
   Well, I'm not convinced that it isn't one.  And adding a new relkind
   will involve a bunch of code churn, too.  But I don't much care to
   pre-litigate this: when someone has got a patch, we can either agree
   that the approach is OK or argue that it is problematic because X.  I
   think we need to hammer down the design in broad strokes first, and
   I'm not sure we're totally there yet.
 
  That's right, I think at this point defining the top level
behaviour/design
  is very important to proceed, we can decide about the better
  implementation approach afterwards (may be once initial patch is ready,
  because it might not be a major work to do it either way).  So here's
where
  we are on this point till now as per my understanding, I think that
direct
  operations should be prohibited on partitions, you think that they
should be
  allowed and Andres think that it might be better to allow direct
operations
  on partitions for Read.

 FWIW in my original proposal I was rejecting some things that after
 further consideration turn out to be possible to allow; for instance
 directly referencing individual partitions in COPY.  We could allow
 something like

 COPY lineitems PARTITION FOR VALUE '2000-01-01' TO STDOUT
 or maybe
 COPY PARTITION FOR VALUE '2000-01-01' ON TABLE lineitems TO STDOUT

or
COPY [TABLE] lineitems PARTITION FOR VALUE '2000-01-01'  TO STDOUT
COPY [TABLE] lineitems PARTITION part_1,part_2,  TO STDOUT

I think we should try to support operations on partitions via main
table whereever it is required.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] thinko in convertToJsonb()

2014-12-09 Thread Michael Paquier
On Tue, Dec 9, 2014 at 11:11 AM, Mark Dilger m...@port25.com wrote:
 The call:

 reserveFromBuffer(buffer, sizeof(VARHDRSZ))

 is assuming that the size of varlena header is the same
 size as the type used to return that size, which happens
 to be so, but someone could easily change that macro
 to:

 #define VARHDRSZ ((int64) sizeof(int32))

 And you'd want to reserve sizeof(int32), not sizeof(int64)
 in convertToJsonb.

 Perhaps the code really meant to say:

 reserveFromBuffer(buffer, VARHDRSZ)

Good catch! The code is indeed incorrect. Attached is a one-line patch
addressing that, I guess someone is going to pick up that sooner or
later.
-- 
Michael


20141210_jsonb_varlena_header.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical column ordering

2014-12-09 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Question on COPY, though: there's reasons why people would want COPY to
 dump in either physical or logical order.  If you're doing COPY to
 create CSV files for output, then you want the columns in logical order.
  If you're doing COPY for pg_dump, then you want them in physical order
 for faster dump/reload.  So we're almost certainly going to need to have
 an option for COPY.

This is complete nonsense, Josh, or at least it is until you can provide
some solid evidence to believe that column ordering would make any
noticeable performance difference in COPY.  I know of no reason to believe
that the existing user-defined-column-ordering option makes any difference.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical column ordering

2014-12-09 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Andrew Dunstan wrote:
 I seriously doubt it, although I could be wrong. Unless someone can show a
 significant performance gain from using physical order, which would be a bit
 of a surprise to me, I would just stick with logical ordering as the
 default.

 Well, we have an optimization that avoids a projection step IIRC by
 using the physical tlist instead of having to build a tailored one.  I
 guess the reason that's there is because somebody did measure an
 improvement.  Maybe it *is* worth having as an option for pg_dump ...

The physical tlist thing is there because it's demonstrable that
ExecProject() takes nonzero time.  COPY does not go through ExecProject
though.  What's more, it already has code to deal with a user-specified
column order, and nobody's ever claimed that that code imposes a
measurable performance overhead.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-12-09 Thread Ashutosh Bapat
We haven't heard anything from Horiguchi-san and Hanada-san for almost a
week. So, I am fine marking it as ready for committer. What do you say?


On Wed, Dec 10, 2014 at 8:48 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
wrote:

 Hi Ashutosh,

 Thanks for the review!

 (2014/11/28 18:14), Ashutosh Bapat wrote:

 On Thu, Nov 27, 2014 at 3:52 PM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:
 (2014/11/17 17:55), Ashutosh Bapat wrote:
 Here are my review comments for patch fdw-inh-3.patch.


  Tests
 ---
 1. It seems like you have copied from testcase inherit.sql to
 postgres_fdw testcase. That's a good thing, but it makes the
 test quite
 long. May be we should have two tests in postgres_fdw contrib
 module,
 one for simple cases, and other for inheritance. What do you say?


  IMO, the test is not so time-consuming, so it doesn't seem worth
 splitting it into two.


  I am not worried about the timing but I am worried about the length of
 the file and hence ease of debugging in case we find any issues there.
 We will leave that for the commiter to decide.


 OK


  Documentation
 
 1. The change in ddl.sgml
 -We will refer to the child tables as partitions, though
 they
 -are in every way normal productnamePostgreSQL/
 tables.
 +We will refer to the child tables as partitions, though
 we assume
 +that they are normal productnamePostgreSQL/ tables.

 adds phrase we assume that, which confuses the intention
 behind the
 sentence. The original text is intended to highlight the
 equivalence
 between partition and normal table, where as the addition
 esp. the
 word assume weakens that equivalence. Instead now we have to
 highlight
 the equivalence between partition and normal or foreign
 table. The
 wording similar to though they are either normal or foreign
 tables
 should be used there.


  You are right, but I feel that there is something out of place in
 saying that there (5.10.2. Implementing Partitioning) because the
 procedure there has been written based on normal tables only.  Put
 another way, if we say that, I think we'd need to add more docs,
 describing the syntax and/or the corresponding examples for
 foreign-table cases.  But I'd like to leave that for another patch.
 So, how about the wording we assume *here* that, instead of we
 assume that, as I added the following notice in the previous
 section (5.10.1. Overview)?

 @@ -2650,7 +2669,10 @@ VALUES ('Albany', NULL, NULL, 'NY');
   table of a single parent table.  The parent table itself is
 normally
   empty; it exists just to represent the entire data set.  You
 should be
   familiar with inheritance (see xref linkend=ddl-inherit)
 before
 -attempting to set up partitioning.
 +attempting to set up partitioning.  (The setup and management of
 +partitioned tables illustrated in this section assume that each
 +partition is a normal table.  However, you can do that in a
 similar way
 +for cases where some or all partitions are foreign tables.)


  This looks ok, though, I would like to see final version of the
 document. But I think, we will leave that for committer to handle.


 OK

  2. The wording some kind of optimization gives vague picture.
 May be
 it can be worded as Since the constraints are assumed to be
 true, they
 are used in constraint-based query optimization like constraint
 exclusion for partitioned tables..
 +Those constraints are used in some kind of query
 optimization such
 +as constraint exclusion for partitioned tables (see
 +xref linkend=ddl-partitioning).


  Will follow your revision.


 Done.

  Code
 ---
 1. In the following change
 +/*
 * acquire_inherited_sample_rows -- acquire sample rows from
 inheritance tree
 *
 * This has the same API as acquire_sample_rows, except that
 rows are
 * collected from all inheritance children as well as the
 specified table.
 - * We fail and return zero if there are no inheritance children.
 + * We fail and return zero if there are no inheritance children
 or
 there are
 + * inheritance children that foreign tables.

 The addition should be there are inheritance children that *are
 all
 *foreign tables. Note the addition are all.


  Sorry, I incorrectly wrote the comment.  What I tried to write is
 We fail and return zero if there are no inheritance children or if
 we are 

Re: [HACKERS] On partitioning

2014-12-09 Thread Amit Langote


On Wed, Dec 10, 2014 at 12:33 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Dec 9, 2014 at 11:44 PM, Josh Berkus j...@agliodbs.com wrote:
 On 12/09/2014 12:17 AM, Amit Langote wrote:
  Now if user wants to define multi-column Partition based on
   monthly_salary and annual_salary, how do we want him to
   specify the values.  Basically how to distinguish which values
   belong to first column key and which one's belong to second
   column key.
  
  Perhaps you are talking about syntactic difficulties that I totally
  missed in my other reply to this mail?
 
  Can we represent the same data by rather using a subpartitioning scheme?
  ISTM, semantics would remain the same.
 
  ... PARTITION BY (monthly_salary) SUBPARTITION BY (annual_salary)?


 Using SUBPARTITION is not the answer for multi-column partition,
 I think if we have to support it for List partitioning then something
 on lines what Josh has mentioned below could workout, but I don't
 think it is important to support multi-column partition for List at this
 stage.  


Yeah, I realize multicolumn list partitioning and list-list composite 
partitioning are different things in many respects. And given how awkward 
multicolumn list partitioning is looking to implement, I also think we only 
allow single column in a list partition key.

 ... or just use arrays.

 PARTITION BY LIST ( monthly_salary, annual_salary )
 PARTITION salary_small VALUES ({[300,400],[5000,6000]})
 ) 

 ... but that begs the question of how partition by list over two columns
 (or more) would even work?  You'd need an a*b number of partitions, and
 the user would be pretty much certain to miss a few value combinations.
  Maybe we should just restrict list partitioning to a single column for
 a first release, and wait and see if people ask for more?


 I also think we should not support multi-column list partition in first
 release.


Yes.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-09 Thread Amit Langote


On Wed, Dec 10, 2014 at 12:46 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Dec 9, 2014 at 7:21 PM, Alvaro Herrera alvhe...@2ndquadrant.com
 wrote:

 Amit Kapila wrote:
  On Tue, Dec 9, 2014 at 1:42 AM, Robert Haas robertmh...@gmail.com
  wrote:
   On Mon, Dec 8, 2014 at 2:56 PM, Andres Freund and...@2ndquadrant.com
  wrote:
I don't think that's mutually exclusive with the idea of
partitions-as-tables.  I mean, you can add code to the ALTER TABLE
path that says if (i_am_not_the_partitioning_root) ereport(ERROR,
...)
wherever you want.
   
That'll be a lot of places you'll need to touch. More fundamentally:
Why
should we name something a table that's not one?
  
   Well, I'm not convinced that it isn't one.  And adding a new relkind
   will involve a bunch of code churn, too.  But I don't much care to
   pre-litigate this: when someone has got a patch, we can either agree
   that the approach is OK or argue that it is problematic because X.  I
   think we need to hammer down the design in broad strokes first, and
   I'm not sure we're totally there yet.
 
  That's right, I think at this point defining the top level
  behaviour/design
  is very important to proceed, we can decide about the better
  implementation approach afterwards (may be once initial patch is ready,
  because it might not be a major work to do it either way).  So here's
  where
  we are on this point till now as per my understanding, I think that
  direct
  operations should be prohibited on partitions, you think that they
  should be
  allowed and Andres think that it might be better to allow direct
  operations
  on partitions for Read.

 FWIW in my original proposal I was rejecting some things that after
 further consideration turn out to be possible to allow; for instance
 directly referencing individual partitions in COPY.  We could allow
 something like

 COPY lineitems PARTITION FOR VALUE '2000-01-01' TO STDOUT
 or maybe
 COPY PARTITION FOR VALUE '2000-01-01' ON TABLE lineitems TO STDOUT

 or
 COPY [TABLE] lineitems PARTITION FOR VALUE '2000-01-01'  TO STDOUT
 COPY [TABLE] lineitems PARTITION part_1,part_2,  TO STDOUT

 I think we should try to support operations on partitions via main
 table whereever it is required.


We can also allow to explicitly name a partition

COPY [TABLE ] lineitems PARTITION lineitems_2001 TO STDOUT;

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers