Re: [HACKERS] contrib modules and relkind check

2017-03-08 Thread Amit Langote
On 2017/03/08 16:47, Michael Paquier wrote:
> On Tue, Mar 7, 2017 at 4:15 PM, Amit Langote wrote:
> +++ b/contrib/pg_visibility/expected/pg_visibility.out
> @@ -0,0 +1,85 @@
> +CREATE EXTENSION pg_visibility;
> +--
> +-- check that using the module's functions with unsupported relations will 
> fail
> +--
> [...]
> +select count(*) > 0 from pg_visibility('regular_table');
> + ?column?
> +--
> + t
> +(1 row)
> Only regular tables are tested as valid objects. Testing toast tables
> is not worth the complication. Could you add as well a matview?

Done in the attached updated patch.

> 
> Except for this small issue the patch looks good to me.

Thanks.

Regards,
Amit
>From 9679618d65b00d65effdd5de019410998e10b3d9 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 24 Jan 2017 13:22:17 +0900
Subject: [PATCH] Add relkind checks to certain contrib modules

Contrib functions such a pg_visibility, pg_relpages are only applicable
to certain relkinds; error out if otherwise.

Also, RELKIND_PARTITIONED_TABLE was not added to the pageinspect's and
pgstattuple's list of unsupported relkinds.

Add tests for the newly added checks.
---
 contrib/pageinspect/expected/page.out|   5 ++
 contrib/pageinspect/rawpage.c|   5 ++
 contrib/pageinspect/sql/page.sql |   5 ++
 contrib/pg_visibility/.gitignore |   4 +
 contrib/pg_visibility/Makefile   |   2 +
 contrib/pg_visibility/expected/pg_visibility.out | 103 +++
 contrib/pg_visibility/pg_visibility.c|  44 +++---
 contrib/pg_visibility/sql/pg_visibility.sql  |  66 +++
 contrib/pgstattuple/expected/pgstattuple.out |  29 +++
 contrib/pgstattuple/pgstatindex.c|  30 +++
 contrib/pgstattuple/pgstattuple.c|   3 +
 contrib/pgstattuple/sql/pgstattuple.sql  |  24 ++
 12 files changed, 306 insertions(+), 14 deletions(-)
 create mode 100644 contrib/pg_visibility/.gitignore
 create mode 100644 contrib/pg_visibility/expected/pg_visibility.out
 create mode 100644 contrib/pg_visibility/sql/pg_visibility.sql

diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 13964cd878..85b183db4d 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -71,3 +71,8 @@ SELECT * FROM fsm_page_contents(get_raw_page('test1', 'fsm', 0));
 (1 row)
 
 DROP TABLE test1;
+-- check that using any of these functions with a partitioned table would fail
+create table test_partitioned (a int) partition by range (a);
+select get_raw_page('test_partitioned', 0);
+ERROR:  cannot get raw page from partitioned table "test_partitioned"
+drop table test_partitioned;
diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index 102f360c6f..1ccc3ff320 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -123,6 +123,11 @@ get_raw_page_internal(text *relname, ForkNumber forknum, BlockNumber blkno)
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("cannot get raw page from foreign table \"%s\"",
 		RelationGetRelationName(rel;
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot get raw page from partitioned table \"%s\"",
+		RelationGetRelationName(rel;
 
 	/*
 	 * Reject attempts to read non-local temporary relations; we would be
diff --git a/contrib/pageinspect/sql/page.sql b/contrib/pageinspect/sql/page.sql
index 97eef9829a..fa3533f2af 100644
--- a/contrib/pageinspect/sql/page.sql
+++ b/contrib/pageinspect/sql/page.sql
@@ -28,3 +28,8 @@ SELECT tuple_data_split('test1'::regclass, t_data, t_infomask, t_infomask2, t_bi
 SELECT * FROM fsm_page_contents(get_raw_page('test1', 'fsm', 0));
 
 DROP TABLE test1;
+
+-- check that using any of these functions with a partitioned table would fail
+create table test_partitioned (a int) partition by range (a);
+select get_raw_page('test_partitioned', 0);
+drop table test_partitioned;
diff --git a/contrib/pg_visibility/.gitignore b/contrib/pg_visibility/.gitignore
new file mode 100644
index 00..5dcb3ff972
--- /dev/null
+++ b/contrib/pg_visibility/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/pg_visibility/Makefile b/contrib/pg_visibility/Makefile
index 379591a098..bc42944426 100644
--- a/contrib/pg_visibility/Makefile
+++ b/contrib/pg_visibility/Makefile
@@ -7,6 +7,8 @@ EXTENSION = pg_visibility
 DATA = pg_visibility--1.1.sql pg_visibility--1.0--1.1.sql
 PGFILEDESC = "pg_visibility - page visibility information"
 
+REGRESS = pg_visibility
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
new file mode 100644
index 00..699d2131b9
--- /dev/null
+++ b/contrib/pg_vi

[HACKERS] Removing #include "postgres.h" from a couple of headers

2017-03-08 Thread Thomas Munro
Hi,

Over in another thread it was pointed out that a patch I submitted
broke a project rule by including "postgres.h" in a header.  Here is a
patch to remove it from dsa.h where I made the same mistake, and also
a case I found in bufmask.h by grepping.

There are also instances in regcustom.h and snowball's header.h -- are
those special cases?

-- 
Thomas Munro
http://www.enterprisedb.com


remove-postgres.h-from-headers.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing #include "postgres.h" from a couple of headers

2017-03-08 Thread Michael Paquier
On Wed, Mar 8, 2017 at 5:55 PM, Thomas Munro
 wrote:
> Over in another thread it was pointed out that a patch I submitted
> broke a project rule by including "postgres.h" in a header.  Here is a
> patch to remove it from dsa.h where I made the same mistake, and also
> a case I found in bufmask.h by grepping.
>
> There are also instances in regcustom.h and snowball's header.h -- are
> those special cases?

--- a/src/include/access/bufmask.h
+++ b/src/include/access/bufmask.h
@@ -17,7 +17,6 @@
 #ifndef BUFMASK_H
 #define BUFMASK_H

-#include "postgres.h"
 #include "storage/block.h"
 #include "storage/bufmgr.h"
Oops. This really escaped me...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Consistency checking for hash indexes

2017-03-08 Thread Kuntal Ghosh
On Fri, Mar 3, 2017 at 9:44 AM, Amit Kapila  wrote:
> On Tue, Feb 28, 2017 at 11:06 AM, Kuntal Ghosh
>  wrote:
>> Hello everyone,
>>
>> I've attached a patch which implements WAL consistency checking for
>> hash indexes. This feature is going to be useful for developing and
>> testing of WAL logging for hash index.
>>
>
> 2.
> + else if ((opaque->hasho_flag & LH_BUCKET_PAGE) ||
> + (opaque->hasho_flag & LH_OVERFLOW_PAGE))
> + {
> + /*
> + * In btree bucket and overflow pages, it is possible to modify the
> + * LP_FLAGS without emitting any WAL record. Hence, mask the line
> + * pointer flags.
> + * See hashgettuple() for details.
> + */
> + mask_lp_flags(page);
> + }
>
> Again, this mechanism is also modified by patch "Microvacuum support
> for hash index", so above changes needs to be adjusted accordingly.
> Comment referring to btree is wrong, you need to refer hash.
I've corrected the text in the comment and re-based the patch on the
latest hash index patch for WAL logging[1]. As discussed in the
thread, Microvacuum patch can be re-based on top of this patch.


[1] 
https://www.postgresql.org/message-id/CAA4eK1%2BmvCucroWQwX3S7aBR%3D0yBJGF%2BjQz4x4Cx9QVsMFTZUw%40mail.gmail.com
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-wal_consistency_checking-for-hash-index_v1.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: psql show index with type info

2017-03-08 Thread Ashutosh Bapat
On Mon, Mar 6, 2017 at 7:54 PM, Amos Bird  wrote:
>
> Hello Stephen,
>
> Well, the prefix is used to differentiate other \d commands, like
> this,
>
> amos=# \ditv
>   List of relations
>  Schema |Name| Type | Owner |  Table
> ++--+---+-
>  public | i  | table| amos  |
>  public | ii | index: gist  | amos  | i
>  public | j  | table| amos  |
>  public | jj | index: gin   | amos  | i
>  public | jp | index: btree | amos  | i
>  public | js | index: brin  | amos  | i
>  public | numbers| table| amos  |
>  public | numbers_mod2   | index: gin   | amos  | numbers
>  public | numbers_mod2_btree | index: btree | amos  | numbers
>  public | ts | table| amos  |
> (10 rows)
>

The header for this table is "list of relations", so type gets
associated with relations indicated type of relation. btree: gin as a
type of relation doesn't sound really great. Instead we might want to
add another column "access method" and specify the access method used
for that relation. But then only indexes seem to have access methods
per pg_class.h.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding the optional clause 'AS' in CREATE TRIGGER

2017-03-08 Thread Okano, Naoki
Peter Eisentraut wrote:
> I have a feeling that this was proposed a few times in the ancient past
> but did not go through because of locking issues.  I can't find any
> emails about it through.  Does anyone remember?  Have you thought about
> locking issues?
Is this e-mail you are finding?
https://www.postgresql.org/message-id/20140916124537.GH25887%40awork2.anarazel.de

I am considering to add 'OR REPLACE' clause as a first step.
At least, I think there is no need to change the locking level when replacing a 
trigger with 'EXECUTE PROCEDURE' clause.
In PostgreSQL, we currently have ShareRowExclusiveLock lock on relation on 
which trigger is created. ShareRowExclusiveLock is enough to replace a trigger.
Also, we currently have RowExclusiveLock on pg_trigger. RowExclusiveLock is 
enough to replace a trigger, too.

Regards,
Okano Naoki
Fujitsu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dropping partitioned tables without CASCADE

2017-03-08 Thread Ashutosh Bapat
>
> About the other statement you changed, I just realized that we should
> perhaps do one more thing.  Show the Number of partitions, even if it's 0.
>  In case of inheritance, the parent table stands on its own when there are
> no child tables, but a partitioned table doesn't in the same sense.  I
> tried to implement that in attached patch 0002.  Example below:
>
> create table p (a int) partition by list (a);
> \d p
> 
> Partition key: LIST (a)
> Number of partitions: 0
>
> \d+ p
> 
> Partition key: LIST (a)
> Number of partitions: 0
>
> create table p1 partition of p for values in (1);
> \d p
> 
> Partition key: LIST (a)
> Number of partitions: 1 (Use \d+ to list them.)
>
> \d+ p
> 
> Partition key: LIST (a)
> Partitions: p1 FOR VALUES IN (1)

I liked that. PFA 0002 updated. I changed one of \d output to \d+ to
better test partitioned tables without partitions in verbose and
non-verbose mode. Also, refactored the your code to have less number
of conditions. Please let me know if it looks good.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


0001-Improve-d-output-of-a-partitioned-table.patch
Description: Binary data


0002-Number-of-partitions-for-a-partitioned-table.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-08 Thread Petr Jelinek
On 07/03/17 23:30, Erik Rijkers wrote:
> On 2017-03-06 11:27, Petr Jelinek wrote:
> 
>> 0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
>> 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
>> 0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch  +
>> 0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch  +
>> 0005-Skip-unnecessary-snapshot-builds.patch+
>> 0001-Logical-replication-support-for-initial-data-copy-v6.patch
> 
> I use three different machines (2 desktop, 1 server) to test logical
> replication, and all three have now at least once failed to correctly
> synchronise a pgbench session (amidst many succesful runs, of course)
> 
> I attach an output-file from the test-program, with the 2 logfiles
> (master+replica) of the failed run.  The outputfile
> (out_20170307_1613.txt) contains the output of 5 runs of
> pgbench_derail2.sh.  The first run failed, the next 4 were ok.
> 
> But that's probably not very useful; perhaps is pg_waldump more useful? 
> From what moment, or leading up to what moment, or period, is a
> pg_waldump(s) useful?  I can run it from the script, repeatedly, and
> only keep the dumped files when things go awry. Would that make sense?

Hi,

yes waldump would be useful, the last segment should be enough, but
possibly all segments mentioned in the log.

The other useful thing would be to turn on log_connections and
log_replication_commands.

And finally if you could dump the contents of pg_subscription_rel,
pg_replication_origin_status on subscriber and pg_replication_slots on
publisher at the end of the failed run that would also help.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-03-08 Thread Ashutosh Sharma
Hi,

I tried to test v8 version of patch. Firstly, I was able to start the
postgresql server process  with 'huge_pages' set to on. I had to
follow the instructions given in MSDN[1] to enable lock pages in
memory option and also had to start the postgresql server process as
admin user.

test=# show huge_pages;
 huge_pages

 on
(1 row)

To start with, I ran the regression test-suite and didn't find any
failures. But, then I am not sure if huge_pages are getting used or
not. However, upon checking the settings for huge_pages and I found it
as 'on'. I am assuming, if huge pages is not being used due to
shortage of large pages, it should have fallen back to non-huge pages.

I also ran the pgbench tests on read-only workload and here are the
results I got.

pgbench -c 4 -j 4 - T 600 bench

huge_pages=on, TPS = 21120.768085
huge_pages=off, TPS = 20606.288995

[1] - https://msdn.microsoft.com/en-IN/library/ms190730.aspx

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Thu, Feb 23, 2017 at 12:59 PM, Tsunakawa, Takayuki
 wrote:
> From: Amit Kapila [mailto:amit.kapil...@gmail.com]
>> > Hmm, the large-page requires contiguous memory for each page, so this
>> error could occur on a long-running system where the memory is heavily
>> fragmented.  For example, please see the following page and check the memory
>> with RAMMap program referred there.
>> >
>>
>> I don't have RAMMap and it might take some time to investigate what is going
>> on, but I think in such a case even if it works we should keep the default
>> value of huge_pages as off on Windows.  I request somebody else having
>> access to Windows m/c to test this patch and if it works then we can move
>> forward.
>
> You are right.  I modified the patch so that the code falls back to the 
> non-huge page when CreateFileMapping() fails due to the shortage of large 
> pages.  That's what the Linux version does.
>
> The other change is to parameterize the Win32 function names in the messages 
> in EnableLockPagePrivileges().  This is to avoid adding almost identical 
> messages unnecessarily.  I followed Alvaro's comment.  I didn't touch the two 
> existing sites that embed Win32 function names.  I'd like to leave it up to 
> the committer to decide whether to change as well, because changing them 
> might make it a bit harder to apply some bug fixes to earlier releases.
>
> FYI, I could reproduce the same error as Amit on 32-bit Win7, where the total 
> RAM is 3.5 GB and available RAM is 2 GB.  I used the attached largepage.c.  
> Immediately after the system boot, I could only allocate 8 large pages.  When 
> I first tried to allocate 32 large pages, the test program produced:
>
> large page size = 2097152
> allocating 32 large pages...
> CreateFileMapping failed: error code = 1450
>
> You can build the test program as follows:
>
> cl largepage.c advapi32.lib
>
> Regards
> Takayuki Tsunakawa
>
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dropping partitioned tables without CASCADE

2017-03-08 Thread Amit Langote
On 2017/03/08 18:27, Ashutosh Bapat wrote:
>>
>> About the other statement you changed, I just realized that we should
>> perhaps do one more thing.  Show the Number of partitions, even if it's 0.
>>  In case of inheritance, the parent table stands on its own when there are
>> no child tables, but a partitioned table doesn't in the same sense.  I
>> tried to implement that in attached patch 0002.  Example below:
>>
>> create table p (a int) partition by list (a);
>> \d p
>> 
>> Partition key: LIST (a)
>> Number of partitions: 0
>>
>> \d+ p
>> 
>> Partition key: LIST (a)
>> Number of partitions: 0
>>
>> create table p1 partition of p for values in (1);
>> \d p
>> 
>> Partition key: LIST (a)
>> Number of partitions: 1 (Use \d+ to list them.)
>>
>> \d+ p
>> 
>> Partition key: LIST (a)
>> Partitions: p1 FOR VALUES IN (1)
> 
> I liked that. PFA 0002 updated. I changed one of \d output to \d+ to
> better test partitioned tables without partitions in verbose and
> non-verbose mode. Also, refactored the your code to have less number
> of conditions. Please let me know if it looks good.

Thanks, looks good.

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-03-08 Thread Thomas Munro
On Thu, Feb 23, 2017 at 8:29 PM, Tsunakawa, Takayuki
 wrote:
> [win_large_pages_v8.patch]

+Huge pages are known as large pages on Windows.  To use them,
you need to
+assign the user right Lock Pages in Memory to the Windows user account
+that runs PostgreSQL.
+Huge pages are known as large pages on Windows.  To use them,
you need to
+To start the database server on the command prompt as a
standalone process,
+not as a Windows service, run the command prompt as an administrator or
+disable the User Access Control (UAC). When the UAC is
enabled, the normal
+command prompt revokes the user right Lock Pages in Memory.

The line beginning 'Huge pages are known as...' has been accidentally
duplicated.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> On linux/gcc the patch generates a warning in nodeAgg.c that is
 Mark> fairly easy to fix.  Using -Werror to make catching the error
 Mark> easier, I get:

what gcc version is this exactly?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioned tables and relfilenode

2017-03-08 Thread Amit Langote
On 2017/03/08 2:27, Robert Haas wrote:
> On Tue, Mar 7, 2017 at 12:11 AM, Ashutosh Bapat
>  wrote:
>> I see that all the changes by Amit and myself to what was earlier 0003
>> patch are now part of 0002 patch. The patch looks ready for committer.
> 
> Reviewing 0002:

Thanks for the review.

> 
> This patch seems to have falsified the header comments for
> expand_inherited_rtentry.

Oops, updated.

> I don't quite understand the need for the change to set_rel_size().
> The rte->inh case is handled before reaching the new code, and IIUC
> the !rte->inh case should never happen given the changes to
> expand_inherited_rtentry.  Or am I confused?

In expand_inherited_rtentry(), patch only changes the rule about the
minimum number of child RTEs required to keep rte->inh true.  In the
traditional case, it is 2 (the table itself and at least one child).  For
a partitioned table, since we don't want to scan the table itself, that
becomes 1 (at least one leaf partition).

So, it's still possible for rte->inh to become false if the required
minimum is not met, even for partitioned tables.

> If not, I think we
> should just add an Assert() to the "plain relation" case that this is
> not RELKIND_PARTITIONED_TABLE (with a comment explaining why it can't
> happen).

How about adding Assert(rte->relkind != RELKIND_PARTITIONED_TABLE) at the
beginning of the following functions as the updated patch does:

set_plain_rel_size()
set_tablesample_rel_size()
set_plain_rel_pathlist()
set_tablesample_rel_pathlist()

> In inheritance_planner, this comment addition does not seem adequate:
> 
> + * If the parent is a partitioned table, we already set the nominal
> + * relation.
> 
> That basically contradicts the previous paragraph.  I think you need
> to adjust the previous paragraph instead of adding a new one, probably
> in multiple places.  For example, the parenthesized bit now only
> applies in the non-partitioned case.

Hmm, that's true.  I rewrote the entire comment.

> 
> -rel = mtstate->resultRelInfo->ri_RelationDesc;
> +nominalRTE = rt_fetch(node->nominalRelation, estate->es_range_table);
> +nominalRel = heap_open(nominalRTE->relid, NoLock);
> 
> No lock?

Hmm, I think I missed that a partitioned parent table would not be locked
by the *executor* at all after applying this patch.  As of now, InitPlan()
takes care of locking all the result relations in the
PlannedStmt.resultRelations list.  This patch removes partitioned
parent(s) from appearing in this list.  But that makes me wonder - aren't
the locks taken by expand_inherited_rtentry() kept long enough?  Why does
InitPlan need to acquire them again?  I see this comment in
expand_inherited_rtentry:

   * If the parent relation is the query's result relation, then we need
   * RowExclusiveLock.  Otherwise, if it's accessed FOR UPDATE/SHARE, we
   * need RowShareLock; otherwise AccessShareLock.  We can't just grab
   * AccessShareLock because then the executor would be trying to upgrade
   * the lock, leading to possible deadlocks.  (This code should match the
   * parser and rewriter.)

So, I assumed all the necessary locking is being taken care of here.

Anyway, I changed NoLock to RowExclusiveLock here for now, but maybe it's
either not necessary or a way should be found to do that in InitPlan along
with other result relations.

> Another thing that bothers me about this is that, apparently, the use
> of nominalRelation is no longer strictly for EXPLAIN, as the comments
> in plannodes.h/relation.h still claim that it is.  I'm not sure how to
> adjust that exactly; there's not a lot of room in those comments to
> give all the details.  Maybe they should simply say something like /*
> Parent RT index */ instead of /* Parent RT index for use of EXPLAIN
> */.  But we can't just go around changing the meaning of things
> without updating the comments accordingly.

It seems that this comment and one more need to be updated.  The other
comment change is in explain.c as follows:

  * Adds the relid of each referenced RTE to *rels_used.  The result controls
  * which RTEs are assigned aliases by select_rtable_names_for_explain.
  * This ensures that we don't confusingly assign un-suffixed aliases to RTEs
- * that never appear in the EXPLAIN output (such as inheritance parents).
+ * that never appear in the EXPLAIN output (such as non-partitioned
+ * inheritance parents).
  */
 static bool
 ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)

I studied the EXPLAIN code a bit to see whether the problem cited for
using inheritance parent RTE as nominalRelation (for example, in comments
in inheritance_planner) apply to the partitioned parent RTE case, which it
doesn't.  The partitioned parent RTE won't appear anywhere else in the
plan.  So, different aliases for what's the same table/RTE won't happen in
the partitioned parent case.

> A related question is
> whether we should really be using nominalRelation for this or whether
> there's some o

Re: [HACKERS] wait events for disk I/O

2017-03-08 Thread Rushabh Lathia
On Wed, Mar 8, 2017 at 8:23 AM, Robert Haas  wrote:

> On Tue, Mar 7, 2017 at 9:32 PM, Amit Kapila 
> wrote:
> > On Tue, Mar 7, 2017 at 9:16 PM, Robert Haas 
> wrote:
> >> On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila 
> wrote:
> >>> Sure, if you think both Writes and Reads at OS level can have some
> >>> chance of blocking in obscure cases, then we should add a wait event
> >>> for them.
> >>
> >> I think writes have a chance of blocking in cases even in cases that
> >> are not very obscure at all.
> >
> > Point taken for writes, but I think in general we should have some
> > criteria based on which we can decide whether to have a wait event for
> > a particular call. It should not happen that we have tons of wait
> > events and out of which, only a few are helpful in most of the cases
> > in real-world scenarios.
>
> Well, the problem is that if you pick and choose which wait events to
> add based on what you think will be common, you're actually kind of
> hosing yourself. Because now when something uncommon happens, suddenly
> you don't get any wait event data and you can't tell what's happening.
> I think the number of new wait events added by Rushabh's patch is
> wholly reasonable.  Yeah, some of those are going to be a lot more
> common than others, but so what?  We add wait events so that we can
> find out what's going on.  I don't want to sometimes know when a
> backend is blocked on an I/O.  I want to ALWAYS know.
>
>
Yes, I agree with Robert. Knowing what we want and what we don't
want is difficult to judge. Something which we might think its not useful
information, and later of end up into situation where we re-think about
adding those missing stuff is not good. Having more information about
the system, specially for monitoring purpose is always good.

I am attaching  another version of the patch, as I found stupid mistake
in the earlier version of patch, where I missed to initialize initial value
to
WaitEventIO enum. Also earlier version was not getting cleanly apply on
the current version of sources.



-- 
Rushabh Lathia
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


wait_event_for_disk_IO_v2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel seq. plan is not coming against inheritance or partition table

2017-03-08 Thread Amit Kapila
On Wed, Mar 8, 2017 at 8:28 AM, Robert Haas  wrote:
> On Tue, Mar 7, 2017 at 9:24 PM, Amit Kapila  wrote:
>
>> I think it can give us benefit in
>> such cases as well (especially when we have to discard rows based heap
>> rows).  Now, consider it from another viewpoint, what if there are
>> enough index pages (> min_parallel_index_scan_size) but not sufficient
>> heap pages.  I think in such cases parallel index (only) scans will be
>> beneficial especially because in the parallel index only scans
>> heap_pages could be very less or possibly could be zero.
>
> That's a separate problem.  I think we ought to consider having an
> index-only scan pass -1 for the number of heap pages, so that the
> degree of parallelism in that case is limited only by the number of
> index pages.
>

Sure, that sounds sensible, but even after that, I am not sure if for
plain index scans it is a good idea to not choose parallelism if the
number of heap pages is lesser than min_parallel_table_scan_size even
though the number of index pages is greater than
min_parallel_index_scan_size.  I think we can try out some tests
(maybe TPC-H queries where parallel index scan gets picked up) to see
what is right here.  Do you think it will be bad if just commit your
patch without this change and then consider changing it separately?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION

2017-03-08 Thread Masahiko Sawada
On Wed, Feb 8, 2017 at 12:04 AM, Fujii Masao  wrote:
> On Tue, Feb 7, 2017 at 2:13 AM, Petr Jelinek
>  wrote:
>> On 06/02/17 17:33, Fujii Masao wrote:
>>> On Sun, Feb 5, 2017 at 5:11 AM, Petr Jelinek
>>>  wrote:
 On 03/02/17 19:38, Fujii Masao wrote:
> On Sat, Feb 4, 2017 at 12:49 AM, Fujii Masao  
> wrote:
>> On Fri, Feb 3, 2017 at 10:56 AM, Kyotaro HORIGUCHI
>>  wrote:
>>> At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao  
>>> wrote in 
>>> 
 On Thu, Feb 2, 2017 at 2:36 PM, Michael Paquier
  wrote:
> On Thu, Feb 2, 2017 at 2:13 PM, Tom Lane  wrote:
>> Kyotaro HORIGUCHI  writes:
>>> Then, the reason for the TRY-CATCH cluase is that I found that
>>> some functions called from there can throw exceptions.
>>
>> Yes, but all LWLocks should be released by normal error recovery.
>> It should not be necessary for this code to clean that up by hand.
>> If it were necessary, there would be TRY-CATCH around every single
>> LWLockAcquire in the backend, and we'd have an unreadable and
>> unmaintainable system.  Please don't add a TRY-CATCH unless it's
>> *necessary* -- and you haven't explained why this one is.

 Yes.
>>>
>>> Thank you for the suggestion. I minunderstood that.
>>>
> Putting hands into the code and at the problem, I can see that
> dropping a subscription on a node makes it unresponsive in case of a
> stop. And that's just because calls to LWLockRelease are missing as in
> the patch attached. A try/catch problem should not be necessary.

 Thanks for the patch!

 With the patch, LogicalRepLauncherLock is released at the end of
 DropSubscription(). But ISTM that the lock should be released just 
 after
 logicalrep_worker_stop() and there is no need to protect the removal of
 replication slot with the lock.
>>>
>>> That's true. logicalrep_worker_stop returns after confirmig that
>>> worker->proc is cleard, so no false relaunch cannot be caused.
>>> After all, logicalrep_worker_stop is surrounded by
>>> LWLockAcquire/Relase pair. So it can be moved into the funciton
>>> and make the lock secrion to be more narrower.
>
> If we do this, Assert(LWLockHeldByMe(LogicalRepLauncherLock)) should be
> removed and the comment for logicalrep_worker_stop() should be updated.
>
> Your approach may cause the deadlock. The launcher takes 
> LogicalRepWorkerLock
> while holding LogicalRepLauncherLock. OTOH, with your approach,
> logicalrep_worker_stop() takes LogicalRepLauncherLock while holding
> LogicalRepWorkerLock.
>
> Therefore I pushed the simple patch which adds LWLockRelease() just after
> logicalrep_worker_stop().
>
> Another problem that I found while reading the code is that the launcher 
> can
> start up the worker with the subscription that DROP SUBSCRIPTION just 
> removed.
> That is, DROP SUBSCRIPTION removes the target entry from pg_subscription,
> but the launcher can see it and start new worker until the transaction for
> DROP has been committed.
>

 That was the reason why DropSubscription didn't release the lock in the
 first place. It was supposed to be released at the end of the
 transaction though.
>>>
>>> OK, I understood why you used the lock in that way. But using LWLock
>>> for that purpose is not valid.
>>>
>>
>> Yeah, I just tried to avoid what we are doing now really hard :)
>>
> To fix this issue, I think that DROP SUBSCRIPTION should take
> AccessExclusiveLock on pg_subscription, instead of RowExclusiveLock,
> so that the launcher cannot see the entry to be being removed.
>

 The whole point of having LogicalRepLauncherLock is to avoid having to
 do this, so if we do this we could probably get rid of it.
>>>
>>> Yes, let's remove LogicalRepLauncherLock and lock pg_subscription
>>> with AccessExclusive mode at the beginning of DROP SUBSCRIPTION.
>>> Attached patch does this.
>>>
>>
>> Okay, looks reasonable to me.
>
> Thanks for the review!
> But ISMT that I should suspend committing the patch until we fix the issue
> that Sawada reported in other thread. That bugfix may change the related
> code and design very much.
> https://www.postgresql.org/message-id/cad21aod+vo93zz4zqtzqb-jz_wmko3oggdx1mxo4t+8q_zh...@mail.gmail.com
>

That patch has been committed. And this issue still happens. Should we
add this to the open item list so it doesn't get missed?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-08 Thread Amit Kapila
On Wed, Mar 8, 2017 at 5:11 AM, Robert Haas  wrote:
>
> Here are some initial review thoughts on 0003 based on a first read-through.
>
>
> +/*
> + * we need to release and reacquire the lock on overflow buffer to ensure
> + * that standby shouldn't see an intermediate state of it.
> + */
> +LockBuffer(ovflbuf, BUFFER_LOCK_UNLOCK);
> +LockBuffer(ovflbuf, BUFFER_LOCK_EXCLUSIVE);
>
> I still think this is a bad idea.  Releasing and reacquiring the lock
> on the master doesn't prevent the standby from seeing intermediate
> states; the comment, if I understand correctly, is just plain wrong.
>

I can remove this release and reacquire stuff, but first, let me try
to explain what intermediate state I am talking here.  If we don't
have a release/reacquire lock here, standby could see an empty
overflow page at that location whereas master will be able to see the
new page only after the insertion of tuple/'s in that bucket is
finished.  I understand that there is not much functionality impact of
this change but still wanted to make you understand why I have added
this code.  I am okay with removing this code if you still feel this
is just a marginal case and we should not bother about it. Let me know
what you think?

>
> + * Initialise the freed overflow page, here we can't complete zeroed the
>
> Don't use British spelling, and don't use a comma to join two
> sentences.  Also "here we can't complete zeroed" doesn't seem right; I
> don't know what that means.

What the comment means to say is that we can't initialize page as zero
(something like below)
MemSet(ovflpage, 0, BufferGetPageSize(ovflbuf));

typo in sentence
/complete/completely

>
> + * To replay meta page changes, we can log the entire metapage which
> + * doesn't seem advisable considering size of hash metapage or we can
> + * log the individual updated values which seems doable, but we 
> prefer
> + * to perform exact operations on metapage during replay as are done
> + * during actual operation.  That looks straight forward and has an
> + * advantage of using lesser space in WAL.
>
> This sounds like it is describing two alternatives that were not
> selected and then the one that was selected, but I don't understand
> the difference between the second non-selected alternative and what
> was actually picked.
>

Actually, the second alternative was picked.

> I'd just write "We need only log the one field
> from the metapage that has changed." or maybe drop the comment
> altogether.
>

I have added the comment so that others can understand why we don't
want to log the entire meta page (I think in some other indexes, we
always log the entire meta page, so such a question can arise).
However, after reading your comment, I think it is overkill and I
prefer to drop the entire comment unless you feel differently.

>
> +/*
> + * We need to release and if required reacquire the lock on
> + * rbuf to ensure that standby shouldn't see an intermediate
> + * state of it.  If we don't release the lock, after replay 
> of
> + * XLOG_HASH_MOVE_PAGE_CONTENTS on standby users will
> be able to
> + * view the results of partial deletion on rblkno.
> + */
> +LockBuffer(rbuf, BUFFER_LOCK_UNLOCK);
>
> If you DO release the lock, this will STILL be true, because what
> matters on the standby is what the redo code does.
>

That's right, what I intend to do here is to release the lock in
master where it will be released in standby.  In this case, we can't
ensure what user can see in master is same as in standby after this
WAL record is replayed, because in master we have exclusive lock on
write buf, so no one can read the contents of read buf (the location
of read buf will be after write buf) whereas, in standby, it will be
possible to read the contents of the read buf.  I think this is not a
correctness issue, so we might want to leave as it is, what do you
say?

>
> + * We can log the exact changes made to meta page, however as no
> + * concurrent operation could see the index during the replay of this
> + * record, we can perform the operations during replay as they are
> + * done here.
>
> Don't use a comma to join two sentences.  Also, I don't understand
> anything up to the "however".
>

I mean the below changes made to meta buf (refer existing code):
metap->hashm_mapp[metap->hashm_nmaps] = num_buckets + 1;

metap->hashm_nmaps++;

I think it will be clear if you look both the DO and REDO operation of
XLOG_HASH_INIT_BITMAP_PAGE.  Let me know if you still think that the
comment needs to be changed?

>
> + * Set pd_lower just past the end of the metadata.  This is to log
> + * full_page_image of metapage in xloginsert.c.
>
> Why the underscores?
>
> + * won't be a leak on standby, as next split will consume this space.
>

No 

Re: [HACKERS] Foreign Join pushdowns not working properly for outer joins

2017-03-08 Thread David Rowley
On 7 March 2017 at 01:22, Ashutosh Bapat
 wrote:
> On Mon, Mar 6, 2017 at 1:29 PM, David Rowley
>  wrote:
>> On 6 March 2017 at 18:51, Etsuro Fujita  wrote:
>>> On 2017/03/06 11:05, David Rowley wrote:
>> It seems like a much better idea to keep the server option processing
>> in one location, which is what I did.
>
> I agree with this. However
> 1. apply_server_options() is parsing the options strings again and
> again, which seems wastage of CPU cycles. It should probably pick up
> the options from one of the joining relations. Also, the patch calls
> GetForeignServer(), which is not needed; in foreign_join_ok(), it will
> copy it from the outer relation anyway.
> 2. Some server options like use_remote_estimate and fetch_size are
> overridden by corresponding table level options. For a join relation
> the values of these options are derived by some combination of
> table-level options.

This seems much more sane. I'd failed to find the code which takes the
largest fetch_size.

> I think we should write a separate function
> apply_fdw_options_for_joinrel() and pass the fpinfos of joinrel, outer
> and inner relation. The function will copy the values of server level
> options and derive values for table level options. We would add a note
> there to keep this function in sync with apply_*_options(). I don't
> think there's any better way to keep the options in sync for base
> relations and join relations.
>
> Here's the patch attached.

Agreed. It seems like a good idea to keep that logic in a single location

I've beaten your patch around a bit and come up with the attached.

The changes from yours are mostly cosmetic, but I've also added a
regression test too.

What do you think?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


foreign_outerjoin_pushdown_fix_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-08 Thread Rajkumar Raghuwanshi
On Wed, Mar 8, 2017 at 4:50 PM, Rushabh Lathia 
wrote:

> I am attaching  another version of the patch, as I found stupid mistake
> in the earlier version of patch, where I missed to initialize initial
> value to
> WaitEventIO enum. Also earlier version was not getting cleanly apply on
> the current version of sources.
>

I have applied attached patch, set shared_buffers to 128kB and ran pgbench,
I am able to see below distinct IO events.

--with ./pgbench -i -s 500 postgres
application_namewait_event_typewait_event  query
 pgbench  IO  ExtendDataBlock
copy pgbench_account
 pgbench  IO  WriteXLog
copy pgbench_account
 pgbench  IO  WriteDataBlock
 copy pgbench_account
 pgbench  IO  ReadDataBlock
 vacuum analyze pgben
 pgbench  IO  ReadBuffile
   alter table pgbench_

--with ./pgbench -T 600 postgres (readwrite)
application_namewait_event_typewait_event  query
 pgbench IO   ReadDataBlock
 UPDATE pgbench_accou
 pgbench IO   WriteDataBlock
UPDATE pgbench_telle
IO   SyncDataBlock

 pgbench IO   SyncDataBlock
 UPDATE pgbench_telle
IO   SyncDataBlock
 autovacuum: VACUUM A
 pgbench IO   WriteXLog
END;
 pgbench IO   ExtendDataBlock
copy pgbench_account

--with ./pgbench -T 600 -S postgres (select only)
application_namewait_event_typewait_event  query
 pgbench IO   ReadDataBlock
 SELECT abalance FROM

Attached excel with all IO event values.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


wait_event_for_disk_io.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 1:49 AM, Dilip Kumar  wrote:
> On Tue, Mar 7, 2017 at 10:10 PM, Dilip Kumar  wrote:
>>> It's not about speed.  It's about not forgetting to prefetch.  Suppose
>>> that worker 1 becomes the prefetch worker but then doesn't return to
>>> the Bitmap Heap Scan node for a long time because it's busy in some
>>> other part of the plan tree.  Now you just stop prefetching; that's
>>> bad.  You want prefetching to continue regardless of which workers are
>>> busy doing what; as long as SOME worker is executing the parallel
>>> bitmap heap scan, prefetching should continue as needed.
>>
>> Right, I missed this part. I will fix this.
> I have fixed this part, after doing that I realised if multiple
> processes are prefetching then it may be possible that in boundary
> cases (e.g. suppose prefetch_target is 3 and prefetch_pages is at 2)
> there may be some extra prefetch but finally those prefetched blocks
> will be used.  Another, solution to this problem is that we can
> increase the prefetch_pages in advance then call tbm_shared_iterate,
> this will avoid extra prefetch.  But I am not sure what will be best
> here.

I don't think I understand exactly why this system might be prone to a
little bit of extra prefetching - can you explain further? I don't
think it will hurt anything as long as we are talking about a small
number of extra prefetches here and there.

> Fixed

Committed 0001.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

2017-03-08 Thread Stephen Frost
* Pavan Deolasee (pavan.deola...@gmail.com) wrote:
> On Wed, Mar 8, 2017 at 7:33 AM, Robert Haas  wrote:
> 
> > On Tue, Mar 7, 2017 at 4:26 PM, Stephen Frost  wrote:
> > > Right, that's what I thought he was getting at and my general thinking
> > > was that we would need a way to discover if a CIC is ongoing on the
> > > relation and therefore heap_page_prune(), or anything else, would know
> > > that it can't twiddle the bits in the VM due to the ongoing CIC.
> > > Perhaps a lock isn't the right answer there, but it would have to be
> > > some kind of cross-process communication that operates at a relation
> > > level..
> >
> > Well, I guess that's one option.  I lean toward the position already
> > taken by Andres and Peter, namely, that it's probably not a great idea
> > to pursue this optimization.
> 
> Fair point. I'm not going to "persist" with the idea too long. It seemed
> like a good, low-risk feature to me which can benefit certain use cases
> quite reasonably. It's not uncommon to create indexes (or reindex existing
> indexes to remove index bloats) on extremely large tables and avoiding a
> second heap scan can hugely benefit such cases. Holding up the patch for
> something for which we don't even have a proposal yet seemed a bit strange
> at first, but I see the point.

I'm not really sure that I do.  CIC's will not be so frequent that not
allowing VM updates during their operation should really result in a
huge drag on the system, imv.  Of course, if the only way to realize a
CIC is happening on the table is to perform some expensive operation and
we have to do that every time then it's not going to be worth it, but
I'm not sure that's really the case here.

The issue here, as I understand it at least, is to come up with a way
that we can make sure to not do anything which would screw up the CIC
while it's running, that we can detect very cheaply so we don't slow
things down during normal non-CIC-running periods.  I suggested a new
lock, in part because anything that's updating the VM is going to have
to have some kind of lock anyway, and perhaps going for the "heavier"
lock that would conflict with the CIC, in an optimistic manner, would
make the normal non-CIC-running case essentially the same speed.  If a
CIC was running then the attempt to acquire the lock would fail and
extra time would be spent acquiring a lower-weight lock, of course, but
that's going to happen relatively rarely.

> Anyways, for a recently vacuumed table of twice the size of RAM and on a
> machine with SSDs, the patched CIC version runs about 25% faster. That's
> probably the best case scenario.

I agree, that certainly seems like a very nice performance improvement.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Removing #include "postgres.h" from a couple of headers

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 3:55 AM, Thomas Munro
 wrote:
> Over in another thread it was pointed out that a patch I submitted
> broke a project rule by including "postgres.h" in a header.  Here is a
> patch to remove it from dsa.h where I made the same mistake, and also
> a case I found in bufmask.h by grepping.

Committed.

> There are also instances in regcustom.h and snowball's header.h -- are
> those special cases?

I will leave this question to someone wiser (or more self-assured) than I.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding the optional clause 'AS' in CREATE TRIGGER

2017-03-08 Thread Peter Eisentraut
On 3/8/17 04:12, Okano, Naoki wrote:
> Peter Eisentraut wrote:
>> I have a feeling that this was proposed a few times in the ancient past
>> but did not go through because of locking issues.  I can't find any
>> emails about it through.  Does anyone remember?  Have you thought about
>> locking issues?
> Is this e-mail you are finding?
> https://www.postgresql.org/message-id/20140916124537.GH25887%40awork2.anarazel.de

No, that's not the one I had in mind.

> I am considering to add 'OR REPLACE' clause as a first step.
> At least, I think there is no need to change the locking level when replacing 
> a trigger with 'EXECUTE PROCEDURE' clause.
> In PostgreSQL, we currently have ShareRowExclusiveLock lock on relation on 
> which trigger is created. ShareRowExclusiveLock is enough to replace a 
> trigger.
> Also, we currently have RowExclusiveLock on pg_trigger. RowExclusiveLock is 
> enough to replace a trigger, too.

I'm not saying it's not correct.  I was just wondering.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC

2017-03-08 Thread Stephen Frost
* Andres Freund (and...@anarazel.de) wrote:
> On 2017-03-07 21:38:40 -0500, Robert Haas wrote:
> > > I wonder however, if careful snapshot managment couldn't solve this as
> > > well.  I have *not* thought a lot about this, but afaics we can easily
> > > prevent all-visible from being set in cases it'd bother us by having an
> > > "appropriate" xmin / registered snapshot.
> > 
> > Yeah, but that's a tax on the whole system.
> 
> I'm not sure I can buy that argument. CIC *already* holds a snapshot
> during each of the two scans. By reducing the amount of time that's held
> in the second scan, the systemwide impact is reduced, because it's held
> for a shorter amount of time.  We need to hold a slightly "more
> aggressive" snapshot, that's true, but if you have high xid throughput
> those effects should roughly balance each other.

For my 2c, at least, this certainly sounds like a potentially promising
approach too and I tend to agree with Andres on it not really being an
issue to have a more aggressive snapshot for the duration of the CIC.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel seq. plan is not coming against inheritance or partition table

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 7:04 AM, Amit Kapila  wrote:
> On Wed, Mar 8, 2017 at 8:28 AM, Robert Haas  wrote:
>> On Tue, Mar 7, 2017 at 9:24 PM, Amit Kapila  wrote:
>>> I think it can give us benefit in
>>> such cases as well (especially when we have to discard rows based heap
>>> rows).  Now, consider it from another viewpoint, what if there are
>>> enough index pages (> min_parallel_index_scan_size) but not sufficient
>>> heap pages.  I think in such cases parallel index (only) scans will be
>>> beneficial especially because in the parallel index only scans
>>> heap_pages could be very less or possibly could be zero.
>>
>> That's a separate problem.  I think we ought to consider having an
>> index-only scan pass -1 for the number of heap pages, so that the
>> degree of parallelism in that case is limited only by the number of
>> index pages.
>
> Sure, that sounds sensible, but even after that, I am not sure if for
> plain index scans it is a good idea to not choose parallelism if the
> number of heap pages is lesser than min_parallel_table_scan_size even
> though the number of index pages is greater than
> min_parallel_index_scan_size.  I think we can try out some tests
> (maybe TPC-H queries where parallel index scan gets picked up) to see
> what is right here.  Do you think it will be bad if just commit your
> patch without this change and then consider changing it separately?

No, I think that would be fine.  I think that we need to commit
something like what I proposed because the earlier commit broke
something that used to work.  That's got to get fixed.  Now if we
subsequently find out that what we've implemented can be improved in
some way, we can consider those changes then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: psql show index with type info

2017-03-08 Thread Stephen Frost
Ashutosh,

* Ashutosh Bapat (ashutosh.ba...@enterprisedb.com) wrote:
> On Mon, Mar 6, 2017 at 7:54 PM, Amos Bird  wrote:
> > Well, the prefix is used to differentiate other \d commands, like
> > this,
> >
> > amos=# \ditv
> >   List of relations
> >  Schema |Name| Type | Owner |  Table
> > ++--+---+-
> >  public | i  | table| amos  |
> >  public | ii | index: gist  | amos  | i
> >  public | j  | table| amos  |
> >  public | jj | index: gin   | amos  | i
> >  public | jp | index: btree | amos  | i
> >  public | js | index: brin  | amos  | i
> >  public | numbers| table| amos  |
> >  public | numbers_mod2   | index: gin   | amos  | numbers
> >  public | numbers_mod2_btree | index: btree | amos  | numbers
> >  public | ts | table| amos  |
> > (10 rows)
> 
> The header for this table is "list of relations", so type gets
> associated with relations indicated type of relation. btree: gin as a
> type of relation doesn't sound really great.

The type is 'index', we're just adding a sub-type here to clarify the
kind of index it is.

> Instead we might want to
> add another column "access method" and specify the access method used
> for that relation. But then only indexes seem to have access methods
> per pg_class.h.

Right, I don't think having an extra column which is going to be NULL a
large amount of the time is good.  The approach taken by Amos seems like
a good one to me, to have the type still be 'index' but with a sub-type
indicating the access method.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-08 Thread Peter Eisentraut
On 3/6/17 09:38, Peter Eisentraut wrote:
> On 3/4/17 01:45, Petr Jelinek wrote:
>> If that's the case, the attached should fix it, but I have no way of
>> testing it on windows, I can only say that it still works on my machine
>> so at least it hopefully does not make things worse.
> 
> Committed that.  Let's see how it goes.

So that didn't work.  Now we probably need someone to dig into that host
directly.

Andrew, could you help?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioned tables and relfilenode

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 5:36 AM, Amit Langote
 wrote:
>> I don't quite understand the need for the change to set_rel_size().
>> The rte->inh case is handled before reaching the new code, and IIUC
>> the !rte->inh case should never happen given the changes to
>> expand_inherited_rtentry.  Or am I confused?
>
> In expand_inherited_rtentry(), patch only changes the rule about the
> minimum number of child RTEs required to keep rte->inh true.  In the
> traditional case, it is 2 (the table itself and at least one child).  For
> a partitioned table, since we don't want to scan the table itself, that
> becomes 1 (at least one leaf partition).
>
> So, it's still possible for rte->inh to become false if the required
> minimum is not met, even for partitioned tables.

OK.

>> -rel = mtstate->resultRelInfo->ri_RelationDesc;
>> +nominalRTE = rt_fetch(node->nominalRelation, estate->es_range_table);
>> +nominalRel = heap_open(nominalRTE->relid, NoLock);
>>
>> No lock?
>
> Hmm, I think I missed that a partitioned parent table would not be locked
> by the *executor* at all after applying this patch.  As of now, InitPlan()
> takes care of locking all the result relations in the
> PlannedStmt.resultRelations list.  This patch removes partitioned
> parent(s) from appearing in this list.  But that makes me wonder - aren't
> the locks taken by expand_inherited_rtentry() kept long enough?  Why does
> InitPlan need to acquire them again?  I see this comment in
> expand_inherited_rtentry:

Parse-time locks, plan-time locks, and execution-time locks are all
separate.  See the header comments for AcquireRewriteLocks in
rewriteHandler.c; think about a view, where the parsed query has been
stored and is later rewritten and planned.  Similarly, a plan can be
reused even after the transaction that created that plan has
committed; see AcquireExecutorLocks in plancache.c.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256

2017-03-08 Thread Robert Haas
On Tue, Mar 7, 2017 at 10:01 PM, Michael Paquier
 wrote:
> RFC5802 (https://tools.ietf.org/html/rfc5802) regarding the
> implementation of SCRAM, needs to have passwords normalized as per
> RFC4013 (https://tools.ietf.org/html/rfc4013) using SASLprep, which is
> actually NFKC. I have previously described what happens in this case
> here:
> https://www.postgresql.org/message-id/CAB7nPqScgwh6Eu4=c-0l7-cufzru5rultsdpmoowiz1yfvg...@mail.gmail.com
> This way, we can be sure that two UTf-8 strings are considered as
> equivalent in a SASL exchange, in our case we care about the password
> string (we should care about the username as well). Without SASLprep,
> our implementation of SCRAM may fail with other third-part tools if
> passwords are not made only of ASCII characters. And that sucks.

Agreed.  I am not sure this quite rises to the level of a stop-ship
issue; we could document the restriction.  However, that's pretty
ugly; it would be a whole lot better to get this fixed.

I kinda hope Heikki is going to step up to the plate here, because I
think he understands most of it already.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] port of INSTALL file generation to XSLT

2017-03-08 Thread Peter Eisentraut
On 3/1/17 20:23, Magnus Hagander wrote:
> On Wed, Mar 1, 2017 at 3:58 AM, Peter Eisentraut
>  > wrote:
> 
> On 2/28/17 08:57, Magnus Hagander wrote:
> > It appears we need pandoc 1.13 to get the good output.  This won't 
> be
> > available until Debian stretch.
> >
> > So for PG10, I propose the attached revised patch which keeps using 
> lynx
> > but uses xsltproc instead of jade.
> >
> >
> > It is available for people not using debian though? Might it be
> > worthwhile to make it dependent on the version of pandoc -- so use that
> > method if people have the newer pandoc and fall back to lynx if they 
> don't?
> 
> Well, this really only runs once every couple of months on borka and
> here or there for those building their own snapshot tarballs.  I don't
> think we need to cater to a wide audience here.  In fact, variety could
> be bad here:  We don't want to find out that a tarball was rolled with
> the wrong variant.
> 
> 
> Good point. Let 's just go for it then.

Committed that way.  Thanks.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-03-08 Thread Dilip Kumar
On Wed, Mar 8, 2017 at 6:42 PM, Robert Haas  wrote:
> I don't think I understand exactly why this system might be prone to a
> little bit of extra prefetching - can you explain further?
Let me explain with an example, suppose there are 2 workers
prefetching jointly, lets assume
prefetch_target is 3, so for example when both the workers prefetch 1
page each then prefetch_page count will be 2, so both the workers will
go for next prefetch because prefetch_page is still less than prefetch
target, so again both the workers will do prefetch and totally they
will prefetch 4 pages.

> I don't think it will hurt anything as long as we are talking about a small
> number of extra prefetches here and there.

I completely agree with this point and I mentioned in the mail so that
it don't go unnoticed.

And, whatever extra prefetch we have done we will anyway use it unless
we stop the execution because of some limit clause.

>
>> Fixed
>
> Committed 0001.

Thanks


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing #include "postgres.h" from a couple of headers

2017-03-08 Thread Tom Lane
Robert Haas  writes:
> On Wed, Mar 8, 2017 at 3:55 AM, Thomas Munro
>  wrote:
>> There are also instances in regcustom.h and snowball's header.h -- are
>> those special cases?

> I will leave this question to someone wiser (or more self-assured) than I.

I'm pretty sure I'm to blame for both of those special cases.  The genesis
of both is that we are including these headers from externally-generated
.c files, and it seemed like modifying the .c files would be a bigger
problem than violating the policy.  I am not sure if I hold that position
anymore for the regexp library; our copy has diverged substantially from
Tcl's anyway.  It's still an issue for Snowball, because those .c files
are actually machine-generated by a Snowball-to-C compiler.  We haven't
modified them and probably shouldn't.

If we don't change the code layout, we should probably at least add
comments near these postgres.h inclusions explaining why they're violating
policy.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-03-08 Thread Robert Haas
Reviewing 0003:

How about adding a regression test?

bitmap_subplan_mark_shared could use castNode(), which seems like it
would be better style.  Maybe some other places, too.

+ ParallelBitmapPopulate
+ Waiting for the leader to populate the TidBitmap.
+
+

If you build the documentation, you'll find that this doesn't come out
right; you need to add 1 to the value of the nearest preceding
"morerows".  (I fixed a similar issue with 0001 while committing.)

+/*---
+ * Check the current state
+ * If state is
+ * BM_INITIAL : We become the leader and set it to BM_INPROGRESS
+ * BM_INPROGRESS : We need to wait till leader creates bitmap
+ * BM_FINISHED   : bitmap is ready so no need to wait
+ *---

The formatting of this comment is slightly off - the comment for
BM_INITIAL isn't aligned the same as the others.  But I would just
delete the whole comment, since more or less it recapitulates the
function header comment anyway.

I wonder if BitmapShouldInitializeSharedState couldn't be written a
little more compactly overall, like this:

{
SharedBitmapState   state;

while (1)
{
SpinLockAcquire(&pstate->mutex);
state = pstate->state;
if (pstate->state == BM_INITIAL)
pstate->state = BM_INPROGRESS;
SpinLockRelease(&pstate->mutex);

/* If we are leader or leader has already created a TIDBITMAP */
if (state != BM_INPROGRESS)
break;

/* Sleep until leader finishes creating the bitmap */
ConditionVariableSleep(&pstate->cv, WAIT_EVENT_PARALLEL_BITMAP_SCAN);
}

ConditionVariableCancelSleep();

return (state == BM_INITIAL);
}

+/*
+ * By this time we have already populated the TBM and
+ * initialized the shared iterators so set the state to
+ * BM_FINISHED and wake up others.
+ */
+SpinLockAcquire(&pstate->mutex);
+pstate->state = BM_FINISHED;
+SpinLockRelease(&pstate->mutex);
+ConditionVariableBroadcast(&pstate->cv);

I think it would be good to have a function for this, like
BitmapDoneInitializingSharedState(), and just call that function here.

+SpinLockAcquire(&pstate->mutex);
+
+/*
+ * Recheck under the mutex, If some other process has already done
+ * the enough prefetch then we need not to do anything.
+ */
+if (pstate->prefetch_pages >= pstate->prefetch_target)
+SpinLockRelease(&pstate->mutex);
+return;
+SpinLockRelease(&pstate->mutex);

I think it would be clearer to write this as:

SpinLockAcquire(&pstate->mutex);
do_prefetch = (pstate->prefetch_pages >= pstate->prefetch_target);
SpinLockRelease(&pstate->mutex);
if (!do_prefetch)
   return;

Then it's more obvious what's going on with the spinlock.  But
actually what I would do is also roll in the increment to prefetch
pages in there, so that you don't have to reacquire the spinlock after
calling PrefetchBuffer:

bool do_prefetch = false;
SpinLockAcquire(&pstate->mutex);
if (pstate->prefetch_pages < pstate->prefetch_target)
{
pstate->prefetch_pages++;
do_prefetch = true;
}
SpinLockRelease(&pstate->mutex);

That seems like it will reduce the amount of excess prefetching
considerably, and also simplify the code and cut the spinlock
acquisitions by 50%.

Overall I think this is in pretty good shape.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing #include "postgres.h" from a couple of headers

2017-03-08 Thread Tom Lane
... BTW, a bit of grepping shows that there are still significant
violations of this policy with respect to header-driven inclusion of
postgres_fe.h and c.h.  Also, plpgsql is doing it in the unapproved
fashion.  Cleaning these up will take a bit more work, since we'll have
to actually add #includes to some .c files that are currently relying on
those headers to get the core header.  However, if I believe my own
argument, then that's a good thing and we'd better go do it.

Also, ecpglib.h seems like a complete mess: it's relying on symbols
like ENABLE_NLS but there's no certainty as to whether pg_config.h
has been included first.  It won't have been in the case where this
header is read from an ecpg-generated .c file.  So this header will
in fact be interpreted differently in ecpg-generated programs than
in ecpglib and ecpg itself.  Maybe this is okay but it sure smells
like trouble waiting to happen.  I have no desire to try to fix it
myself though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign partition DDL regression tests

2017-03-08 Thread Robert Haas
On Tue, Mar 7, 2017 at 11:18 PM, Ashutosh Bapat
 wrote:
>> I agree that we could do that, but what value would it have?  It just
>> forces the user to spend two SQL commands doing what could otherwise
>> be done in one.
>
> I don't think it's going to be two commands always. A user who wants
> to attach a foreign table as a partition, "knows" that the data on the
> foreign server honours the partitioning bounds. If s/he knows that
> probably he added the constraint on the foreign table, so that planner
> could make use of it. Remember this is an existing foreign table. If
> s/he is not aware that the data on the foreign server doesn't honour
> partition bounds, adding that as a partition would be a problem. I
> think, this step gives the user a chance to make a conscious decision.

I think attaching the foreign table as a partition constitutes a
sufficiently-conscious decision.

> At least we need to update the documentation.

Got a proposal?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Mark Dilger

> On Mar 8, 2017, at 2:30 AM, Andrew Gierth  wrote:
> 
>> "Mark" == Mark Dilger  writes:
> 
> Mark> On linux/gcc the patch generates a warning in nodeAgg.c that is
> Mark> fairly easy to fix.  Using -Werror to make catching the error
> Mark> easier, I get:
> 
> what gcc version is this exactly?
> 

Linux version 2.6.32-573.18.1.el6.x86_64 (mockbu...@c6b8.bsys.dev.centos.org) 
(gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Tue Feb 9 
22:46:17 UTC 2016



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-03-08 Thread Andreas Karlsson

On 03/08/2017 03:48 AM, Robert Haas wrote:

On Sun, Mar 5, 2017 at 7:13 PM, Andreas Karlsson  wrote:

And I would argue that his feature is useful for quite many, based on my
experience running a semi-large database. Index bloat happens and without
REINDEX CONCURRENTLY it can be really annoying to solve, especially for
primary keys. Certainly more people have problems with index bloat than the
number of people who store index oids in their database.


Yeah, but that's not the only wart, I think.


The only two potential issues I see with the patch are:

1) That the index oid changes visibly to external users.

2) That the code for moving the dependencies will need to be updated 
when adding new things which refer to an index oid.


Given how useful I find REINDEX CONCURRENTLY I think these warts are 
worth it given that the impact is quite limited. I am of course biased 
since if I did not believe this I would not pursue this solution in the 
first place.



For example, I believe
(haven't looked at this patch series in a while) that the patch takes
a lock and later escalates the lock level.  If so, that could lead to
doing a lot of work to build the index and then getting killed by the
deadlock detector.


This version of the patch no longer does that. For my use case 
escalating the lock would make this patch much less interesting. The 
highest lock level taken is the same one as the initial one (SHARE 
UPDATE EXCLUSIVE). The current patch does on a high level (very 
simplified) this:


1. CREATE INDEX CONCURRENTLY ind_new;
2. Atomically move all dependencies from ind to ind_new, rename ind to 
ind_old, and rename ind_new to ind.

3. DROP INDEX CONCURRENTLY ind_old;

The actual implementation is a bit more complicated in reality, but no 
part escalates the lock level over what would be required by the steps 
for creating and dropping indexes concurrently



Also, if by any chance you think (or use any
software that thinks) that OIDs for system objects are a stable
identifier, this will be the first case where that ceases to be true.
If the system is shut down or crashes or the session is killed, you'll
be left with stray objects with names that you've never typed into the
system.  I'm sure you're going to say "don't worry, none of that is
any big deal" and maybe you're right.


Hm, I cannot think of any real life scenario where this will be an issue 
based on my personal experience with PostgreSQL, but if you can think of 
one please provide it. I will try to ponder some more on this myself.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explicit subtransactions for PL/Tcl

2017-03-08 Thread Pavel Stehule
Hi

2017-01-08 18:57 GMT+01:00 Victor Wagner :

>
> Collegues!
>
> Recently I've found out that PL/Python have very nice feature - explicit
> subtransaction object, which allows to execute block of code in the
> context of subtransaction.
>
> I've quite surprised that other PL languages, shipped with PostgreSQL do
> not have such useful construction.
>
> If it might require considerable trickery to add such functionality into
> PL/Perl, Tcl allows to add new control stuctures very easily.
>
> I'm attaching the patch which implements subtransaction command for
> PL/Tcl which does almost same as PL/Python plpy.subtransction context
> manager object does: executes a block of Tcl code in the context of
> subtransaction, rolls subtransaction back if error occures and commits
> it otherwise.
>
> It looks like
>
> subtransaction {
>  ...some Tcl code...
> }
>
> Typically one would use it inside Tcl catch statement:
>
> if [catch {
> subtransaction {
> spi_exec "insert into..."
> ...
> }
> } errormsg] {
># Handle an error
> }
>
> See documentation and tests included in the patch for more complete
> examples.
>
> Just like corresponding Python construction, this command doesn't
> replace language  builtin exception handling, just adds subtransaction
> support to it.
>
> Patch includes sufficiently less tests than python subtransaction tests,
> because Tcl implementation is way simpler than python one, and doesn't
> have syntactic variatons which depend on language version.
>
> Also entering and exiting subtransactions are in the same piece of code
> rather than in separate __enter__ and __exit__ methods as in Python, so
> there is no possibility to call exit without enter or vice versa.
>

I did a review of this patch

1. This functionality has sense and nobody was against this feature.

2. This patch does what is proposed - it introduce new TCL function that
wraps PostgreSQL subtransaction

3. This patch is really simple due massive using subtransactions already -
every SPI called from TCL is wrapped to subtransaction.

4. A documentation is good - although I am not sure if it is well
structured - is  level necessary? Probably there will not be any
other similar command.

5. There are a basic regress tests, and all tests passed, but I miss a
path, where subtransaction is commited - now rollback is every time

6. The code has some issues with white chars

7. I don't understand why TopMemoryContext is used there? Maybe some
already used context should be there.

+<->BeginInternalSubTransaction(NULL);
+<->MemoryContextSwitchTo(TopTransactionContext);
+<->


Regards

Pavel



>
> --
>Victor Wagner 
>


Re: [HACKERS] new gcc 7.0.1 warnings

2017-03-08 Thread Peter Eisentraut
On 2/18/17 02:08, Pavel Stehule wrote:
> I am checking new Fedora 26, where new gcc compiler is used.
> 
> float.c: In function ‘float4out’:
> float.c:382:41: warning: ‘%.*g’ directive output may be truncated
> writing between 1 and 310 bytes into a region of size 65
> [-Wformat-truncation=]
>  snprintf(ascii, MAXFLOATWIDTH + 1, "%.*g", ndig, num);

It appears these warnings no longer happen with a newer gcc-7 snapshot.
I'm using

gcc-7 (Debian 7-20170302-1) 7.0.1 20170302 (experimental) [trunk
revision 245832]

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2017-03-08 Thread Alvaro Herrera
Pavel Stehule wrote:
> Hi
> 
> I used your idea about special columns when COLUMNS are not explicitly
> defined.
> 
> All lines that you are dislike removed.

I just pushed XMLTABLE, after some additional changes.  Please test it
thoroughly and report any problems.

I didn't add the change you proposed here to keep COLUMNS optional;
instead, I just made COLUMNS mandatory.  I think what you propose here
is not entirely out of the question, but you left out ruleutils.c
support for it, so I decided to leave it aside for now so that I could
get this patch out of my plate once and for all.  If you really want
that feature, you can submit another patch for it and discuss with the
RMT whether it belongs in PG10 or not.

Some changes I made:
* I added some pg_stat_statements support.  It works fine for simple
tests, but deeper testing of it would be appreciated.

* I removed the "buildercxt" memory context.  It seemed mostly
pointless, and I was disturbed by the MemoryContextResetOnly().
Per-value memory still uses the per-value memory context, but the rest
of the stuff is in the per-query context, which should be pretty much
the same.

* Desultory stylistic changes

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Mark Dilger

> On Mar 8, 2017, at 5:47 AM, Andrew Gierth  wrote:
> 
>> "Mark" == Mark Dilger  writes:
> 
> Mark> On my MacBook, `make check-world` gives differences in the
> Mark> contrib modules:
> 
> Thanks! Latest cleaned up version of patch is attached.

This fixes both the warning and the contrib tests on linux and mac.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2017-03-08 Thread Pavel Stehule
2017-03-08 17:01 GMT+01:00 Alvaro Herrera :

> Pavel Stehule wrote:
> > Hi
> >
> > I used your idea about special columns when COLUMNS are not explicitly
> > defined.
> >
> > All lines that you are dislike removed.
>
> I just pushed XMLTABLE, after some additional changes.  Please test it
> thoroughly and report any problems.
>

Thank you

>
> I didn't add the change you proposed here to keep COLUMNS optional;
> instead, I just made COLUMNS mandatory.  I think what you propose here
> is not entirely out of the question, but you left out ruleutils.c
> support for it, so I decided to leave it aside for now so that I could
> get this patch out of my plate once and for all.  If you really want
> that feature, you can submit another patch for it and discuss with the
> RMT whether it belongs in PG10 or not.
>

It is interesting feature - because it replaces XPATH function, but not
important enough.

For daily work the default schema support is much more interesting.


>
> Some changes I made:
> * I added some pg_stat_statements support.  It works fine for simple
> tests, but deeper testing of it would be appreciated.
>
> * I removed the "buildercxt" memory context.  It seemed mostly
> pointless, and I was disturbed by the MemoryContextResetOnly().
> Per-value memory still uses the per-value memory context, but the rest
> of the stuff is in the per-query context, which should be pretty much
> the same.
>
> * Desultory stylistic changes
>

ok

Regards

Pavel

>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] new gcc 7.0.1 warnings

2017-03-08 Thread Pavel Stehule
2017-03-08 16:59 GMT+01:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 2/18/17 02:08, Pavel Stehule wrote:
> > I am checking new Fedora 26, where new gcc compiler is used.
> >
> > float.c: In function ‘float4out’:
> > float.c:382:41: warning: ‘%.*g’ directive output may be truncated
> > writing between 1 and 310 bytes into a region of size 65
> > [-Wformat-truncation=]
> >  snprintf(ascii, MAXFLOATWIDTH + 1, "%.*g", ndig, num);
>
> It appears these warnings no longer happen with a newer gcc-7 snapshot.
> I'm using
>
> gcc-7 (Debian 7-20170302-1) 7.0.1 20170302 (experimental) [trunk
> revision 245832]
>
>
gcc (GCC) 7.0.1 20170225 is on Fc still :(

Regards

Pavel




> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')

2017-03-08 Thread Joe Conway
On 03/07/2017 08:29 PM, Tom Lane wrote:
> Michael Paquier  writes:
>> here is a separate thread dedicated to the following extension for
>> CREATE/ALTER ROLE: PASSWORD ('value' USING 'method').
> 
> The parentheses seem weird ... do we really need those?

+1

> +If you do not plan to use password authentication you can omit this
> +option. The methods supported are md5 to enforce a 
> password
> +to be MD5-encrypted, scram for a SCRAM-encrypted password
> +and plain for an unencrypted password.  If the password

Can we please stop calling this encryption? What is being done is a form
of cryptographic hashing, not encryption.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Parallel bitmap heap scan

2017-03-08 Thread Dilip Kumar
On Wed, Mar 8, 2017 at 8:28 PM, Robert Haas  wrote:
> How about adding a regression test?

Added
>
> bitmap_subplan_mark_shared could use castNode(), which seems like it
> would be better style.  Maybe some other places, too.
>
> + ParallelBitmapPopulate
> + Waiting for the leader to populate the TidBitmap.
> +
> +
>
> If you build the documentation, you'll find that this doesn't come out
> right; you need to add 1 to the value of the nearest preceding
> "morerows".  (I fixed a similar issue with 0001 while committing.)

Fixed
>
> +/*---
> + * Check the current state
> + * If state is
> + * BM_INITIAL : We become the leader and set it to BM_INPROGRESS
> + * BM_INPROGRESS : We need to wait till leader creates bitmap
> + * BM_FINISHED   : bitmap is ready so no need to wait
> + *---
>
> The formatting of this comment is slightly off - the comment for
> BM_INITIAL isn't aligned the same as the others.  But I would just
> delete the whole comment, since more or less it recapitulates the
> function header comment anyway.

Removed.
>
> I wonder if BitmapShouldInitializeSharedState couldn't be written a
> little more compactly overall, like this:
>
> {
> SharedBitmapState   state;
>
> while (1)
> {
> SpinLockAcquire(&pstate->mutex);
> state = pstate->state;
> if (pstate->state == BM_INITIAL)
> pstate->state = BM_INPROGRESS;
> SpinLockRelease(&pstate->mutex);
>
> /* If we are leader or leader has already created a TIDBITMAP */
> if (state != BM_INPROGRESS)
> break;
>
> /* Sleep until leader finishes creating the bitmap */
> ConditionVariableSleep(&pstate->cv, WAIT_EVENT_PARALLEL_BITMAP_SCAN);
> }
>
> ConditionVariableCancelSleep();
>
> return (state == BM_INITIAL);
> }

This looks good, done this way

>
> +/*
> + * By this time we have already populated the TBM and
> + * initialized the shared iterators so set the state to
> + * BM_FINISHED and wake up others.
> + */
> +SpinLockAcquire(&pstate->mutex);
> +pstate->state = BM_FINISHED;
> +SpinLockRelease(&pstate->mutex);
> +ConditionVariableBroadcast(&pstate->cv);
>
> I think it would be good to have a function for this, like
> BitmapDoneInitializingSharedState(), and just call that function here.

Done
>
> +SpinLockAcquire(&pstate->mutex);
> +
> +/*
> + * Recheck under the mutex, If some other process has already 
> done
> + * the enough prefetch then we need not to do anything.
> + */
> +if (pstate->prefetch_pages >= pstate->prefetch_target)
> +SpinLockRelease(&pstate->mutex);
> +return;
> +SpinLockRelease(&pstate->mutex);
>
> I think it would be clearer to write this as:
>
> SpinLockAcquire(&pstate->mutex);
> do_prefetch = (pstate->prefetch_pages >= pstate->prefetch_target);
> SpinLockRelease(&pstate->mutex);
> if (!do_prefetch)
>return;
>
> Then it's more obvious what's going on with the spinlock.  But
> actually what I would do is also roll in the increment to prefetch
> pages in there, so that you don't have to reacquire the spinlock after
> calling PrefetchBuffer:
>
> bool do_prefetch = false;
> SpinLockAcquire(&pstate->mutex);
> if (pstate->prefetch_pages < pstate->prefetch_target)
> {
> pstate->prefetch_pages++;
> do_prefetch = true;
> }
> SpinLockRelease(&pstate->mutex);
>
> That seems like it will reduce the amount of excess prefetching
> considerably, and also simplify the code and cut the spinlock
> acquisitions by 50%.
>
Right, done that way
> Overall I think this is in pretty good shape.

Thanks.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


0003-parallel-bitmap-heapscan-v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign partition DDL regression tests

2017-03-08 Thread Robert Haas
On Tue, Feb 21, 2017 at 8:40 PM, Amit Langote
 wrote:
> Ashutosh Bapat pointed out [0] that regression tests are missing for the
> foreign partition DDL commands.  Attached patch takes care of that.

Committed.

I didn't do anything about Ashutosh's comment that we could use ALTER
FOREIGN TABLE rather than ALTER TABLE someplace; that didn't seem
critical.

Also, the names of the objects in this test are kinda generic (pt2 et.
al.) but they match the existing names in the same file (pt1, foo).
If we're going to start differentiating those a little better, we
should probably change them all, and as a separate commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2017-03-08 Thread Alvaro Herrera
Pavel Stehule wrote:
> 2017-03-08 17:01 GMT+01:00 Alvaro Herrera :

> > I didn't add the change you proposed here to keep COLUMNS optional;
> > instead, I just made COLUMNS mandatory.  I think what you propose here
> > is not entirely out of the question, but you left out ruleutils.c
> > support for it, so I decided to leave it aside for now so that I could
> > get this patch out of my plate once and for all.  If you really want
> > that feature, you can submit another patch for it and discuss with the
> > RMT whether it belongs in PG10 or not.
> 
> It is interesting feature - because it replaces XPATH function, but not
> important enough.

OK.

> For daily work the default schema support is much more interesting.

Let's see that one, then.  It was part of the original submission so
depending on how the patch we looks can still cram it in.  But other
patches have priority for me now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new gcc 7.0.1 warnings

2017-03-08 Thread Alvaro Herrera
Pavel Stehule wrote:
> 2017-02-18 18:35 GMT+01:00 Tom Lane :
> 
> > Pavel Stehule  writes:

> > Do the warnings go away if you add some explicit guard to the precision
> > variable, say like this:
> >
> > {
> > intndig = DBL_DIG + extra_float_digits;
> >
> > if (ndig < 1)
> > ndig = 1;
> > +   if (ndig > 50)
> > +   ndig = 50;
> 
> This fix doesn't help

Ahh, so this is why you had this change in the xmltable patch once!
Heh.  Please be more careful.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding an immutable variant of to_date

2017-03-08 Thread Andreas Karlsson

On 03/07/2017 09:56 PM, Sven R. Kunze wrote:

On 07.03.2017 03:21, Andreas Karlsson wrote:

1) I do not think we currently allow setting the locale like this
anywhere, so this will introduce a new concept to PostgreSQL. And you
will probably need to add support for caching per locale.


Good to know. Could you explain what you mean by "caching per locale"?


The current code for to_char will on the first call to to_char build 
arrays with the localized names of the week days and the months. I 
suspect that you may need to build something similar but a set of arrays 
per locale.


See the DCH_to_char function and its call to cache_locale_time.

Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Mark Dilger
Hi Andrew,

Reviewing the patch a bit more, I find it hard to understand the comment about
passing -1 as a flag for finalize_aggregates.  Any chance you can spend a bit
more time word-smithing that code comment?


@@ -1559,7 +1647,9 @@ prepare_projection_slot(AggState *aggstate, 
TupleTableSlot *slot, int currentSet
 /*
  * Compute the final value of all aggregates for one group.
  *
- * This function handles only one grouping set at a time.
+ * This function handles only one grouping set at a time.  But in the hash
+ * case, it's the caller's responsibility to have selected the set already, and
+ * we pass in -1 here to flag that and to control the indexing into pertrans.
  *
  * Results are stored in the output econtext aggvalues/aggnulls.
  */
@@ -1575,10 +1665,11 @@ finalize_aggregates(AggState *aggstate,
int aggno;
int transno;

-   Assert(currentSet == 0 ||
-  ((Agg *) aggstate->ss.ps.plan)->aggstrategy != AGG_HASHED);
-
-   aggstate->current_set = currentSet;
+   /* ugly hack for hash */
+   if (currentSet >= 0)
+   select_current_set(aggstate, currentSet, false);
+   else
+   currentSet = 0;


> On Mar 8, 2017, at 8:00 AM, Mark Dilger  wrote:
> 
> 
>> On Mar 8, 2017, at 5:47 AM, Andrew Gierth  
>> wrote:
>> 
>>> "Mark" == Mark Dilger  writes:
>> 
>> Mark> On my MacBook, `make check-world` gives differences in the
>> Mark> contrib modules:
>> 
>> Thanks! Latest cleaned up version of patch is attached.
> 
> This fixes both the warning and the contrib tests on linux and mac.
> 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2017-03-08 Thread Pavel Stehule
2017-03-08 17:32 GMT+01:00 Alvaro Herrera :

> Pavel Stehule wrote:
> > 2017-03-08 17:01 GMT+01:00 Alvaro Herrera :
>
> > > I didn't add the change you proposed here to keep COLUMNS optional;
> > > instead, I just made COLUMNS mandatory.  I think what you propose here
> > > is not entirely out of the question, but you left out ruleutils.c
> > > support for it, so I decided to leave it aside for now so that I could
> > > get this patch out of my plate once and for all.  If you really want
> > > that feature, you can submit another patch for it and discuss with the
> > > RMT whether it belongs in PG10 or not.
> >
> > It is interesting feature - because it replaces XPATH function, but not
> > important enough.
>
> OK.
>
> > For daily work the default schema support is much more interesting.
>
> Let's see that one, then.  It was part of the original submission so
> depending on how the patch we looks can still cram it in.  But other
> patches have priority for me now.
>

It is theme for 11

Thank you very much

Pavel


>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] new gcc 7.0.1 warnings

2017-03-08 Thread Pavel Stehule
2017-03-08 17:33 GMT+01:00 Alvaro Herrera :

> Pavel Stehule wrote:
> > 2017-02-18 18:35 GMT+01:00 Tom Lane :
> >
> > > Pavel Stehule  writes:
>
> > > Do the warnings go away if you add some explicit guard to the precision
> > > variable, say like this:
> > >
> > > {
> > > intndig = DBL_DIG + extra_float_digits;
> > >
> > > if (ndig < 1)
> > > ndig = 1;
> > > +   if (ndig > 50)
> > > +   ndig = 50;
> >
> > This fix doesn't help
>
> Ahh, so this is why you had this change in the xmltable patch once!
> Heh.  Please be more careful.
>

grr :(

I am sorry

Pavel


>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION

2017-03-08 Thread Fujii Masao
On Wed, Mar 8, 2017 at 9:05 PM, Masahiko Sawada  wrote:
> On Wed, Feb 8, 2017 at 12:04 AM, Fujii Masao  wrote:
>> On Tue, Feb 7, 2017 at 2:13 AM, Petr Jelinek
>>  wrote:
>>> On 06/02/17 17:33, Fujii Masao wrote:
 On Sun, Feb 5, 2017 at 5:11 AM, Petr Jelinek
  wrote:
> On 03/02/17 19:38, Fujii Masao wrote:
>> On Sat, Feb 4, 2017 at 12:49 AM, Fujii Masao  
>> wrote:
>>> On Fri, Feb 3, 2017 at 10:56 AM, Kyotaro HORIGUCHI
>>>  wrote:
 At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao  
 wrote in 
 
> On Thu, Feb 2, 2017 at 2:36 PM, Michael Paquier
>  wrote:
>> On Thu, Feb 2, 2017 at 2:13 PM, Tom Lane  wrote:
>>> Kyotaro HORIGUCHI  writes:
 Then, the reason for the TRY-CATCH cluase is that I found that
 some functions called from there can throw exceptions.
>>>
>>> Yes, but all LWLocks should be released by normal error recovery.
>>> It should not be necessary for this code to clean that up by hand.
>>> If it were necessary, there would be TRY-CATCH around every single
>>> LWLockAcquire in the backend, and we'd have an unreadable and
>>> unmaintainable system.  Please don't add a TRY-CATCH unless it's
>>> *necessary* -- and you haven't explained why this one is.
>
> Yes.

 Thank you for the suggestion. I minunderstood that.

>> Putting hands into the code and at the problem, I can see that
>> dropping a subscription on a node makes it unresponsive in case of a
>> stop. And that's just because calls to LWLockRelease are missing as 
>> in
>> the patch attached. A try/catch problem should not be necessary.
>
> Thanks for the patch!
>
> With the patch, LogicalRepLauncherLock is released at the end of
> DropSubscription(). But ISTM that the lock should be released just 
> after
> logicalrep_worker_stop() and there is no need to protect the removal 
> of
> replication slot with the lock.

 That's true. logicalrep_worker_stop returns after confirmig that
 worker->proc is cleard, so no false relaunch cannot be caused.
 After all, logicalrep_worker_stop is surrounded by
 LWLockAcquire/Relase pair. So it can be moved into the funciton
 and make the lock secrion to be more narrower.
>>
>> If we do this, Assert(LWLockHeldByMe(LogicalRepLauncherLock)) should be
>> removed and the comment for logicalrep_worker_stop() should be updated.
>>
>> Your approach may cause the deadlock. The launcher takes 
>> LogicalRepWorkerLock
>> while holding LogicalRepLauncherLock. OTOH, with your approach,
>> logicalrep_worker_stop() takes LogicalRepLauncherLock while holding
>> LogicalRepWorkerLock.
>>
>> Therefore I pushed the simple patch which adds LWLockRelease() just after
>> logicalrep_worker_stop().
>>
>> Another problem that I found while reading the code is that the launcher 
>> can
>> start up the worker with the subscription that DROP SUBSCRIPTION just 
>> removed.
>> That is, DROP SUBSCRIPTION removes the target entry from pg_subscription,
>> but the launcher can see it and start new worker until the transaction 
>> for
>> DROP has been committed.
>>
>
> That was the reason why DropSubscription didn't release the lock in the
> first place. It was supposed to be released at the end of the
> transaction though.

 OK, I understood why you used the lock in that way. But using LWLock
 for that purpose is not valid.

>>>
>>> Yeah, I just tried to avoid what we are doing now really hard :)
>>>
>> To fix this issue, I think that DROP SUBSCRIPTION should take
>> AccessExclusiveLock on pg_subscription, instead of RowExclusiveLock,
>> so that the launcher cannot see the entry to be being removed.
>>
>
> The whole point of having LogicalRepLauncherLock is to avoid having to
> do this, so if we do this we could probably get rid of it.

 Yes, let's remove LogicalRepLauncherLock and lock pg_subscription
 with AccessExclusive mode at the beginning of DROP SUBSCRIPTION.
 Attached patch does this.

>>>
>>> Okay, looks reasonable to me.
>>
>> Thanks for the review!
>> But ISMT that I should suspend committing the patch until we fix the issue
>> that Sawada reported in other thread. That bugfix may change the related
>> code and design very much.
>> https://www.postgresql.org/message-id/cad21aod+vo93zz4zqtzqb-jz_wmko3oggdx1mxo4t+8q_zh...@mail.gmail.com
>>
>
> That patch has been committed. And this issue still happens. Should we
> add this to the open item list so it doesn't get missed?

Thanks for ping. Pushed the patch.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make cha

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-08 Thread Alvaro Herrera
Here's a rebased set of patches.  This is the same Pavan posted; I only
fixed some whitespace and a trivial conflict in indexam.c, per 9b88f27cb42f.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2017-03-08 Thread Alvaro Herrera
Pavel Stehule wrote:
> 2017-03-08 17:32 GMT+01:00 Alvaro Herrera :

> > > For daily work the default schema support is much more interesting.
> >
> > Let's see that one, then.  It was part of the original submission so
> > depending on how the patch we looks can still cram it in.  But other
> > patches have priority for me now.
> 
> It is theme for 11

Ah, great.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 11:20 AM, Dilip Kumar  wrote:
> Right, done that way

This didn't compile because you bobbled some code in
src/backend/nodes, but it was a trivial mistake so I fixed it.

Committed with that fix and a bunch of minor cosmetic changes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 12:00 PM, Alvaro Herrera
 wrote:
> Here's a rebased set of patches.  This is the same Pavan posted; I only
> fixed some whitespace and a trivial conflict in indexam.c, per 9b88f27cb42f.

No attachments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Poor memory context performance in large hash joins

2017-03-08 Thread Tom Lane
Andres Freund  writes:
> On 2017-03-07 13:06:39 -0500, Tom Lane wrote:
>> The hashjoin issue is certainly new, and the reorderbuffer issue can't
>> go back further than 9.4.  So my inclination is to patch back to 9.4
>> and call it good.

> That works for me.  If we find further cases later, we can easily enough
> backpatch it then.

Done like that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-03-08 Thread Jeff Janes
On Wed, Mar 8, 2017 at 9:08 AM, Robert Haas  wrote:

> On Wed, Mar 8, 2017 at 11:20 AM, Dilip Kumar 
> wrote:
> > Right, done that way
>
> This didn't compile because you bobbled some code in
> src/backend/nodes, but it was a trivial mistake so I fixed it.
>
> Committed with that fix and a bunch of minor cosmetic changes.
>


I'm getting some compiler warnings in gcc version 4.4.7 20120313 (Red Hat
4.4.7-17) (GCC):

nodeBitmapHeapscan.c: In function 'BitmapHeapNext':
nodeBitmapHeapscan.c:79: warning: 'tbmiterator' may be used uninitialized
in this function
nodeBitmapHeapscan.c:80: warning: 'shared_tbmiterator' may be used
uninitialized in this function

Cheers,

Jeff


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> Hi Andrew,

 Mark> Reviewing the patch a bit more, I find it hard to understand the
 Mark> comment about passing -1 as a flag for finalize_aggregates.  Any
 Mark> chance you can spend a bit more time word-smithing that code
 Mark> comment?

Sure.

How does this look for wording (I'll incorporate it into an updated
patch later):

/*
 * Compute the final value of all aggregates for one group.
 *
 * This function handles only one grouping set at a time.  But in the hash
 * case, it's the caller's responsibility to have selected the set already, and
 * we pass in -1 as currentSet here to flag that; this also changes how we
 * handle the indexing into AggStatePerGroup as explained below.
 *
 * Results are stored in the output econtext aggvalues/aggnulls.
 */
static void
finalize_aggregates(AggState *aggstate,
AggStatePerAgg peraggs,
AggStatePerGroup pergroup,
int currentSet)
{
ExprContext *econtext = aggstate->ss.ps.ps_ExprContext;
Datum  *aggvalues = econtext->ecxt_aggvalues;
bool   *aggnulls = econtext->ecxt_aggnulls;
int aggno;
int transno;

/*
 * If currentSet >= 0, then we're doing sorted grouping, and pergroup 
is an
 * array of size numTrans*numSets which we have to index into using
 * currentSet in addition to transno. The caller may not have selected 
the
 * set, so we do that.
 *
 * If currentSet < 0, then we're doing hashed grouping, and pergroup is 
an
 * array of only numTrans items (since for hashed grouping, each 
grouping
 * set is in a separate hashtable).  We rely on the caller having done
 * select_current_set, and we fudge currentSet to 0 in order to make the
 * same indexing calculations work as for the grouping case.
 */
if (currentSet >= 0)
select_current_set(aggstate, currentSet, false);
else
currentSet = 0;


-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Example Custom Scan Provider Implementation?

2017-03-08 Thread Eric Ridge
On Tue, Mar 7, 2017 at 6:39 PM Amit Langote 
wrote:

>
> Here you go: https://github.com/kaigai/ctidscan


Thanks for the link, Amit!  I think that'll get me bootstrapped!

This was proposed originally [1] to go into contrib/, but that didn't
> happen somehow.
>

Too bad.  :(

eric


Re: [HACKERS] Parallel bitmap heap scan

2017-03-08 Thread Tom Lane
Jeff Janes  writes:
> I'm getting some compiler warnings in gcc version 4.4.7 20120313 (Red Hat
> 4.4.7-17) (GCC):

Me too.  Fix pushed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] updating parallel.sgml

2017-03-08 Thread Robert Haas
Here's a patch which updates parallel.sgml for the new parallel scan
types which have recently been committed, and also expands the
explanation of parallel joins slightly.  I hope everyone will find
this useful in understanding what new capabilities we now have, and
what remains to be done in the future.

Barring objections, I plan to commit this tomorrow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


parallel-doc-update.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 12:45 PM, Tom Lane  wrote:
> Jeff Janes  writes:
>> I'm getting some compiler warnings in gcc version 4.4.7 20120313 (Red Hat
>> 4.4.7-17) (GCC):
>
> Me too.  Fix pushed.

Thanks.  Sorry for the hassle; my compiler isn't as picky about this
as I would like, and apparently Dilip's isn't either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC] Push-based query executor discussion

2017-03-08 Thread Robert Haas
On Mon, Mar 6, 2017 at 11:20 AM, Arseny Sher  wrote:
> I would like to work on push-based executor [1] during GSoC, so I'm
> writing to introduce myself and start the discussion of the project. I
> think I should mention beforehand that the subject is my master's
> thesis topic, and I have already started working on it. This letter is
> not (obviously) a ready proposal but rather initial point to talk over
> the concept. Below you can see a short review of the idea, description
> of benefits for the community, details, related work and some info
> about me.

While I admire your fearlessness, I think the chances of you being
able to bring a project of this type to a successful conclusion are
remote.  Here is what I said about this topic previously:

http://postgr.es/m/CA+Tgmoa=kzhj+twxyq+vku21nk3prkrjsdbhjubn7qvc8uk...@mail.gmail.com

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-03-08 Thread Tom Lane
Robert Haas  writes:
> On Wed, Mar 8, 2017 at 12:45 PM, Tom Lane  wrote:
>> Jeff Janes  writes:
>>> I'm getting some compiler warnings in gcc version 4.4.7 20120313 (Red Hat
>>> 4.4.7-17) (GCC):

>> Me too.  Fix pushed.

> Thanks.  Sorry for the hassle; my compiler isn't as picky about this
> as I would like, and apparently Dilip's isn't either.

Might be interesting to see whether -O level affects it.  In principle,
whether you get the warning should depend on how much the compiler has
analyzed the logic flow ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need a builtin way to run all tests faster manner

2017-03-08 Thread Robert Haas
On Tue, Mar 7, 2017 at 11:23 PM, Andres Freund  wrote:
> Personally that's not addressing my main concern, which is that the
> latency of getting done with some patch/topic takes a long while. If I
> have to wait for the buildfarm to check some preliminary patch, I still
> have to afterwards work on pushing it to master.  And very likely my
> local check would finish a lot faster than a bunch of buildfarm animals
> - I have after all a plenty powerfull machine, lots of cores, fast ssd,
> lots of memory, ...
>
> So I really want faster end-to-end test, not less cpu time spent on my
> own machine.

Yeah.  I think the buildfarm-for-test-commits or maybe
buildfarm-for-approved-branches-belonging-to-people-we-basically-trust
idea isn't a bad one, but it's not a substitute for $SUBJECT.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Mark Dilger

> On Mar 8, 2017, at 9:40 AM, Andrew Gierth  wrote:
> 
>> "Mark" == Mark Dilger  writes:
> 
> Mark> Hi Andrew,
> 
> Mark> Reviewing the patch a bit more, I find it hard to understand the
> Mark> comment about passing -1 as a flag for finalize_aggregates.  Any
> Mark> chance you can spend a bit more time word-smithing that code
> Mark> comment?
> 
> Sure.
> 
> How does this look for wording (I'll incorporate it into an updated
> patch later):

Much better.  Thanks!

> /*
> * Compute the final value of all aggregates for one group.
> *
> * This function handles only one grouping set at a time.  But in the hash
> * case, it's the caller's responsibility to have selected the set already, and
> * we pass in -1 as currentSet here to flag that; this also changes how we
> * handle the indexing into AggStatePerGroup as explained below.
> *
> * Results are stored in the output econtext aggvalues/aggnulls.
> */
> static void
> finalize_aggregates(AggState *aggstate,
>   AggStatePerAgg peraggs,
>   AggStatePerGroup pergroup,
>   int currentSet)
> {
>   ExprContext *econtext = aggstate->ss.ps.ps_ExprContext;
>   Datum  *aggvalues = econtext->ecxt_aggvalues;
>   bool   *aggnulls = econtext->ecxt_aggnulls;
>   int aggno;
>   int transno;
> 
>   /*
>* If currentSet >= 0, then we're doing sorted grouping, and pergroup 
> is an
>* array of size numTrans*numSets which we have to index into using
>* currentSet in addition to transno. The caller may not have selected 
> the
>* set, so we do that.
>*
>* If currentSet < 0, then we're doing hashed grouping, and pergroup is 
> an
>* array of only numTrans items (since for hashed grouping, each 
> grouping
>* set is in a separate hashtable).  We rely on the caller having done
>* select_current_set, and we fudge currentSet to 0 in order to make the
>* same indexing calculations work as for the grouping case.
>*/
>   if (currentSet >= 0)
>   select_current_set(aggstate, currentSet, false);
>   else
>   currentSet = 0;
> 
> 
> -- 
> Andrew (irc:RhodiumToad)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 12:53 PM, Tom Lane  wrote:
>> Thanks.  Sorry for the hassle; my compiler isn't as picky about this
>> as I would like, and apparently Dilip's isn't either.
>
> Might be interesting to see whether -O level affects it.  In principle,
> whether you get the warning should depend on how much the compiler has
> analyzed the logic flow ...

What I'm using is:

Configured with:
--prefix=/Applications/Xcode.app/Contents/Developer/usr
--with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin14.5.0
Thread model: posix

While I haven't experimented with this too extensively, my general
impression is that this thing is extremely tolerant of uninitialized
variables.  I just tried compiling nodeBitmapHeapscan.c with -Wall
-Werror and each of -O0, -O1, -O2, and -O3, and none of those produced
any warnings.  I've been reluctant to go to the hassle of installing a
different compiler...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explicit subtransactions for PL/Tcl

2017-03-08 Thread Victor Wagner
On Wed, 8 Mar 2017 16:49:33 +0100
Pavel Stehule  wrote:


> 
> I did a review of this patch

First, thank you for you effort. I already begin to think that nobody
is really interesting in PL/Tcl stuff. 

> 1. This functionality has sense and nobody was against this feature.
> 
> 2. This patch does what is proposed - it introduce new TCL function
> that wraps PostgreSQL subtransaction
> 
> 3. This patch is really simple due massive using subtransactions
> already - every SPI called from TCL is wrapped to subtransaction.
> 
> 4. A documentation is good - although I am not sure if it is well
> structured - is  level necessary? Probably there will not be
> any other similar command.

You are right. At least sect2 can be added later whenever this second
command will appear.

> 5. There are a basic regress tests, and all tests passed, but I miss a
> path, where subtransaction is commited - now rollback is every time

Really, I haven't found such tests in PL/Python suite, so I haven't
translated it to Tcl. 

It might be good idea to add such test.

> 6. The code has some issues with white chars
> 
> 7. I don't understand why TopMemoryContext is used there? Maybe some
> already used context should be there.
> 
> +<->BeginInternalSubTransaction(NULL);
> +<->MemoryContextSwitchTo(TopTransactionContext);
> +<->

I've copied this code from PL/Python subtransaction object and 
it has following comment:
   /* Be sure that cells of explicit_subtransactions list are long-lived */
 
But in Tcl case wi don't have to maintain our own stack in the form of
list. We use C-language stack and keep our data in the local variables.
So, probably changing of memory contexts is not necessary at all.

But it require a bit more investigation and testing.

With best regards, Victor


-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> Hi Andrew,

 Mark> Reviewing the patch a bit more, I find it hard to understand the
 Mark> comment about passing -1 as a flag for finalize_aggregates.  Any
 Mark> chance you can spend a bit more time word-smithing that code
 Mark> comment?

Actually, ignore that prior response, I'll refactor it a bit to remove
the need for the comment at all.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-08 Thread Oleg Bartunov
On Wed, Mar 8, 2017 at 12:43 AM, Sven R. Kunze  wrote:

> Hi,
>
> about the datetime issue: as far as I know, JSON does not define a
> serialization format for dates and timestamps.
>
> On the other hand, YAML (as a superset of JSON) already supports a
> language-independent date(time) serialization format (
> http://yaml.org/type/timestamp.html).
>
> I haven't had a glance into the SQL/JSON standard yet and a quick search
> didn't reveal anything. However, reading your test case here
> https://github.com/postgrespro/sqljson/blob/5a8a241/src/
> test/regress/sql/sql_json.sql#L411 it seems as if you intend to parse all
> strings in the form of "-MM-DD" as dates. This is problematic in case a
> string happens to look like this but is not intended to be a date.
>

SQL/JSON defines methods in jsonpath, in particularly,


| datetime  [  ] 
| keyvalue  

 ::=


datetime template is also specified in the standard (very rich)

 ::=
{  }...
 ::=

| 
 ::=

| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
 ::=

| 
| 
| 
| 
| 
| 
| 
 ::=
 | YYY | YY | Y
 ::=
 | RR
 ::=
MM
 ::=
DD
 ::=
DDD
 ::=
HH | HH12
 ::=
HH24
 ::=
MI
 ::=
SS
 ::=
S
 ::=
FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9
 ::=
A.M. | P.M.
 ::=
TZH
 ::=
TZM



> Just for the sake of completeness: YAML solves this issue by omitting the
> quotation marks around the date string (just as JSON integers have no
> quotations marks around them).
>

interesting idea, but need to dig the standard first.


>
> Regards,
> Sven
>


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 12:14 PM, Alvaro Herrera
 wrote:
> Alvaro Herrera wrote:
>> Here's a rebased set of patches.  This is the same Pavan posted; I only
>> fixed some whitespace and a trivial conflict in indexam.c, per 9b88f27cb42f.
>
> Jaime noted that I forgot the attachments.  Here they are

If I recall correctly, the main concern about 0001 was whether it
might negatively affect performance, and testing showed that, if
anything, it was a little better. Does that sound right?

Regarding 0002, I think this could use some documentation someplace
explaining the overall theory of operation.  README.HOT, maybe?

+ * Most often and unless we are dealing with a pg-upgraded cluster, the
+ * root offset information should be cached. So there should not be too
+ * much overhead of fetching this information. Also, once a tuple is
+ * updated, the information will be copied to the new version. So it's not
+ * as if we're going to pay this price forever.

What if a tuple is updated -- presumably clearing the
HEAP_LATEST_TUPLE on the tuple at the end of the chain -- and then the
update aborts?  Then we must be back to not having this information.

One overall question about this patch series is how we feel about
using up this many bits.  0002 uses a bit from infomask, and 0005 uses
a bit from infomask2.  I'm not sure if that's everything, and then I
think we're steeling some bits from the item pointers, too.  While the
performance benefits of the patch sound pretty good based on the test
results so far, this is definitely the very last time we'll be able to
implement a feature that requires this many bits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-03-08 Thread Tom Lane
Robert Haas  writes:
> What I'm using is:

> Configured with:
> --prefix=/Applications/Xcode.app/Contents/Developer/usr
> --with-gxx-include-dir=/usr/include/c++/4.2.1
> Apple LLVM version 7.0.2 (clang-700.1.81)
> Target: x86_64-apple-darwin14.5.0
> Thread model: posix

Hm.  I noticed that longfin didn't spit up on it either, despite having
-Werror turned on.  That's a slightly newer version, but still Apple's
clang:

$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
--with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin16.4.0
Thread model: posix
InstalledDir: 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 1:18 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> What I'm using is:
>
>> Configured with:
>> --prefix=/Applications/Xcode.app/Contents/Developer/usr
>> --with-gxx-include-dir=/usr/include/c++/4.2.1
>> Apple LLVM version 7.0.2 (clang-700.1.81)
>> Target: x86_64-apple-darwin14.5.0
>> Thread model: posix
>
> Hm.  I noticed that longfin didn't spit up on it either, despite having
> -Werror turned on.  That's a slightly newer version, but still Apple's
> clang:
>
> $ gcc -v
> Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
> --with-gxx-include-dir=/usr/include/c++/4.2.1
> Apple LLVM version 8.0.0 (clang-800.0.42.1)
> Target: x86_64-apple-darwin16.4.0
> Thread model: posix
> InstalledDir: 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Yeah.  I think on my previous MacBook Pro, you could do this without
generating a warning:

int x;
printf("%d\n", x);

The compiler on this one detects that case, but that seems to be about
as far as it goes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-08 Thread Oleg Bartunov
On Wed, Mar 8, 2017 at 7:05 AM, David Steele  wrote:

> On 3/7/17 11:38 AM, Andres Freund wrote:
>
> <...>
>
> We have a plenty of time and we dedicate one full-time developer for
>>> this project.
>>>
>>
>> How about having that, and perhaps others, developer participate in
>> reviewing patches and getting to the bottom of the commitfest?  Should
>> we end up being done early, we can look at this patch...  There's not
>> been review activity corresponding to the amount of submissions from
>> pgpro...
>>
>
> This patch has been moved to CF 2017-07.
>

Yes, after committing XMLTABLE, we anyway need to extend its infrastructure
to support JSON_TABLE.


>
> --
> -David
> da...@pgmasters.net
>


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 7:45 AM, Amit Kapila  wrote:
>> I still think this is a bad idea.  Releasing and reacquiring the lock
>> on the master doesn't prevent the standby from seeing intermediate
>> states; the comment, if I understand correctly, is just plain wrong.
> I can remove this release and reacquire stuff, but first, let me try
> to explain what intermediate state I am talking here.  If we don't
> have a release/reacquire lock here, standby could see an empty
> overflow page at that location whereas master will be able to see the
> new page only after the insertion of tuple/'s in that bucket is
> finished.

That's true, but you keep speaking as if the release and reacquire
prevents the standby from seeing that state.  It doesn't.  Quite the
opposite: it allows the master to see that intermediate state as well.
I don't think there is a problem with the standby seeing an
intermediate state that can't be seen on the master, provided we've
set up the code to handle that intermediate state properly, which
hopefully we have.

>> + * Initialise the freed overflow page, here we can't complete zeroed the
>>
>> Don't use British spelling, and don't use a comma to join two
>> sentences.  Also "here we can't complete zeroed" doesn't seem right; I
>> don't know what that means.
>
> What the comment means to say is that we can't initialize page as zero
> (something like below)
> MemSet(ovflpage, 0, BufferGetPageSize(ovflbuf));
>
> typo in sentence
> /complete/completely

I think you can just say we initialize the page.  Saying we can't
initializing it by zeroing it seems pretty obvious, both from the code
and from the general theory of how stuff works.  If you did want to
explain it, I think you'd write something like "Just zeroing the page
won't work, because ."

>> +/*
>> + * We need to release and if required reacquire the lock on
>> + * rbuf to ensure that standby shouldn't see an intermediate
>> + * state of it.  If we don't release the lock, after replay 
>> of
>> + * XLOG_HASH_MOVE_PAGE_CONTENTS on standby users will
>> be able to
>> + * view the results of partial deletion on rblkno.
>> + */
>> +LockBuffer(rbuf, BUFFER_LOCK_UNLOCK);
>>
>> If you DO release the lock, this will STILL be true, because what
>> matters on the standby is what the redo code does.
>
> That's right, what I intend to do here is to release the lock in
> master where it will be released in standby.  In this case, we can't
> ensure what user can see in master is same as in standby after this
> WAL record is replayed, because in master we have exclusive lock on
> write buf, so no one can read the contents of read buf (the location
> of read buf will be after write buf) whereas, in standby, it will be
> possible to read the contents of the read buf.  I think this is not a
> correctness issue, so we might want to leave as it is, what do you
> say?

Well, as in the case above, you're doing extra work to make sure that
every state which can be observed on the standby can also be observed
on the master.  But that has no benefit, and it does have a cost: you
have to release and reacquire a lock that you could otherwise retain,
saving CPU cycles and code complexity.  So I think the way you've got
it is not good.

>> + * We can log the exact changes made to meta page, however as no
>> + * concurrent operation could see the index during the replay of 
>> this
>> + * record, we can perform the operations during replay as they are
>> + * done here.
>>
>> Don't use a comma to join two sentences.  Also, I don't understand
>> anything up to the "however".
>>
>
> I mean the below changes made to meta buf (refer existing code):
> metap->hashm_mapp[metap->hashm_nmaps] = num_buckets + 1;
>
> metap->hashm_nmaps++;
>
> I think it will be clear if you look both the DO and REDO operation of
> XLOG_HASH_INIT_BITMAP_PAGE.  Let me know if you still think that the
> comment needs to be changed?

I think it's not very clear.  I think what this boils down to is that
this is another place where you've got a comment to explain that you
didn't log the entire metapage, but I don't think anyone would expect
that anyway, so it doesn't seem like a very important thing to
explain.  If you want a comment here, maybe something like /* This is
safe only because nobody else can be modifying the index at this
stage; it's only visible to the transaction that is creating it */

>> + * Set pd_lower just past the end of the metadata.  This is to log
>> + * full_page_image of metapage in xloginsert.c.
>>
>> Why the underscores?
>>
>> + * won't be a leak on standby, as next split will consume this 
>> space.
>
> No specific reason, just trying to resemble full_page_writes, but can
> change if you feel it doesn't make much sense.

Well, usually I don't separate_words_in_a_sentence_with_underscores.
It mak

Re: [HACKERS] background sessions

2017-03-08 Thread Pavel Stehule
Hi

2017-03-01 3:35 GMT+01:00 Peter Eisentraut :

> > For additional entertainment, I include patches that integrate
> > background sessions into dblink.  So dblink can open a connection to a
> > background session, and then you can use the existing dblink functions
> > to send queries, read results, etc.  People use dblink to make
> > self-connections to get autonomous subsessions, so this would directly
> > address that use case.  The 0001 patch is some prerequisite refactoring
> > to remove an ugly macro mess, which is useful independent of this.  0002
> > is the actual patch.
>
> Updated patch, mainly with improved error handling and some tidying up.
>

I am checking this patch. I have few questions ( I didn't find a reply in
doc)

1. will be background session process closed automatically when parent
process is closed?

2. what timeouts are valid for this process - statement timeout, idle in
transaction timeout

I see significant risk on leaking sessions.

There can be more doc and examples in plpython doc. It will be main
interface for this feature. Mainly about session processing.

Regards

Pavel

p.s. It is great functionality and patch looks very well.





> Related to this is also the patch in
>  693b-cdc6d16b9...@2ndquadrant.com>
> as a resource control mechanism.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-08 Thread Alvaro Herrera
Robert Haas wrote:
> On Wed, Mar 8, 2017 at 12:14 PM, Alvaro Herrera
>  wrote:
> > Alvaro Herrera wrote:
> >> Here's a rebased set of patches.  This is the same Pavan posted; I only
> >> fixed some whitespace and a trivial conflict in indexam.c, per 
> >> 9b88f27cb42f.
> >
> > Jaime noted that I forgot the attachments.  Here they are
> 
> If I recall correctly, the main concern about 0001 was whether it
> might negatively affect performance, and testing showed that, if
> anything, it was a little better. Does that sound right?

Not really -- it's a bit slower actually in a synthetic case measuring
exactly the slowed-down case.  See
https://www.postgresql.org/message-id/cad__ougk12zqmwwjzim-yyud1y8jmmy6x9yectnif3rpp6h...@mail.gmail.com
I bet in normal cases it's unnoticeable.  If WARM flies, then it's going
to provide a larger improvement than is lost to this.

> Regarding 0002, I think this could use some documentation someplace
> explaining the overall theory of operation.  README.HOT, maybe?

Hmm.  Yeah, we should have something to that effect.  0005 includes
README.WARM, but I think there should be some place unified that
explains the whole thing.

> + * Most often and unless we are dealing with a pg-upgraded cluster, the
> + * root offset information should be cached. So there should not be too
> + * much overhead of fetching this information. Also, once a tuple is
> + * updated, the information will be copied to the new version. So it's 
> not
> + * as if we're going to pay this price forever.
> 
> What if a tuple is updated -- presumably clearing the
> HEAP_LATEST_TUPLE on the tuple at the end of the chain -- and then the
> update aborts?  Then we must be back to not having this information.

I will leave this question until I have grokked how this actually works.

> One overall question about this patch series is how we feel about
> using up this many bits.  0002 uses a bit from infomask, and 0005 uses
> a bit from infomask2.  I'm not sure if that's everything, and then I
> think we're steeling some bits from the item pointers, too.  While the
> performance benefits of the patch sound pretty good based on the test
> results so far, this is definitely the very last time we'll be able to
> implement a feature that requires this many bits.

Yeah, this patch series uses a lot of bits.  At some point we should
really add the "last full-scanned by version X" we discussed a long time
ago, and free the MOVED_IN / MOVED_OFF bits that have been unused for so
long.  Sadly, once we add that, we need to wait one more release before
we can use the bits anyway.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-08 Thread Peter van Hardenberg
Small point of order: YAML is not strictly a super-set of JSON.

Editorializing slightly, I have not seen much interest in the world for
YAML support though I'd be interested in evidence to the contrary.

On Tue, Mar 7, 2017 at 1:43 PM, Sven R. Kunze  wrote:

> Hi,
>
> about the datetime issue: as far as I know, JSON does not define a
> serialization format for dates and timestamps.
>
> On the other hand, YAML (as a superset of JSON) already supports a
> language-independent date(time) serialization format (
> http://yaml.org/type/timestamp.html).
>
> I haven't had a glance into the SQL/JSON standard yet and a quick search
> didn't reveal anything. However, reading your test case here
> https://github.com/postgrespro/sqljson/blob/5a8a241/src/
> test/regress/sql/sql_json.sql#L411 it seems as if you intend to parse all
> strings in the form of "-MM-DD" as dates. This is problematic in case a
> string happens to look like this but is not intended to be a date.
>
> Just for the sake of completeness: YAML solves this issue by omitting the
> quotation marks around the date string (just as JSON integers have no
> quotations marks around them).
>
> Regards,
> Sven
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-08 Thread Magnus Hagander
On Wed, Mar 8, 2017 at 11:48 AM, Peter van Hardenberg  wrote:

> Small point of order: YAML is not strictly a super-set of JSON.
>
> Editorializing slightly, I have not seen much interest in the world for
> YAML support though I'd be interested in evidence to the contrary.
>
>
The world of configuration management seems to for some reason run off
YAML, but that's the only places I've seen it recently (ansible, puppet
etc).

That said if we're introducing something new, it's usually better to copy
from another format than to invite your own.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-08 Thread Robert Haas
On Tue, Mar 7, 2017 at 2:38 PM, Andres Freund  wrote:
> On 2017-03-07 12:21:59 +0300, Oleg Bartunov wrote:
>> On 2017-03-03 15:49:38 -0500, David Steele wrote:
>> > I propose we move this patch to the 2017-07 CF so further development
>> > and review can be done without haste and as the standard becomes more
>> > accessible.
>
> +1

I agree that this should not go into v10.  February 28th is not the
right time for a large, never-before-seen patch to show up with
expectations of getting committed for the current cycle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting planning time with EXPLAIN

2017-03-08 Thread Stephen Frost
Ashutosh,

* Ashutosh Bapat (ashutosh.ba...@enterprisedb.com) wrote:
> On Tue, Mar 7, 2017 at 9:23 PM, Stephen Frost  wrote:
> > I'm going through these with an eye towards committing them soon.  I've
> > already adjusted some of the documentation and comments per our earlier
> > discussion
> 
> Thanks a lot.

I've pushed this with the editorialization of the documentation which
we discussed up-thread along with some improvements to the comments
along with your latest variable name suggestions and changes as
discussed.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Need a builtin way to run all tests faster manner

2017-03-08 Thread Peter Eisentraut
On 3/6/17 19:53, Andres Freund wrote:
> I'm just not quite sure what the best way is to make it easier to run
> tests in parallel within the tree.

make check-world -j2 seems to run fine for me.

With higher -j I appear to be running out of memory or disks space, so I
haven't checked that any further, but it seems possible.

You can also run prove with a -j option.

And we could parallelize some of the contrib/pl tests, e.g., plpython.

(The problem is that parallel make and parallel tests together might
explode a bit, so we might want some way to control which aspect we
parallelize.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-08 Thread Robert Haas
On Tue, Mar 7, 2017 at 6:41 PM, Robert Haas  wrote:
>>> Great, thanks.  0001 looks good to me now, so committed.
>>
>> Committed 0002.
>
> Here are some initial review thoughts on 0003 based on a first read-through.

More thoughts on the main patch:

The text you've added to the hash index README throws around the term
"single WAL entry" pretty freely.  For example: "An insertion that
causes an addition of an overflow page is logged as a single WAL entry
preceded by a WAL entry for new overflow page required to insert a
tuple."  When you say a single WAL entry, you make it sound like
there's only one, but because it's preceded by another WAL entry,
there are actually two.  I would rephase this to just avoid using the
word "single" this way.  For example: "If an insertion causes the
addition of an overflow page, there will be one WAL entry for the new
overflow page and a second entry for the insert itself."

+mode anyway).  It would seem natural to complete the split in VACUUM, but since
+splitting a bucket might require allocating a new page, it might fail if you
+run out of disk space.  That would be bad during VACUUM - the reason for
+running VACUUM in the first place might be that you run out of disk space,
+and now VACUUM won't finish because you're out of disk space.  In contrast,
+an insertion can require enlarging the physical file anyway.

But VACUUM actually does try to complete the splits, so this is wrong, I think.

+Squeeze operation moves tuples from one of the buckets later in the chain to

A squeeze operation

+As Squeeze operation involves writing multiple atomic operations, it is

As a squeeze operation involves multiple atomic operations

+if (!xlrec.is_primary_bucket_page)
+XLogRegisterBuffer(0, bucket_buf, REGBUF_STANDARD);

So, in hashbucketcleanup, the page is registered and therefore an FPI
will be taken if this is the first reference to this page since the
checkpoint, but hash_xlog_delete won't restore the FPI.  I think that
exposes us to a torn-page hazard.
_hash_freeovflpage/hash_xlog_squeeze_page seems to have the same
disease, as does _hash_squeezebucket/hash_xlog_move_page_contents.
Not sure exactly what the fix should be.

+/*
+ * As bitmap page doesn't have standard page layout, so this will
+ * allow us to log the data.
+ */
+XLogRegisterBuffer(2, mapbuf, REGBUF_STANDARD);
+XLogRegisterBufData(2, (char *) &bitmap_page_bit, sizeof(uint32));

If the page doesn't have a standard page layout, why are we passing
REGBUF_STANDARD?  But I think you've hacked things so that bitmap
pages DO have a standard page layout, per the change to
_hash_initbitmapbuffer, in which case the comment seems wrong.

(The comment is also a little confusing grammatically, but let's iron
out the substantive issue first.)

+recptr = XLogInsert(RM_HASH_ID, XLOG_HASH_ADD_OVFL_PAGE);
+
+PageSetLSN(BufferGetPage(ovflbuf), recptr);
+PageSetLSN(BufferGetPage(buf), recptr);
+
+if (BufferIsValid(mapbuf))
+PageSetLSN(BufferGetPage(mapbuf), recptr);
+
+if (BufferIsValid(newmapbuf))
+PageSetLSN(BufferGetPage(newmapbuf), recptr);

I think you forgot to call PageSetLSN() on metabuf.  (There are 5
block references in the write-ahead log record but only 4 calls to
PageSetLSN ... surely that can't be right!)

+/*
+ * We need to release the locks once the prev pointer of overflow bucket
+ * and next of left bucket are set, otherwise concurrent read might skip
+ * the bucket.
+ */
+if (BufferIsValid(leftbuf))
+UnlockReleaseBuffer(leftbuf);
+UnlockReleaseBuffer(ovflbuf);

I don't believe the comment.  Postponing the lock release wouldn't
cause the concurrent read to skip the bucket.  It might cause it to be
delayed unnecessarily, but the following comment already addresses
that point.  I think you can just nuke this comment.

+xlrec.is_prim_bucket_same_wrt = (wbuf == bucketbuf) ? true : false;
+xlrec.is_prev_bucket_same_wrt = (wbuf == prevbuf) ? true : false;

Maybe just xlrec.is_prim_bucket_same_wrt = (wbuf == bucketbuf); and similar?

+/*
+ * We release locks on writebuf and bucketbuf at end of replay operation
+ * to ensure that we hold lock on primary bucket page till end of
+ * operation.  We can optimize by releasing the lock on write buffer as
+ * soon as the operation for same is complete, if it is not same as
+ * primary bucket page, but that doesn't seem to be worth complicating the
+ * code.
+ */
+if (BufferIsValid(writebuf))
+UnlockReleaseBuffer(writebuf);
+
+if (BufferIsValid(bucketbuf))
+UnlockReleaseBuffer(bucketbuf);

Here in hash_xlog_squeeze_page(), you wait until the very end to
release these locks, but in e.g. hash_xlog_addovflpage you release
them considerably earlier for reasons that seem like they would also
be valid here.  I 

Re: [HACKERS] Need a builtin way to run all tests faster manner

2017-03-08 Thread Andres Freund
On 2017-03-08 16:32:49 -0500, Peter Eisentraut wrote:
> On 3/6/17 19:53, Andres Freund wrote:
> > I'm just not quite sure what the best way is to make it easier to run
> > tests in parallel within the tree.
> 
> make check-world -j2 seems to run fine for me.

Hm, I at least used to get a lot of spurious failures with this. I
e.g. don't think the free port selection is race free.  Also that
doesn't solve the issue of the time spent in repetitive initdbs - but
that's mainly in contrib, and we probably solve that there by having a
special target in contrib/ building a cluster once.


> You can also run prove with a -j option.

Ah, interesting.


> (The problem is that parallel make and parallel tests together might
> explode a bit, so we might want some way to control which aspect we
> parallelize.)

Yea, I indeed see that. Probably could do a top-level prerequisite for
check on a check-world subset that includes docs?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-08 Thread Thomas Munro
On Wed, Mar 8, 2017 at 1:15 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> +++ b/src/include/storage/barrier.h
>> +#include "postgres.h"
>
>> Huh, that normally shouldn't be in a header.  I see you introduced that
>> in a bunch of other places too - that really doesn't look right to me.
>
> That is absolutely not project style and is not acceptable.
>
> The core reason why not is that postgres.h/postgres_fe.h/c.h have to be
> the *first* inclusion in every compilation, for arcane portability reasons
> you really don't want to know about.  (Suffice it to say that on some
> platforms, stdio.h isn't all that std.)  Our coding rule for that is that
> we put the appropriate one of these first in every .c file, while .h files
> always assume that it's been included already.  As soon as you break that
> convention, it becomes unclear from looking at a .c file whether the
> ordering requirement has been satisfied.  Also, since now you've moved
> the must-be-first requirement to some other header file(s), you risk
> breakage when somebody applies another project convention about
> alphabetizing #include references for all headers other than those magic
> ones.

Thanks for the explanation.  Will post a new series addressing this
and other complaints from Andres shortly.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need a builtin way to run all tests faster manner

2017-03-08 Thread Michael Paquier
On Thu, Mar 9, 2017 at 6:49 AM, Andres Freund  wrote:
> On 2017-03-08 16:32:49 -0500, Peter Eisentraut wrote:
>> On 3/6/17 19:53, Andres Freund wrote:
>> > I'm just not quite sure what the best way is to make it easier to run
>> > tests in parallel within the tree.
>>
>> make check-world -j2 seems to run fine for me.
>
> Hm, I at least used to get a lot of spurious failures with this. I
> e.g. don't think the free port selection is race free.

pg_regress is running each Postgres instance in a separate Unix socket
directory, so the port selection is not problem. PostgresNode.pm is
also careful about that.

> Also that
> doesn't solve the issue of the time spent in repetitive initdbs - but
> that's mainly in contrib, and we probably solve that there by having a
> special target in contrib/ building a cluster once.

Yeah, you can count the growing src/test/modules in that as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign partition DDL regression tests

2017-03-08 Thread Amit Langote
On 2017/03/09 1:28, Robert Haas wrote:
> On Tue, Feb 21, 2017 at 8:40 PM, Amit Langote
>  wrote:
>> Ashutosh Bapat pointed out [0] that regression tests are missing for the
>> foreign partition DDL commands.  Attached patch takes care of that.
> 
> Committed.

Thanks.

> I didn't do anything about Ashutosh's comment that we could use ALTER
> FOREIGN TABLE rather than ALTER TABLE someplace; that didn't seem
> critical.

Attached is a patch to fix that, just in case.

> Also, the names of the objects in this test are kinda generic (pt2 et.
> al.) but they match the existing names in the same file (pt1, foo).
> If we're going to start differentiating those a little better, we
> should probably change them all, and as a separate commit.

Agreed.

Thanks,
Amit
>From f0467b30b74d2af72480bb20867164ef030f3c56 Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 9 Mar 2017 09:51:39 +0900
Subject: [PATCH] Use ALTER FOREIGN TABLE with foreign table in tests

---
 src/test/regress/expected/foreign_data.out | 8 
 src/test/regress/sql/foreign_data.sql  | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index a0f969f3e5..ea197c5e4f 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -1851,11 +1851,11 @@ Server: s0
 FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
 
 -- cannot add column to a partition
-ALTER TABLE pt2_1 ADD c4 char;
+ALTER FOREIGN TABLE pt2_1 ADD c4 char;
 ERROR:  cannot add column to a partition
 -- ok to have a partition's own constraints though
-ALTER TABLE pt2_1 ALTER c3 SET NOT NULL;
-ALTER TABLE pt2_1 ADD CONSTRAINT p21chk CHECK (c2 <> '');
+ALTER FOREIGN TABLE pt2_1 ALTER c3 SET NOT NULL;
+ALTER FOREIGN TABLE pt2_1 ADD CONSTRAINT p21chk CHECK (c2 <> '');
 \d+ pt2
 Table "public.pt2"
  Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
@@ -1880,7 +1880,7 @@ Server: s0
 FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
 
 -- cannot drop inherited NOT NULL constraint from a partition
-ALTER TABLE pt2_1 ALTER c1 DROP NOT NULL;
+ALTER FOREIGN TABLE pt2_1 ALTER c1 DROP NOT NULL;
 ERROR:  column "c1" is marked NOT NULL in parent table
 -- partition must have parent's constraints
 ALTER TABLE pt2 DETACH PARTITION pt2_1;
diff --git a/src/test/regress/sql/foreign_data.sql b/src/test/regress/sql/foreign_data.sql
index c13d5ffbe9..8c5fcb8b35 100644
--- a/src/test/regress/sql/foreign_data.sql
+++ b/src/test/regress/sql/foreign_data.sql
@@ -720,16 +720,16 @@ ALTER TABLE pt2 ATTACH PARTITION pt2_1 FOR VALUES IN (1);
 \d+ pt2_1
 
 -- cannot add column to a partition
-ALTER TABLE pt2_1 ADD c4 char;
+ALTER FOREIGN TABLE pt2_1 ADD c4 char;
 
 -- ok to have a partition's own constraints though
-ALTER TABLE pt2_1 ALTER c3 SET NOT NULL;
-ALTER TABLE pt2_1 ADD CONSTRAINT p21chk CHECK (c2 <> '');
+ALTER FOREIGN TABLE pt2_1 ALTER c3 SET NOT NULL;
+ALTER FOREIGN TABLE pt2_1 ADD CONSTRAINT p21chk CHECK (c2 <> '');
 \d+ pt2
 \d+ pt2_1
 
 -- cannot drop inherited NOT NULL constraint from a partition
-ALTER TABLE pt2_1 ALTER c1 DROP NOT NULL;
+ALTER FOREIGN TABLE pt2_1 ALTER c1 DROP NOT NULL;
 
 -- partition must have parent's constraints
 ALTER TABLE pt2 DETACH PARTITION pt2_1;
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC] Personal presentation and request for clarification

2017-03-08 Thread Robert Haas
On Thu, Mar 2, 2017 at 6:15 PM, João Miguel Afonso
 wrote:
> The project that most caught my eye was on "Implementing push-based query
> executor".
> Although it completely fits my capabilities and current research, I have
> some concerns on "The ability to understand and modify PostgresSQL executor
> code" as I had not enough time to understand the dimension of the referred
> changes.

They are formidable.

https://www.postgresql.org/message-id/CA%2BTgmoaf_uR_wVMj53MVvyEQ_wRx62MM3QQwR6aPZe0Lbr%2BJew%40mail.gmail.com

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-08 Thread Robert Haas
On Wed, Mar 1, 2017 at 3:56 AM, Ashutosh Bapat
 wrote:
>> 2. If the PartitionJoinPath emerges as the best path, we create paths
>> for each of the remaining child-joins. Then we collect paths with
>> properties same as the given PartitionJoinPath, one from each
>> child-join. These paths are converted into plans and a Merge/Append
>> plan is created combing these plans. The paths and plans for
>> child-join are created in a temporary memory context. The final plan
>> for each child-join is copied into planner's context and the temporary
>> memory context is reset.
>>
>
> Robert and I discussed this in more detail. Path creation code may
> allocate objects other than paths. postgres_fdw, for example,
> allocates character array to hold the name of relation being
> pushed-down. When the temporary context gets zapped after creating
> paths for a given child-join, those other objects also gets thrown
> away. Attached patch has implemented the idea that came out of the
> discussion.
>
> We create a memory context for holding paths at the time of creating
> PlannerGlobal and save it in PlannerGlobal. The patch introduces a new
> macro makePathNode() which allocates the memory for given type of path
> from this context. Every create_*_path function has been changed to
> use this macro instead of makeNode(). In standard_planner(), at the
> end of planning we destroy the memory context freeing all the paths
> allocated. While creating a plan node, planner copies everything
> required by the plan from the path, so the path is not needed any
> more. So, freeing corresponding memory should not have any adverse
> effects.
>
> Most of the create_*_path() functions accept root as an argument, thus
> the temporary path context is available through root->glob everywhere.
> An exception is create_append_path() which does not accept root as an
> argument. The patch changes create_append_path() and its callers like
> set_dummy_rel_pathlist(), mark_dummy_rel() to accept root as an
> argument. Ideally paths are not required after creating plan, so we
> should be
> able to free the context right after the call to create_plan(). But we
> need dummy paths while creating flat rtable in
> set_plan_references()->add_rtes_to_flat_rtable(). We used to So free
> the path context at the end of planning cycle. Now that we are
> allocating all the paths in a different memory context, it doesn't
> make sense to switch context in mark_dummy_rel().
>
> 0001 patch implements the idea described above.
> 0002 patch adds instrumentation to measure memory consumed in
> standard_planner() call.
> 0003 patch adds a GUC zap_paths to enable/disable destroying path context.
> The last two patches are for testing only.
>
> Attached also find the SQL script and its output showing the memory
> saved. For a 5 way self-join of pg_class, the total memory consumed in
> standard_planner() is 760K without patch and with patch it comes down
> to 713K, saving 47K memory otherwise occupied by paths. It looks like
> something useful even without partition-wise joins.

Of course, that's not a lot, but the savings will be a lot better for
partition-wise joins.  Do you have a set of patches for that feature
that apply on top of 0001?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] on_dsm_detach() callback and parallel tuplesort BufFile resource management

2017-03-08 Thread Robert Haas
On Mon, Mar 6, 2017 at 7:33 PM, Peter Geoghegan  wrote:
> Recap
> =
>
> A design goal of parallel tuplesort is that the parallel case be as
> close to possible as the serial case is already. Very little new code
> is needed in tuplesort.c and logtape.c, most of which is about using a
> new lower-level facility which is very well encapsulated. And, even
> buffile.c "unification", the guts of everything, doesn't have many new
> special cases.
>
> So, workers start with "unifiable" BufFiles, which have very little
> difference with conventional temp BufFiles -- they just create
> filenames in a deterministic fashion, making them discoverable to the
> leader process, through a process of unification (plus you have the
> refcount state in shared memory, though very little). In principle,
> workers could decide to not go through with unification long after the
> fact, and close their unifiable temp files without doing anything for
> the leader, and that would be just fine. By the same token, the
> unified BufFile owned by the leader can be extended if needs be, in a
> way that is virtually indistinguishable from just extending the end of
> any other BufFile. logtape.c recycling for future randomAccess cases
> works just the same as before.

I think something like 0007-hj-shared-buf-file-v6.patch from
https://www.postgresql.org/message-id/CAEepm=3g=dMG+84083fkFzLvgMJ7HdhbGB=aezabnukbzm3...@mail.gmail.com
is probably a good approach to this problem.  In essence, it dodges
the problem of trying to transfer ownership by making ownership be
common from the beginning.  That's what I've been recommending, or
trying to, anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Print correct startup cost for the group aggregate.

2017-03-08 Thread Robert Haas
On Sun, Mar 5, 2017 at 11:32 PM, Ashutosh Bapat
 wrote:
>> I think there have been
>> previous discussions of switching over to the practice for which you
>> are advocating here, but my impression (without researching) is that
>> the current practice is more like what Rushabh did.
>
> I am not sure Rushabh's approach is correct. Here's the excerpt from my mail.
>
>>> The reason the reason why startup_cost = input_startup_cost and not
>>> input_total_cost for aggregation by sorting is we don't need the whole
>>> input before the Group/Agg plan can produce the first row.
>
> With Rushabh's approach the startup cost of aggregation by sorting
> would be way higher that it's right now. Secondly, it would match that
> of hash aggregation and thus forcing hash aggregation to be chosen in
> almost all the cases.

You're right.  I'm wrong.  I take it all back.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-08 Thread Andrew Dunstan


On 03/08/2017 08:33 AM, Peter Eisentraut wrote:
> On 3/6/17 09:38, Peter Eisentraut wrote:
>> On 3/4/17 01:45, Petr Jelinek wrote:
>>> If that's the case, the attached should fix it, but I have no way of
>>> testing it on windows, I can only say that it still works on my machine
>>> so at least it hopefully does not make things worse.
>> Committed that.  Let's see how it goes.
> So that didn't work.  Now we probably need someone to dig into that host
> directly.
>
> Andrew, could you help?
>


I'll take a look when I get a chance. Might be a few days.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-08 Thread Robert Haas
On Sun, Mar 5, 2017 at 7:14 PM, Peter Geoghegan  wrote:
> On Sat, Mar 4, 2017 at 2:15 PM, Peter Geoghegan  wrote:
>> So, I agree with Robert that we should actually use heap size for the
>> main, initial determination of # of workers to use, but we still need
>> to estimate the size of the final index [1], to let the cost model cap
>> the initial determination when maintenance_work_mem is just too low.
>> (This cap will rarely be applied in practice, as I said.)
>>
>> [1] 
>> https://wiki.postgresql.org/wiki/Parallel_External_Sort#bt_estimated_nblocks.28.29_function_in_pageinspect
>
> Having looked at it some more, this no longer seems worthwhile. In the
> next revision, I will add a backstop that limits the use of
> parallelism based on a lack of maintenance_work_mem in a simpler
> manner. Namely, the worker will have to be left with a
> maintenance_work_mem/nworkers share of no less than 32MB in order for
> parallel CREATE INDEX to proceed. There doesn't seem to be any great
> reason to bring the volume of data to be sorted into it.

+1.

> I expect the cost model to be significantly simplified in the next
> revision in other ways, too. There will be no new index storage
> parameter, nor a disable_parallelddl GUC. compute_parallel_worker()
> will be called in a fairly straightforward way within
> plan_create_index_workers(), using heap blocks, as agreed to already.

+1.

> pg_restore will avoid parallelism (that will happen by setting
> "max_parallel_workers_maintenance  = 0" when it runs), not because it
> cannot trust the cost model, but because it prefers to parallelize
> things its own way (with multiple restore jobs), and because execution
> speed may not be the top priority for pg_restore, unlike a live
> production system.

This part I'm not sure about.  I think people care quite a lot about
pg_restore speed, because they are often down when they're running it.
And they may have oodles mode CPUs that parallel restore can use
without help from parallel query.  I would be inclined to leave
pg_restore alone and let the chips fall where they may.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-03-08 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Ashutosh Sharma
> To start with, I ran the regression test-suite and didn't find any failures.
> But, then I am not sure if huge_pages are getting used or not. However,
> upon checking the settings for huge_pages and I found it as 'on'. I am
> assuming, if huge pages is not being used due to shortage of large pages,
> it should have fallen back to non-huge pages.

You are right, the server falls back to non-huge pages when the large pages run 
short.

> I also ran the pgbench tests on read-only workload and here are the results
> I got.
> 
> pgbench -c 4 -j 4 - T 600 bench
> 
> huge_pages=on, TPS = 21120.768085
> huge_pages=off, TPS = 20606.288995

Thanks.  It's about 2% improvement, which is the same as what I got.


From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com]
> The line beginning 'Huge pages are known as...' has been accidentally
> duplicated.

Oops, how careless I was.  Fixed.  As Ashutosh referred, I added a very simple 
suggestion to use Windows Group Policy tool.

Regards
Takayuki Tsunakawa



win_large_pages_v9.patch
Description: win_large_pages_v9.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-08 Thread Peter Geoghegan
On Wed, Mar 8, 2017 at 5:33 PM, Robert Haas  wrote:
>> pg_restore will avoid parallelism (that will happen by setting
>> "max_parallel_workers_maintenance  = 0" when it runs), not because it
>> cannot trust the cost model, but because it prefers to parallelize
>> things its own way (with multiple restore jobs), and because execution
>> speed may not be the top priority for pg_restore, unlike a live
>> production system.
>
> This part I'm not sure about.  I think people care quite a lot about
> pg_restore speed, because they are often down when they're running it.
> And they may have oodles mode CPUs that parallel restore can use
> without help from parallel query.  I would be inclined to leave
> pg_restore alone and let the chips fall where they may.

I thought that we might want to err on the side of preserving the
existing behavior, but arguably that's actually what I failed to do.
That is, since we don't currently have a pg_restore flag that controls
the maintenance_work_mem used by pg_restore, "let the chips fall where
they may" is arguably the standard that I didn't uphold.

It might still make sense to take a leaf out of the parallel query
book on this question. That is, add an open item along the lines of
"review behavior of pg_restore with parallel CREATE INDEX" that we
plan to deal with close to the release of Postgres 10.0, when feedback
from beta testing is in. There are a number of options, none of which
are difficult to write code for. The hard part is determining what
makes most sense for users on balance.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 2:00 AM, Amit Khandekar  wrote:
> Yeah, that seems to be right in most of the cases. The only cases
> where your formula seems to give too few workers is for something like
> : (2, 8, 8). For such subplans, we should at least allocate 8 workers.
> It turns out that in most of the cases in my formula, the Append
> workers allocated is just 1 worker more than the max per-subplan
> worker count. So in (2, 1, 1, 8), it will be a fraction more than 8.
> So in the patch, in addition to the log2() formula you proposed, I
> have made sure that it allocates at least equal to max(per-subplan
> parallel_workers values).

Yeah, I agree with that.

Some review:

+typedef struct ParallelAppendDescData
+{
+slock_tpa_mutex;/* mutual exclusion to choose
next subplan */
+ParallelAppendInfo pa_info[FLEXIBLE_ARRAY_MEMBER];
+} ParallelAppendDescData;

Instead of having ParallelAppendInfo, how about just int
pa_workers[FLEXIBLE_ARRAY_MEMBER]?  The second structure seems like
overkill, at least for now.

+static inline void
+exec_append_scan_first(AppendState *appendstate)
+{
+appendstate->as_whichplan = 0;
+}

I don't think this is buying you anything, and suggest backing it out.

+/* Backward scan is not supported by parallel-aware plans */
+Assert(!ScanDirectionIsBackward(appendstate->ps.state->es_direction));

I think you could assert ScanDirectionIsForward, couldn't you?
NoMovement, I assume, is right out.

+elog(DEBUG2, "ParallelAppend : pid %d : all plans already
finished",
+ MyProcPid);

Please remove (and all similar cases also).

+ sizeof(*node->as_padesc->pa_info) * node->as_nplans);

I'd use the type name instead.

+for (i = 0; i < node->as_nplans; i++)
+{
+/*
+ * Just setting all the number of workers to 0 is enough. The logic
+ * of choosing the next plan in workers will take care of everything
+ * else.
+ */
+padesc->pa_info[i].pa_num_workers = 0;
+}

Here I'd use memset.

+return (min_whichplan == PA_INVALID_PLAN ? false : true);

Maybe just return (min_whichplan != PA_INVALID_PLAN);

-  childrel->cheapest_total_path);
+
childrel->cheapest_total_path);

Unnecessary.

+{
 partial_subpaths = accumulate_append_subpath(partial_subpaths,
linitial(childrel->partial_pathlist));
+}

Don't need to add braces.

+/*
+ * Extract the first unparameterized, parallel-safe one among the
+ * child paths.
+ */

Can we use get_cheapest_parallel_safe_total_inner for this, from
a71f10189dc10a2fe422158a2c9409e0f77c6b9e?

+if (rel->partial_pathlist != NIL &&
+(Path *) linitial(rel->partial_pathlist) == subpath)
+partial_subplans_set = bms_add_member(partial_subplans_set, i);

This seems like a scary way to figure this out.  What if we wanted to
build a parallel append subpath with some path other than the
cheapest, for some reason?  I think you ought to record the decision
that set_append_rel_pathlist makes about whether to use a partial path
or a parallel-safe path, and then just copy it over here.

-create_append_path(grouped_rel,
-   paths,
-   NULL,
-   0);
+create_append_path(grouped_rel, paths, NULL, 0);

Unnecessary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 8:45 PM, Peter Geoghegan  wrote:
>> This part I'm not sure about.  I think people care quite a lot about
>> pg_restore speed, because they are often down when they're running it.
>> And they may have oodles mode CPUs that parallel restore can use
>> without help from parallel query.  I would be inclined to leave
>> pg_restore alone and let the chips fall where they may.
>
> I thought that we might want to err on the side of preserving the
> existing behavior, but arguably that's actually what I failed to do.
> That is, since we don't currently have a pg_restore flag that controls
> the maintenance_work_mem used by pg_restore, "let the chips fall where
> they may" is arguably the standard that I didn't uphold.
>
> It might still make sense to take a leaf out of the parallel query
> book on this question. That is, add an open item along the lines of
> "review behavior of pg_restore with parallel CREATE INDEX" that we
> plan to deal with close to the release of Postgres 10.0, when feedback
> from beta testing is in. There are a number of options, none of which
> are difficult to write code for. The hard part is determining what
> makes most sense for users on balance.

I like to err on the side of the approach that requires fewer changes.
That is, if the question is "does pg_restore need to treat this issue
specially?" and the answer is unclear, I like to assume it probably
doesn't until some contrary evidence emerges.

I mean, sometimes it is clear that you are going to need special
handling someplace, and then you have to do it.  But I don't see that
this is one of those cases, necessarily.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib modules and relkind check

2017-03-08 Thread Michael Paquier
On Wed, Mar 8, 2017 at 5:10 PM, Amit Langote
 wrote:
> On 2017/03/08 16:47, Michael Paquier wrote:
>> Only regular tables are tested as valid objects. Testing toast tables
>> is not worth the complication. Could you add as well a matview?
>
> Done in the attached updated patch.

+select count(*) > 0 from pg_visibility('matview');
+ ?column?
+--
+ f
+(1 row)
That's quite a generic name :) You may want to use less common names
in your tests.

OK, I am marking that as ready for committer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >