Re: [HACKERS] synchronous_commit option is not visible after pressing TAB

2017-05-16 Thread Masahiko Sawada
On Wed, May 17, 2017 at 11:21 AM, Peter Eisentraut
 wrote:
> On 5/16/17 19:45, Michael Paquier wrote:
>> On Tue, May 16, 2017 at 8:18 PM, tushar  
>> wrote:
>>> While creating subscription - if we press TAB button to see the available
>>> parameters , synchronous_commit parameter is not visible.
>>>
>>> postgres=# CREATE SUBSCRIPTION sub123 CONNECTION 'dbname=postgres port=5000'
>>> PUBLICATION pub WITH (
>>> CONNECT  COPY_DATACREATE_SLOT  ENABLED  SLOT_NAME
>>>
>>> synchronous_commit option is not visible
>>
>> Yes, and the items should be ordered alphabetically.
>
> Fixed.  There were a few more things that needed tweaking, too.
>

In addition to that, attached patch fixes two tab completion issues.

* ALTER SUBSCRIPTION hoge_sub SET PUBLICATION hoge_pub [TAB]
  -> Complete with "REFRESH ..." or "SKIP REFRESH".
* CREATE PUBLICATION hoge_pub FOR ALL TABLES [TAB] and CREATE
PUBLICATION hoge_pub FOR TABLE hoge_table [TAB]
  -> Complete with "WITH ("

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_tab_completion.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-05-16 Thread Dilip Kumar
On Fri, May 12, 2017 at 4:17 PM, Amit Khandekar  wrote:
> Option 3
> 
>
> BR, AR delete triggers on source partition
> BR, AR insert triggers on destination partition.
>
> Rationale :
> Since the update is converted to delete+insert, just skip the update
> triggers completely.

+1 to option3
Generally, BR triggers are used for updating the ROW value and AR
triggers to VALIDATE the row or to modify some other tables.  So it
seems that we can fire the triggers what is actual operation is
happening at the partition level.

For source partition, it's only the delete operation (no update
happened) so we fire delete triggers and for the destination only
insert operations so fire only inserts triggers.  That will keep the
things simple.  And, it will also be in sync with the actual partition
level delete/insert operations.

We may argue that user might have declared only update triggers and as
he has executed the update operation he may expect those triggers to
get fired.  But, I think this behaviour can be documented with the
proper logic that if the user is updating the partition key then he
must be ready with the Delete/Insert triggers also, he can not rely
only upon update level triggers.

Earlier I thought that option1 is better but later I think that this
can complicate the situation as we are firing first BR update then BR
delete and can change the row multiple time and defining such
behaviour can be complicated.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-16 Thread Michael Paquier
On Tue, May 16, 2017 at 11:26 PM, Ildus Kurbangaliev
 wrote:
> On Tue, 16 May 2017 21:36:11 +0900
> Etsuro Fujita  wrote:
>> On 2017/05/16 21:11, Ashutosh Bapat wrote:
>> > On Tue, May 16, 2017 at 5:35 PM, Ildus Kurbangaliev
>> >  wrote:
>>
>> >> I agree. Maybe this issue should be added to Postgresql Open Items?
>> >> I think there should be some complex solution that fixes not only
>> >> triggers for foreign tables at table partitioning, but covers other
>> >> possible not working cases.
>>
>> > I doubt if this is an open item, since DMLs on foreign tables are
>> > supported since 9.3 and support to add foreign tables to inheritance
>> > was added back in 9.5.
>>
>> I think this issue was introduced by the latter, so that was my fault.
>>
>> One approach I came up with to fix this issue is to rewrite the
>> targetList entries of an inherited UPDATE/DELETE properly using
>> rewriteTargetListUD, when generating a plan for each child table in
>> inheritance_planner.  Attached is a WIP patch for that.  Maybe I am
>> missing something, though.

Could this patch include some regression tests to see at what extent
it has been tested? We surely don't want to see that broken again in
the future as well. (Nit: I did not look at the patch in details yet)

> I tested the patch, looks good.

What kind of tests did you do?

  junkfilter = resultRelInfo->ri_junkFilter;
+ tupleid = NULL;
  estate->es_result_relation_info = resultRelInfo;
Er, what is that?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-16 Thread Michael Paquier
On Wed, May 17, 2017 at 7:56 AM, Bossart, Nathan  wrote:
> I think this issue already exists, as this comment in get_rel_oids(…) seems 
> to indicate:
>
> /*
>  * Since we don't take a lock here, the relation might be gone, or the
>  * RangeVar might no longer refer to the OID we look up here.  In the
>  * former case, VACUUM will do nothing; in the latter case, it will
>  * process the OID we looked up here, rather than the new one. Neither
>  * is ideal, but there's little practical alternative, since we're
>  * going to commit this transaction and begin a new one between now
>  * and then.
>  */
>  relid = RangeVarGetRelid(vacrel, NoLock, false);
>
> With the patch applied, I believe this statement still holds true.  So if the 
> relation disappears before we get to vacuum_rel(…), we will simply skip it 
> and move on to the next one.  The vacuum_rel(…) code provides a WARNING in 
> many cases (e.g. the user does not have privileges to VACUUM the table), but 
> we seem to silently skip the table when it disappears before the call to 
> vacuum_rel(…).

Yes, that's the bits using try_relation_open() which returns NULL if
the relation is gone. I don't think that we want VACUUM to be noisy
about that when running it on a database.

> If we added a WARNING to the effect of “skipping vacuum of  — 
> relation no longer exists” for this case, I think what you are suggesting 
> would be satisfied.

We would do no favor by reporting nothing to the user. Without any
information, the user triggering a manual vacuum may believe that the
relation has been vacuumed as it was listed but that would not be the
case.

> However, ANALYZE has a slight caveat.  While analyze_rel(…) silently skips 
> the relation if it no longer exists like vacuum_rel(…) does, we do not 
> pre-validate the columns list at all.  So, in an ANALYZE statement with 
> multiple tables and columns specified, it’ll only fail once we get to the 
> undefined column.  To fix this, we could add a check for the column lists 
> near get_rel_oids(…) and adjust do_analyze_rel(…) to emit a WARNING and skip 
> any columns that vanish in the meantime.
>
> Does this seem like a sane approach?
>
>   1. Emit WARNING when skipping if relation disappears before we get to it.
>   2. Early in vacuum(…), check that the specified columns exist.

And issue an ERROR, right?

>   3. Emit WARNING and skip any specified columns that vanish before 
> processing.

This looks like a sensible approach to me.

+   RelationAndColumns *relinfo = (RelationAndColumns *)
lfirst(relation);
+   int per_table_opts = options | relinfo->options;  /*
VACOPT_ANALYZE may be set per-table */
+   ListCell *oid;
I have just noticed this bit in your patch. So you can basically do
something like that:
VACUUM (ANALYZE) foo, (FULL) bar;
Do we really want to go down to this level of control? I would keep
things a bit more restricted as users may be confused by the different
lock levels involved by each operation, and make use of the same
options for all the relations listed. Opinions from others is welcome.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-05-16 Thread Ashutosh Bapat
On Tue, May 16, 2017 at 8:40 PM, Jeff Davis  wrote:
> On Mon, May 15, 2017 at 1:04 PM, David Fetter  wrote:
>> As the discussion has devolved here, it appears that there are, at
>> least conceptually, two fundamentally different classes of partition:
>> public, which is to say meaningful to DB clients, and "private", used
>> for optimizations, but otherwise opaque to DB clients.
>>
>> Mashing those two cases together appears to cause more problems than
>> it solves.
>
> I concur at this point. I originally thought hash functions might be
> made portable, but I think Tom and Andres showed that to be too
> problematic -- the issue with different encodings is the real killer.
>
> But I also believe hash partitioning is important and we shouldn't
> give up on it yet.
>
> That means we need to have a concept of hash partitions that's
> different from range/list partitioning. The terminology
> "public"/"private" does not seem appropriate. Logical/physical or
> external/internal might be better.
>
> With hash partitioning:
> * User only specifies number of partitions of the parent table; does
> not specify individual partition properties (modulus, etc.)

a well distributed integer column doesn't even need to be hashed, a
simple modulo works with it. If we are going towards "implicit" (yet
another name) partitioning, we could choose the strategy based on the
data type of the partition key, not just hash it always. Although, we
might end up hashing it in most of the cases.

> * Dump/reload goes through the parent table (though we may provide
> options so pg_dump/restore can optimize this)

Probably you imply immediate hash partitioned parent, but just let me
clarify it a bit. We support multi-level partitioning with each
partitioned table anywhere in the partitioning hierarchy choosing any
partitioning scheme. So, we can have range partitioned table as a
partition of a hash partitioned table or for that matter, a non-hash
partitioned table which is somewhere in the hiearchy rooted at the
hash partitioned table. So, for range/list even hash partitions that
are grand-children of a hash partitioned table, we will need to route
dump/reload through that hash partitioned table i.e. route it through
the topmost hash-partitioned table.

> * We could provide syntax to adjust the number of partitions, which
> would be expensive but still useful sometimes.
> * All DDL should be on the parent table, including check constraints,
> FKs, unique constraints, exclusion constraints, indexes, etc.

i.e. topmost hash partitioned table as explained above.

>   - Unique and exclusion constraints would only be permitted if the
> keys are a superset of the partition keys.

Do you think this constraint apply even after we support global
indexes? Isn't this applicable to all the partitioning strategies?

>   - FKs would only be permitted if the two table's partition schemes
> match and the keys are members of the same hash opfamily (this could
> be relaxed slightly, but it gets a little confusing if so)
> * No attach/detach of partitions

It will be good, if we can support this for maintenance purpose. If a
partition goes bad, we could replace it with its copy somewhere, using
attach and detach without affecting the whole table. Now does that
mean, that we will need to support some form of pg_dump/copy with
special flag to create copies of individual partitions? I think that
proposal has already been floated.

> * All partitions have the same permissions

Why's that?

>
> The only real downside is that it could surprise users -- why can I
> add a CHECK constraint on my range-partitioned table but not the
> hash-partitioned one? We should try to document this so users don't
> find that out too far along. As long as they aren't surprised, I think
> users will understand why these aren't quite the same concepts.
>

For a transparent hash (non-transparent in the sense of what Mark
Dilger proposed), any constraint other than implicit partitioning
constraint is applicable to the whole table or it's not applicable at
all. So, better if user adds it on the parent hash table. So, yes,
with this reasoning, we could document this fact.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-16 Thread Amit Langote
On 2017/05/17 11:22, Thomas Munro wrote:
> On Wed, May 17, 2017 at 10:13 AM, Kevin Grittner  wrote:
>> On Tue, May 16, 2017 at 4:50 PM, Thomas Munro
>>  wrote:
>>>
>>> I'm about to post a much simpler patch that collects uniform tuples
>>> from children, addressing the reported bug, and simply rejects
>>> transition tables on row-triggers on tables that are in either kind of
>>> inheritance hierarchy.  More soon...
>>
>> I agree that we can safely go that far, but not farther.  Thanks!
> 
> Here is that patch.  Thoughts?

I looked at the patch and noticed that there might be some confusion about
what's in the EState.es_root_result_relations array.

If you see the following block of code in ExecInitModifyTable():

/* If modifying a partitioned table, initialize the root table info */
if (node->rootResultRelIndex >= 0)
mtstate->rootResultRelInfo = estate->es_root_result_relations +
node->rootResultRelIndex;

You might be able to see that node->rootResultRelIndex is used as offset
into es_root_result_relations, which means the partitioned table root
being modified by this node.  The entries in es_root_result_relations
correspond to the partitioned table roots referenced as targets in
different parts of the query (for example, a WITH query might have its own
target partitioned tables).

So, in ExecSetupTriggerTransitionState() added by the patch, the following
code needs to be changed:

/*
 * Find the ResultRelInfo corresponding to the relation that was
 * explicitly named in the statement.
 */
if (estate->es_num_root_result_relations > 0)
{
/* Partitioned table.  The named relation is the first root. */
targetRelInfo = &estate->es_root_result_relations[0];
}

targetRelInfo should instead be set to mtstate->rootResultRelInfo that was
set in ExecInitModifyTable() as described above, IOW, as follows:

/* Partitioned table. */
if (mtstate->rootResultRelInfo != NULL)
targetRelInfo = mtstate->rootResultRelInfo;

I guess that's what you intend to do here.

Sorry if the comments that I wrote about es_root_result_relations in what
got committed as e180c8aa8c were not clear enough.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

2017-05-16 Thread Ashutosh Bapat
On Wed, May 17, 2017 at 7:24 AM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Tue, 16 May 2017 12:45:39 -0400, Tom Lane  wrote in 
> <22556.1494953...@sss.pgh.pa.us>
>> Robert Haas  writes:
>> > Concretely, I think we should replace the abort_cleanup_incomplete
>> > flag from my previous patch with a changing_xact_state flag and set
>> > that flag around all transaction state changes, clearing it when such
>> > changes have succeeded.  On error, the flag remains set, so we know
>> > that the state of that connection is unknown and that we must close it
>> > (killing outer transaction levels as needed).
>>
>> > Thoughts?
>>
>> Sounds plausible.
>
> I think that the current issue is the usability of the current
> connection. Even if any other command should fail, we can
> continue using the connection if ABORT TRANSACTION
> succeeds. Failure of ABORT immediately means that the connection
> is no longer available. If this discuttion is reasonable,
> changing_xact_state might be too-much.

pgfdw_get_result() raises an ERROR, which will end up in
pgfdw_xact_callback with ABORT. There if we found connection in bad
state, we will slam it down. The problem seems to be that waits for
none of the transaction related commands are interruptible; both while
sending the command and while receiving its result. Robert's
suggestion is to fix the later for transaction command. So, it looks
good to me. Can you please elaborate what's too much with
changing_xact_state?
>
> By the way if an fdw connection is stalled amid do_sql_command
> waiting a result, a cancel request doesn't work (in other words
> pgsql_xact_callback is not called) since EINTR is just ignored in
> libpq. Maybe we should teach libpq to error out with EINTR with
> some additional reasons. PQsetEINTRcallback() or something?

Right do_sql_command() is uninterruptible and so is PQexec() used for
sending "ABORT TRANSACTION" and "ROLLBACK TO SAVEPOINT". Robert's
suggestion seems to be that we make them interruptible with
changing_xact_state, so that we could notice a previously failed in
transaction related command in pgfdw_xact_callback(). So, his approach
scores on that front as well.



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Ashutosh Bapat
On Wed, May 17, 2017 at 12:04 AM, amul sul  wrote:
> On Tue, May 16, 2017 at 10:00 PM, Dilip Kumar  wrote:
>> On Tue, May 16, 2017 at 4:22 PM, amul sul  wrote:
>>> v6 patch has bug in partition oid mapping and indexing, fixed in the
>>> attached version.
>>>
>>> Now partition oids will be arranged in the ascending order of hash
>>> partition bound  (i.e. modulus and remainder sorting order)
>>
>> Thanks for the update patch. I have some more comments.
>>
>> 
>> + if (spec->remainder < 0)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>> +  errmsg("hash partition remainder must be less than modulus")));
>>
>> I think this error message is not correct, you might want to change it
>> to "hash partition remainder must be non-negative integer"
>>
>
> Fixed in the attached version;  used "hash partition remainder must be
> greater than or equal to 0" instead.

I would suggest "non-zero positive", since that's what we are using in
the documentation.

>
>> ---
>>
>> + The table is partitioned by specifying remainder and modulus for 
>> each
>> + partition. Each partition holds rows for which the hash value of
>>
>> Wouldn't it be better to say "modulus and remainder" instead of
>> "remainder and modulus" then it will be consistent?
>>
>
> You are correct, fixed in the attached version.
>
>> ---
>> +   An UPDATE that causes a row to move from one partition to
>> +   another fails, because
>>
>> fails, because -> fails because
>>
>
> This hunk is no longer exists in the attached patch, that was mistaken
> copied, sorry about that.
>
>> ---
>>
>> Wouldn't it be a good idea to document how to increase the number of
>> hash partitions, I think we can document it somewhere with an example,
>> something like Robert explained upthread?
>>
>> create table foo (a integer, b text) partition by hash (a);
>> create table foo1 partition of foo with (modulus 2, remainder 0);
>> create table foo2 partition of foo with (modulus 2, remainder 1);
>>
>> You can detach foo1, create two new partitions with modulus 4 and
>> remainders 0 and 2, and move the data over from the old partition
>>
>> I think it will be good information for a user to have? or it's
>> already documented and I missed it?
>>

This is already part of documentation contained in the patch.

Here are some more comments
@@ -3296,6 +3311,14 @@ ALTER TABLE measurement ATTACH PARTITION
measurement_y2008m02
not the partitioned table.
   
  
+
+ 
+  
+   An UPDATE that causes a row to move from one partition to
+   another fails, because the new value of the row fails to satisfy the
+   implicit partition constraint of the original partition.
+  
+ 
 
 
 
The description in this chunk is applicable to all the kinds of partitioning.
Why should it be part of a patch implementing hash partitioning?

+Declarative partitioning only supports hash, list and range
+partitioning, whereas table inheritance allows data to be
+divided in a manner of the user's choosing.  (Note, however,
+that if constraint exclusion is unable to prune partitions
+effectively, query performance will be very poor.)
Looks like the line width is less than 80 characters.

In partition_bounds_equal(), please add comments explaining why is it safe to
check just the indexes? May be we should add code under assertion to make sure
that the datums are equal as well. The comment could be something
like, "If two partitioned tables have different greatest moduli, their
partition schemes don't match. If they have same greatest moduli, and
all remainders have different indexes, they all have same modulus
specified and the partitions are ordered by remainders, thus indexes
array will be an identity i.e. index[i] = i. If the partition
corresponding to a given remainder exists, it will have same index
entry for both partitioned tables or if it's missing it will be -1.
Thus if indexes array matches, corresponding datums array matches. If
there are multiple remainders corresponding to a given partition,
their partitions are ordered by the lowest of the remainders, thus if
indexes array matches, both of the tables have same indexes arrays, in
both the tables remainders corresponding to multiple partitions all
have same indexes and thus same modulus. Thus again if the indexes are
same, datums are same.".

In the same function
if (key->strategy == PARTITION_STRATEGY_HASH)
{
intgreatest_modulus;

/*
 * Compare greatest modulus of hash partition bound which
 * is the last element of datums array.
 */
if (b1->datums[b1->ndatums - 1][0] != b2->datums[b2->ndatums - 1][0])
return false;

/* Compare indexes */
greatest_modulus = DatumGetInt32(b1->datums[b1->ndatums - 1][0]);
for (i = 0; i < greatest_modulus; i++)
if (b1->indexes[i] != b2->indexes[i])

Re: [HACKERS] COPY FROM STDIN behaviour on end-of-file

2017-05-16 Thread Thomas Munro
On Wed, May 17, 2017 at 2:39 PM, Tom Lane  wrote:
> Vaishnavi Prabakaran  writes:
>>> Tom Lane wrote:
 BTW, it would be a good idea for somebody to check this out on Windows,
 assuming there's a way to generate a keyboard EOF signal there.
>
>> Ctrl-Z + Enter in windows generates EOF signal. I verified this issue and
>> it is not reproducible in windows.
>
> Thanks for checking.  So that's two major platforms where it works "as
> expected" already.

Ah... the reason this is happening is that BSD-derived fread()
implementations return immediately if the EOF flag is set[1], but
others do not.  At a guess, not doing that is probably more conformant
with POSIX ("... less than nitems only if a read error or end-of-file
is *encountered*", which seems to refer to the underlying condition
and not the user-clearable EOF flag).

We are neither clearing nor checking the EOF flag explicitly, and that
only works out OK on fread implementation that also ignore it.

Tom Lane  wrote (further upstream):
> If we're going
> to go out of our way to make it work, should we mention it in psql-ref?

I went looking for the place to put that and found that it already says:

For \copy ... from stdin, data rows are read from the same
source that issued the command, continuing until \.
is read or the stream reaches EOF.

That's probably referring to the "outer" stream, such as a file that
contains the COPY ... FROM STDIN command, but doesn't it seem like a
general enough statement to cover ^d in the interactive case too?

Here's a version incorporating your other suggestions and a comment to explain.

[1] 
https://github.com/freebsd/freebsd/blob/afbef1895e627cd1993428a252d39b505cf6c085/lib/libc/stdio/refill.c#L79

-- 
Thomas Munro
http://www.enterprisedb.com


clear-copy-stream-eof-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fix refresh_option syntax of ALTER SUBSCRIPTION in document

2017-05-16 Thread Masahiko Sawada
Hi,

While reading documentation I found refresh_option syntax of ALTER
SUBSCRIPTION in documentation is not correct.

ALTER SUBSCRIPTION ... REFRESH PUBLICATION WITH (refresh_option value [, ...] )
should be changed to
ALTER SUBSCRIPTION ... REFRESH PUBLICATION WITH (refresh_option [=
value] [, ...])

Attached patch fixes this.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_document_alter_subscription.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Ashutosh Bapat
On Wed, May 17, 2017 at 9:38 AM, Tom Lane  wrote:
> Ashutosh Bapat  writes:
>> On Tue, May 16, 2017 at 6:50 PM, Peter Eisentraut
>>  wrote:
>>> On 5/15/17 23:45, Ashutosh Bapat wrote:
 +1. We should throw an error and add a line in documentation that
 collation should not be specified for hash partitioned table.
>
>>> Why is it even allowed in the parser then?
>
>> That grammar is common to all the partitioning strategies. It looks
>> like it's easy to handle collation for hash partitions in
>> transformation than in grammar. But, if we could handle it in grammar,
>> I don't have any objection to it.
>
> If you disallow something in the grammar, the error message is unlikely to
> be better than "syntax error".  That's not very desirable.

Right +1.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Tom Lane
Ashutosh Bapat  writes:
> On Tue, May 16, 2017 at 6:50 PM, Peter Eisentraut
>  wrote:
>> On 5/15/17 23:45, Ashutosh Bapat wrote:
>>> +1. We should throw an error and add a line in documentation that
>>> collation should not be specified for hash partitioned table.

>> Why is it even allowed in the parser then?

> That grammar is common to all the partitioning strategies. It looks
> like it's easy to handle collation for hash partitions in
> transformation than in grammar. But, if we could handle it in grammar,
> I don't have any objection to it.

If you disallow something in the grammar, the error message is unlikely to
be better than "syntax error".  That's not very desirable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified

2017-05-16 Thread Michael Paquier
On Wed, May 17, 2017 at 11:49 AM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
>> pqWait is internal to libpq, so we are free to set up what we want here.
>> Still I think that we should be consistent with what pqSocketCheck returns:
>
> Please let this what it is now for the same reason Robert mentioned.
>
>> +intret = 0;
>> +inttimeout = 0;
>> The declaration of "ret" should be internal in the for(;;) loop.
>
> Done.
>
>> +   /* Attempt connection to the next host, starting the
>> connect_timeout timer */
>> +   pqDropConnection(conn, true);
>> +   conn->addr_cur = conn->connhost[conn->whichhost].addrlist;
>> +   conn->status = CONNECTION_NEEDED;
>> +   finish_time = time(NULL) + timeout;
>> +   }
>> I think that it would be safer to not set finish_time if
>> conn->connect_timeout is NULL. I agree that your code works because
>> pqWaitTimed() will never complain on timeout reached if finish_time is -1.
>> That's for robustness sake.
>
> Done, but I'm not sure how this contributes to the robustness.  I guess you 
> were concerned just in case pqWaitTimed() returned 0 (timeout) even when it 
> should not.

Thanks for the updated patch. This looks good to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Ashutosh Bapat
On Tue, May 16, 2017 at 6:50 PM, Peter Eisentraut
 wrote:
> On 5/15/17 23:45, Ashutosh Bapat wrote:
>> +1. We should throw an error and add a line in documentation that
>> collation should not be specified for hash partitioned table.
>
> Why is it even allowed in the parser then?

That grammar is common to all the partitioning strategies. It looks
like it's easy to handle collation for hash partitions in
transformation than in grammar. But, if we could handle it in grammar,
I don't have any objection to it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] If subscription to foreign table valid ?

2017-05-16 Thread Peter Eisentraut
On 5/14/17 23:55, Neha Khatri wrote:
> With this patch the error will be like this:
> 
>   logical replication target relation public.t is not a table
> 
> But it is possible that the referred table is Foreign Table of
> Partitioned table (so actually the referred object is indeed a table).
> Would it make sense to specify in the message that the table is not a
> normal table or something in that line? 

This is how we phrase the message in other places where we check the
relkind.  We should keep that consistent.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] If subscription to foreign table valid ?

2017-05-16 Thread Peter Eisentraut
On 5/12/17 09:58, Petr Jelinek wrote:
> So I moved the relkind check to single function and call it from all the
> necessary places. See the attached

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified

2017-05-16 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> pqWait is internal to libpq, so we are free to set up what we want here.
> Still I think that we should be consistent with what pqSocketCheck returns:

Please let this what it is now for the same reason Robert mentioned.

> +intret = 0;
> +inttimeout = 0;
> The declaration of "ret" should be internal in the for(;;) loop.

Done.


> +   /* Attempt connection to the next host, starting the
> connect_timeout timer */
> +   pqDropConnection(conn, true);
> +   conn->addr_cur = conn->connhost[conn->whichhost].addrlist;
> +   conn->status = CONNECTION_NEEDED;
> +   finish_time = time(NULL) + timeout;
> +   }
> I think that it would be safer to not set finish_time if
> conn->connect_timeout is NULL. I agree that your code works because
> pqWaitTimed() will never complain on timeout reached if finish_time is -1.
> That's for robustness sake.

Done, but I'm not sure how this contributes to the robustness.  I guess you 
were concerned just in case pqWaitTimed() returned 0 (timeout) even when it 
should not.

Regards
Takayuki Tsunakawa




libpq-retry-connect-on-timeout_v2.patch
Description: libpq-retry-connect-on-timeout_v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY FROM STDIN behaviour on end-of-file

2017-05-16 Thread Tom Lane
Vaishnavi Prabakaran  writes:
>> Tom Lane wrote:
>>> BTW, it would be a good idea for somebody to check this out on Windows,
>>> assuming there's a way to generate a keyboard EOF signal there.

> Ctrl-Z + Enter in windows generates EOF signal. I verified this issue and
> it is not reproducible in windows.

Thanks for checking.  So that's two major platforms where it works "as
expected" already.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Tag refs/tags/REL_10_BETA1 was created

2017-05-16 Thread Tom Lane
Peter Eisentraut  writes:
> On 5/16/17 18:14, pg...@postgresql.org wrote:
>> Tag refs/tags/REL_10_BETA1 was created.

> Was this change in naming pattern intentional?

Yes, it was.  Andrew Dunstan suggested[1] during the
two-part-version-number discussion that we should start including a "_"
after REL in tag and branch names for v10 and later, so that those names
would sort correctly compared to the tag/branch names for earlier branches
(at least when using C locale).  I believe his main concern was some logic
in the buildfarm, but it seems like a good idea in general.

When we get to v100, we'll need some other hack to make it work ...
but I plan to be safely dead by then.

BTW, I now remember having wondered[2] if we should make any other changes
in version-number formatting while we're at it, like maybe "10beta1"
should be "10.beta1".  It's a bit late to have remembered it for beta1,
but is anyone hot to change anything else about these labels?

regards, tom lane

[1] https://www.postgresql.org/message-id/57364c11.4040...@dunslane.net
[2] https://www.postgresql.org/message-id/20780.1463176901%40sss.pgh.pa.us


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvement in log message of logical replication worker

2017-05-16 Thread Masahiko Sawada
On Wed, May 17, 2017 at 4:08 AM, Robert Haas  wrote:
> On Fri, May 12, 2017 at 12:30 AM, Masahiko Sawada  
> wrote:
>> Attached small patch adds relid to these log messages if it's valid.
>> I'd like to propose it for PG10 if possible, but since It's not a bug
>> and not an open item we can add it to next CF.
>
> To me, it seems completely reasonable to add this as an open item.
>

Okay, added this to the open items.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-16 Thread Thomas Munro
On Wed, May 17, 2017 at 10:13 AM, Kevin Grittner  wrote:
> On Tue, May 16, 2017 at 4:50 PM, Thomas Munro
>  wrote:
>>
>> I'm about to post a much simpler patch that collects uniform tuples
>> from children, addressing the reported bug, and simply rejects
>> transition tables on row-triggers on tables that are in either kind of
>> inheritance hierarchy.  More soon...
>
> I agree that we can safely go that far, but not farther.  Thanks!

Here is that patch.  Thoughts?

On Wed, May 10, 2017 at 3:41 PM, Robert Haas  wrote:
> Hmm.  What if the partitioning hierarchy contains foreign tables?

No tuples are collected from partitions that are foreign tables.  See
the attached demonstration.  I wasn't sure whether and if so where to
include that in the regression tests because it needs a contrib
module.

Robert and I discussed this off-list and came up with some options:
(1) document that as the current behaviour (where?), (2) figure out
how to prevent that situation from arising, (3) raise some kind of
runtime error if foreign transition tuples need to be collected.

Option 2 would seem to require us to lock the whole chain of ancestors
to check for statement-level triggers with transition tables, which
seems unpleasant, and option 3 is conceptually similar to the
execution time insertion failure.  It's debatable wither 3 or 1 is
more surprising or inconvenient to users.  I vote for option 1 as a
stop-gap measure (and I hope that someone will soon fix transition
tuple capture for foreign tables generally).  However, it's a bit
inconsistent that we explicitly reject triggers with transition tables
on foreign tables directly, but let them silently fail to capture
anything when they're indirectly accessed via a parent relation.
Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com


transition-tables-vs-foreign-partitions.sql
Description: Binary data


transition-tuples-from-child-tables-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-16 Thread Masahiko Sawada
On Tue, May 16, 2017 at 10:06 PM, tushar  wrote:
> On 05/16/2017 06:35 AM, Masahiko Sawada wrote:
>>
>> I've updated Kuntal's patch, added regression test for option
>> combination and updated documentation.
>
> While testing the patch - I  found that after dump/restore , we are getting
> an error in the log file once we enable the subscription
>
> \\create subscription
>
> postgres=# CREATE SUBSCRIPTION m1 CONNECTION 'dbname=postgres port=5000 '
> PUBLICATION qdd WITH (slot_name='none');
> NOTICE:  synchronized table states
> CREATE SUBSCRIPTION
>
> \\take the dump
> [centos@centos-cpula bin]$ ./pg_dump -Fp  -p 9000 postgres > /tmp/d.c
> \\check the syntax
> [centos@centos-cpula bin]$ cat /tmp/d.c |grep 'create subsc*' -i
> CREATE SUBSCRIPTION m1 CONNECTION 'dbname=postgres port=5000 ' PUBLICATION
> qdd WITH (connect = false, slot_name = '');
> \\execute this same syntax against a new database
> postgres=# create database  test;
> CREATE DATABASE
> postgres=# \c test
> You are now connected to database "test" as user "centos".
> test=# CREATE SUBSCRIPTION m1 CONNECTION 'dbname=postgres port=5000 '
> PUBLICATION qdd WITH (connect = false, slot_name = '');
> WARNING:  tables were not subscribed, you will have to run ALTER
> SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
> CREATE SUBSCRIPTION
>
> test=# alter subscription m1 refresh publication ;
> ERROR:  ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled
> subscriptions
> test=# alter subscription m1 enable ;
> ALTER SUBSCRIPTION
>
> Check the message in  log file
>
> 017-05-16 14:04:48.373 BST [18219] LOG:  logical replication apply for
> subscription m1 started
> 2017-05-16 14:04:48.381 BST [18219] ERROR:  could not start WAL streaming:
> ERROR:  replication slot name "" is too short
> 2017-05-16 14:04:48.382 BST [17843] LOG:  worker process: logical
> replication worker for subscription 16386 (PID 18219) exited with exit code
> 1
> 2017-05-16 14:04:53.388 BST [17850] LOG:  starting logical replication
> worker for subscription "m1"
> 2017-05-16 14:04:53.396 BST [18224] LOG:  logical replication apply for
> subscription m1 started
> 2017-05-16 14:04:53.403 BST [18224] ERROR:  could not start WAL streaming:
> ERROR:  replication slot name "" is too short
>
> Is this error message (ERROR:  replication slot name "" is too short ) is
> expected now ?
>

I think there are two bugs; pg_dump should dump slot_name = NONE
instead of '' and subscription should not be created if given slot
name is invalid. The validation check for replication slot name is
done when creating it actually but I think it's more safer to check
when CREATE SUBSCRIPTION. The bug related to setting slot_name = NONE
should be fixed by attached 001 patch, and 002 patch prevents to be
specified invalid replication slot name when CREATE SUBSCRIPTION and
ALTER SUBSCRIPTION SET.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


001_fix_bug_of_slot_name_none_v3.patch
Description: Binary data


002_disallow_invalid_slot_name.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous_commit option is not visible after pressing TAB

2017-05-16 Thread Peter Eisentraut
On 5/16/17 19:45, Michael Paquier wrote:
> On Tue, May 16, 2017 at 8:18 PM, tushar  wrote:
>> While creating subscription - if we press TAB button to see the available
>> parameters , synchronous_commit parameter is not visible.
>>
>> postgres=# CREATE SUBSCRIPTION sub123 CONNECTION 'dbname=postgres port=5000'
>> PUBLICATION pub WITH (
>> CONNECT  COPY_DATACREATE_SLOT  ENABLED  SLOT_NAME
>>
>> synchronous_commit option is not visible
> 
> Yes, and the items should be ordered alphabetically.

Fixed.  There were a few more things that needed tweaking, too.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-05-16 Thread Amit Langote
On 2017/05/17 5:25, Jeff Davis wrote:
> On Tuesday, May 16, 2017, Robert Haas  wrote:
>> I don't really find this a very practical design.  If the table
>> partitions are spread across different relfilenodes, then those
>> relfilenodes have to have separate pg_class entries and separate
>> indexes, and those indexes also need to have separate pg_class
>> entries.  Otherwise, nothing works.  And if they do have separate
>> pg_class entries, then the partitions have to have their own names,
>> and likewise for their indexes, and a dump-and-reload has to preserve
>> those names.  If it doesn't, and those objects get new system-assigned
>> names after the dump-and-reload, then dump restoration can fail when a
>> system-assigned name collides with an existing name that is first
>> mentioned later in the dump.
> 
> Why can't hash partitions be stored in tables the same way as we do TOAST?
> That should take care of the naming problem.

There is only one TOAST table per relation though, containing all of the
relation's TOASTed data.  Doesn't the multiplicity of hash partitions pose
a problem?  Will a hash partition of given name end up with the same
subset of data in the target database as it did in the source database?  I
suppose it won't matter though if we make hash partitions an
implementation detail of hash partitioned tables to the extent you
described in your earlier email [1], whereby it doesn't matter to an
application which partition contains what subset of the table's data.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAMp0ubcQ3VYdU1kNUCOmpj225U4hk6ZEoaUVeReP8h60p%2Bmv1Q%40mail.gmail.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

2017-05-16 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 16 May 2017 12:45:39 -0400, Tom Lane  wrote in 
<22556.1494953...@sss.pgh.pa.us>
> Robert Haas  writes:
> > Concretely, I think we should replace the abort_cleanup_incomplete
> > flag from my previous patch with a changing_xact_state flag and set
> > that flag around all transaction state changes, clearing it when such
> > changes have succeeded.  On error, the flag remains set, so we know
> > that the state of that connection is unknown and that we must close it
> > (killing outer transaction levels as needed).
> 
> > Thoughts?
> 
> Sounds plausible.

I think that the current issue is the usability of the current
connection. Even if any other command should fail, we can
continue using the connection if ABORT TRANSACTION
succeeds. Failure of ABORT immediately means that the connection
is no longer available. If this discuttion is reasonable,
changing_xact_state might be too-much.

By the way if an fdw connection is stalled amid do_sql_command
waiting a result, a cancel request doesn't work (in other words
pgsql_xact_callback is not called) since EINTR is just ignored in
libpq. Maybe we should teach libpq to error out with EINTR with
some additional reasons. PQsetEINTRcallback() or something?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PROVE_FLAGS

2017-05-16 Thread Andrew Dunstan


On 05/16/2017 07:44 PM, Peter Eisentraut wrote:
> On 5/3/17 15:14, Andrew Dunstan wrote:
>> Can someone please explain to me why we have this in Makefile.global.in?
>> (from commit e9c81b60 )
>>
>> PROVE_FLAGS =
>>
>> ISTM it's unnecessary, and prevents us from using the same named value
>> in the environment. I want to be able to use the environment in
>> vcregress.pl, and I'd like the Make files to work the same way.
> I think relying on environment variables like this is a bad design.
> Someone could have something left in the environment and it changes
> things in mysterious ways.  For all other *FLAGS variables, we
> explicitly set them in Makefile.global, sometimes to empty values, as
> the case may be.
>
> Note that specifying a variable on the make command line overrides
> settings in the makefile under certain circumstances, which is a useful
> feature.
>
> So I think the above setting was correct, the new behavior is
> inconsistent and incorrect, and I think this change should be reverted.
>


It would have been nice if you'd spoken up when the topic was raised.

This doesn't rely on the environment variable, it just enables it. You
can still say "make PROVE_FLAGS=--whatever" and it will override the
environment.

Inheriting variables from the environment is a part of make by design.
We have PG_PROVE_FLAGS for our own forced settings.

Note that the previous setting was done without any thought being given
to how this works with vcregress.pl. I have been trying to make the two
systems more consistent. This was a part of that effort.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 pgindent run

2017-05-16 Thread Bruce Momjian
On Tue, May 16, 2017 at 06:06:35PM -0700, Andres Freund wrote:
> On 2017-05-16 21:02:19 -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > OK, I assume we are good to go for Wednesday afternoon, UTC.  Thanks for
> > > the research.
> > 
> > Yeah, I think we're ready, unless anyone has a large patch they want
> > to stick in first ...
> 
> I've this pending JIT support...

That went in April 1.  ;-)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 pgindent run

2017-05-16 Thread Andres Freund
On 2017-05-16 21:02:19 -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > OK, I assume we are good to go for Wednesday afternoon, UTC.  Thanks for
> > the research.
> 
> Yeah, I think we're ready, unless anyone has a large patch they want
> to stick in first ...

I've this pending JIT support...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY FROM STDIN behaviour on end-of-file

2017-05-16 Thread Vaishnavi Prabakaran
On Wed, May 17, 2017 at 3:52 AM, Alvaro Herrera 
wrote:

> Tom Lane wrote:
>
> > BTW, it would be a good idea for somebody to check this out on Windows,
> > assuming there's a way to generate a keyboard EOF signal there.
>
> I last used a Windows command line almost two decades ago now, but
> Ctrl-Z used to do it.
>

Ctrl-Z + Enter in windows generates EOF signal. I verified this issue and
it is not reproducible in windows.

Thanks & Regards,
Vaishnavi
Fujitsu Australia.


Re: [HACKERS] PG10 pgindent run

2017-05-16 Thread Tom Lane
Bruce Momjian  writes:
> OK, I assume we are good to go for Wednesday afternoon, UTC.  Thanks for
> the research.

Yeah, I think we're ready, unless anyone has a large patch they want
to stick in first ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 pgindent run

2017-05-16 Thread Bruce Momjian
On Tue, May 16, 2017 at 08:39:36PM -0400, Tom Lane wrote:
> Andres Freund  writes:
> > There we go:
> > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=calliphoridae&dt=2017-05-16%2023:16:53&stg=typedefs
> 
> Yup, looks good now.  Thanks!
> 
> BTW, comparing the typedef list to what I got a few hours ago, I see
> that "Function" is now a known type name, along with the UWhatever
> symbols from ICU.  I wonder where that came from?  A quick grep
> suggests that it's not going to mess anything up too badly, but it
> sure seems like a poor choice for a globally visible typedef name.

OK, I assume we are good to go for Wednesday afternoon, UTC.  Thanks for
the research.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 pgindent run

2017-05-16 Thread Andrew Dunstan


On 05/16/2017 08:39 PM, Tom Lane wrote:
> Andres Freund  writes:
>> There we go:
>> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=calliphoridae&dt=2017-05-16%2023:16:53&stg=typedefs
> Yup, looks good now.  Thanks!
>
> BTW, comparing the typedef list to what I got a few hours ago, I see
> that "Function" is now a known type name, along with the UWhatever
> symbols from ICU.  I wonder where that came from?  A quick grep
> suggests that it's not going to mess anything up too badly, but it
> sure seems like a poor choice for a globally visible typedef name.
>
>   


There is a place in pgindent around line 139 where we filter out
typedefs we don't want. You could add it there of you were so inclined.
I agree this seems like a remarkably bad choice of name.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified

2017-05-16 Thread Michael Paquier
On Wed, May 17, 2017 at 12:47 AM, Robert Haas  wrote:
> On Mon, May 15, 2017 at 9:59 PM, Michael Paquier
>  wrote:
>> + *
>> + * Returns -1 on failure, 0 if the socket is readable/writable, 1 if
>> it timed out.
>>   */
>> pqWait is internal to libpq, so we are free to set up what we want
>> here. Still I think that we should be consistent with what
>> pqSocketCheck returns:
>> * >0 means that the socket is readable/writable, counting things.
>> * 0 is for timeout.
>> * -1 on failure.
>
> That would imply a lot more change, though.  The way that the patch
> currently does it, none of the other callers of pqWait() or
> pqWaitTimed() need to be adjusted.  So I prefer the way that Tsunakawa
> Takayuki currently has this over your proposal.

Consistency in APIs matters, but I won't fight hard in favor of this
item either. In short I am fine to discard this comment.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 pgindent run

2017-05-16 Thread Tom Lane
Andres Freund  writes:
> There we go:
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=calliphoridae&dt=2017-05-16%2023:16:53&stg=typedefs

Yup, looks good now.  Thanks!

BTW, comparing the typedef list to what I got a few hours ago, I see
that "Function" is now a known type name, along with the UWhatever
symbols from ICU.  I wonder where that came from?  A quick grep
suggests that it's not going to mess anything up too badly, but it
sure seems like a poor choice for a globally visible typedef name.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT NULL constraints on range partition key columns

2017-05-16 Thread Amit Langote
On 2017/05/16 21:16, Amit Kapila wrote:
> On Tue, May 16, 2017 at 3:26 PM, Amit Langote
>  wrote:
>> On 2017/05/16 4:29, Robert Haas wrote:
>>> Yeah, that's exactly why I think we should make the change Amit is
>>> proposing here.  If we don't, then we won't be able to accept NULL
>>> values even after we have the default partitioning stuff.
>>
>> Attached is a patch for consideration.  There are 2 actually, but maybe
>> they should be committed together if we decide do go with this.
>>
> 
> After your changes in get_qual_for_range(), below comment in that
> function needs an update and I feel it is better to update the
> function header as well.
> 
> /*
> * A range-partitioned table does not allow partition keys to be null. For
> * simple columns, their NOT NULL constraint suffices for the enforcement
> * of non-nullability.  But for the expression keys, which are still
> * nullable, we must emit a IS NOT NULL expression.  Collect them in
> * result first.
> */

Thanks for the review.  I updated the comments.

Thanks,
Amit
>From 789d35765a1c00bd5ae3ab2956c4b0292f36c9b1 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 16 May 2017 18:32:31 +0900
Subject: [PATCH 1/2] No explicit NOT NULL constraints on range partition keys

Instead generate them implicitly in get_qual_for_range().
---
 doc/src/sgml/ref/create_table.sgml |  5 
 src/backend/catalog/partition.c| 41 +-
 src/backend/commands/tablecmds.c   | 31 +-
 src/test/regress/expected/create_table.out | 40 ++---
 src/test/regress/expected/insert.out   |  2 +-
 src/test/regress/sql/create_table.sql  |  5 
 6 files changed, 39 insertions(+), 85 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 484f81898b..0478e40447 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -454,11 +454,6 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
   these constraints on individual partitions.
  
 
- 
-  When using range partitioning, a NOT NULL constraint
-  is added to each non-expression column in the partition key.
- 
-
 

 
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 832049c1ee..9de6511e59 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1484,7 +1484,7 @@ get_range_key_properties(PartitionKey key, int keynum,
  *
  * If all values of both lower and upper bounds are UNBOUNDED, the partition
  * does not really have a constraint, except the IS NOT NULL constraint for
- * any expression keys.
+ * partition keys.
  *
  * If we end up with an empty result list, we append return a single-member
  * list containing a constant-true expression in that case, because callers
@@ -1520,32 +1520,37 @@ get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec)
 	num_or_arms = key->partnatts;
 
 	/*
-	 * A range-partitioned table does not allow partition keys to be null. For
-	 * simple columns, their NOT NULL constraint suffices for the enforcement
-	 * of non-nullability.  But for the expression keys, which are still
-	 * nullable, we must emit a IS NOT NULL expression.  Collect them in
-	 * result first.
+	 * A range-partitioned table does not currently allow partition keys to
+	 * be null, so emit an IS NOT NULL expression for each key column.
 	 */
 	partexprs_item = list_head(key->partexprs);
 	for (i = 0; i < key->partnatts; i++)
 	{
-		if (key->partattrs[i] == 0)
-		{
-			Expr	   *keyCol;
+		Expr	   *keyCol;
 
+		if (key->partattrs[i] != 0)
+		{
+			keyCol = (Expr *) makeVar(1,
+	  key->partattrs[i],
+	  key->parttypid[i],
+	  key->parttypmod[i],
+	  key->parttypcoll[i],
+	  0);
+		}
+		else
+		{
 			if (partexprs_item == NULL)
 elog(ERROR, "wrong number of partition key expressions");
-			keyCol = lfirst(partexprs_item);
+			keyCol = copyObject(lfirst(partexprs_item));
 			partexprs_item = lnext(partexprs_item);
-			Assert(!IsA(keyCol, Var));
-
-			nulltest = makeNode(NullTest);
-			nulltest->arg = keyCol;
-			nulltest->nulltesttype = IS_NOT_NULL;
-			nulltest->argisrow = false;
-			nulltest->location = -1;
-			result = lappend(result, nulltest);
 		}
+
+		nulltest = makeNode(NullTest);
+		nulltest->arg = keyCol;
+		nulltest->nulltesttype = IS_NOT_NULL;
+		nulltest->argisrow = false;
+		nulltest->location = -1;
+		result = lappend(result, nulltest);
 	}
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e259378051..e4d2bfb6b0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -805,13 +805,11 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	if (stmt->partspec)
 	{
 		char		strategy;
-		int			partnatts,
-	i;
+		int			partnatts;
 		AttrNumber	partattrs[PARTITION_MAX_KEYS];
 		Oid			partop

Re: [HACKERS] Increasing parallel workers at runtime

2017-05-16 Thread Haribabu Kommi
On Wed, May 17, 2017 at 2:35 AM, Robert Haas  wrote:

> On Tue, May 16, 2017 at 8:18 AM, Haribabu Kommi
>  wrote:
> > In the current state of the patch, the main backend tries to start the
> > extra workers only when there is no tuples that are available from the
> > available workers. I feel that the invocation for more workers doesn't
> > do for every tuple.
>
> Well, the point is, the frequency with which the leader will try to
> acquire more workers is going to be highly variable with your patch,
> and might sometimes be extremely frequent.  It depends on the timing
> of the workers and of the leader, and I don't think that's something
> we want to depend on here.
>

Ok.


> > Another background process logic can produce a fair distribution of
> > workers to the parallel queries. In this case also, the backend should
> > advertise only when the allotted workers are not enough, this is because
> > there may be a case where the planned workers may be 5, but because
> > of other part of the query, the main backend is feed by the tuples just
> by
> > 2 workers, then there is no need to provide extra workers.
>
> That's true, but I think taking it into account might be difficult.
>
> > The another background process approach of wait interval to reassign
> > more workers after an interval period doesn't work for the queries that
> > are getting finished before the configured time of the wait. May be we
> > can ignore those scenarios?
>
> I think we can ignore that.  We can still help a lot of cases, and
> queries with very short run times obviously aren't being seriously
> hurt by the lack of full parallelism.


Ok. In this approach, we need to share some of the gatherstate structure
members in the shared memory to access by the other background process
or some better logic to update these details whenever a new worker gets
allotted.

I will give a try and see how easy to implement the same.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] synchronous_commit option is not visible after pressing TAB

2017-05-16 Thread Michael Paquier
On Tue, May 16, 2017 at 8:18 PM, tushar  wrote:
> While creating subscription - if we press TAB button to see the available
> parameters , synchronous_commit parameter is not visible.
>
> postgres=# CREATE SUBSCRIPTION sub123 CONNECTION 'dbname=postgres port=5000'
> PUBLICATION pub WITH (
> CONNECT  COPY_DATACREATE_SLOT  ENABLED  SLOT_NAME
>
> synchronous_commit option is not visible

Yes, and the items should be ordered alphabetically.
-- 
Michael


create-sub-tab.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PROVE_FLAGS

2017-05-16 Thread Peter Eisentraut
On 5/3/17 15:14, Andrew Dunstan wrote:
> Can someone please explain to me why we have this in Makefile.global.in?
> (from commit e9c81b60 )
> 
> PROVE_FLAGS =
> 
> ISTM it's unnecessary, and prevents us from using the same named value
> in the environment. I want to be able to use the environment in
> vcregress.pl, and I'd like the Make files to work the same way.

I think relying on environment variables like this is a bad design.
Someone could have something left in the environment and it changes
things in mysterious ways.  For all other *FLAGS variables, we
explicitly set them in Makefile.global, sometimes to empty values, as
the case may be.

Note that specifying a variable on the make command line overrides
settings in the makefile under certain circumstances, which is a useful
feature.

So I think the above setting was correct, the new behavior is
inconsistent and incorrect, and I think this change should be reverted.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 pgindent run

2017-05-16 Thread Andrew Dunstan


On 05/16/2017 06:54 PM, Andres Freund wrote:
> On 2017-05-16 18:43:22 -0400, Tom Lane wrote:
>> Specifically, we don't seem to have entries for any of the typedefs
>> associated with the ICU code, eg UChar.  This is not terribly
>> surprising since none of the buildfarm critters contributing typedef
>> lists are building with --with-icu.  It looks like only these
>> critters are building with ICU:
>>
>>  calliphoridae
>>  culicidae
>>  mylodon
>>  prion
>>  skink
>>
>> Perhaps one of them could enable typedef collection?
> I tried to enable it on calliphoridae, not sure how the min_hours_since
> logic works, and forced a run on HEAD.  Chose calliphoridae because runs
> are reasonably quick and -DDCOPY_PARSE_PLAN_TREES 
> -DRAW_EXPRESSION_COVERAGE_TEST
> shouldn't affect typedef results.
>
> Shouldn't take too long.
>
> No clue if there's some switch that needs to be toggled on the buildfarm
> side to accept the typedefs, I've never looked at that side of things.



prion has done a run. UChar is now in the list.

cheers

andrew




-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-05-16 Thread Peter Eisentraut
On 5/16/17 11:10, Jeff Davis wrote:
> I concur at this point. I originally thought hash functions might be
> made portable, but I think Tom and Andres showed that to be too
> problematic -- the issue with different encodings is the real killer.

I think it would be OK that if you want to move a hash-partitioned table
to a database with a different encoding, you have to do dump/restore
through the parent table.  This is quite similar to what you have to do
now if you want to move a range-partitioned table to a database with a
different locale.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 pgindent run

2017-05-16 Thread Andres Freund
On 2017-05-16 18:56:57 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > No clue if there's some switch that needs to be toggled on the buildfarm
> > side to accept the typedefs, I've never looked at that side of things.
> 
> AFAIK, not; I think it just takes any typedef reports that aren't too
> stale.

Cool.

There we go:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=calliphoridae&dt=2017-05-16%2023:16:53&stg=typedefs


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 pgindent run

2017-05-16 Thread Tom Lane
Andres Freund  writes:
> No clue if there's some switch that needs to be toggled on the buildfarm
> side to accept the typedefs, I've never looked at that side of things.

AFAIK, not; I think it just takes any typedef reports that aren't too
stale.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 pgindent run

2017-05-16 Thread Andres Freund
On 2017-05-16 18:43:22 -0400, Tom Lane wrote:
> Specifically, we don't seem to have entries for any of the typedefs
> associated with the ICU code, eg UChar.  This is not terribly
> surprising since none of the buildfarm critters contributing typedef
> lists are building with --with-icu.  It looks like only these
> critters are building with ICU:
> 
>  calliphoridae
>  culicidae
>  mylodon
>  prion
>  skink
> 
> Perhaps one of them could enable typedef collection?

I tried to enable it on calliphoridae, not sure how the min_hours_since
logic works, and forced a run on HEAD.  Chose calliphoridae because runs
are reasonably quick and -DDCOPY_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST
shouldn't affect typedef results.

Shouldn't take too long.

No clue if there's some switch that needs to be toggled on the buildfarm
side to accept the typedefs, I've never looked at that side of things.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 pgindent run

2017-05-16 Thread Tom Lane
Bruce Momjian  writes:
> I would like to run pgindent on the head source tree this Wednesday
> afternoon, UTC.   Is that OK for everyone?

I've been doing some preliminary checking on what pgindent will do,
and I notice that some typedef names are getting misindented because
they are not in the current buildfarm typedef list.

Specifically, we don't seem to have entries for any of the typedefs
associated with the ICU code, eg UChar.  This is not terribly
surprising since none of the buildfarm critters contributing typedef
lists are building with --with-icu.  It looks like only these
critters are building with ICU:

 calliphoridae
 culicidae
 mylodon
 prion
 skink

Perhaps one of them could enable typedef collection?

If that can be spun up quickly, I'd recommend holding off the
pgindent run till we get results.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-16 Thread Kevin Grittner
On Tue, May 16, 2017 at 4:50 PM, Thomas Munro
 wrote:
> On Wed, May 17, 2017 at 9:20 AM, Kevin Grittner  wrote:
>> On Wed, May 10, 2017 at 7:02 AM, Thomas Munro
>>  wrote:
>>> On Wed, May 10, 2017 at 11:10 PM, Thomas Munro 
>>>  wrote:
 2.  If you attach a row-level trigger with transition tables to any
 inheritance child, it will see transition tuples from all tables in
 the inheritance hierarchy at or below the directly named table that
 were modified by the same statement, sliced so that they appear as
 tuples from the directly named table.
>>>
>>> Of course that's a bit crazy, not only for trigger authors to
>>> understand and deal with, but also for plan caching: it just doesn't
>>> really make sense to have a database object, even an ephemeral one,
>>> whose type changes depending on how the trigger was invoked, because
>>> the plans stick around.
>>
>> The patch to add transition tables changed caching of a trigger
>> function to key on the combination of the function and the target
>> relation, rather than having one cache entry regardless of the
>> target table.
>
> Right.  That works as long as triggers always see tuples in the format
> of the relation that they're attached to, and I think that's the only
> sensible choice.  The problem I was thinking about was this:  We have
> only one pair of tuplestores and in the current design it holds tuples
> of a uniform format, yet it may (eventually) need to be scanned by a
> statement trigger attached to a the named relation AND any number of
> row triggers attached to children with potentially different formats.
> That implies some extra conversions are required at scan time.  I had
> a more ambitious patch that would deal with sone of that by tracking
> storage format and scan format separately (next time your row trigger
> is invoked the scan format will be the same but the storage format may
> be different depending on which relation you named in a query), but
> I'm putting that to one side for this release.  That was a bit of a
> rabbit hole, and there are some tricky design questions about tuple
> conversions (to behave like DB2 with subtables may even require
> tuplestore with per-tuple type information) and also the subset of
> rows that each row trigger should see (which may require extra tuple
> origin metadata or separate tuplestores).

Yeah, I wish this had surfaced far earlier, but I missed it and none
of the reviews prior to commit caught it, either.  :-(  It seems too
big to squeeze into v10 now.  I just want to have that general
direction in mind and not paint ourselves into any corner that makes
it hard to get there in the next release.

> I'm about to post a much simpler patch that collects uniform tuples
> from children, addressing the reported bug, and simply rejects
> transition tables on row-triggers on tables that are in either kind of
> inheritance hierarchy.  More soon...

I agree that we can safely go that far, but not farther.  Thanks!

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-16 Thread Thomas Munro
On Wed, May 17, 2017 at 9:20 AM, Kevin Grittner  wrote:
> On Wed, May 10, 2017 at 7:02 AM, Thomas Munro
>  wrote:
>> On Wed, May 10, 2017 at 11:10 PM, Thomas Munro 
>>  wrote:
>>> 2.  If you attach a row-level trigger with transition tables to any
>>> inheritance child, it will see transition tuples from all tables in
>>> the inheritance hierarchy at or below the directly named table that
>>> were modified by the same statement, sliced so that they appear as
>>> tuples from the directly named table.
>>
>> Of course that's a bit crazy, not only for trigger authors to
>> understand and deal with, but also for plan caching: it just doesn't
>> really make sense to have a database object, even an ephemeral one,
>> whose type changes depending on how the trigger was invoked, because
>> the plans stick around.
>
> The patch to add transition tables changed caching of a trigger
> function to key on the combination of the function and the target
> relation, rather than having one cache entry regardless of the
> target table.

Right.  That works as long as triggers always see tuples in the format
of the relation that they're attached to, and I think that's the only
sensible choice.  The problem I was thinking about was this:  We have
only one pair of tuplestores and in the current design it holds tuples
of a uniform format, yet it may (eventually) need to be scanned by a
statement trigger attached to a the named relation AND any number of
row triggers attached to children with potentially different formats.
That implies some extra conversions are required at scan time.  I had
a more ambitious patch that would deal with sone of that by tracking
storage format and scan format separately (next time your row trigger
is invoked the scan format will be the same but the storage format may
be different depending on which relation you named in a query), but
I'm putting that to one side for this release.  That was a bit of a
rabbit hole, and there are some tricky design questions about tuple
conversions (to behave like DB2 with subtables may even require
tuplestore with per-tuple type information) and also the subset of
rows that each row trigger should see (which may require extra tuple
origin metadata or separate tuplestores).

I'm about to post a much simpler patch that collects uniform tuples
from children, addressing the reported bug, and simply rejects
transition tables on row-triggers on tables that are in either kind of
inheritance hierarchy.  More soon...

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-16 Thread Kevin Grittner
On Wed, May 10, 2017 at 7:02 AM, Thomas Munro
 wrote:
> On Wed, May 10, 2017 at 11:10 PM, Thomas Munro 
>  wrote:
>> 2.  If you attach a row-level trigger with transition tables to any
>> inheritance child, it will see transition tuples from all tables in
>> the inheritance hierarchy at or below the directly named table that
>> were modified by the same statement, sliced so that they appear as
>> tuples from the directly named table.
>
> Of course that's a bit crazy, not only for trigger authors to
> understand and deal with, but also for plan caching: it just doesn't
> really make sense to have a database object, even an ephemeral one,
> whose type changes depending on how the trigger was invoked, because
> the plans stick around.

The patch to add transition tables changed caching of a trigger
function to key on the combination of the function and the target
relation, rather than having one cache entry regardless of the
target table.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread amul sul
On Tue, May 16, 2017 at 10:00 PM, Dilip Kumar  wrote:
> On Tue, May 16, 2017 at 4:22 PM, amul sul  wrote:
>> v6 patch has bug in partition oid mapping and indexing, fixed in the
>> attached version.
>>
>> Now partition oids will be arranged in the ascending order of hash
>> partition bound  (i.e. modulus and remainder sorting order)
>
> Thanks for the update patch. I have some more comments.
>
> 
> + if (spec->remainder < 0)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> +  errmsg("hash partition remainder must be less than modulus")));
>
> I think this error message is not correct, you might want to change it
> to "hash partition remainder must be non-negative integer"
>

Fixed in the attached version;  used "hash partition remainder must be
greater than or equal to 0" instead.

> ---
>
> + The table is partitioned by specifying remainder and modulus for 
> each
> + partition. Each partition holds rows for which the hash value of
>
> Wouldn't it be better to say "modulus and remainder" instead of
> "remainder and modulus" then it will be consistent?
>

You are correct, fixed in the attached version.

> ---
> +   An UPDATE that causes a row to move from one partition to
> +   another fails, because
>
> fails, because -> fails because
>

This hunk is no longer exists in the attached patch, that was mistaken
copied, sorry about that.

> ---
>
> Wouldn't it be a good idea to document how to increase the number of
> hash partitions, I think we can document it somewhere with an example,
> something like Robert explained upthread?
>
> create table foo (a integer, b text) partition by hash (a);
> create table foo1 partition of foo with (modulus 2, remainder 0);
> create table foo2 partition of foo with (modulus 2, remainder 1);
>
> You can detach foo1, create two new partitions with modulus 4 and
> remainders 0 and 2, and move the data over from the old partition
>
> I think it will be good information for a user to have? or it's
> already documented and I missed it?
>

I think, we should, but not sure about it.

Regards,
Amul


0001-Cleanup_v2.patch
Description: Binary data


0002-hash-partitioning_another_design-v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-05-16 Thread Jeff Davis
On Tuesday, May 16, 2017, Robert Haas  wrote:
> I don't really find this a very practical design.  If the table
> partitions are spread across different relfilenodes, then those
> relfilenodes have to have separate pg_class entries and separate
> indexes, and those indexes also need to have separate pg_class
> entries.  Otherwise, nothing works.  And if they do have separate
> pg_class entries, then the partitions have to have their own names,
> and likewise for their indexes, and a dump-and-reload has to preserve
> those names.  If it doesn't, and those objects get new system-assigned
> names after the dump-and-reload, then dump restoration can fail when a
> system-assigned name collides with an existing name that is first
> mentioned later in the dump.

Why can't hash partitions be stored in tables the same way as we do TOAST?
That should take care of the naming problem.

> If Java has portable hash functions, why can't we?

Java standardizes on a particular unicode encoding (utf-16). Are you
suggesting that we do the same? Or is there another solution that I am
missing?

Regards,
   Jeff Davis


Re: [HACKERS] Improvement in log message of logical replication worker

2017-05-16 Thread Robert Haas
On Fri, May 12, 2017 at 12:30 AM, Masahiko Sawada  wrote:
> Attached small patch adds relid to these log messages if it's valid.
> I'd like to propose it for PG10 if possible, but since It's not a bug
> and not an open item we can add it to next CF.

To me, it seems completely reasonable to add this as an open item.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v2] Progress command to monitor progression of long running SQL queries

2017-05-16 Thread Magnus Hagander
On Tue, May 16, 2017 at 7:26 PM, Robert Haas  wrote:

> On Tue, May 16, 2017 at 2:17 AM, Michael Paquier
>  wrote:
> > Perhaps DSM? It is not user-friendly to fail sporadically...
>
> Yeah.  I've been thinking we might want to give each backend a
> backend-lifetime DSA that is created on first use.  That could be
> useful for some parallel query stuff and maybe for this as well.
> Other backends could attach to it to read data from it, and it would
> go away on last detach (which would normally be the detach by the
> creating backend, but not if somebody else happens to be reading at
> the moment the backend exits).
>

That seems like a pretty good idea. I've been considering something simliar
for the usecase of being able to view the "collected but not sent yet"
statistics inside a backend (which tables have been accessed, number of
reads etc),and this seems like it could be used for that as well.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] COPY FROM STDIN behaviour on end-of-file

2017-05-16 Thread Alvaro Herrera
Tom Lane wrote:

> BTW, it would be a good idea for somebody to check this out on Windows,
> assuming there's a way to generate a keyboard EOF signal there.

I last used a Windows command line almost two decades ago now, but
Ctrl-Z used to do it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY FROM STDIN behaviour on end-of-file

2017-05-16 Thread Tom Lane
Robert Haas  writes:
> On Tue, May 16, 2017 at 1:29 PM, Tom Lane  wrote:
>>> I had been supposing that this was a feature addition and should be left
>>> for the next commitfest.  But given that it already works as-expected on
>>> popular platform(s), the fact that it doesn't work the same on some other
>>> platforms seems like a portability bug rather than a missing feature.
>>> Now I'm inclined to treat it as a bug and back-patch.

>> BTW, the main argument for considering it a new feature is that we don't
>> suggest anywhere in our code or docs that this will work.  If we're going
>> to go out of our way to make it work, should we mention it in psql-ref?
>> And what about changing the interactive prompt, along the lines of
>> End with a backslash and a period on a line by itself, or an EOF signal.

> Well, the current behavior is so wonky and inconsistent that it's hard
> for me to view it as anything but a bug.  I mean, one can argue about
> exactly what an EOF should do in any given situation, but surely it
> can't be right for it to do one thing on one platform and something
> else on a different platform.

Oh, I still believe it's a bug.  I'm just saying that if we're going
to fix it, we should do more than just make a minimal code change.

BTW, it would be a good idea for somebody to check this out on Windows,
assuming there's a way to generate a keyboard EOF signal there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY FROM STDIN behaviour on end-of-file

2017-05-16 Thread Robert Haas
On Tue, May 16, 2017 at 1:29 PM, Tom Lane  wrote:
> I wrote:
>> I had been supposing that this was a feature addition and should be left
>> for the next commitfest.  But given that it already works as-expected on
>> popular platform(s), the fact that it doesn't work the same on some other
>> platforms seems like a portability bug rather than a missing feature.
>> Now I'm inclined to treat it as a bug and back-patch.
>
> BTW, the main argument for considering it a new feature is that we don't
> suggest anywhere in our code or docs that this will work.  If we're going
> to go out of our way to make it work, should we mention it in psql-ref?
> And what about changing the interactive prompt, along the lines of
>
> End with a backslash and a period on a line by itself, or an EOF signal.

Well, the current behavior is so wonky and inconsistent that it's hard
for me to view it as anything but a bug.  I mean, one can argue about
exactly what an EOF should do in any given situation, but surely it
can't be right for it to do one thing on one platform and something
else on a different platform.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relcache leak when row triggers on partitions are fired by COPY

2017-05-16 Thread Robert Haas
On Mon, May 15, 2017 at 9:17 PM, Amit Langote
 wrote:
>> Ok, here's a patch like that.
>
> Thanks, looks good to me.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY FROM STDIN behaviour on end-of-file

2017-05-16 Thread Tom Lane
I wrote:
> I had been supposing that this was a feature addition and should be left
> for the next commitfest.  But given that it already works as-expected on
> popular platform(s), the fact that it doesn't work the same on some other
> platforms seems like a portability bug rather than a missing feature.
> Now I'm inclined to treat it as a bug and back-patch.

BTW, the main argument for considering it a new feature is that we don't
suggest anywhere in our code or docs that this will work.  If we're going
to go out of our way to make it work, should we mention it in psql-ref?
And what about changing the interactive prompt, along the lines of

End with a backslash and a period on a line by itself, or an EOF signal.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v2] Progress command to monitor progression of long running SQL queries

2017-05-16 Thread Robert Haas
On Tue, May 16, 2017 at 2:17 AM, Michael Paquier
 wrote:
> Perhaps DSM? It is not user-friendly to fail sporadically...

Yeah.  I've been thinking we might want to give each backend a
backend-lifetime DSA that is created on first use.  That could be
useful for some parallel query stuff and maybe for this as well.
Other backends could attach to it to read data from it, and it would
go away on last detach (which would normally be the detach by the
creating backend, but not if somebody else happens to be reading at
the moment the backend exits).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Robert Haas
On Tue, May 16, 2017 at 3:19 AM, Ashutosh Bapat
 wrote:
> While earlier, I thought the same, I am wondering whether this is
> true. Don't different collations deem different strings equal e.g one
> collation may deem 'aa' and 'AA' as same but other may not.

No, that's not allowed.  This has been discussed many times on this
mailing list.  See varstr_cmp(), which you will notice refuses to
return 0 unless the strings are bytewise identical.

> Or is that
> encoding problem being discussed in hash functions thread?

No, that's something else entirely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-05-16 Thread David Fetter
On Tue, May 16, 2017 at 08:10:39AM -0700, Jeff Davis wrote:
> On Mon, May 15, 2017 at 1:04 PM, David Fetter  wrote:
> > As the discussion has devolved here, it appears that there are, at
> > least conceptually, two fundamentally different classes of partition:
> > public, which is to say meaningful to DB clients, and "private", used
> > for optimizations, but otherwise opaque to DB clients.
> >
> > Mashing those two cases together appears to cause more problems than
> > it solves.
> 
> I concur at this point. I originally thought hash functions might be
> made portable, but I think Tom and Andres showed that to be too
> problematic -- the issue with different encodings is the real killer.
> 
> But I also believe hash partitioning is important and we shouldn't
> give up on it yet.
> 
> That means we need to have a concept of hash partitions that's
> different from range/list partitioning. The terminology
> "public"/"private" does not seem appropriate. Logical/physical or
> external/internal might be better.

I'm not attached to any particular terminology.

> With hash partitioning:
> * User only specifies number of partitions of the parent table; does
> not specify individual partition properties (modulus, etc.)

Maybe this is over-thinking it, but I'm picturing us ending up with
something along the lines of:

PARTITION BY INTERNAL(EXPRESSION)
[ WITH ( [PARAMETERS] [, AS, APPROPRIATE] ) ]

i.e. it's not clear that we should wire in "number of partitions" as a
parameter.

In a not that distant future, ANALYZE and similar could have a say in
determining both the "how" and the "whether" of partitioning.

> * Dump/reload goes through the parent table (though we may provide
> options so pg_dump/restore can optimize this)

Would it be simplest to default to routing through the immediate
ancestor for now?

It occurs to me that with the opaque partition system we're designing
here, internal partitions would necessarily be leaves in the tree.

> * We could provide syntax to adjust the number of partitions, which
> would be expensive but still useful sometimes.

Yep.  I suspect that techniques for this are described in literature,
and possibly even in code bases.  Any pointers?

> * All DDL should be on the parent table, including check constraints,
> FKs, unique constraints, exclusion constraints, indexes, etc.

Necessarily.

>   - Unique and exclusion constraints would only be permitted if the
> keys are a superset of the partition keys.

"Includes either all of the partition expression or none of it,"
maybe?

>   - FKs would only be permitted if the two table's partition schemes
> match and the keys are members of the same hash opfamily (this could
> be relaxed slightly, but it gets a little confusing if so)

Relaxing sounds like a not-in-the-first-cut feature, and subtle.

> * No attach/detach of partitions

Since they're opaque, this is the only sane thing.

> * All partitions have the same permissions

Since they're opaque, this is the only sane thing.

> * Individual partitions would only be individually-addressable for
> maintenance (like reindex and vacuum), but not for arbitrary queries

Since they're opaque, this is the only sane thing.

>   - perhaps also COPY for bulk loading/dumping, in case we get clients
> smart enough to do their own hashing.

This is appealing from a resource allocation point of view in the
sense of deciding where the hash computing resources are spent.  Do we
want something like the NOT VALID/VALIDATE infrastructure to support
it?

> The only real downside is that it could surprise users -- why can I
> add a CHECK constraint on my range-partitioned table but not the
> hash-partitioned one? We should try to document this so users don't
> find that out too far along. As long as they aren't surprised, I think
> users will understand why these aren't quite the same concepts.

+1

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

2017-05-16 Thread Tom Lane
Robert Haas  writes:
> Concretely, I think we should replace the abort_cleanup_incomplete
> flag from my previous patch with a changing_xact_state flag and set
> that flag around all transaction state changes, clearing it when such
> changes have succeeded.  On error, the flag remains set, so we know
> that the state of that connection is unknown and that we must close it
> (killing outer transaction levels as needed).

> Thoughts?

Sounds plausible.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Increasing parallel workers at runtime

2017-05-16 Thread Robert Haas
On Tue, May 16, 2017 at 8:18 AM, Haribabu Kommi
 wrote:
> In the current state of the patch, the main backend tries to start the
> extra workers only when there is no tuples that are available from the
> available workers. I feel that the invocation for more workers doesn't
> do for every tuple.

Well, the point is, the frequency with which the leader will try to
acquire more workers is going to be highly variable with your patch,
and might sometimes be extremely frequent.  It depends on the timing
of the workers and of the leader, and I don't think that's something
we want to depend on here.

> Another background process logic can produce a fair distribution of
> workers to the parallel queries. In this case also, the backend should
> advertise only when the allotted workers are not enough, this is because
> there may be a case where the planned workers may be 5, but because
> of other part of the query, the main backend is feed by the tuples just by
> 2 workers, then there is no need to provide extra workers.

That's true, but I think taking it into account might be difficult.

> The another background process approach of wait interval to reassign
> more workers after an interval period doesn't work for the queries that
> are getting finished before the configured time of the wait. May be we
> can ignore those scenarios?

I think we can ignore that.  We can still help a lot of cases, and
queries with very short run times obviously aren't being seriously
hurt by the lack of full parallelism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Dilip Kumar
On Tue, May 16, 2017 at 4:22 PM, amul sul  wrote:
> v6 patch has bug in partition oid mapping and indexing, fixed in the
> attached version.
>
> Now partition oids will be arranged in the ascending order of hash
> partition bound  (i.e. modulus and remainder sorting order)

Thanks for the update patch. I have some more comments.


+ if (spec->remainder < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+  errmsg("hash partition remainder must be less than modulus")));

I think this error message is not correct, you might want to change it
to "hash partition remainder must be non-negative integer"

---

+ The table is partitioned by specifying remainder and modulus for each
+ partition. Each partition holds rows for which the hash value of

Wouldn't it be better to say "modulus and remainder" instead of
"remainder and modulus" then it will be consistent?

---
+   An UPDATE that causes a row to move from one partition to
+   another fails, because

fails, because -> fails because

---

Wouldn't it be a good idea to document how to increase the number of
hash partitions, I think we can document it somewhere with an example,
something like Robert explained upthread?

create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo with (modulus 2, remainder 0);
create table foo2 partition of foo with (modulus 2, remainder 1);

You can detach foo1, create two new partitions with modulus 4 and
remainders 0 and 2, and move the data over from the old partition

I think it will be good information for a user to have? or it's
already documented and I missed it?



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY FROM STDIN behaviour on end-of-file

2017-05-16 Thread Tom Lane
Thomas Munro  writes:
> On Tue, May 16, 2017 at 6:29 PM, Vaishnavi Prabakaran
>  wrote:
>> Hi, I could not reproduce this issue. Even after Ctrl+d , subsequent COPY
>> from commands reads the input properly. Is there any specific step you
>> followed or can you share the sample testcase?

> Hmm.  Doesn't happen on GNU/Linux, does happen on macOS and FreeBSD.

Hah, confirmed here.  Thinking about it, it seems like glibc must be
exceeding its authority to make this work on Linux; I don't know of
anything in POSIX saying that the eof indicator should go away without
a clearerr() call.  In fact, it seems like glibc is violating its own
documentation --- "man feof" saith

   The function feof() tests the  end-of-file  indicator  for  the  stream
   pointed to by stream, returning non-zero if it is set.  The end-of-file
   indicator can only be cleared by the function clearerr().

I had been supposing that this was a feature addition and should be left
for the next commitfest.  But given that it already works as-expected on
popular platform(s), the fact that it doesn't work the same on some other
platforms seems like a portability bug rather than a missing feature.
Now I'm inclined to treat it as a bug and back-patch.

Reviewing the actual patch, though ... there seem to be paths through
handleCopyIn that would not hit this, particularly the sigsetjmp path.
It's unlikely that we'd get control-C before we finish servicing
control-D, but maybe not impossible.  Wouldn't it be better just to do
an unconditional clear at the end, maybe

 copyin_cleanup:
 
+   /* In case input is from a tty, reset the EOF indicator. */
+   clearerr(copystream);
+ 
/*
 * Check command status and return to normal libpq state.
 *

(I'd be inclined to make the comment significantly more verbose than
that, btw.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-05-16 Thread Robert Haas
On Sat, May 13, 2017 at 7:27 AM, Amit Kapila  wrote:
> On Fri, May 12, 2017 at 9:14 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada  
>>> wrote:
 ... I'd like to propose to change relation
 extension lock management so that it works using LWLock instead.
>>
>>> That's not a good idea because it'll make the code that executes while
>>> holding that lock noninterruptible.
>>
>> Is that really a problem?  We typically only hold it over one kernel call,
>> which ought to be noninterruptible anyway.
>
> During parallel bulk load operations, I think we hold it over multiple
> kernel calls.

We do.  Also, RelationGetNumberOfBlocks() is not necessarily only one
kernel call, no?  Nor is vm_extend.

Also, it's not just the backend doing the filesystem operation that's
non-interruptible, but also any waiters, right?

Maybe this isn't a big problem, but it does seem to be that it would
be better to avoid it if we can.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: full merge join on comparison clause

2017-05-16 Thread Alexander Kuzmenkov

On 16.05.2017 18:57, Robert Haas wrote:

Interesting. I suggest adding this to the next CommitFest.

Thank you, added: https://commitfest.postgresql.org/14/1141/

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

2017-05-16 Thread Robert Haas
On Sun, May 7, 2017 at 11:54 AM, Robert Haas  wrote:
> I'm having second thoughts based on some more experimentation I did
> this morning.  I'll update again once I've had a bit more time to poke
> at it.

So the issue that I noticed here is that this problem really isn't
confined to abort processing.  If we ROLLBACK TO SAVEPOINT or ABORT
TRANSACTION on an open connection, we really do not know what the
state of that connection is until we get an acknowledgement that the
command completed successfully.  The patch handles that.  However, the
same is true if we're sending a SAVEPOINT or RELEASE SAVEPOINT
command, and the patch that I posted doesn't do anything about those
cases.  I think it would be best to fix all transaction control
commands in a symmetric manner.

Concretely, I think we should replace the abort_cleanup_incomplete
flag from my previous patch with a changing_xact_state flag and set
that flag around all transaction state changes, clearing it when such
changes have succeeded.  On error, the flag remains set, so we know
that the state of that connection is unknown and that we must close it
(killing outer transaction levels as needed).

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: full merge join on comparison clause

2017-05-16 Thread Robert Haas
On Fri, May 12, 2017 at 7:09 AM, Alexander Kuzmenkov
 wrote:
> As you know, at this time Postgres cannot perform a full join on a
> comparison clause. For example, if we have two tables with numeric columns
> and run a query like 'select * from t1 full join t2 on t1.a > t2.a', we get
> an error: "FULL JOIN is only supported with merge-joinable or hash-joinable
> join conditions". Such queries are legitimate SQL and sometimes arise when
> porting applications from different DBMS, so it would be good to support
> them in Postgres. They can be rewritten as union of right and left joins,
> but it requires manual work and runs somewhat slower (see the benchmark at
> the end of the letter). This proof-of-concept patch explores the possibility
> of performing such queries as merge joins.

Interesting.  I suggest adding this to the next CommitFest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to fix documentation about AFTER triggers

2017-05-16 Thread Robert Haas
On Mon, May 15, 2017 at 7:00 AM, Ashutosh Bapat
 wrote:
> On Sat, May 13, 2017 at 2:45 AM, Paul Jungwirth
>  wrote:
>> Here is a patch to amend the docs here:
>>
>> https://www.postgresql.org/docs/devel/static/plpgsql-trigger.html
>>
>> In the example for an AFTER trigger, you see this code:
>>
>> --
>> -- Create a row in emp_audit to reflect the operation performed on emp,
>> -- make use of the special variable TG_OP to work out the operation.
>> --
>> IF (TG_OP = 'DELETE') THEN
>> INSERT INTO emp_audit SELECT 'D', now(), user, OLD.*;
>> RETURN OLD;
>>  ELSIF (TG_OP = 'UPDATE') THEN
>> INSERT INTO emp_audit SELECT 'U', now(), user, NEW.*;
>> RETURN NEW;
>> ELSIF (TG_OP = 'INSERT') THEN
>> INSERT INTO emp_audit SELECT 'I', now(), user, NEW.*;
>> RETURN NEW;
>> END IF;
>> RETURN NULL; -- result is ignored since this is an AFTER trigger
>>
>> What are all those RETURNs doing in there? The comment on the final RETURN
>> is correct, so returning NEW or OLD above seems confusing, and likely a
>> copy/paste error.
>>
>> This patch just removes those three lines from the example code.
>
> https://www.postgresql.org/docs/devel/static/trigger-definition.html says
> "The return value is ignored for row-level triggers fired after an
> operation, and so they can return NULL.". There's nothing wrong with
> the example, returning OLD or NEW, but as you have pointed out it's
> confusing. So, +1 for this change.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified

2017-05-16 Thread Robert Haas
On Mon, May 15, 2017 at 9:59 PM, Michael Paquier
 wrote:
> + *
> + * Returns -1 on failure, 0 if the socket is readable/writable, 1 if
> it timed out.
>   */
> pqWait is internal to libpq, so we are free to set up what we want
> here. Still I think that we should be consistent with what
> pqSocketCheck returns:
> * >0 means that the socket is readable/writable, counting things.
> * 0 is for timeout.
> * -1 on failure.

That would imply a lot more change, though.  The way that the patch
currently does it, none of the other callers of pqWait() or
pqWaitTimed() need to be adjusted.  So I prefer the way that Tsunakawa
Takayuki currently has this over your proposal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-16 Thread Robert Haas
On Tue, May 16, 2017 at 8:57 AM, Jeevan Ladhe
 wrote:
> I have fixed the crash in attached patch.
> Also the patch needed bit of adjustments due to recent commit.
> I have re-based the patch on latest commit.

+boolhas_default;/* Is there a default partition?
Currently false
+ * for a range partitioned table */
+intdefault_index;/* Index of the default list
partition. -1 for
+ * range partitioned tables */

Why do we need both has_default and default_index?  If default_index
== -1 means that there is no default, we don't also need a separate
bool to record the same thing, do we?

get_qual_for_default() still returns a list of things that are not
quals.  I think that this logic is all pretty poorly organized.  The
logic to create a partitioning constraint for a list partition should
be part of get_qual_for_list(), whether or not it is a default.  And
when we have range partitions, the logic to create a default range
partitioning constraint should be part of get_qual_for_range().  The
code the way it's organized today makes it look like there are three
kinds of partitions: list, range, and default.  But that's not right
at all.  There are two kinds: list and range.  And a list partition
might or might not be a default partition, and similarly for range.

+ereport(ERROR, (errcode(ERRCODE_CHECK_VIOLATION),
+errmsg("DEFAULT partition cannot be used"
+   " without negator of operator  %s",
+   get_opname(operoid;

I don't think ERRCODE_CHECK_VIOLATION is the right error code here,
and we have a policy against splitting message strings like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-05-16 Thread Robert Haas
On Tue, May 16, 2017 at 11:10 AM, Jeff Davis  wrote:
> With hash partitioning:
> * User only specifies number of partitions of the parent table; does
> not specify individual partition properties (modulus, etc.)
> * Dump/reload goes through the parent table (though we may provide
> options so pg_dump/restore can optimize this)
> * We could provide syntax to adjust the number of partitions, which
> would be expensive but still useful sometimes.
> * All DDL should be on the parent table, including check constraints,
> FKs, unique constraints, exclusion constraints, indexes, etc.
>   - Unique and exclusion constraints would only be permitted if the
> keys are a superset of the partition keys.
>   - FKs would only be permitted if the two table's partition schemes
> match and the keys are members of the same hash opfamily (this could
> be relaxed slightly, but it gets a little confusing if so)
> * No attach/detach of partitions
> * All partitions have the same permissions
> * Individual partitions would only be individually-addressable for
> maintenance (like reindex and vacuum), but not for arbitrary queries
>   - perhaps also COPY for bulk loading/dumping, in case we get clients
> smart enough to do their own hashing.

I don't really find this a very practical design.  If the table
partitions are spread across different relfilenodes, then those
relfilenodes have to have separate pg_class entries and separate
indexes, and those indexes also need to have separate pg_class
entries.  Otherwise, nothing works.  And if they do have separate
pg_class entries, then the partitions have to have their own names,
and likewise for their indexes, and a dump-and-reload has to preserve
those names.  If it doesn't, and those objects get new system-assigned
names after the dump-and-reload, then dump restoration can fail when a
system-assigned name collides with an existing name that is first
mentioned later in the dump.

If we had the ability to have anonymous pg_class entries -- relations
that have no names -- then maybe it would be possible to make
something like what you're talking about work.  But that does not seem
easy to do.  There's a unique index on (relname, relnamespace) for
good reason, and we can't make it partial on a system catalog.  We
could make the relname column allow nulls, but that would add overhead
to any code that needs to access the relation name, and there's a fair
amount of that.

Similarly, if we had the ability to associate multiple relfilenodes
with a single relation, and if index entries could point to
 rather than just ,
then we could also make this work.  But either of those things would
require significant re-engineering and would have downsides in other
cases.

If Java has portable hash functions, why can't we?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-05-16 Thread Robert Haas
On Wed, May 10, 2017 at 9:48 PM, Michael Paquier
 wrote:
> I had my eyes on the WAL sender code this morning, and I have noticed
> that walsender.c is not completely consistent with the PID lookups it
> does in walsender.c. In two code paths, the PID value is checked
> without holding the WAL sender spin lock (WalSndRqstFileReload and
> pg_stat_get_wal_senders), which looks like a very bad idea contrary to
> what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
> doing for ages.

Well, it's probably worth changing for consistency, but I'm not sure
that it rises to the level of "a very bad idea".  It actually seems
almost entirely harmless.  Spuriously setting the needreload flag on a
just-deceased WAL sender will just result in some future WAL sender
doing a bit of unnecessary work, but I don't think it breaks anything
and the probability is vanishingly low.  The other change could result
a bogus 0 PID in pg_stat_get_wal_senders output, but I bet you
couldn't reproduce that more than once in a blue moon even with a test
rig designed to provoke it, and if it does happen it isn't really
anything more than a trivial annoyance.

So I'm in favor of committing this and maybe even back-patching it,
but I also don't think it's a big deal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-05-16 Thread Jeff Davis
On Mon, May 15, 2017 at 1:04 PM, David Fetter  wrote:
> As the discussion has devolved here, it appears that there are, at
> least conceptually, two fundamentally different classes of partition:
> public, which is to say meaningful to DB clients, and "private", used
> for optimizations, but otherwise opaque to DB clients.
>
> Mashing those two cases together appears to cause more problems than
> it solves.

I concur at this point. I originally thought hash functions might be
made portable, but I think Tom and Andres showed that to be too
problematic -- the issue with different encodings is the real killer.

But I also believe hash partitioning is important and we shouldn't
give up on it yet.

That means we need to have a concept of hash partitions that's
different from range/list partitioning. The terminology
"public"/"private" does not seem appropriate. Logical/physical or
external/internal might be better.

With hash partitioning:
* User only specifies number of partitions of the parent table; does
not specify individual partition properties (modulus, etc.)
* Dump/reload goes through the parent table (though we may provide
options so pg_dump/restore can optimize this)
* We could provide syntax to adjust the number of partitions, which
would be expensive but still useful sometimes.
* All DDL should be on the parent table, including check constraints,
FKs, unique constraints, exclusion constraints, indexes, etc.
  - Unique and exclusion constraints would only be permitted if the
keys are a superset of the partition keys.
  - FKs would only be permitted if the two table's partition schemes
match and the keys are members of the same hash opfamily (this could
be relaxed slightly, but it gets a little confusing if so)
* No attach/detach of partitions
* All partitions have the same permissions
* Individual partitions would only be individually-addressable for
maintenance (like reindex and vacuum), but not for arbitrary queries
  - perhaps also COPY for bulk loading/dumping, in case we get clients
smart enough to do their own hashing.

The only real downside is that it could surprise users -- why can I
add a CHECK constraint on my range-partitioned table but not the
hash-partitioned one? We should try to document this so users don't
find that out too far along. As long as they aren't surprised, I think
users will understand why these aren't quite the same concepts.

Regards,
 Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-16 Thread Ildus Kurbangaliev
On Tue, 16 May 2017 21:36:11 +0900
Etsuro Fujita  wrote:

> On 2017/05/16 21:11, Ashutosh Bapat wrote:
> > On Tue, May 16, 2017 at 5:35 PM, Ildus Kurbangaliev
> >  wrote:  
> 
> >> I agree. Maybe this issue should be added to Postgresql Open Items?
> >> I think there should be some complex solution that fixes not only
> >> triggers for foreign tables at table partitioning, but covers other
> >> possible not working cases.  
> 
> > I doubt if this is an open item, since DMLs on foreign tables are
> > supported since 9.3 and support to add foreign tables to inheritance
> > was added back in 9.5.  
> 
> I think this issue was introduced by the latter, so that was my fault.
> 
> One approach I came up with to fix this issue is to rewrite the 
> targetList entries of an inherited UPDATE/DELETE properly using 
> rewriteTargetListUD, when generating a plan for each child table in 
> inheritance_planner.  Attached is a WIP patch for that.  Maybe I am 
> missing something, though.
> 
> Best regards,
> Etsuro Fujita

I tested the patch, looks good.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Peter Eisentraut
On 5/16/17 03:19, Ashutosh Bapat wrote:
> On Tue, May 16, 2017 at 10:03 AM, amul sul  wrote:
>> On Mon, May 15, 2017 at 9:13 PM, Robert Haas  wrote:
> Collation is only relevant for ordering, not equality.
> 
> While earlier, I thought the same, I am wondering whether this is
> true. Don't different collations deem different strings equal e.g one
> collation may deem 'aa' and 'AA' as same but other may not. Or is that
> encoding problem being discussed in hash functions thread?

The collations we currently support don't do that, unless someone made a
custom one.  However, we might want to support that in the future.

Also, text/varchar comparisons always use strcmp() as a tie-breaker.
Again, this might be something to review at some point.

But you currently have the citext type that would indeed consider 'aa'
and 'AA' equal.  But citext also has a hash function in the hash
operator class that handles that.  So you could look into using that
approach.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bumping HASH_VERSION to 3

2017-05-16 Thread Amit Kapila
On Tue, May 16, 2017 at 5:14 PM, Robert Haas  wrote:
> On Tue, May 16, 2017 at 7:31 AM, Amit Kapila  wrote:
>
>> I will send an updated patch once we agree on above points.
>
> Sounds good.
>

Attached patch addresses all the comments as discussed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pg_upgrade_invalidate_old_hash_index_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Peter Eisentraut
On 5/15/17 23:45, Ashutosh Bapat wrote:
> +1. We should throw an error and add a line in documentation that
> collation should not be specified for hash partitioned table.

Why is it even allowed in the parser then?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-16 Thread tushar

On 05/16/2017 06:35 AM, Masahiko Sawada wrote:

I've updated Kuntal's patch, added regression test for option
combination and updated documentation.
While testing the patch - I  found that after dump/restore , we are 
getting an error in the log file once we enable the subscription


\\create subscription

postgres=# CREATE SUBSCRIPTION m1 CONNECTION 'dbname=postgres port=5000 
' PUBLICATION qdd WITH (slot_name='none');

NOTICE:  synchronized table states
CREATE SUBSCRIPTION

\\take the dump
[centos@centos-cpula bin]$ ./pg_dump -Fp  -p 9000 postgres > /tmp/d.c
\\check the syntax
[centos@centos-cpula bin]$ cat /tmp/d.c |grep 'create subsc*' -i
CREATE SUBSCRIPTION m1 CONNECTION 'dbname=postgres port=5000 ' 
PUBLICATION qdd WITH (connect = false, slot_name = '');

\\execute this same syntax against a new database
postgres=# create database  test;
CREATE DATABASE
postgres=# \c test
You are now connected to database "test" as user "centos".
test=# CREATE SUBSCRIPTION m1 CONNECTION 'dbname=postgres port=5000 ' 
PUBLICATION qdd WITH (connect = false, slot_name = '');
WARNING:  tables were not subscribed, you will have to run ALTER 
SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables

CREATE SUBSCRIPTION

test=# alter subscription m1 refresh publication ;
ERROR:  ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled 
subscriptions

test=# alter subscription m1 enable ;
ALTER SUBSCRIPTION

Check the message in  log file

017-05-16 14:04:48.373 BST [18219] LOG:  logical replication apply for 
subscription m1 started
2017-05-16 14:04:48.381 BST [18219] ERROR:  could not start WAL 
streaming: ERROR:  replication slot name "" is too short
2017-05-16 14:04:48.382 BST [17843] LOG:  worker process: logical 
replication worker for subscription 16386 (PID 18219) exited with exit 
code 1
2017-05-16 14:04:53.388 BST [17850] LOG:  starting logical replication 
worker for subscription "m1"
2017-05-16 14:04:53.396 BST [18224] LOG:  logical replication apply for 
subscription m1 started
2017-05-16 14:04:53.403 BST [18224] ERROR:  could not start WAL 
streaming: ERROR:  replication slot name "" is too short


Is this error message (ERROR:  replication slot name "" is too short ) 
is expected now ?


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GCC 7 warnings

2017-05-16 Thread Peter Eisentraut
On 5/4/17 00:21, Tom Lane wrote:
> But I'd suggest waiting till after next week's releases.  If there
> are any problems induced by this, we'd be more likely to find them
> with another three months' time before it hits the wild.

done

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-16 Thread Jeevan Ladhe
On Fri, May 12, 2017 at 7:34 PM, Beena Emerson 
wrote:

>
> Thank you for the updated patch. However, now I cannot create a partition
> after default.
>
> CREATE TABLE list1 (
> a int,
> b int
> ) PARTITION BY LIST (a);
>
> CREATE TABLE list1_1 (LIKE list1);
> ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
> CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
> CREATE TABLE list1_5 PARTITION OF list1 FOR VALUES IN (3);
>
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>

Hi,

I have fixed the crash in attached patch.
Also the patch needed bit of adjustments due to recent commit.
I have re-based the patch on latest commit.

PFA.

Regards,
Jeevan Ladhe


default_partition_v12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread amul sul
On Tue, May 16, 2017 at 3:30 PM, amul sul  wrote:
> On Tue, May 16, 2017 at 1:02 PM, Ashutosh Bapat
>  wrote:
>  [...]

 +if (key->strategy == PARTITION_STRATEGY_HASH)
 +{
 +ndatums = nparts;
 +hbounds = (PartitionHashBound **) palloc(nparts *
 +
 sizeof(PartitionHashBound *));
 +i = 0;
 +foreach (cell, boundspecs)
 +{
 +PartitionBoundSpec *spec = lfirst(cell);
 +
 [ clipped ]
 +hbounds[i]->index = i;
 +i++;
 +}
 For list and range partitioned table we order the bounds so that two
 partitioned tables have them in the same order irrespective of order in 
 which
 they are specified by the user or hence stored in the catalogs. The 
 partitions
 then get indexes according the order in which their bounds appear in 
 ordered
 arrays of bounds. Thus any two partitioned tables with same partition
 specification always have same PartitionBoundInfoData. This helps in
 partition-wise join to match partition bounds of two given tables.  Above 
 code
 assigns the indexes to the partitions as they appear in the catalogs. This
 means that two partitioned tables with same partition specification but
 different order for partition bound specification will have different
 PartitionBoundInfoData represenation.

 If we do that, probably partition_bounds_equal() would reduce to just 
 matching
 indexes and the last element of datums array i.e. the greatest modulus 
 datum.
 If ordered datums array of two partitioned table do not match exactly, the
 mismatch can be because missing datums or different datums. If it's a 
 missing
 datum it will change the greatest modulus or have corresponding entry in
 indexes array as -1. If the entry differs it will cause mismatching 
 indexes in
 the index arrays.

>>> Make sense, will fix this.
>>
>> I don't see this being addressed in the patches attached in the reply to 
>> Dilip.
>>
>
> Fixed in the attached version.
>

v6 patch has bug in partition oid mapping and indexing, fixed in the
attached version.

Now partition oids will be arranged in the ascending order of hash
partition bound  (i.e. modulus and remainder sorting order)

Regards,
Amul


0001-Cleanup_v2.patch
Description: Binary data


0002-hash-partitioning_another_design-v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-16 Thread Etsuro Fujita

On 2017/05/16 21:11, Ashutosh Bapat wrote:

On Tue, May 16, 2017 at 5:35 PM, Ildus Kurbangaliev
 wrote:



I agree. Maybe this issue should be added to Postgresql Open Items?
I think there should be some complex solution that fixes not only
triggers for foreign tables at table partitioning, but covers other
possible not working cases.



I doubt if this is an open item, since DMLs on foreign tables are
supported since 9.3 and support to add foreign tables to inheritance
was added back in 9.5.


I think this issue was introduced by the latter, so that was my fault.

One approach I came up with to fix this issue is to rewrite the 
targetList entries of an inherited UPDATE/DELETE properly using 
rewriteTargetListUD, when generating a plan for each child table in 
inheritance_planner.  Attached is a WIP patch for that.  Maybe I am 
missing something, though.


Best regards,
Etsuro Fujita
*** a/src/backend/executor/nodeModifyTable.c
--- b/src/backend/executor/nodeModifyTable.c
***
*** 1469,1474  ExecModifyTable(ModifyTableState *node)
--- 1469,1475 
  resultRelInfo++;
  subplanstate = node->mt_plans[node->mt_whichplan];
  junkfilter = resultRelInfo->ri_junkFilter;
+ tupleid = NULL;
  estate->es_result_relation_info = resultRelInfo;
  EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
  	node->mt_arowmarks[node->mt_whichplan]);
*** a/src/backend/optimizer/prep/prepunion.c
--- b/src/backend/optimizer/prep/prepunion.c
***
*** 47,52 
--- 47,53 
  #include "optimizer/tlist.h"
  #include "parser/parse_coerce.h"
  #include "parser/parsetree.h"
+ #include "rewrite/rewriteHandler.h"
  #include "utils/lsyscache.h"
  #include "utils/rel.h"
  #include "utils/selfuncs.h"
***
*** 110,115  static Node *adjust_appendrel_attrs_mutator(Node *node,
--- 111,117 
  static Relids adjust_relid_set(Relids relids, Index oldrelid, Index newrelid);
  static List *adjust_inherited_tlist(List *tlist,
  	   AppendRelInfo *context);
+ static void rewrite_inherited_tlist(Query *parse, RangeTblEntry *target_rte);
  
  
  /*
***
*** 1806,1811  adjust_appendrel_attrs(PlannerInfo *root, Node *node, AppendRelInfo *appinfo)
--- 1808,1826 
  newnode->targetList =
  	adjust_inherited_tlist(newnode->targetList,
  		   appinfo);
+ 			/* And fix UPDATE/DELETE junk tlist entries too, if needed */
+ 			if (newnode->commandType == CMD_UPDATE ||
+ newnode->commandType == CMD_DELETE)
+ 			{
+ RangeTblEntry *childrte;
+ RangeTblEntry *parentrte;
+ 
+ childrte = planner_rt_fetch(appinfo->child_relid, root);
+ parentrte = planner_rt_fetch(appinfo->parent_relid, root);
+ if (childrte->relkind == RELKIND_FOREIGN_TABLE ||
+ 	parentrte->relkind == RELKIND_FOREIGN_TABLE)
+ 	rewrite_inherited_tlist(newnode, childrte);
+ 			}
  		}
  		result = (Node *) newnode;
  	}
***
*** 2139,2144  adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
--- 2154,2196 
  }
  
  /*
+  * Rewrite the targetlist entries of an inherited UPDATE/DELETE operation
+  */
+ static void
+ rewrite_inherited_tlist(Query *parse, RangeTblEntry *target_rte)
+ {
+ 	ListCell   *tl;
+ 	List	   *new_tlist = NIL;
+ 	Relation	target_relation;
+ 
+ 	/*
+ 	 * Open the target relation; we need not lock it since it was already
+ 	 * locked by expand_inherited_rtentry().
+ 	 */
+ 	target_relation = heap_open(target_rte->relid, NoLock);
+ 
+ 	/* Create a new tlist without junk tlist entries */
+ 	foreach(tl, parse->targetList)
+ 	{
+ 		TargetEntry *tle = (TargetEntry *) lfirst(tl);
+ 
+ 		if (tle->resjunk)
+ 			continue;			/* don't need junk items */
+ 
+ 		new_tlist = lappend(new_tlist, tle);
+ 	}
+ 
+ 	/* Replace the targetList with the new one */
+ 	parse->targetList = new_tlist;
+ 
+ 	/* And add junk tlist entries needed for UPDATE/DELETE */
+ 	rewriteTargetListUD(parse, target_rte, target_relation);
+ 
+ 	/* Close the target relation, but keep the lock */
+ 	heap_close(target_relation, NoLock);
+ }
+ 
+ /*
   * adjust_appendrel_attrs_multilevel
   *	  Apply Var translations from a toplevel appendrel parent down to a child.
   *
*** a/src/backend/rewrite/rewriteHandler.c
--- b/src/backend/rewrite/rewriteHandler.c
***
*** 72,79  static TargetEntry *process_matched_tle(TargetEntry *src_tle,
  static Node *get_assignment_input(Node *node);
  static void rewriteValuesRTE(RangeTblEntry *rte, Relation target_relation,
   List *attrnos);
- static void rewriteTargetListUD(Query *parsetree, RangeTblEntry *target_rte,
- 	Relation target_relation);
  static void markQueryForLocking(Query *qry, Node *jtnode,
  	LockClauseStrength strength, LockWaitPolicy waitPolicy,
  	bool pushedDown);
--- 72,77 
***
*** 1269,1275  rewriteValuesRTE(RangeTblEntry *rte, Relation target_relation, List *attrnos)
   * For UPDATE queries, this is applied after rewriteTargetListIU.  The
   * ord

Re: [HACKERS] Increasing parallel workers at runtime

2017-05-16 Thread Haribabu Kommi
On Tue, May 16, 2017 at 1:53 AM, Robert Haas  wrote:

> On Mon, May 15, 2017 at 10:06 AM, Haribabu Kommi
>  wrote:
> > This still needs some adjustments to fix for the cases where
> > the main backend also does the scan instead of waiting for
> > the workers to finish the job. As increasing the workers logic
> > shouldn't add an overhead in this case.
>
> I think it would be pretty crazy to try relaunching workers after
> every tuple, as this patch does.  The overhead of that will be very
> high for queries where the number of tuples passing through the Gather
> is large, whereas when the number of tuples passing through Gather is
> small, or where tuples are sent all at once at the end of procesisng,
> it will not actually be very effective at getting hold of more
> workers.


In the current state of the patch, the main backend tries to start the
extra workers only when there is no tuples that are available from the
available workers. I feel that the invocation for more workers doesn't
do for every tuple.

1. When there are large number of tuples are getting transferred from
workers,
I feel there is very less chance that backend is free that it can start
more workers
as because the backend itself may not need to execute the plan locally.

2. When there are tuples that are transferred at the end of the plan for
the cases
something it involves a sort node or has aggregate or etc, either the
backend is
waiting for the tuples to arrive or by itself doing the plan execution
along with
workers after trying of extending the number workers once.

3. When there are small number of tuples that are getting transferred, in
this
case there are chance of extra workers invocation more time compare to the
other scenarios, but still in this case also, The less number of tuples
transfer
is may be because of a complex filter condition that is taking time and
also it
is filtering more records. So in this case also, once the backend tried to
extend
the number of workers, after that it also participate in executing the
plan, then
the backend also takes time to get a tuple by executing the plan locally.
By that
time there are more chances of that workers are already ready with tuples.

The problem of invoking for more number of workers is possible when there is
only one worker that is allotted to the query execution.

Am I missing?


>   A different idea is to have an area in shared memory where
> queries can advertise that they didn't get all of the workers they
> wanted, plus a background process that periodically tries to launch
> workers to help those queries as parallel workers become available.
> It can recheck for available workers after some interval, say 10s.
> There are some problems there -- the process won't have bgw_notify_pid
> pointing at the parallel leader -- but I think it might be best to try
> to solve those problems instead of making it the leader's job to try
> to grab more workers as we go along.  For one thing, the background
> process idea can attempt to achieve fairness.  Suppose there are two
> processes that didn't get all of their workers; one got 3 of 4, the
> other 1 of 4.  When a worker becomes available, we'd presumably like
> to give it to the process that got 1 of 4, rather than having the
> leaders race to see who grabs the new worker first.  Similarly if
> there are four workers available and two queries that each got 1 of 5
> workers they wanted, we'd like to split the workers two and two,
> rather than having one leader grab all four of them.  Or at least, I
> think that's what we want.


Another background process logic can produce a fair distribution of
workers to the parallel queries. In this case also, the backend should
advertise only when the allotted workers are not enough, this is because
there may be a case where the planned workers may be 5, but because
of other part of the query, the main backend is feed by the tuples just by
2 workers, then there is no need to provide extra workers.

The another background process approach of wait interval to reassign
more workers after an interval period doesn't work for the queries that
are getting finished before the configured time of the wait. May be we
can ignore those scenarios?

Needs some smarter logic to share the required details to start the worker
as it is started by the main backend itself. But this approach is useful for
the cases where the query doesn't get any workers I feel.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] NOT NULL constraints on range partition key columns

2017-05-16 Thread Amit Kapila
On Tue, May 16, 2017 at 3:26 PM, Amit Langote
 wrote:
> On 2017/05/16 4:29, Robert Haas wrote:
>> On Mon, May 15, 2017 at 9:12 AM, Amit Kapila  wrote:
>>> Can't we allow NULL to get inserted into the partition (leaf
>>> partition) if the user uses the partition name in Insert statement?
>>
>> That would be terrible behavior - the behavior of tuple routing should
>> match the enforced constraints.
>>

Right and that makes sense.

>>> For root partitions, I think for now giving an error is okay, but once
>>> we have default partitions (Rahila's patch), we can route NULLS to
>>> default partition.
>>
>> Yeah, that's exactly why I think we should make the change Amit is
>> proposing here.  If we don't, then we won't be able to accept NULL
>> values even after we have the default partitioning stuff.
>
> Attached is a patch for consideration.  There are 2 actually, but maybe
> they should be committed together if we decide do go with this.
>

After your changes in get_qual_for_range(), below comment in that
function needs an update and I feel it is better to update the
function header as well.

/*
* A range-partitioned table does not allow partition keys to be null. For
* simple columns, their NOT NULL constraint suffices for the enforcement
* of non-nullability.  But for the expression keys, which are still
* nullable, we must emit a IS NOT NULL expression.  Collect them in
* result first.
*/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-16 Thread Ashutosh Bapat
On Tue, May 16, 2017 at 5:35 PM, Ildus Kurbangaliev
 wrote:

>
> I agree. Maybe this issue should be added to Postgresql Open Items?
> I think there should be some complex solution that fixes not only
> triggers for foreign tables at table partitioning, but covers other
> possible not working cases.
>

I doubt if this is an open item, since DMLs on foreign tables are
supported since 9.3 and support to add foreign tables to inheritance
was added back in 9.5.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-16 Thread Ildus Kurbangaliev
On Tue, 16 May 2017 15:21:27 +0530
Ashutosh Bapat  wrote:

> On Mon, May 15, 2017 at 7:24 PM, Ildus Kurbangaliev
>  wrote:
> > On Mon, 15 May 2017 17:43:52 +0530
> > Ashutosh Bapat  wrote:
> >  
> >> On Mon, May 15, 2017 at 2:46 PM, Ildus Kurbangaliev
> >>  wrote:  
> >> > On Mon, 15 May 2017 10:34:58 +0530
> >> > Dilip Kumar  wrote:
> >> >  
> >> >> On Sun, May 14, 2017 at 9:54 PM, Dilip Kumar
> >> >>  wrote:  
> >> >> > After your fix, now tupleid is invalid which is expected, but
> >> >> > seems like we need to do something more. As per the comments
> >> >> > seems like it is expected to get the oldtuple from planSlot.
> >> >> > But I don't see any code for handling that part.  
> >> >>
> >> >> Maybe we should do something like attached patch.
> >> >>  
> >> >
> >> > Hi,
> >> > planSlot contains already projected tuple, you can't use it as
> >> > oldtuple. I think problem is that `rewriteTargetListUD` called
> >> > only for parent relation, so there is no `wholerow` attribute for
> >> > foreign tables.  
> >>
> >> Yes. postgresAddForeignUpdateTargets() which is called by
> >> rewriteTargetListUD injects "ctid". "wholerow" is always there. Not
> >> for postgres_fdw but for other wrappers it might be a bad news.
> >> ctid, whole row obtained from the remote postgres server will fit
> >> the tuple descriptor of parent, but for other FDWs the column
> >> injected by rewriteTargetListUD() may make the child tuple look
> >> different from that of the parent, so we may not pass that column
> >> down to the child. 
> >
> > I'm trying to say that when we have a regular table as parent, and
> > foreign table as child, in rewriteTargetListUD `wholerow` won't be
> > added, because rewriteTargetListUD will be called only for parent
> > relation. You can see that by running the script i provided in the
> > first message of this thread.  
> 
> 
> You are right.
> 
> explain verbose update parent set val = random();
>   QUERY PLAN
> ---
>  Update on public.parent  (cost=0.00..258.63 rows=5535 width=10)
>Update on public.parent
>Update on public.child1
>Foreign Update on public.ftable
>  Remote SQL: UPDATE public.child1 SET val = $2 WHERE ctid = $1
>->  Seq Scan on public.parent  (cost=0.00..4.83 rows=255
> width=10) Output: random(), parent.ctid
>->  Seq Scan on public.child1  (cost=0.00..48.25 rows=2550
> width=10) Output: random(), child1.ctid
>->  Foreign Scan on public.ftable  (cost=100.00..205.55 rows=2730
> width=10) Output: random(), ftable.ctid
>  Remote SQL: SELECT ctid FROM public.child1 FOR UPDATE
> (12 rows)
> 
> explain verbose update ftable set val = random();
>   QUERY PLAN
> ---
>  Update on public.ftable  (cost=100.00..159.42 rows=1412 width=38)
>Remote SQL: UPDATE public.child1 SET val = $2 WHERE ctid = $1
>->  Foreign Scan on public.ftable  (cost=100.00..159.42 rows=1412
> width=38) Output: random(), ctid, ftable.*
>  Remote SQL: SELECT val, ctid FROM public.child1 FOR UPDATE
> (5 rows)
> 
> Anyway, the problem I am stating, i.e. we have a bigger problem to fix
> when there are FDWs other than postgres_fdw involved seems to be still
> valid.

I agree. Maybe this issue should be added to Postgresql Open Items? 
I think there should be some complex solution that fixes not only
triggers for foreign tables at table partitioning, but covers other
possible not working cases.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bumping HASH_VERSION to 3

2017-05-16 Thread Robert Haas
On Tue, May 16, 2017 at 7:31 AM, Amit Kapila  wrote:
>> +snprintf(output_path, sizeof(output_path), "reindex_hash.sql");
>>
>> This looks suspiciously pointless.  The contents of output_path will
>> always be precisely "reindex_hash.sql"; you could just char
>> *output_path = "reindex_hash.sql" and get the same effect.
>
> Sure, but the same code pattern is used in all other similar functions
> in version.c, refer new_9_0_populate_pg_largeobject_metadata.  Both
> the ways will serve the purpose, do you think it makes sense to write
> this differently?

Yes.  It's silly to copy a constant string into a new buffer just for
the heck of it.  Perhaps the old instances should also be cleaned up
at some point, but even if we don't bother, copying absolutely
pointless coding into new places isn't getting us anywhere.

> Hmm, "./" is required for non-windows, but as mentioned above, this is
> not required for our case.

OK.

> I will send an updated patch once we agree on above points.

Sounds good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bumping HASH_VERSION to 3

2017-05-16 Thread Amit Kapila
On Mon, May 15, 2017 at 8:57 PM, Robert Haas  wrote:
> On Mon, May 15, 2017 at 12:08 AM, Noah Misch  wrote:
>> The above-described topic is currently a PostgreSQL 10 open item.  Robert,
>> since you committed the patch believed to have created it, you own this open
>> item.  If some other commit is more relevant or if this does not belong as a
>> v10 open item, please let us know.  Otherwise, please observe the policy on
>> open item ownership[1] and send a status update within three calendar days of
>> this message.  Include a date for your subsequent status update.  Testers may
>> discover new open items at any time, and I want to plan to get them all fixed
>> well in advance of shipping v10.  Consequently, I will appreciate your 
>> efforts
>> toward speedy resolution.  Thanks.
>
> I don't believe this patch can be committed to beta1 which wraps in
> just a few hours; it seems to need a bit of work.  I'll update again
> by Friday based on how review unfolds this week.  I anticipate
> committing something on Wednesday or Thursday unless Bruce picks this
> one up.
>
> +/* find hash and gin indexes */
> +res = executeQueryOrDie(conn,
> +"SELECT n.nspname, c.relname "
> +"FROM   pg_catalog.pg_class c, "
> +"   pg_catalog.pg_index i, "
> +"   pg_catalog.pg_am a, "
> +"   pg_catalog.pg_namespace n "
> +"WHERE  i.indexrelid = c.oid AND "
> +"   c.relam = a.oid AND "
> +"   c.relnamespace = n.oid AND "
> +"   a.amname = 'hash'"
>
> Comment doesn't match code.
>
> +if (!check_mode && db_used)
> +/* mark hash indexes as invalid */
> +PQclear(executeQueryOrDie(conn,
>
> Given that we have a comment here, I'd put curly braces around this block.
>

Okay, will change.

> +snprintf(output_path, sizeof(output_path), "reindex_hash.sql");
>
> This looks suspiciously pointless.  The contents of output_path will
> always be precisely "reindex_hash.sql"; you could just char
> *output_path = "reindex_hash.sql" and get the same effect.
>

Sure, but the same code pattern is used in all other similar functions
in version.c, refer new_9_0_populate_pg_largeobject_metadata.  Both
the ways will serve the purpose, do you think it makes sense to write
this differently?

>  Compare
> this to the logic in create_script_for_cluster_analyze, which prepends
> SCRIPT_PREFIX.

That is required for .sh or .bat files not for .sql files.  I think we
need to compare it with logic in
new_9_0_populate_pg_largeobject_metadata.

>  (I am not sure why it's necessary to prepend "./" on
> Windows, but if it's needed in that case, maybe it's needed in this
> case, too.)
>

Hmm, "./" is required for non-windows, but as mentioned above, this is
not required for our case.

> +start_postmaster(&new_cluster, true);
> +old_9_6_invalidate_hash_indexes(&old_cluster, false);
> +stop_postmaster(false);
>
> Won't this cause the hash indexes to be invalided in the old cluster
> rather than the new one?
>

oops. copy-paste.  It passed in my testing because I have not used any
different options (like port number) for old or new server.

> This might need a visit from pgindent in one or two places, too.
>

I have run pgindent before sending the previous version, but will
again verify the same.

I will send an updated patch once we agree on above points.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] synchronous_commit option is not visible after pressing TAB

2017-05-16 Thread tushar

Hi,

While creating subscription - if we press TAB button to see the 
available parameters , synchronous_commit parameter is not visible.


postgres=# CREATE SUBSCRIPTION sub123 CONNECTION 'dbname=postgres 
port=5000' PUBLICATION pub WITH (

CONNECT  COPY_DATACREATE_SLOT  ENABLED  SLOT_NAME

synchronous_commit option is not visible

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY FROM STDIN behaviour on end-of-file

2017-05-16 Thread Thomas Munro
On Tue, May 16, 2017 at 6:29 PM, Vaishnavi Prabakaran
 wrote:
> On Tue, May 16, 2017 at 8:40 AM, Thomas Munro
>  wrote:
>>
>>
>> I you hit ^d while COPY FROM STDIN is reading then subsequent COPY
>> FROM STDIN commands return immediately.
>
>
> Hi, I could not reproduce this issue. Even after Ctrl+d , subsequent COPY
> from commands reads the input properly. Is there any specific step you
> followed or can you share the sample testcase?

Hmm.  Doesn't happen on GNU/Linux, does happen on macOS and FreeBSD.  Example:

postgres=# create table t(a int);
CREATE TABLE
postgres=# copy t(a) from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 42
>> [...press ^d here ...]COPY 1
postgres=# copy t(a) from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> COPY 0

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread amul sul
On Tue, May 16, 2017 at 1:17 PM, Ashutosh Bapat
 wrote:
> Hi,
> Here's patch with some cosmetic fixes to 0002, to be applied on top of 0002.
>

Thank you, included in v6 patch.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread amul sul
On Tue, May 16, 2017 at 1:02 PM, Ashutosh Bapat
 wrote:
 [...]
>>>
>>> +if (key->strategy == PARTITION_STRATEGY_HASH)
>>> +{
>>> +ndatums = nparts;
>>> +hbounds = (PartitionHashBound **) palloc(nparts *
>>> +
>>> sizeof(PartitionHashBound *));
>>> +i = 0;
>>> +foreach (cell, boundspecs)
>>> +{
>>> +PartitionBoundSpec *spec = lfirst(cell);
>>> +
>>> [ clipped ]
>>> +hbounds[i]->index = i;
>>> +i++;
>>> +}
>>> For list and range partitioned table we order the bounds so that two
>>> partitioned tables have them in the same order irrespective of order in 
>>> which
>>> they are specified by the user or hence stored in the catalogs. The 
>>> partitions
>>> then get indexes according the order in which their bounds appear in ordered
>>> arrays of bounds. Thus any two partitioned tables with same partition
>>> specification always have same PartitionBoundInfoData. This helps in
>>> partition-wise join to match partition bounds of two given tables.  Above 
>>> code
>>> assigns the indexes to the partitions as they appear in the catalogs. This
>>> means that two partitioned tables with same partition specification but
>>> different order for partition bound specification will have different
>>> PartitionBoundInfoData represenation.
>>>
>>> If we do that, probably partition_bounds_equal() would reduce to just 
>>> matching
>>> indexes and the last element of datums array i.e. the greatest modulus 
>>> datum.
>>> If ordered datums array of two partitioned table do not match exactly, the
>>> mismatch can be because missing datums or different datums. If it's a 
>>> missing
>>> datum it will change the greatest modulus or have corresponding entry in
>>> indexes array as -1. If the entry differs it will cause mismatching indexes 
>>> in
>>> the index arrays.
>>>
>> Make sense, will fix this.
>
> I don't see this being addressed in the patches attached in the reply to 
> Dilip.
>

Fixed in the attached version.

>>
>>>
>>> +if (key->partattrs[i] != 0)
>>> +{
>>> +keyCol = (Node *) makeVar(1,
>>> +  key->partattrs[i],
>>> +  key->parttypid[i],
>>> +  key->parttypmod[i],
>>> +  key->parttypcoll[i],
>>> +  0);
>>> +
>>> +/* Form hash_fn(value) expression */
>>> +keyCol = (Node *) makeFuncExpr(key->partsupfunc[i].fn_oid,
>>> +
>>> get_fn_expr_rettype(&key->partsupfunc[i]),
>>> +list_make1(keyCol),
>>> +InvalidOid,
>>> +InvalidOid,
>>> +COERCE_EXPLICIT_CALL);
>>> +}
>>> +else
>>> +{
>>> +keyCol = (Node *) copyObject(lfirst(partexprs_item));
>>> +partexprs_item = lnext(partexprs_item);
>>> +}
>>> I think we should add FuncExpr for column Vars as well as expressions.
>>>
>> Okay, will fix this.
>
> Here, please add a check similar to get_quals_for_range()
> 1840 if (partexprs_item == NULL)
> 1841 elog(ERROR, "wrong number of partition key expressions");
>
>

Fixed in the attached version.

>>
>>> I think we need more comments for compute_hash_value(), mix_hash_value() and
>>> satisfies_hash_partition() as to what each of them accepts and what it
>>> computes.
>>>
>>> +/* key's hash values start from third argument of function. */
>>> +if (!PG_ARGISNULL(i + 2))
>>> +{
>>> +values[i] = PG_GETARG_DATUM(i + 2);
>>> +isnull[i] = false;
>>> +}
>>> +else
>>> +isnull[i] = true;
>>> You could write this as
>>> isnull[i] = PG_ARGISNULL(i + 2);
>>> if (isnull[i])
>>> values[i] = PG_GETARG_DATUM(i + 2);
>>>
>> Okay.
>
> If we have used this technique somewhere else in PG code, please
> mention that function/place.
> /*
>  * Rotate hash left 1 bit before mixing in the next column.  This
>  * prevents equal values in different keys from cancelling each other.
>  */
>

Fixed in the attached version.

>
>>
>>> +foreach (lc, $5)
>>> +{
>>> +DefElem*opt = (DefElem *) lfirst(lc);
>>> A search on WITH in gram.y shows that we do not handle WITH options in 
>>> gram.y.
>>> Usually they are handled at the transformation stage. Why is this an 
>>> exception?
>>> If you do that, we can have all the error handling in
>>> transformPartitionBound().
>>>
>> If so, ForValues need to return list for hash and PartitionBoundSpec
>> for other two; wouldn't  that break code consistency? And such
>> validation is not new in gram.y see xmltable_column_el.
>
> Thanks f

Re: [HACKERS] NOT NULL constraints on range partition key columns

2017-05-16 Thread Amit Langote
On 2017/05/16 4:29, Robert Haas wrote:
> On Mon, May 15, 2017 at 9:12 AM, Amit Kapila  wrote:
>> Can't we allow NULL to get inserted into the partition (leaf
>> partition) if the user uses the partition name in Insert statement?
> 
> That would be terrible behavior - the behavior of tuple routing should
> match the enforced constraints.
>
>> For root partitions, I think for now giving an error is okay, but once
>> we have default partitions (Rahila's patch), we can route NULLS to
>> default partition.
> 
> Yeah, that's exactly why I think we should make the change Amit is
> proposing here.  If we don't, then we won't be able to accept NULL
> values even after we have the default partitioning stuff.

Attached is a patch for consideration.  There are 2 actually, but maybe
they should be committed together if we decide do go with this.

Thanks,
Amit
>From 001db3ae9258f90964b73d2ef06af435be0a680d Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 16 May 2017 18:32:31 +0900
Subject: [PATCH 1/2] No explicit NOT NULL constraints on range partition keys

Instead generate them implicitly in get_qual_for_range().
---
 doc/src/sgml/ref/create_table.sgml |  5 
 src/backend/catalog/partition.c| 32 +++-
 src/backend/commands/tablecmds.c   | 31 +--
 src/test/regress/expected/create_table.out | 40 +++---
 src/test/regress/expected/insert.out   |  2 +-
 src/test/regress/sql/create_table.sql  |  5 
 6 files changed, 36 insertions(+), 79 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 484f81898b..0478e40447 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -454,11 +454,6 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
   these constraints on individual partitions.
  
 
- 
-  When using range partitioning, a NOT NULL constraint
-  is added to each non-expression column in the partition key.
- 
-
 

 
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 832049c1ee..dee904d99d 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1529,23 +1529,31 @@ get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec)
 	partexprs_item = list_head(key->partexprs);
 	for (i = 0; i < key->partnatts; i++)
 	{
-		if (key->partattrs[i] == 0)
-		{
-			Expr	   *keyCol;
+		Expr	   *keyCol;
 
+		if (key->partattrs[i] != 0)
+		{
+			keyCol = (Expr *) makeVar(1,
+	  key->partattrs[i],
+	  key->parttypid[i],
+	  key->parttypmod[i],
+	  key->parttypcoll[i],
+	  0);
+		}
+		else
+		{
 			if (partexprs_item == NULL)
 elog(ERROR, "wrong number of partition key expressions");
-			keyCol = lfirst(partexprs_item);
+			keyCol = copyObject(lfirst(partexprs_item));
 			partexprs_item = lnext(partexprs_item);
-			Assert(!IsA(keyCol, Var));
-
-			nulltest = makeNode(NullTest);
-			nulltest->arg = keyCol;
-			nulltest->nulltesttype = IS_NOT_NULL;
-			nulltest->argisrow = false;
-			nulltest->location = -1;
-			result = lappend(result, nulltest);
 		}
+
+		nulltest = makeNode(NullTest);
+		nulltest->arg = keyCol;
+		nulltest->nulltesttype = IS_NOT_NULL;
+		nulltest->argisrow = false;
+		nulltest->location = -1;
+		result = lappend(result, nulltest);
 	}
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e259378051..e4d2bfb6b0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -805,13 +805,11 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	if (stmt->partspec)
 	{
 		char		strategy;
-		int			partnatts,
-	i;
+		int			partnatts;
 		AttrNumber	partattrs[PARTITION_MAX_KEYS];
 		Oid			partopclass[PARTITION_MAX_KEYS];
 		Oid			partcollation[PARTITION_MAX_KEYS];
 		List	   *partexprs = NIL;
-		List	   *cmds = NIL;
 
 		/*
 		 * We need to transform the raw parsetrees corresponding to partition
@@ -828,33 +826,6 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		partnatts = list_length(stmt->partspec->partParams);
 		StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs,
 		  partopclass, partcollation);
-
-		/* Force key columns to be NOT NULL when using range partitioning */
-		if (strategy == PARTITION_STRATEGY_RANGE)
-		{
-			for (i = 0; i < partnatts; i++)
-			{
-AttrNumber	partattno = partattrs[i];
-Form_pg_attribute attform = descriptor->attrs[partattno - 1];
-
-if (partattno != 0 && !attform->attnotnull)
-{
-	/* Add a subcommand to make this one NOT NULL */
-	AlterTableCmd *cmd = makeNode(AlterTableCmd);
-
-	cmd->subtype = AT_SetNotNull;
-	cmd->name = pstrdup(NameStr(attform->attname));
-	cmds = lappend(cmds, cmd);
-}
-			}
-
-			/*
-			 * Although, there cannot be any partitions yet, we still need to
-			 * pass true for

Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-16 Thread Ashutosh Bapat
On Mon, May 15, 2017 at 7:24 PM, Ildus Kurbangaliev
 wrote:
> On Mon, 15 May 2017 17:43:52 +0530
> Ashutosh Bapat  wrote:
>
>> On Mon, May 15, 2017 at 2:46 PM, Ildus Kurbangaliev
>>  wrote:
>> > On Mon, 15 May 2017 10:34:58 +0530
>> > Dilip Kumar  wrote:
>> >
>> >> On Sun, May 14, 2017 at 9:54 PM, Dilip Kumar
>> >>  wrote:
>> >> > After your fix, now tupleid is invalid which is expected, but
>> >> > seems like we need to do something more. As per the comments
>> >> > seems like it is expected to get the oldtuple from planSlot.
>> >> > But I don't see any code for handling that part.
>> >>
>> >> Maybe we should do something like attached patch.
>> >>
>> >
>> > Hi,
>> > planSlot contains already projected tuple, you can't use it as
>> > oldtuple. I think problem is that `rewriteTargetListUD` called only
>> > for parent relation, so there is no `wholerow` attribute for
>> > foreign tables.
>>
>> Yes. postgresAddForeignUpdateTargets() which is called by
>> rewriteTargetListUD injects "ctid". "wholerow" is always there. Not
>> for postgres_fdw but for other wrappers it might be a bad news. ctid,
>> whole row obtained from the remote postgres server will fit the tuple
>> descriptor of parent, but for other FDWs the column injected by
>> rewriteTargetListUD() may make the child tuple look different from
>> that of the parent, so we may not pass that column down to the child.
>>
>
> I'm trying to say that when we have a regular table as parent, and
> foreign table as child, in rewriteTargetListUD `wholerow` won't be
> added, because rewriteTargetListUD will be called only for parent
> relation. You can see that by running the script i provided in the first
> message of this thread.


You are right.

explain verbose update parent set val = random();
  QUERY PLAN
---
 Update on public.parent  (cost=0.00..258.63 rows=5535 width=10)
   Update on public.parent
   Update on public.child1
   Foreign Update on public.ftable
 Remote SQL: UPDATE public.child1 SET val = $2 WHERE ctid = $1
   ->  Seq Scan on public.parent  (cost=0.00..4.83 rows=255 width=10)
 Output: random(), parent.ctid
   ->  Seq Scan on public.child1  (cost=0.00..48.25 rows=2550 width=10)
 Output: random(), child1.ctid
   ->  Foreign Scan on public.ftable  (cost=100.00..205.55 rows=2730 width=10)
 Output: random(), ftable.ctid
 Remote SQL: SELECT ctid FROM public.child1 FOR UPDATE
(12 rows)

explain verbose update ftable set val = random();
  QUERY PLAN
---
 Update on public.ftable  (cost=100.00..159.42 rows=1412 width=38)
   Remote SQL: UPDATE public.child1 SET val = $2 WHERE ctid = $1
   ->  Foreign Scan on public.ftable  (cost=100.00..159.42 rows=1412 width=38)
 Output: random(), ctid, ftable.*
 Remote SQL: SELECT val, ctid FROM public.child1 FOR UPDATE
(5 rows)

Anyway, the problem I am stating, i.e. we have a bigger problem to fix
when there are FDWs other than postgres_fdw involved seems to be still
valid.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-16 Thread Kuntal Ghosh
Added to open item lists.

On Tue, May 16, 2017 at 6:35 AM, Masahiko Sawada  wrote:
> On Mon, May 15, 2017 at 8:09 PM, Masahiko Sawada  
> wrote:
>> On Mon, May 15, 2017 at 8:06 PM, Kuntal Ghosh
>>  wrote:
>>> On Mon, May 15, 2017 at 4:00 PM, Masahiko Sawada  
>>> wrote:
 On Mon, May 15, 2017 at 6:41 PM, tushar  
 wrote:
> Hi,
>
> Server Crashes if we try to provide slot_name='none' at the time of 
> creating
> subscription -
>
> postgres=#  create subscription sub2 connection 'dbname=postgres port=5000
> user=centos password=f' publication abc with (slot_name='none');
> NOTICE:  synchronized table states
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>

 Thank you for reporting.
 I think create_slot and enabled should be set to false forcibly when
 slot_name = 'none'. Attached patch fixes it, more test and regression
 test case are needed though.

>>> I think the patch doesn't solve the problem completely. For example,
>>> postgres=# create subscription sub3 connection 'dbname=postgres
>>> port=5432 user=edb password=edb' publication abc with
>>> (slot_name='none',create_slot=true);
>>> ERROR:  slot_name = NONE and create slot are mutually exclusive options
>>> postgres=# create subscription sub3 connection 'dbname=postgres
>>> port=5432 user=edb password=edb' publication abc with
>>> (create_slot=true,slot_name='none');
>>> ERROR:  could not connect to the publisher: could not connect to
>>> server: Connection refused
>>> Is the server running locally and accepting
>>> connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
>>>
>>> Changing the order of subscription parameter changes the output. I
>>> think the modifications should be done at the end of the function.
>>> Attached a patch for the same.
>>>
>>
>> Yes, you're patch fixes it correctly.
>>
>
> I've updated Kuntal's patch, added regression test for option
> combination and updated documentation.
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Increasing parallel workers at runtime

2017-05-16 Thread Ashutosh Bapat
On Mon, May 15, 2017 at 9:23 PM, Robert Haas  wrote:
> On Mon, May 15, 2017 at 10:06 AM, Haribabu Kommi
>  wrote:
>> This still needs some adjustments to fix for the cases where
>> the main backend also does the scan instead of waiting for
>> the workers to finish the job. As increasing the workers logic
>> shouldn't add an overhead in this case.
>
> I think it would be pretty crazy to try relaunching workers after
> every tuple, as this patch does.  The overhead of that will be very
> high for queries where the number of tuples passing through the Gather
> is large, whereas when the number of tuples passing through Gather is
> small, or where tuples are sent all at once at the end of procesisng,
> it will not actually be very effective at getting hold of more
> workers.

+1

> A different idea is to have an area in shared memory where
> queries can advertise that they didn't get all of the workers they
> wanted, plus a background process that periodically tries to launch
> workers to help those queries as parallel workers become available.
> It can recheck for available workers after some interval, say 10s.
> There are some problems there -- the process won't have bgw_notify_pid
> pointing at the parallel leader -- but I think it might be best to try
> to solve those problems instead of making it the leader's job to try
> to grab more workers as we go along.  For one thing, the background
> process idea can attempt to achieve fairness.  Suppose there are two
> processes that didn't get all of their workers; one got 3 of 4, the
> other 1 of 4.  When a worker becomes available, we'd presumably like
> to give it to the process that got 1 of 4, rather than having the
> leaders race to see who grabs the new worker first.  Similarly if
> there are four workers available and two queries that each got 1 of 5
> workers they wanted, we'd like to split the workers two and two,
> rather than having one leader grab all four of them.  Or at least, I
> think that's what we want.

+1 for a separate process distributing workers. But, I am not sure
whether we want to spend a full background process doing this. User is
expected to configure enough parallel worker so that every parallel
query gets enough of them under normal circumstances, so that process
may not find anybody to dispatch idle worker to. But then the question
is which process should do that job, postmaster, since it's the one
which spawns workers when they die? But postmaster itself is quite
busy to also execute the balancing algorithm. So, may be a new
background worker is indeed needed.

Also, looking at the patch, it doesn't look like it take enough care
to build execution state of new worker so that it can participate in a
running query. I may be wrong, but the execution state initialization
routines are written with the assumption that all the workers start
simultaneously?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-05-16 Thread Masahiko Sawada
On Sat, May 13, 2017 at 8:19 PM, Amit Kapila  wrote:
> On Thu, May 11, 2017 at 6:09 AM, Masahiko Sawada  
> wrote:
>> This work would be helpful not only for existing workload but also
>> future works like some parallel utility commands, which is discussed
>> on other threads[1]. At least for parallel vacuum, this feature helps
>> to solve issue that the implementation of parallel vacuum has.
>>
>> I ran pgbench for 10 min three times(scale factor is 5000), here is a
>> performance measurement result.
>>
>> clients   TPS(HEAD)   TPS(Patched)
>> 4   2092.612   2031.277
>> 8   3153.732   3046.789
>> 16 4562.072   4625.419
>> 32 6439.391   6479.526
>> 64 7767.364   7779.636
>> 100   7917.173   7906.567
>>
>> * 16 core Xeon E5620 2.4GHz
>> * 32 GB RAM
>> * ioDrive
>>
>> In current implementation, it seems there is no performance degradation so 
>> far.
>>
>
> I think it is good to check pgbench, but we should do tests of the
> bulk load as this lock is stressed during such a workload.  Some of
> the tests we have done when we have improved the performance of bulk
> load can be found in an e-mail [1].
>

Thank you for sharing.

I've measured using two test scripts attached on that thread. Here is result.

* Copy test script
ClientHEAD Patched
4  452.60 455.53
8  561.74 561.09
16592.50 592.21
32602.53 599.53
64605.01 606.42

* Insert test script
ClientHEAD Patched
4  159.04 158.44
8  169.41 169.69
16177.11 178.14
32182.14 181.99
64182.11 182.73

It seems there is no performance degradation so far.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified

2017-05-16 Thread Tsunakawa, Takayuki
Hello Robert, Tom, Michael,

Thanks a lot for checking my patch.  Sorry, let me check Michael's review 
comments and reply tomorrow.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Ashutosh Bapat
Hi,
Here's patch with some cosmetic fixes to 0002, to be applied on top of 0002.

On Tue, May 16, 2017 at 1:02 PM, Ashutosh Bapat
 wrote:
> On Sun, May 14, 2017 at 12:30 PM, amul sul  wrote:
>> On Fri, May 12, 2017 at 10:39 PM, Ashutosh Bapat
>>  wrote:
>>> On Fri, May 12, 2017 at 6:08 PM, amul sul  wrote:
 Hi,

 Please find the following updated patches attached:

 0001-Cleanup.patch : Does some cleanup and code refactoring required
 for hash partition patch. Otherwise, there will be unnecessary diff in
 0002 patch
>>>
>>> Thanks for splitting the patch.
>>>
>>> +if (isnull[0])
>>> +cur_index = partdesc->boundinfo->null_index;
>>> This code assumes that null_index will be set to -1 when has_null is false. 
>>> Per
>>> RelationBuildPartitionDesc() this is true. But probably we should write this
>>> code as
>>> if (isnull[0])
>>> {
>>> if (partdesc->boundinfo->has_null)
>>> cur_index = partdesc->boundinfo->null_index;
>>> }
>>> That way we are certain that when has_null is false, cur_index = -1 similar 
>>> to
>>> the original code.
>>>
>> Okay will add this.
>
> Thanks.
>
>> I still don't understood point of having has_null
>> variable, if no null accepting partition exists then null_index is
>> alway set to -1 in RelationBuildPartitionDesc.  Anyway, let not change
>> the original code.
>
> I agree. has_null might have been folded as null_index == -1. But
> that's not the problem of this patch.
>
> 0001 looks good to me now.
>
>
>>
>> [...]
>>>
>>> +if (key->strategy == PARTITION_STRATEGY_HASH)
>>> +{
>>> +ndatums = nparts;
>>> +hbounds = (PartitionHashBound **) palloc(nparts *
>>> +
>>> sizeof(PartitionHashBound *));
>>> +i = 0;
>>> +foreach (cell, boundspecs)
>>> +{
>>> +PartitionBoundSpec *spec = lfirst(cell);
>>> +
>>> [ clipped ]
>>> +hbounds[i]->index = i;
>>> +i++;
>>> +}
>>> For list and range partitioned table we order the bounds so that two
>>> partitioned tables have them in the same order irrespective of order in 
>>> which
>>> they are specified by the user or hence stored in the catalogs. The 
>>> partitions
>>> then get indexes according the order in which their bounds appear in ordered
>>> arrays of bounds. Thus any two partitioned tables with same partition
>>> specification always have same PartitionBoundInfoData. This helps in
>>> partition-wise join to match partition bounds of two given tables.  Above 
>>> code
>>> assigns the indexes to the partitions as they appear in the catalogs. This
>>> means that two partitioned tables with same partition specification but
>>> different order for partition bound specification will have different
>>> PartitionBoundInfoData represenation.
>>>
>>> If we do that, probably partition_bounds_equal() would reduce to just 
>>> matching
>>> indexes and the last element of datums array i.e. the greatest modulus 
>>> datum.
>>> If ordered datums array of two partitioned table do not match exactly, the
>>> mismatch can be because missing datums or different datums. If it's a 
>>> missing
>>> datum it will change the greatest modulus or have corresponding entry in
>>> indexes array as -1. If the entry differs it will cause mismatching indexes 
>>> in
>>> the index arrays.
>>>
>> Make sense, will fix this.
>
> I don't see this being addressed in the patches attached in the reply to 
> Dilip.
>
>>
>>>
>>> +if (key->partattrs[i] != 0)
>>> +{
>>> +keyCol = (Node *) makeVar(1,
>>> +  key->partattrs[i],
>>> +  key->parttypid[i],
>>> +  key->parttypmod[i],
>>> +  key->parttypcoll[i],
>>> +  0);
>>> +
>>> +/* Form hash_fn(value) expression */
>>> +keyCol = (Node *) makeFuncExpr(key->partsupfunc[i].fn_oid,
>>> +
>>> get_fn_expr_rettype(&key->partsupfunc[i]),
>>> +list_make1(keyCol),
>>> +InvalidOid,
>>> +InvalidOid,
>>> +COERCE_EXPLICIT_CALL);
>>> +}
>>> +else
>>> +{
>>> +keyCol = (Node *) copyObject(lfirst(partexprs_item));
>>> +partexprs_item = lnext(partexprs_item);
>>> +}
>>> I think we should add FuncExpr for column Vars as well as expressions.
>>>
>> Okay, will fix this.
>
> Here, please add a check similar to get_quals_for_range()
> 1840 if (partexprs_item == NULL)
> 1841 elog(ERROR, "wrong number of partition key expressions");
>
>
>>
>>> I think we need more comments for compute_hash_value(), mix_hash_value() and
>>> satisfies_hash_partition() as to what each of th

Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Ashutosh Bapat
On Sun, May 14, 2017 at 12:30 PM, amul sul  wrote:
> On Fri, May 12, 2017 at 10:39 PM, Ashutosh Bapat
>  wrote:
>> On Fri, May 12, 2017 at 6:08 PM, amul sul  wrote:
>>> Hi,
>>>
>>> Please find the following updated patches attached:
>>>
>>> 0001-Cleanup.patch : Does some cleanup and code refactoring required
>>> for hash partition patch. Otherwise, there will be unnecessary diff in
>>> 0002 patch
>>
>> Thanks for splitting the patch.
>>
>> +if (isnull[0])
>> +cur_index = partdesc->boundinfo->null_index;
>> This code assumes that null_index will be set to -1 when has_null is false. 
>> Per
>> RelationBuildPartitionDesc() this is true. But probably we should write this
>> code as
>> if (isnull[0])
>> {
>> if (partdesc->boundinfo->has_null)
>> cur_index = partdesc->boundinfo->null_index;
>> }
>> That way we are certain that when has_null is false, cur_index = -1 similar 
>> to
>> the original code.
>>
> Okay will add this.

Thanks.

> I still don't understood point of having has_null
> variable, if no null accepting partition exists then null_index is
> alway set to -1 in RelationBuildPartitionDesc.  Anyway, let not change
> the original code.

I agree. has_null might have been folded as null_index == -1. But
that's not the problem of this patch.

0001 looks good to me now.


>
> [...]
>>
>> +if (key->strategy == PARTITION_STRATEGY_HASH)
>> +{
>> +ndatums = nparts;
>> +hbounds = (PartitionHashBound **) palloc(nparts *
>> +
>> sizeof(PartitionHashBound *));
>> +i = 0;
>> +foreach (cell, boundspecs)
>> +{
>> +PartitionBoundSpec *spec = lfirst(cell);
>> +
>> [ clipped ]
>> +hbounds[i]->index = i;
>> +i++;
>> +}
>> For list and range partitioned table we order the bounds so that two
>> partitioned tables have them in the same order irrespective of order in which
>> they are specified by the user or hence stored in the catalogs. The 
>> partitions
>> then get indexes according the order in which their bounds appear in ordered
>> arrays of bounds. Thus any two partitioned tables with same partition
>> specification always have same PartitionBoundInfoData. This helps in
>> partition-wise join to match partition bounds of two given tables.  Above 
>> code
>> assigns the indexes to the partitions as they appear in the catalogs. This
>> means that two partitioned tables with same partition specification but
>> different order for partition bound specification will have different
>> PartitionBoundInfoData represenation.
>>
>> If we do that, probably partition_bounds_equal() would reduce to just 
>> matching
>> indexes and the last element of datums array i.e. the greatest modulus datum.
>> If ordered datums array of two partitioned table do not match exactly, the
>> mismatch can be because missing datums or different datums. If it's a missing
>> datum it will change the greatest modulus or have corresponding entry in
>> indexes array as -1. If the entry differs it will cause mismatching indexes 
>> in
>> the index arrays.
>>
> Make sense, will fix this.

I don't see this being addressed in the patches attached in the reply to Dilip.

>
>>
>> +if (key->partattrs[i] != 0)
>> +{
>> +keyCol = (Node *) makeVar(1,
>> +  key->partattrs[i],
>> +  key->parttypid[i],
>> +  key->parttypmod[i],
>> +  key->parttypcoll[i],
>> +  0);
>> +
>> +/* Form hash_fn(value) expression */
>> +keyCol = (Node *) makeFuncExpr(key->partsupfunc[i].fn_oid,
>> +
>> get_fn_expr_rettype(&key->partsupfunc[i]),
>> +list_make1(keyCol),
>> +InvalidOid,
>> +InvalidOid,
>> +COERCE_EXPLICIT_CALL);
>> +}
>> +else
>> +{
>> +keyCol = (Node *) copyObject(lfirst(partexprs_item));
>> +partexprs_item = lnext(partexprs_item);
>> +}
>> I think we should add FuncExpr for column Vars as well as expressions.
>>
> Okay, will fix this.

Here, please add a check similar to get_quals_for_range()
1840 if (partexprs_item == NULL)
1841 elog(ERROR, "wrong number of partition key expressions");


>
>> I think we need more comments for compute_hash_value(), mix_hash_value() and
>> satisfies_hash_partition() as to what each of them accepts and what it
>> computes.
>>
>> +/* key's hash values start from third argument of function. */
>> +if (!PG_ARGISNULL(i + 2))
>> +{
>> +values[i] = PG_GETARG_DATUM(i + 2);
>> +isnull[i] = false;
>> +}
>> +else
>> +

  1   2   >