Re: General purpose hashing func in pgbench

2018-01-16 Thread Fabien COELHO


Hello Ildar,

Here is a new version of patch. I've splitted it into two parts. The 
first one is almost the same as v4 from [1] with some refactoring. The 
second part introduces random_seed variable as you proposed.


Patch 1 applies. Compilations fails, there are two "hash_seed" declared in 
"pgbench.c".


Patch 2 applies cleanly on top of the previous patch and compiles, because 
the variable is removed...


If an ":hash_seed" pgbench variable is used, ISTM that there is no need 
for a global variable at all, so the two patches are going back and forth, 
which is unhelpful. ISTM better to provide just one combined patch for the 
feature.


If the hash_seed variable really needs to be kept, it should be an "int64" 
variable, like other pgbench values. Calling random just usually 
initializes about 31 bits, so random should be called 2 or maybe 3 times? 
Or maybe use the internal getrand which has 48 pseudorandom bits?


For me "random seed" is what is passed to srandom, so the variable should 
rather be named hash_seed because there could also be a random seed 
(actually, there is in another parallel patch:-). Moreover, this seed may 
or may not be random if set, so calling it "random_seed" is not desirable.


The len < 1 || len > 2 is checked twice, once in the "switch", on in an 
"if" just after the "switch". Once is enough.


I didn't do the executor simplification thing yet because I'm a little 
concerned about inventive users, who may want to change random_seed 
variable in runtime (which is possible since pgbench doesn't have read 
only variables aka constants AFAIK).


If the user choses to overide hash_seed in their script, it is their 
decision, the documentation has only to be clear about :hash_seed being 
the default seed. I see no clear reason to work around this possibility by 
evaluating the seed at parse time, especially as the variable may not have 
its final value yet depending on the option order. I'd suggest to just use 
make_variable("hash_seed") for the default second argument and simplify 
the executor.


The seed variable is not tested explicitely in the script, you could add
a "hash(5432) == hash(5432, :hash_seed)" for instance.

It would be nice if an native English speaker could proofread the 
documentation text. I'd suggest: "*an* optional seed parameter". "In case 
*the* seed...". ":hash_seed". "shared for" -> "shared 
by". "following listing" -> "following pgbench script". "few accounts 
generates" -> "few accounts generate".


For the document example, I'd use larger values for the random & modulo, 
eg 1 and 100. The drawback is that zipfian does a costly 
computation of the generalized harmonic number when the parameter is lower 
than 1.0. For cities, the parameter found by Zipf was 1.07 (says 
Wikipedia). Maybe use this historical value. Or maybe use an exponential 
distribution in the example.


--
Fabien.



Test-cases for exclusion constraints is missing in alter_table.sql file

2018-01-16 Thread Ashutosh Sharma
Hi All,

While working on exclusion constraints for one of our internal
project, I noticed that there is no test-case for exclusion
constraints in alter_table.sql file. However, for other constraints i
could see lots of test-cases in alter_table.sql. There are hardly 1-2
test-cases for exclusion constraint using ALTER TABLE command (1 in
constraints.source and 1 for partitioned table in alter_table.sql).
Shouldn't we consider adding few test-cases for exclusion constraints
in alter_table.sql considering that we have added lots of test-cases
for other constraints in this file. Thoughts?

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com



Re: pgsql: Centralize json and jsonb handling of datetime types

2018-01-16 Thread Erik Rijkers

On 2018-01-17 01:29, Andrew Dunstan wrote:

Centralize json and jsonb handling of datetime types

[...]

https://git.postgresql.org/pg/commitdiff/cc4feded0a31d2b732d4ea68613115cb720e624e

Modified Files
--
src/backend/utils/adt/date.c  |   6 +--
src/backend/utils/adt/json.c  | 122 
--

src/backend/utils/adt/jsonb.c |  70 
src/include/utils/date.h  |   4 +-
src/include/utils/jsonapi.h   |   2 +
5 files changed, 109 insertions(+), 95 deletions(-)


Latest gcc 7.2.0 compile shows these warnings (I suppose these come from 
this commit):


Compiling core:

In file included from gram.y:63:0:
../../../src/include/utils/date.h:76:41: warning: ‘struct pg_tm’ 
declared inside parameter list will not be visible outside of this 
definition or declaration

 extern int time2tm(TimeADT time, struct pg_tm *tm, fsec_t *fsec);
 ^
../../../src/include/utils/date.h:77:46: warning: ‘struct pg_tm’ 
declared inside parameter list will not be visible outside of this 
definition or declaration
 extern int timetz2tm(TimeTzADT *time, struct pg_tm *tm, fsec_t *fsec, 
int *tzp);

  ^
In file included from formatting.c:92:0:
../../../../src/include/utils/date.h:76:41: warning: ‘struct pg_tm’ 
declared inside parameter list will not be visible outside of this 
definition or declaration

 extern int time2tm(TimeADT time, struct pg_tm *tm, fsec_t *fsec);
 ^
../../../../src/include/utils/date.h:77:46: warning: ‘struct pg_tm’ 
declared inside parameter list will not be visible outside of this 
definition or declaration
 extern int timetz2tm(TimeTzADT *time, struct pg_tm *tm, fsec_t *fsec, 
int *tzp);



... while contrib adds:

In file included from btree_gin.c:12:0:
../../src/include/utils/date.h:76:41: warning: ‘struct pg_tm’ declared 
inside parameter list will not be visible outside of this definition or 
declaration

 extern int time2tm(TimeADT time, struct pg_tm *tm, fsec_t *fsec);
 ^
../../src/include/utils/date.h:77:46: warning: ‘struct pg_tm’ declared 
inside parameter list will not be visible outside of this definition or 
declaration
 extern int timetz2tm(TimeTzADT *time, struct pg_tm *tm, fsec_t *fsec, 
int *tzp);

  ^
In file included from btree_utils_num.c:9:0:
../../src/include/utils/date.h:76:41: warning: ‘struct pg_tm’ declared 
inside parameter list will not be visible outside of this definition or 
declaration

 extern int time2tm(TimeADT time, struct pg_tm *tm, fsec_t *fsec);
 ^
../../src/include/utils/date.h:77:46: warning: ‘struct pg_tm’ declared 
inside parameter list will not be visible outside of this definition or 
declaration
 extern int timetz2tm(TimeTzADT *time, struct pg_tm *tm, fsec_t *fsec, 
int *tzp);

  ^
In file included from btree_time.c:9:0:
../../src/include/utils/date.h:76:41: warning: ‘struct pg_tm’ declared 
inside parameter list will not be visible outside of this definition or 
declaration

 extern int time2tm(TimeADT time, struct pg_tm *tm, fsec_t *fsec);
 ^
../../src/include/utils/date.h:77:46: warning: ‘struct pg_tm’ declared 
inside parameter list will not be visible outside of this definition or 
declaration
 extern int timetz2tm(TimeTzADT *time, struct pg_tm *tm, fsec_t *fsec, 
int *tzp);

  ^
In file included from btree_date.c:9:0:
../../src/include/utils/date.h:76:41: warning: ‘struct pg_tm’ declared 
inside parameter list will not be visible outside of this definition or 
declaration

 extern int time2tm(TimeADT time, struct pg_tm *tm, fsec_t *fsec);
 ^
../../src/include/utils/date.h:77:46: warning: ‘struct pg_tm’ declared 
inside parameter list will not be visible outside of this definition or 
declaration
 extern int timetz2tm(TimeTzADT *time, struct pg_tm *tm, fsec_t *fsec, 
int *tzp);



No errors, and 'make check' and 'make check-world' are both OK.


thanks,

Erik Rijkers



Re: TOAST table created for partitioned tables

2018-01-16 Thread Michael Paquier
On Tue, Jan 16, 2018 at 11:38:58PM -0500, Tom Lane wrote:
> Yeah, pg_upgrade already has to cope with cases where the newer version
> thinks a table needs a toast table when the older version didn't, or
> vice versa.  This looks like it ought to fall into that category.
> Not that testing it wouldn't be a good idea.

As far as I can see this statement is true. If you create a parent
partition table in a v10 cluster, and then upgrade to HEAD with this
patch applied, you'll be able to notice that the relation still has its
toast table present, while newly-created parent partitions would have
nothing. (Just tested, I didn't review the patch in details).
--
Michael


signature.asc
Description: PGP signature


Re: TOAST table created for partitioned tables

2018-01-16 Thread Tom Lane
Amit Langote  writes:
> On Wed, Jan 17, 2018 at 5:32 AM, Robert Haas  wrote:
>> Aargh.  Will apply this patch break pg_upgrade from v10?

> AFAICS, it doesn't.  Partitioned tables that used to have a TOAST
> table in v10 cluster will continue to have it after upgrading.
> Whereas, any partitioned tables created with the patched won't have a
> TOAST table.

Yeah, pg_upgrade already has to cope with cases where the newer version
thinks a table needs a toast table when the older version didn't, or
vice versa.  This looks like it ought to fall into that category.
Not that testing it wouldn't be a good idea.

regards, tom lane



Re: [HACKERS] replace GrantObjectType with ObjectType

2018-01-16 Thread Michael Paquier
On Tue, Jan 16, 2018 at 04:18:29PM -0500, Peter Eisentraut wrote:
> So I was looking into how we can do it without OBJECT_RELATION.  For the
> first patch, that was obviously easy, because that's what my initial
> proposal did.  We just treat OBJECT_TABLE within the context of
> GRANT/REVOKE as "might also be another kind of relation".
> 
> The second class replaced ACL_KIND_CLASS with OBJECT_RELATION in the
> context of aclcheck_error(), so that was harder to unwind.  But I wrote
> some additional code that resolves the actual relkind and gives a more
> precise error message, e.g., about "view" or "index" instead of just
> "relation".  So overall this is a net win, I think.
> 
> Attached is an updated patch set.  It's a bit messy because I first want
> to get confirmation on the approach we are choosing, and then I can
> smash them together in the right way.  0001 and 0002 are the original
> patches, and then 0003, 0004, 0005 undo the OBJECT_RELATION addition and
> replace them with new facilities.

Looking at 0003, which is a nice addition:

+ObjectType
+relkind_get_objtype(char relkind)
+   /* other relkinds are not supported here because they don't map to 
OBJECT_* values */
+   default:
+   elog(ERROR, "unexpected relkind: %d", relkind);
+   return 0;
So this concerns RELKIND_TOASTVALUE and RELKIND_COMPOSITE_TYPE. At least
a comment at the top of relkind_get_objtype?

if (!pg_class_ownercheck(RelationGetRelid(rel), GetUserId()))
-   aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_RELATION,
+   aclcheck_error(ACLCHECK_NOT_OWNER, 
relkind_get_objtype(get_rel_relkind(RelationGetRelid(rel))),
It seems to me that you could just use rel->rd_rel->relkind instead of
get_rel_relkind(RelationGetRelid(rel)).

0004 alone fails to compile as OBJECT_RELATION is still listed in
objectaddress.c. This is corrected in 0005.

+   if (prop->objtype == OBJECT_TABLE)
+   /*
+* If the property data says it's a table, dig a little deeper to get
+* the real relation kind, so that callers can produce more precise
+* error messages.
+*/
+   return relkind_get_objtype(get_rel_relkind(object_id));
I guess that this is the price to pay as OBJECT_RELATION gets
removed, but it seems to me that we want to keep the OBJECT_RELATION
layer and look in depth at the relkind if is found... By the way, I
personally favor the use of brackets in the case of a single line in an
if() if there is a comment block within the condition.

+-- Clean up in case a prior regression run failed
+SET client_min_messages TO 'warning';
+DROP ROLE IF EXISTS regress_alter_user1;
+RESET client_min_messages;
+
+CREATE USER regress_alter_user1;
Er, if you need that it seems to me that this regression test design is
wrong. I would have thought that roles should be contained within a
single file. And the role is dropped at the end of alter_table.sql as
your patch does. So perhaps something crashed during your own tests and
you added this DROP ROLE to ease things?

As a whole, I like this patch series, except that OBJECT_RELATION should
be kept for get_object_type() as this shaves a couple of cycles if an
object is marked as OBJECT_TABLE and its real relkind is OBJECT_TABLE.
--
Michael


signature.asc
Description: PGP signature


Re: TOAST table created for partitioned tables

2018-01-16 Thread Amit Langote
On Wed, Jan 17, 2018 at 5:32 AM, Robert Haas  wrote:
> On Tue, Jan 16, 2018 at 5:13 AM, Amit Langote
>  wrote:
>> I used to think that $subject didn't happen, but it actually does and ends
>> up consuming a fixed 8192 bytes on the disk.
>>
>> create table p (a int[]) partition by list (a);
>> CREATE TABLE
>>
>> select pg_table_size('p');
>>  pg_table_size
>> ---
>>   8192
>> (1 row)
>>
>> select  pg_relation_size(c1.oid) as p_size,
>> pg_relation_size(c1.reltoastrelid) as p_toast_heap_size,
>> pg_relation_size(c2.oid) as p_toast_index_size
>> frompg_class c1, pg_class c2, pg_index i
>> where   c1.relname = 'p' and
>> c1.reltoastrelid = i.indrelid and
>> c2.oid = i.indexrelid;
>>  p_size | p_toast_heap_size | p_toast_index_size
>> +---+
>>   0 | 0 |   8192
>> (1 row)
>
> Aargh.  Will apply this patch break pg_upgrade from v10?

AFAICS, it doesn't.  Partitioned tables that used to have a TOAST
table in v10 cluster will continue to have it after upgrading.
Whereas, any partitioned tables created with the patched won't have a
TOAST table.

Thanks,
Amit



Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-16 Thread David Rowley
On 16 January 2018 at 21:08, Amit Langote  wrote:
> Attached v20.  Thanks again.

Thanks for working on v20. I've had a look over part of it and I've
written down the following:

1. The following comment is not correct

/*
* Equality look up key.  Values in the following array appear in no
* particular order (unlike minkeys and maxkeys below which must appear in
* the same order as the partition key columns).

These must be in partition key order, just like the others.

This part is not true either:

* the same order as the partition key columns).  n_eqkeys must be equal to
* the number of partition keys to be valid (except in the case of hash
* partitioning where that's not required).  When set, minkeys and maxkeys
* are ignored.

range2 is pruned just fine from the following:

create table rangep (a int, b int) partition by range (a,b);
create table rangep1 partition of rangep for values from (1,10) to (1,20);
create table rangep2 partition of rangep for values from (2,10) to (2,20);

explain select * from rangep where a = 1;
  QUERY PLAN
---
 Append  (cost=0.00..38.25 rows=11 width=8)
   ->  Seq Scan on rangep1  (cost=0.00..38.25 rows=11 width=8)
 Filter: (a = 1)
(3 rows)

2. You've added a list_copy() to get_partitions_from_clauses so as not
to modify the input list, but this function calls
get_partitions_from_clauses_recurse which calls
classify_partition_bounding_keys() which modifes that list. Would it
not just be better to make a list copy in
get_partitions_from_clauses() without any conditions?

If we get new users of that function, e.g Run-time pruning, then they
might be surprised to see new items magically added to their input
list without mention of that behaviour in the function comment.

3. The following case causes an Assert failure:

drop table listp;
CREATE TABLE listp (a int, b int) partition by list (a);
create table listp1 partition of listp for values in (1);
create table listp2 partition of listp for values in (2);

prepare q1 (int) as select * from listp where a = 1 and a in($1,10);

explain execute q1 (1);
explain execute q1 (1);
explain execute q1 (1);
explain execute q1 (1);
explain execute q1 (1);
explain execute q1 (1); -- <--- Assert failure!

In match_clauses_to_partkey you always add the ScalarArrayOpExpr to
the result regardless if it is a complete set of Consts, however, the
code in classify_partition_bounding_keys() that deals with
ScalarArrayOpExpr in can't handle non-consts

/* OK to add to the result. */
result = lappend(result, clause);
if (IsA(eval_const_expressions(root, rightop), Const))
*contains_const = true;
else
*contains_const = false;

*contains_consts is reset to true again by the a = 1 qual, so
get_partitions_from_clauses() gets called from
get_append_rel_partitions. Later classify_partition_bounding_keys()
when processing the ScalarArrayOpExpr, the following code assumes the
array exprs are all Consts:

foreach(lc1, elem_exprs)
{
Const  *rightop = castNode(Const, lfirst(lc1));


Setting *contains_const = false; in match_clauses_to_partkey() is not
correct either. If I understand the intent here correctly, you want
this to be set to true if the clause list contains quals with any
consts that are useful for partition pruning during planning. If
that's the case then you should set it to true if you find a suitable
clause, otherwise leave it set to false as you set it to at the start
of the function. What you have now will have varying results based on
the order of the clauses in the list, which is certainly not correct.

4. The following code can be rearranged to not pull_varnos if there's
no commutator op.

constexpr = leftop;
constrelids = pull_varnos((Node *) leftop);
expr_op = get_commutator(expr_op);

/*
* If no commutator exists, cannot flip the qual's args,
* so give up.
*/
if (!OidIsValid(expr_op))
continue;

5. Header comment for match_clauses_to_partkey() says only clauses in
the pattern of "partkey op const" and "const op partkey" are handled.
NULL tests are also mentioned but nothing is mentioned about
ScalarArrayOpExpr. It might be better to be less verbose about what
the function handles, but if you're listing what is handled then you
should not make false claims.

 * For an individual clause to match with a partition key column, the clause
 * must be an operator clause of the form (partkey op const) or (const op
 * partkey); the latter only if a suitable commutator exists.  Furthermore,

6. Which brings me to; why do we need match_clauses_to_partkey at all?
classify_partition_bounding_keys seems to do all the work
match_clauses_to_partkey does, plus more. Item #3 above is caused by
an inconsistency between these functions. What benefit does
match_clauses_to_partkey give? I might understand if you were creating
list of clauses matching each partition key, but you're just dumping
everything in one big list which 

Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-16 Thread David Rowley
On 16 January 2018 at 21:08, Amit Langote  wrote:
> On 2018/01/12 12:30, David Rowley wrote:
>> 8. The code in get_partitions_from_ne_clauses() does perform quite a
>> few nested loops. I think a more simple way to would be to track the
>> offsets you've seen in a Bitmapset. This would save you having to
>> check for duplicates, as an offset can only contain a single datum.
>> You'd just need to build a couple of arrays after that, one to sum up
>> the offsets found per partition, and one for the total datums allowed
>> in the partition. If the numbers match then you can remove the
>> partition.
>>
>> I've written this and attached it to this email. It saves about 50
>> lines of code and should perform much better for complex cases, for
>> example, a large NOT IN list. This also implements #6.
>
> I liked your patch, so incorporated it, except, I feel slightly
> uncomfortable about the new name that you chose for the function because
> it sounds a bit generic.

You're right. I only renamed it because I inverted the meaning of the
function in the patch. It no longer did
"get_partitions_from_ne_clauses", it did the opposite and give the
partitions which can't match. Please feel free to think of a new
better name. Is "get_partitions_excluded_by_ne_clauses" too long?

>> I think you quite often use "the same" to mean "it". Can you change that?
>
> I guess that's just one of my many odd habits when writing English, all of
> which I'm trying to get rid of, but apparently with limited success.  Must
> try harder. :)

Oops, on re-reading that it sounded as though I was asking you to
change some habit, but I just meant the comments. I understand there
will be places that use English where that's normal. It's just I don't
recall seeing that in PostgreSQL code before. American English is
pretty much the standard for the project, despite that not always
being strictly applied (e.g we have a command called ANALYSE which is
an alias for ANALYZE). I always try and do my best to spell words in
American English (which is not where I'm from), which for me stretches
about as far as putting 'z' in the place of some of my 's'es.

> I rewrote the above comment block as:
>
>  * Try to compare the constant arguments of 'leftarg' and 'rightarg', in that
>  * order, using the operator of 'op' and set *result to the result of this
>  * comparison.
>
> Is that any better?

Sounds good.

>
>> 15. "the latter" is normally used when you're referring to the last
>> thing in a list which was just mentioned. In this case, leftarg_const
>> and rightarg_const is the list, so "the latter" should mean
>> rightarg_const, but I think you mean to compare them using the
>> operator.
>>
>> * If the leftarg_const and rightarg_const are both of the type expected
>> * by op's operator, then compare them using the latter.
>
> Rewrote it as:
>
>  * We can compare leftarg_const and rightarg_const using op's operator
>  * only if both are of the type expected by it.

I'd probably write "expected type." instead of "type expected by it."

>> 17. The following example will cause get_partitions_for_keys_hash to 
>> misbehave:
>>
>> create table hashp (a int, b int) partition by hash (a, b);
>> create table hashp1 partition of hashp for values with (modulus 4, remainder 
>> 0);
>> create table hashp2 partition of hashp for values with (modulus 4, remainder 
>> 1);
>> create table hashp3 partition of hashp for values with (modulus 4, remainder 
>> 3);
>> create table hashp4 partition of hashp for values with (modulus 4, remainder 
>> 2);
>> explain select * from hashp where a = 1 and a is null;
>>
>> The following code assumes that you'll never get a NULL test for a key
>> that has an equality test, and ends up trying to prune partitions
>> thinking we got compatible clauses for both partition keys.
>
> Yeah, I think this example helps spot a problem.  I thought we'd never get
> to get_partitions_for_keys_hash() for the above query, because we
> should've been able to prove much earlier that the particular clause
> combination should be always false (a cannot be both non-null 1 and null).
>  Now, because the planner itself doesn't substitute a constant-false for
> that, I taught classify_partition_bounding_keys() to do so.  It would now
> return constfalse=true if it turns out that clauses in the input list lead
> to contradictory nullness condition for a given partition column.
>
>>   memset(keyisnull, false, sizeof(keyisnull));
>>   for (i = 0; i < partkey->partnatts; i++)
>>   {
>> if (bms_is_member(i, keys->keyisnull))
>> {
>>   keys->n_eqkeys++;
>>   keyisnull[i] = true;
>> }
>>   }
>>
>>   /*
>>* Can only do pruning if we know all the keys and they're all equality
>>* keys including the nulls that we just counted above.
>>*/
>>   if (keys->n_eqkeys == partkey->partnatts)
>>
>> The above code will need to be made smarter. It'll likely crash if you
>> change "b" to a pass-by-ref type.
>

Re: [HACKERS] Replication status in logical replication

2018-01-16 Thread Masahiko Sawada
On Wed, Jan 17, 2018 at 2:16 AM, Simon Riggs  wrote:
> On 16 January 2018 at 06:21, Michael Paquier  
> wrote:
>> On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote:
>>> On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs  wrote:
 On 9 January 2018 at 04:36, Masahiko Sawada  wrote:
 We're not talking about standbys, so the message is incorrect.
>>>
>>> Ah, I understood. How about "\"%s\"  has now caught up with upstream
>>> server"?
>>
>> +1.
>
> upstream is what I would have suggested, so +1 here also.
>
> Will commit.
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: let's not complain about harmless patch-apply failures

2018-01-16 Thread Peter Geoghegan
On Tue, Jan 16, 2018 at 3:54 PM, Tom Lane  wrote:
> FWIW, I think that that represents bad practice in those changes,
> precisely because of the hazard it poses for uncommitted patches.
> If you're changing a function signature, it's usually not that hard
> to make sure that un-updated code will produce a failure or warning,
> and you should generally do so IMO.

I strongly agree. That's an example of the programmer exploiting
mechanical detection of conflicts deliberately, which is great. All of
these things are tools, and like all tools they are generally not
helpful unless used thoughtfully.

-- 
Peter Geoghegan



Re: let's not complain about harmless patch-apply failures

2018-01-16 Thread Tom Lane
Peter Geoghegan  writes:
> The parallel CREATE INDEX patch is something that I've worked on
> (fairly inconsistently) for 2 years now. I remember two occasions in
> which somebody else changed a function signature for functions that my
> code called, and without that causing even a compiler warning after
> rebasing on top of these changes (e.g., changing an int argument to a
> bool argument). On both occasions, this led to a real bug in a version
> of the patch that was posted to the list.

FWIW, I think that that represents bad practice in those changes,
precisely because of the hazard it poses for uncommitted patches.
If you're changing a function signature, it's usually not that hard
to make sure that un-updated code will produce a failure or warning,
and you should generally do so IMO.

regards, tom lane



Re: let's not complain about harmless patch-apply failures

2018-01-16 Thread Peter Geoghegan
On Tue, Jan 16, 2018 at 8:56 AM, Robert Haas  wrote:
> I've seen that before as well.
>
> I have also noticed people complaining about patches that apply "with
> offsets", which also seems like needless nitpicking.  If the offsets
> are large and the patch has been sitting around for a long time,
> there's a small chance it could be applying to the wrong place, but
> that is extremely rare.  Most patches have small offsets, just a few
> lines, and there is no problem.

+1

The parallel CREATE INDEX patch is something that I've worked on
(fairly inconsistently) for 2 years now. I remember two occasions in
which somebody else changed a function signature for functions that my
code called, and without that causing even a compiler warning after
rebasing on top of these changes (e.g., changing an int argument to a
bool argument). On both occasions, this led to a real bug in a version
of the patch that was posted to the list.

Mechanical detection of problems is great, but there is no substitute
for vigilance. I think that people that complain about stuff like
patches applying with offsets have a false sense of security about
detecting problems mechanically. Rebasing a patch without conflicts
(including seeing a warning about offsets) does not mean that your
patch didn't become broken in some subtle, harmful way. Mechanical
detection is only useful to the extent that it guides and augments
human oversight.

-- 
Peter Geoghegan



Re: [HACKERS] taking stdbool.h into use

2018-01-16 Thread Peter Eisentraut
On 1/11/18 17:01, Peter Eisentraut wrote:
> Looking around where else they are used, the uses in numeric.c sure seem
> like noops:
> 
> #if SIZEOF_DATUM == 8
> #define NumericAbbrevGetDatum(X) ((Datum) SET_8_BYTES(X))
> #define DatumGetNumericAbbrev(X) ((int64) GET_8_BYTES(X))
> #define NUMERIC_ABBREV_NAN  NumericAbbrevGetDatum(PG_INT64_MIN)
> #else
> #define NumericAbbrevGetDatum(X) ((Datum) SET_4_BYTES(X))
> #define DatumGetNumericAbbrev(X) ((int32) GET_4_BYTES(X))
> #define NUMERIC_ABBREV_NAN  NumericAbbrevGetDatum(PG_INT32_MIN)
> #endif
> 
> We can just replace these by straight casts, too.

I have committed a change for this.  I'll work through the other details
later.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2018-01-16 Thread David Rowley
On 17 January 2018 at 03:58, Alvaro Herrera  wrote:
>> 9. Error details claim that p2_a_idx is not a partition of p.
>> Shouldn't it say table "p2" is not a partition of "p"?
>
> You missed the "on" in the DETAIL:
>   DETAIL:  Index "p2_a_idx" is not on a partition of table "p".
> You could argue that this is obscurely worded, but if you look at the
> command:
>ALTER INDEX p_a_idx ATTACH PARTITION p2_a_idx;
> nowhere is table p2 mentioned, so I'm not sure it's a great idea to
> mention the table in the error message.

I think I did miss the "on".

>> 10. You've added code to get_relation_info() to handle partitioned
>> indexes, but that code is skipped [...]
>> The new code should either be removed, or you should load the index
>> list for a partitioned table.
>
> That's a leftover from previous versions too.  YAGNI principle says I
> should remove it rather than activate it, I think, since the optimizer
> is not going to use the data for anything.

Agreed.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: WIP Patch: Precalculate stable functions, infrastructure v1

2018-01-16 Thread Tom Lane
Aleksander Alekseev  writes:
> I can confirm this code works. However, since this is quite a large patch, I 
> believe we better have a second reviewer or a very attentive committer. 
> The new status of this patch is: Ready for Committer

This is indeed quite a large patch, but it seems to me it could become
smaller.  After a bit of review:

1. I do not like what you've done with ParamListInfo.  The changes around
that are invasive, accounting for a noticeable part of the patch bulk,
and I don't think they're well designed.  Having to cast back and forth
between ParamListInfo and ParamListInfoCommon and so on is ugly and prone
to cause (or hide) errors.  And I don't really understand why
ParamListInfoPrecalculationData exists at all.  Couldn't you have gotten
the same result with far fewer notational changes by defining another
PARAM_FLAG bit in the existing pflags field?  (Or alternatively, maybe
the real need here is for another ParamKind value for Param nodes?)

I also dislike this approach because it effectively throws away the
support for "virtual" param arrays that I added in commit 6719b238e:
ParamListInfoPrecalculationData has no support for dynamically determined
parameter properties, which is surely something that somebody will need.
(It's just luck that the patch doesn't break plpgsql today.)  I realize
that that's a recent commit and the code I'm complaining about predates
it, but we need to adjust this so that it fits in with the new approach.
See comment block at lines 25ff in params.h.

2. I don't follow the need for the also-rather-invasive changes to domain
constraint data structures.  I do see that the patch attempts to make
CoerceToDomain nodes cacheable, which is flat wrong and has to be ripped
out.  You *cannot* assume that the planner has access to the same domain
constraints that will apply at runtime.

I've occasionally thought that we should hook domain constraint changes
into the plan invalidation mechanism, which would make it possible for
the planner to assume that the constraints seen at planning time will
apply at execution.  Whereupon we could have the planner insert domain
constraint expressions into the plan rather than leaving those to be
collected at query startup by execExpr.c, and then do things like
constant-folding and cacheing CoerceToDomain nodes.  But that would be
a rather large and very domain-specific change, and so it would be fit
material for a different patch IMO.  I recommend that for now you just
treat CoerceToDomain as an uncacheable expression type and rip all the
domain-related changes out of this patch.

3. I think you should also try hard to get rid of the need for
PlannedStmt.hasCachedExpr.  AFAICS there's only one place that is
using that flag, which is exec_simple_check_plan, and I have to
think there are better ways we could deal with that.  In particular,
I don't understand why you haven't simply set up plpgsql parameter
references to be noncacheable.  Or maybe what we'd better do is
disable CacheExpr insertions into potentially-simple plans in the
first place.  As you have it here, it's possible for recompilation
of an expression to result in a change in whether it should be deemed
simple or not, which will break things (cf commit 00418c612).

4. I don't like the way that you've inserted
"replace_qual_cached_expressions" and
"replace_pathtarget_cached_expressions" calls into seemingly random places
in the planner.  Why isn't that being done uniformly during expression
preprocessing?  There's no apparent structure to where you've put these
calls, and so they seem really vulnerable to errors of omission.  Also,
if this were done in expression preprocessing, there'd be a chance of
combining it with some existing pass over expression trees instead of
having to do a separate (and expensive) expression tree mutation.
I can't help suspecting that eval_const_expressions could take this on
as an additional responsibility with a lot less than a thousand new lines
of code.

5. BTW, cost_eval_cacheable_expr seems like useless restructuring as well.
Why aren't you just recursively applying the regular costing function?


If you did all of the above it would result in a pretty significant
reduction of the number of places touched by the patch, which would make
it easier to see what's going on.  Then we could start to discuss, for
instance, what does the "isConstParam" flag actually *mean* and why
is it different from PARAM_FLAG_CONST?  And what in the world is
CheckBoundParams about?  The internal documentation in this patch
isn't quite nonexistent, but it's well short of being in a
committable state IMO.

regards, tom lane



Re: TOAST table created for partitioned tables

2018-01-16 Thread Robert Haas
On Tue, Jan 16, 2018 at 5:13 AM, Amit Langote
 wrote:
> I used to think that $subject didn't happen, but it actually does and ends
> up consuming a fixed 8192 bytes on the disk.
>
> create table p (a int[]) partition by list (a);
> CREATE TABLE
>
> select pg_table_size('p');
>  pg_table_size
> ---
>   8192
> (1 row)
>
> select  pg_relation_size(c1.oid) as p_size,
> pg_relation_size(c1.reltoastrelid) as p_toast_heap_size,
> pg_relation_size(c2.oid) as p_toast_index_size
> frompg_class c1, pg_class c2, pg_index i
> where   c1.relname = 'p' and
> c1.reltoastrelid = i.indrelid and
> c2.oid = i.indexrelid;
>  p_size | p_toast_heap_size | p_toast_index_size
> +---+
>   0 | 0 |   8192
> (1 row)

Aargh.  Will apply this patch break pg_upgrade from v10?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Transaction control in procedures

2018-01-16 Thread Andrew Dunstan


On 01/16/2018 10:16 AM, Peter Eisentraut wrote:
> On 1/15/18 12:57, Andrew Dunstan wrote:
>> This confused me slightly:
>>
>> +    Transactions cannot be ended inside loops through query results
>> or inside
>> +    blocks with exception handlers.
>>
>> I suggest: "A transaction cannot be ended inside a loop over query
>> results, nor inside a block with exception handlers."
> fixed
>
>> The patch has bitrotted slightly in src/backend/commands/portalcmds.c
> merged
>
>> The plperl expected file needs updating. Also, why does spi_commit() in
>> a loop result in an error message but not spi_rollback()?
> This is all changed now after the patch for portal pinning in PL/Perl
> and PL/Python has been committed.  The attached patch behaves better.
>


Looks good. Marking ready for committer.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: let's not complain about harmless patch-apply failures

2018-01-16 Thread Alvaro Herrera
David Fetter wrote:

> I'm sure I'm not alone in finding it helpful when patch sets come with
> a single-sentence summary of the patch set and a commit message for
> each individual patch.
> 
> Is git format-patch really too heavy a lift to ask of people?

I think it's okay as general guideline, but not as a hard requirement.
Just like we advise people to trim quoted text when they reply to
mailing list postings, but we don't boot those that fail to.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Partition-wise aggregation/grouping

2018-01-16 Thread Robert Haas
On Tue, Jan 16, 2018 at 3:56 AM, Jeevan Chalke
 wrote:
> I will make your suggested changes that is merge create_sort_agg_path() and
> create_hash_agg_path(). Will name that function as
> create_sort_and_hash_agg_paths().

I suggest add_paths_to_grouping_rel() and
add_partial_paths_to_grouping_rel(), similar to what commit
c44c47a773bd9073012935a29b0264d95920412c did with
add_paths_to_append_rel().

> Oops. My mistake. Missed. We should loop over the input rel's pathlist.
>
> Yep. With above change, the logic is very similar except
> (1) isPartialAgg/can_sort case creates the partial paths and
> (2) finalization step is not needed at this stage.

I'm not sure what you mean by #1.

> I think it can be done by passing a flag to create_sort_agg_path() (or new
> combo function) and making appropriate adjustments. Do you think addition of
> this new flag should go in re-factoring patch or main PWA patch?
> I think re-factoring patch.

I think the refactoring patch should move the existing code into a new
function without any changes, and then the main patch should add an
additional argument to that function that allows for either behavior.

By the way, I'm also a bit concerned about this:

+   /*
+* For full aggregation, we are done with the partial
paths.  Just
+* clear it out so that we don't try to create a
parallel plan over it.
+*/
+   grouped_rel->partial_pathlist = NIL;

I think that's being done for the same reason as mentioned at the
bottom of the current code for create_grouping_paths().  They are only
partially aggregated and wouldn't produce correct final results if
some other planning step -- create_ordered_paths, or the code that
sets up final_rel -- used them as if they had been fully agggregated.
I'm worried that there might be an analogous danger for partition-wise
aggregation -- that is, that the paths being inserted into the partial
pathlists of the aggregate child rels might get reused by some later
planning step which doesn't realize that the output they produce
doesn't quite match up with the rel to which they are attached.  You
may have already taken care of that problem somehow, but we should
make sure that it's fully correct and clearly commented.  I don't
immediately see why the isPartialAgg case should be any different from
the !isPartialAgg case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: let's not complain about harmless patch-apply failures

2018-01-16 Thread David Fetter
On Tue, Jan 16, 2018 at 11:56:55AM -0500, Robert Haas wrote:
> On Tue, Jan 16, 2018 at 4:04 AM, Kyotaro HORIGUCHI
>  wrote:
> > At Mon, 15 Jan 2018 21:45:34 -0500, Tom Lane  wrote in 
> > <26718.1516070...@sss.pgh.pa.us>
> >> Robert Haas  writes:
> >> > Since the "Stripping trailing CRs from patch" message is
> >> > totally harmless, I'm not sure why you should need to devote
> >> > any effort to avoiding it.  Anyone who gets it should just
> >> > ignore it.
> >
> > I know that and totally agree to Robert but still I wonder why
> > (and am annoyed by) I sometimes receive such complain or even an
> > accusation that I sent an out-of-the-convention patch and I was
> > afraid that it is not actually common.
> 
> I've seen that before as well.
> 
> I have also noticed people complaining about patches that apply
> "with offsets", which also seems like needless nitpicking.  If the
> offsets are large and the patch has been sitting around for a long
> time, there's a small chance it could be applying to the wrong
> place, but that is extremely rare.  Most patches have small offsets,
> just a few lines, and there is no problem.  Complaining about the
> offsets, on the other hand, is unhelpful: it not only forces the
> patch author to update the patch for no good reason, but it clutters
> the mailing list with useless traffic that everyone else has to
> ignore.
> 
> I think we should have a firm policy that if patch -p1 can apply
> your patch, your patch is sufficiently well-formatted.  If someone
> wants the result as a context diff, a unified diff, with one kind of
> line endings vs. another, or whatever, they can apply the patch
> locally and use whatever tools they like to get a diff in the format
> they prefer.
> 
> When posting large patch stacks, 'git format-patch' is nice because
> it lets you give a sequence number and a commit message to each
> patch in a sensible way.  I recommend it, but I don't think we
> should insist on it.

I'm sure I'm not alone in finding it helpful when patch sets come with
a single-sentence summary of the patch set and a commit message for
each individual patch.

Is git format-patch really too heavy a lift to ask of people?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: proposal: alternative psql commands quit and exit

2018-01-16 Thread Alvaro Herrera
Tom Lane wrote:

> Also, I remain of the opinion that we needn't necessarily teach them the
> minimum-keystrokes solution; it's better to teach something that will work
> every time.  Maybe ^C is close enough on that score, but I'm not sure.

IMO in the spirit of keeping things simple, it's enough to tell people
to ^C their way out (and why); anything more complicated than that, such
as closing any open quotes, is not catering for the newbie community
that this feature is supposed to help.  As Robert said, if you're trying
to exit the terminal, you don't care too much about any string literals
you have open.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: proposal: alternative psql commands quit and exit

2018-01-16 Thread Robert Haas
On Tue, Jan 16, 2018 at 6:44 AM, Geoff Winkless  wrote:
> A quick PoC.
>
> I should say upfront that I've no axe to grind over this, if no-one
> likes the idea I don't mind: I'm not sure I like it myself (it's quite
> an aggressive stance to take, I think) - I just wanted to see if it
> would work, and provide it as a working possibility.
>
> Basically, I intercept every keypress before we send it to readline.
> If it's 4 (Ctrl-D) we replace the buffer with \q which then quits,
> even if there's something in the buffer. So simply sending "Ctrl-D to
> quit" would always be true. Everything else just gets passed through
> to readline directly.

This is an impressive attempt to make it possible to write a short and
universally correct help text, but (1) I doubt that we want to
override the user's terminal settings and (2) it won't work on
non-readline builds.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: proposal: alternative psql commands quit and exit

2018-01-16 Thread Robert Haas
On Mon, Jan 15, 2018 at 11:55 AM, Tom Lane  wrote:
> Right, but if we're willing to look at the parse state, we don't need
> to cram all possible knowledge into one 80-character string.  I'm
> thinking about just responding to the current situation, say
>
> You have begun a quoted string.  To end it, type '
>
> (or ", or $foo$ as appropriate).  I'm inclined to violate our usual
> message style guidelines here by not putting any quotes or punctuation
> around the ending quote ... more likely to confuse than help.
>
> Or, if you're not in a string but there's text in the buffer,
>
> You have entered an incomplete SQL command.
> Type ; to send it or \r to discard it.
>
> Or, if there's no text in the buffer, we print the existing help
> message for "help", or just exit for quit/exit.

This gets a little complex; you need to note only the parser state but
also, in the case of dollar-quoting, what appeared between the two
dollar signs when the dollar quotes were opened.  Plus, it's quite
possible that there are multiple levels of quoting open; I'm not sure
I want to print a message like:

You have begun a quoted string, To end it, type $$too$$'$$complicated$$

The other problem with this sort of thing is that it gives the user no
hint as to why the message got generated.  The messages in the patch I
sent before were:

Use \? for help or press control-C to clear the input buffer.
Use \q to quit or press control-C to clear the input buffer.

If the user does a cut-and-paste of a bunch of text and gets one of
those messages, it doesn't explicitly spell out what triggered the
message, but it hints at it.  With your proposed message, someone
might reply "I know I have begun a quoted string.  Why are you telling
me about it?".

I also think that pressing control-C to clear the input buffer, rather
than trying to close out the quotes or the SQL command, is likely to
be what most people want to do in this situation.  Generally, by the
time you resort to typing "quit" or "help" into a program and hoping
something good happens, it's a good bet that whatever came before that
point wasn't anything great.  But I don't know what to do about the
revelation that my trusty standby control-C isn't universally the
right thing either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Replication status in logical replication

2018-01-16 Thread Simon Riggs
On 16 January 2018 at 06:21, Michael Paquier  wrote:
> On Tue, Jan 16, 2018 at 10:40:43AM +0900, Masahiko Sawada wrote:
>> On Sun, Jan 14, 2018 at 12:43 AM, Simon Riggs  wrote:
>>> On 9 January 2018 at 04:36, Masahiko Sawada  wrote:
>>> We're not talking about standbys, so the message is incorrect.
>>
>> Ah, I understood. How about "\"%s\"  has now caught up with upstream
>> server"?
>
> +1.

upstream is what I would have suggested, so +1 here also.

Will commit.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: let's not complain about harmless patch-apply failures

2018-01-16 Thread Simon Riggs
On 16 January 2018 at 16:56, Robert Haas  wrote:
> On Tue, Jan 16, 2018 at 4:04 AM, Kyotaro HORIGUCHI
>  wrote:
>> At Mon, 15 Jan 2018 21:45:34 -0500, Tom Lane  wrote in 
>> <26718.1516070...@sss.pgh.pa.us>
>>> Robert Haas  writes:
>>> > Since the "Stripping trailing CRs from patch" message is totally
>>> > harmless, I'm not sure why you should need to devote any effort to
>>> > avoiding it.  Anyone who gets it should just ignore it.
>>
>> I know that and totally agree to Robert but still I wonder why
>> (and am annoyed by) I sometimes receive such complain or even an
>> accusation that I sent an out-of-the-convention patch and I was
>> afraid that it is not actually common.
>
> I've seen that before as well.
>
> I have also noticed people complaining

People complain... asking them not to is unlikely to get anywhere.

We must encourage people to speak up if they see an improvement or a
lack of quality. I have benefited from such comments and they are not
often intended negatively.

Every complaint is not a hard blocker and complainers can also be
wrong, so we just need perspective.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



let's not complain about harmless patch-apply failures

2018-01-16 Thread Robert Haas
On Tue, Jan 16, 2018 at 4:04 AM, Kyotaro HORIGUCHI
 wrote:
> At Mon, 15 Jan 2018 21:45:34 -0500, Tom Lane  wrote in 
> <26718.1516070...@sss.pgh.pa.us>
>> Robert Haas  writes:
>> > Since the "Stripping trailing CRs from patch" message is totally
>> > harmless, I'm not sure why you should need to devote any effort to
>> > avoiding it.  Anyone who gets it should just ignore it.
>
> I know that and totally agree to Robert but still I wonder why
> (and am annoyed by) I sometimes receive such complain or even an
> accusation that I sent an out-of-the-convention patch and I was
> afraid that it is not actually common.

I've seen that before as well.

I have also noticed people complaining about patches that apply "with
offsets", which also seems like needless nitpicking.  If the offsets
are large and the patch has been sitting around for a long time,
there's a small chance it could be applying to the wrong place, but
that is extremely rare.  Most patches have small offsets, just a few
lines, and there is no problem.  Complaining about the offsets, on the
other hand, is unhelpful: it not only forces the patch author to
update the patch for no good reason, but it clutters the mailing list
with useless traffic that everyone else has to ignore.

I think we should have a firm policy that if patch -p1 can apply your
patch, your patch is sufficiently well-formatted.  If someone wants
the result as a context diff, a unified diff, with one kind of line
endings vs. another, or whatever, they can apply the patch locally and
use whatever tools they like to get a diff in the format they prefer.

When posting large patch stacks, 'git format-patch' is nice because it
lets you give a sequence number and a commit message to each patch in
a sensible way.  I recommend it, but I don't think we should insist on
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-16 Thread Andres Freund
On 2018-01-16 16:12:11 +0900, Michael Paquier wrote:
> On Fri, Feb 03, 2017 at 12:26:50AM +, Noah Misch wrote:
> > On Wed, Feb 01, 2017 at 02:39:25PM +0200, Heikki Linnakangas wrote:
> >> @@ -73,11 +73,19 @@ pg_atomic_compare_exchange_u32_impl(volatile 
> >> pg_atomic_uint32 *ptr,
> >>  static inline uint32
> >>  pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
> >>  {
> >> +  uint32  ret;
> >> +
> >>/*
> >> -   * __fetch_and_add() emits a leading "sync" and trailing "isync", 
> >> thereby
> >> -   * providing sequential consistency.  This is undocumented.
> >> +   * Use __sync() before and __isync() after, like in compare-exchange
> >> +   * above.
> >> */
> >> -  return __fetch_and_add((volatile int *)>value, add_);
> >> +  __sync();
> >> +
> >> +  ret = __fetch_and_add((volatile int *)>value, add_);
> >> +
> >> +  __isync();
> >> +
> >> +  return ret;
> >>  }
> > 
> > Since this emits double syncs with older xlc, I recommend instead replacing
> > the whole thing with inline asm.  As I opined in the last message of the
> > thread you linked above, the intrinsics provide little value as abstractions
> > if one checks the generated code to deduce how to use them.  Now that the
> > generated code is xlc-version-dependent, the port is better off with
> > compiler-independent asm like we have for ppc in s_lock.h.
> 
> Could it be cleaner to just use __xlc_ver__ to avoid double syncs on
> past versions? I think that it would make the code more understandable
> than just listing directly the instructions.

Given the quality of the intrinsics on AIX, see past commits and the
comment in the code quoted above, I think we're much better of doing
this via inline asm.

Greetings,

Andres Freund



RE:[HACKERS] Deadlock in XLogInsert at AIX

2018-01-16 Thread REIX, Tony
Thanks Noah !

Hummm You have a big machine, more powerful than mine. However, it seems that 
you do not see the random failure I see.


Cordialement,

Tony Reix

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net


De : Noah Misch [n...@leadboat.com]
Envoyé : mardi 16 janvier 2018 17:19
À : REIX, Tony
Cc : Michael Paquier; Heikki Linnakangas; Konstantin Knizhnik; PostgreSQL 
Hackers; Bernd Helmle; OLIVA, PASCAL; EMPEREUR-MOT, SYLVIE
Objet : Re: [HACKERS] Deadlock in XLogInsert at AIX

On Tue, Jan 16, 2018 at 01:50:29PM +, REIX, Tony wrote:
> And, on BuildFarm, I do not see any details about the logical/physical 
> configuration of the AIX VMs, like hornet.
> Being able to run real concurrent parallel stress programs, thus required 
> multi-physical-CPU VM, would help.

It has 48 virtual CPUs.  Here's the prtconf output:

System Model: IBM,8231-E2C
Machine Serial Number: 104C0CT
Processor Type: PowerPC_POWER7
Processor Implementation Mode: POWER 7
Processor Version: PV_7_Compat
Number Of Processors: 12
Processor Clock Speed: 3720 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: 1 10-4C0CT
Memory Size: 127488 MB
Good Memory Size: 127488 MB
Platform Firmware level: AL740_100
Firmware Version: IBM,AL740_100
Console Login: enable
Auto Restart: true
Full Core: false
NX Crypto Acceleration: Not Capable

Network Information
Host Name: power-aix
IP Address: 140.211.15.154
Sub Netmask: 255.255.255.0
Gateway: 140.211.15.1
Name Server: 140.211.166.130
Domain Name: osuosl.org

Paging Space Information
Total Paging Space: 12288MB
Percent Used: 1%

Volume Groups Information
==
Active VGs
==
homevg:
PV_NAME   PV STATE  TOTAL PPs   FREE PPsFREE DISTRIBUTION
hdisk1active558 183 00..00..00..71..112
hdisk2active558 0   00..00..00..00..00
hdisk3active558 0   00..00..00..00..00
hdisk4active558 0   00..00..00..00..00
==

rootvg:
PV_NAME   PV STATE  TOTAL PPs   FREE PPsFREE DISTRIBUTION
hdisk0active558 510 
111..93..83..111..112
hdisk5active558 514 
111..97..83..111..112
==

INSTALLED RESOURCE LIST

The following resources are installed on the machine.
+/- = Added or deleted from Resource List.
*   = Diagnostic support not available.

  Model Architecture: chrp
  Model Implementation: Multiple Processor, PCI bus

+ sys0 System Object
+ sysplanar0   System Planar
* vio0 Virtual I/O Bus
* vsa2 U78AB.001.WZSHZY0-P1-T2 LPAR Virtual Serial Adapter
* vty2 U78AB.001.WZSHZY0-P1-T2-L0  Asynchronous Terminal
* vsa1 U78AB.001.WZSHZY0-P1-T1 LPAR Virtual Serial Adapter
* vty1 U78AB.001.WZSHZY0-P1-T1-L0  Asynchronous Terminal
* vsa0 U8231.E2C.104C0CT-V1-C0 LPAR Virtual Serial Adapter
* vty0 U8231.E2C.104C0CT-V1-C0-L0  Asynchronous Terminal
* pci8 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci7 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci6 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci5 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci4 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci10U78AB.001.WZSHZY0-P1-C2 PCI Bus
+ cor0 U78AB.001.WZSHZY0-P1-C2-T1  GXT145 Graphics Adapter
* pci3 U78AB.001.WZSHZY0-P1PCI Express Bus
+ ent0 U78AB.001.WZSHZY0-P1-C7-T1  4-Port Gigabit Ethernet 
PCI-Express Adapter (e414571614102004)
+ ent1 U78AB.001.WZSHZY0-P1-C7-T2  4-Port Gigabit Ethernet 
PCI-Express Adapter (e414571614102004)
+ ent2 U78AB.001.WZSHZY0-P1-C7-T3  4-Port Gigabit Ethernet 
PCI-Express Adapter (e414571614102004)
+ ent3 U78AB.001.WZSHZY0-P1-C7-T4  4-Port Gigabit Ethernet 
PCI-Express Adapter (e414571614102004)
* pci2 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci1 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci9 U78AB.001.WZSHZY0-P1PCI Bus
+ usbhc0   U78AB.001.WZSHZY0-P1USB Host Controller (33103500)
+ usbhc1   U78AB.001.WZSHZY0-P1USB Host Controller (33103500)
+ usbhc2   U78AB.001.WZSHZY0-P1USB Enhanced Host 

Re: New gist vacuum.

2018-01-16 Thread Andrey Borodin
Hi!
> 15 янв. 2018 г., в 23:42, Alexander Korotkov  
> написал(а):
> 
> I'm very glad this patch isn't forgotten.  I've assigned to review this patch.
Cool, thanks!
> My first note on this patchset is following.  These patches touches sensitive 
> aspects or GiST and are related to complex concurrency issues.
Indeed! That is why I've divided patches: first one implements very simple scan 
algorithm with concurrency and recovery, second replaces simple algorithm with 
two more tricky algorithms.
> Thus, I'm sure both of them deserve high-level description in 
> src/backend/access/gist/README.  Given this description, it would be much 
> easier to review the patches.
Yes, that's definitely true. Please find README patch attached.


Best regards, Andrey Borodin.


0003-Update-README-with-info-on-new-GiST-VACUUM.patch
Description: Binary data



Re: PATCH: Exclude unlogged tables from base backups

2018-01-16 Thread Adam Brightwell
All,

I have reviewed and tested these patches.

The patches applied cleanly in order against master at (90947674fc).

I ran the provided regression tests and a 'check-world'.  All tests succeeded.

Marking ready for committer.

-Adam



Re: [HACKERS] Transaction control in procedures

2018-01-16 Thread Peter Eisentraut
On 1/15/18 12:57, Andrew Dunstan wrote:
> This confused me slightly:
> 
> +    Transactions cannot be ended inside loops through query results
> or inside
> +    blocks with exception handlers.
> 
> I suggest: "A transaction cannot be ended inside a loop over query
> results, nor inside a block with exception handlers."

fixed

> The patch has bitrotted slightly in src/backend/commands/portalcmds.c

merged

> The plperl expected file needs updating. Also, why does spi_commit() in
> a loop result in an error message but not spi_rollback()?

This is all changed now after the patch for portal pinning in PL/Perl
and PL/Python has been committed.  The attached patch behaves better.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 84f35266cc63de1d433d8a5e3549a0e28cb4c1fc Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 16 Jan 2018 10:12:44 -0500
Subject: [PATCH v7] Transaction control in PL procedures

In each of the supplied procedural languages (PL/pgSQL, PL/Perl,
PL/Python, PL/Tcl), add language-specific commit and rollback
functions/commands to control transactions in procedures in that
language.  Add similar underlying functions to SPI.  Some additional
cleanup so that transaction commit or abort doesn't blow away data
structures still used by the procedure call.  Add execution context
tracking to CALL and DO statements so that transaction control commands
can only be issued in top-level procedure and block calls, not function
calls or other procedure or block calls.

- SPI

Add a new function SPI_connect_ext() that is like SPI_connect() but
allows passing option flags.  The only option flag right now is
SPI_OPT_NONATOMIC.  A nonatomic SPI connection can execute transaction
control commands, otherwise it's not allowed.  This is meant to be
passed down from CALL and DO statements which themselves know in which
context they are called.  A nonatomic SPI connection uses different
memory management.  A normal SPI connection allocates its memory in
TopTransactionContext.  For nonatomic connections we use PortalContext
instead.  As the comment in SPI_connect_ext() (previously SPI_connect())
indicates, one could potentially use PortalContext in all cases, but it
seems safest to leave the existing uses alone, because this stuff is
complicated enough already.

SPI also gets new functions SPI_start_transaction(), SPI_commit(), and
SPI_rollback(), which can be used by PLs to implement their transaction
control logic.

- portalmem.c

Some adjustments were made in the code that cleans up portals at
transaction abort.  The portal code could already handle a command
*committing* a transaction and continuing (e.g., VACUUM), but it was not
quite prepared for a command *aborting* a transaction and continuing.

In AtAbort_Portals(), remove the code that marks an active portal as
failed.  As the comment there already predicted, this doesn't work if
the running command wants to keep running after transaction abort.  And
it's actually not necessary, because pquery.c is careful to run all
portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if
there is an exception.  So the code in AtAbort_Portals() is never used
anyway.

In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not
to clean up active portals too much.  This mirrors similar code in
PreCommit_Portals().

- PL/Perl

Gets new functions spi_commit() and spi_rollback()

- PL/pgSQL

Gets new commands COMMIT and ROLLBACK.

Update the PL/SQL porting example in the documentation to reflect that
transactions are now possible in procedures.

- PL/Python

Gets new functions plpy.commit and plpy.rollback.

- PL/Tcl

Gets new commands commit and rollback.
---
 doc/src/sgml/plperl.sgml   |  49 +
 doc/src/sgml/plpgsql.sgml  |  91 
 doc/src/sgml/plpython.sgml |  39 
 doc/src/sgml/pltcl.sgml|  41 
 doc/src/sgml/ref/call.sgml |   7 +
 doc/src/sgml/ref/create_procedure.sgml |   7 +
 doc/src/sgml/ref/do.sgml   |   7 +
 doc/src/sgml/spi.sgml  | 177 +++
 src/backend/commands/functioncmds.c|  45 +++-
 src/backend/executor/spi.c |  99 -
 src/backend/tcop/utility.c |   6 +-
 src/backend/utils/mmgr/portalmem.c |  49 +++--
 src/include/commands/defrem.h  |   4 +-
 src/include/executor/spi.h |   7 +
 src/include/executor/spi_priv.h|   4 +
 src/include/nodes/nodes.h  |   3 +-
 src/include/nodes/parsenodes.h |   7 +
 src/include/utils/portal.h |   1 +
 src/pl/plperl/GNUmakefile  |   2 

[HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-16 Thread Yoshimi Ichiyanagi
Hi.

These patches enable to use Persistent Memory Development Kit(PMDK)[1]
for reading/writing WAL logs on persistent memory(PMEM).
PMEM is next generation storage and it has a number of nice features:
fast, byte-addressable and non-volatile.

Using pgbench which is a PostgreSQL general benchmark, the postgres server
to which the patches is applied is about 5% faster than original server.
And using my insert benchmark, it is up to 90% faster than original one.
I will describe these details later.


This e-mail describes the following:
A) About PMDK
B) About the patches
C) The way of running benchmarks using the patches, and the results


A) About PMDK
PMDK provides the functions to allow an application to directly access
PMEM without going through the kernel as a memory for the purpose of
high-speed access to PMEM by the application.
The following APIs are available through PMDK.
A-1. APIs to open a file on PMEM, to create a file on PMEM,
 and to map a file on PMEM to virtual addresses
A-2. APIs to read/write data from and to a file on PMEM


A-1. APIs to open a file on PMEM, to create a file on PMEM,
 and to map a file on PMEM to virtual addresses

PMDK provides these APIs using DAX filesystem(DAX FS)[2] feature. 

DAX FS is a PMEM-aware file system which allows direct access
to PMEM without using the kernel page caches. A file in DAX FS can
be mapped to memory using standard calls like mmap() on Linux. 
Furthermore by mapping the file on PMEM to virtual addresses(and
after any initial minor page faults that may be required to create
the mappings in the MMU), the applications can access PMEM
using CPU load/store instructions instead of read/write system calls.


A-2. APIs to read/write data from and to a file on PMEM

PMDK provides the APIs like memcpy() to copy data to PMEM
using single instruction, multiple data(SIMD) instructions[3] and
NT store instructions[4]. These instructions improve the performance
to copy data to PMEM. As a result, using these APIs is faster than
using read/write system calls.


[1] http://pmem.io/pmdk/
[2] 
https://www.usenix.org/system/files/login/articles/login_summer17_07_rudoff.pdf
[3] SIMD: SIMD is the instruction operates on all loaded data in a single
operation. If the SIMD system loads eight data into registers at once,
the store operation to PMEM will happen to all eight values
at the same time.
[4] NT store instructions: NT store instructions bypass the CPU cache,
so using these instructions does not require a flush.


B) About the patches
Changes by the patches:
0001-Add-configure-option-for-PMDK.patch:
- Added "--with-libpmem" configure option to execute I/O with PMDK library

0002-Read-write-WAL-files-using-PMDK.patch:
- Added PMDK implementation for WAL I/O operations
- Added "pmem-drain" to the wal_sync_method parameter list
  to write logs synchronously on PMEM

0003-Walreceiver-WAL-IO-using-PMDK.patch:
- Added PMDK implementation for Walreceiver of secondary server processes



C) The way of running benchmarks using the patches, and the results
It's the following:

Experimental setup
Server: HP ProLiant DL360 Gen9
CPU:Xeon E5-2667 v4 (3.20GHz); 2 processors(without HT)
DRAM:   DDR4-2400; 32 GiB/processor
(8GiB/socket x 4 sockets/processor) x 2 processors
NVDIMM: DDR4-2133; 32 GiB/processor
(8GiB/socket x 4 sockets/processor) x 2 processors
HDD:Seagate Constellation2 2.5inch SATA 3.0. 6Gb/s 1TB 7200rpm x 1
OS: Ubuntu 16.04, linux-4.12
DAX FS: ext4
NVML:   master@Aug 30, 2017
PostgreSQL: master
Note: I bound the postgres processes to one NUMA node, 
  and the benchmarks to other NUMA node.


C-1. Configuring PMEM for using as a block device
# ndctl list
# ndctl create-namespace -f -e namespace0.0 --mode=memory -M dev

C-2. Creating a file system on PMEM, and mounting it with DAX
# mkfs.ext4 /dev/pmem0
# mount -t ext4 -o dax /dev/pmem0 /mnt/pmem0

C-3. Setting PMEM_IS_PMEM_FORCE to determine if the WAL files is stored
 on PMEM
Note: If this environment variable is not set, postgres processes are
  not started.
# export PMEM_IS_PMEM_FORCE=1

C-4. Installing PostgreSQL
Note: There are 3 important things in installing PostgreSQL.
a. Executing "./configure --with-libpmem" to link libpmem
b. Setting WAL directory on PMEM
c. Modifying wal_sync_method parameter in postgresql.conf from fdatasync
   to pmem_drain

# cd /path/to/[PG_source dir]
# ./configure --with-libpmem
# make && make install
# initdb /path/to/PG_DATA -X /mnt/pmem0/path/to/[PG_WAL dir]
# cat /path/to/PG_DATA/postgresql.conf | sed -e s/#wal_sync_method\ =\ 
fsync/wal_sync_method\ =\ pmem_drain/ > /path/to/PG_DATA/postgresql.conf.
tmp
# mv /path/to/PG_DATA/postgresql.conf.tmp /path/to/PG_DATA/postgresql.conf
# pg_ctl start -D /path/to/PG_DATA
# created [DB_NAME]

C-5. Running the 2 benchmarks(1. pgbench, 2. my insert benchmark)
C-5-1. pgbench
# numactl -N 1 pgbech -c 32 -j 8 -T 120 -M prepared [DB_NAME]

The averages of running pgbench three 

jdbc targetServerType=slave with readonly connection

2018-01-16 Thread Vladimír Houba ml .
Hello,

the slave replication server does not seem to be recognized correctly when
using read-only jdbc connection and targetServerType=preferSlave.

Sample conn str
jdbc:postgresql://master,slave/up?user=***=***=require=***=preferSlave=true=true

>From the docs I understands that since writes are not allowed in RO
transactions, the jdbc driver thinks it is connected to a slave. I think
the master/slave check should be corrected, or configurable.

Docs:
The master/slave distinction is currently done by observing if the server
allows writes

Thank you
Vladimir


Re: General purpose hashing func in pgbench

2018-01-16 Thread Ildar Musin
Hi Fabien,


13/01/2018 11:16, Fabien COELHO пишет:
>
> Hello Ildar,
>
>>> so that different instances of hash function within one script would
>>> have different seeds. Yes, that is a good idea, I can do that.
>>>
>> Added this feature in attached patch. But on a second thought this could
>> be something that user won't expect. For example, they may want to run
>> pgbench with two scripts:
>> - the first one updates row by key that is a hashed random_zipfian
>> value;
>> - the second one reads row by key generated the same way
>> (that is actually what YCSB workloads A and B do)
>>
>> It feels natural to write something like this:
>> \set rnd random_zipfian(0, 100, 0.99)
>> \set key abs(hash(:rnd)) % 1000
>> in both scripts and expect that they both would have the same
>> distribution. But they wouldn't. We could of course describe this
>> implicit behaviour in documentation, but ISTM that shared seed would be
>> more clear.
>
> I think that it depends on the use case, that both can be useful, so
> there should be a way to do either.
>
> With "always different" default seed, distinct distributions are achieved
> with:
>
>    -- DIFF distinct seeds inside and between runs
>    \set i1 abs(hash(:r1)) % 1000
>    \set j1 abs(hash(:r2)) % 1000
>
> and the same distribution can be done with an explicit seed:
>
>    -- DIFF same seed inside and between runs
>    \set i1 abs(hash(:r1), 5432) % 1000
>    \set j1 abs(hash(:r2), 5432) % 1000
>
> The drawback is that the same seed is used between runs in this case,
> which is not desirable. This could be circumvented by adding the
> random seed as an automatic variable and using it, eg:
>
>    -- DIFF same seed inside run, distinct between runs
>    \set i1 abs(hash(:r1), :random_seed + 5432) % 1000
>    \set j1 abs(hash(:r2), :random_seed + 2345) % 1000
>
>
> Now with a shared hash_seed the same distribution is by default:
>
>    -- SHARED same underlying hash_seed inside run, distinct between runs
>    \set i1 abs(hash(:r1)) % 1000
>    \set j1 abs(hash(:r2)) % 1000
>
> However some trick is needed now to get distinct seeds. With
>
>    -- SHARED distinct seed inside run, but same between runs
>    \set i1 abs(hash(:r1, 5432)) % 1000
>    \set j1 abs(hash(:r2, 2345)) % 1000
>
> We are back to the same issue has the previous case because then the
> distribution is the same from one run to the next, which is not
> desirable. I found this workaround trick:
>
>    -- SHARED distinct seeds inside and between runs
>    \set i1 abs(hash(:r1, hash(5432))) % 1000
>    \set j1 abs(hash(:r2, hash(2345))) % 1000
>
> Or with a new :hash_seed or :random_seed automatic variable, we could
> also have:
>
>    -- SHARED distinct seeds inside and between runs
>    \set i1 abs(hash(:r1, :hash_seed + 5432)) % 1000
>    \set j1 abs(hash(:r2, :hash_seed + 2345)) % 1000
>
> It provides controllable distinct seeds between runs but equal one
> between if desired, by reusing the same value to be hashed as a seed.
>
> I also agree with your argument that the user may reasonably expect
> that hash(5432) == hash(5432) inside and between scripts, at least on
> the same run, so would be surprised that it is not the case.
>
> So I've changed my mind, I'm sorry for making you going back and forth
> on the subject. I'm now okay with one shared 64 bit hash seed, with a
> clear documentation about the fact, and an outline of the trick to
> achieve distinct distributions inside a run if desired and why it
> would be desirable to avoid correlations. Also, I think that providing
> the seed as automatic variable (:hash_seed or :hseed or whatever)
> would make some sense as well. Maybe this could be used as a way to
> fix the seed explicitely, eg:
>
>    pgbench -D hash_seed=1234 ...
>
> Would use this value instead of the random generated one. Also, with
> that the default inserted second argument could be simply
> ":hash_seed", which would simplify the executor which would not have
> to do check for an optional second argument.
>
Here is a new version of patch. I've splitted it into two parts. The
first one is almost the same as v4 from [1] with some refactoring. The
second part introduces random_seed variable as you proposed. I didn't do
the executor simplification thing yet because I'm a little concerned
about inventive users, who may want to change random_seed variable in
runtime (which is possible since pgbench doesn't have read only
variables aka constants AFAIK).

[1]
https://www.postgresql.org/message-id/43a8fbbb-32fa-6478-30a9-f64041adf019%40postgrespro.ru

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..c575f19 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1246,6 +1246,27 @@ pgbench  options 
 d
5
   
   
+   hash(a [, 
seed ] )
+   integer
+   alias for hash_murmur2()
+   hash(10, 5432)
+ 

Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-16 Thread Andrew Dunstan


On 01/16/2018 08:50 AM, REIX, Tony wrote:
> Hi Michael
>
> You said:
>
>> Setting up a buildfarm member with the combination of compiler and
>> environment where you are seeing the failures would be the best answer
>> in my opinion:
>> https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto
>>
>> This does not require special knowledge of PostgreSQL internals, and the
>> in-core testing framework has improved the last couple of years to allow
>> for more advanced tests. I do use it as well for some tests on my own
>> modules (company stuff). The buildfarm code has also followed the pace,
>> which really helps a lot, thanks to Andrew Dunstan.
>>
>> Developers and committers are more pro-active if they can see automated
>> tests failing in the central community place. And buildfarm animals
>> usually don't stay red for more than a couple of days.
> Hu I quickly read this HowTo and I did not find any explanation about the 
> "protocole"
> used for exchanging data between my VM and the PostgreSQL BuildFarm.
> My machine is behind firewalls and have restricted access to the outside.
> Either I'll see when that does not work... or I can get some information 
> about which port
> (or anything else) I have to ask to be opened, if needed.
> Anyway, I'll read it in depth now and I'll try to implement it.
>
>
>


Communication is only done via outbound port 443 (https). There are no
passwords required and no inbound connections, ever. Uploads are signed
using a shared secret. Communication can can be via a proxy. If you need
the client to use a proxy with git that's a bit more complex, but possible.

Ping me if you need help setting this up.

cheers

andrew

-- 

Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




RE:[HACKERS] Deadlock in XLogInsert at AIX

2018-01-16 Thread REIX, Tony
Hi Michael

You said:

> Setting up a buildfarm member with the combination of compiler and
> environment where you are seeing the failures would be the best answer
> in my opinion:
> https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto
> 
> This does not require special knowledge of PostgreSQL internals, and the
> in-core testing framework has improved the last couple of years to allow
> for more advanced tests. I do use it as well for some tests on my own
> modules (company stuff). The buildfarm code has also followed the pace,
> which really helps a lot, thanks to Andrew Dunstan.
> 
> Developers and committers are more pro-active if they can see automated
> tests failing in the central community place. And buildfarm animals
> usually don't stay red for more than a couple of days.

Hu I quickly read this HowTo and I did not find any explanation about the 
"protocole"
used for exchanging data between my VM and the PostgreSQL BuildFarm.
My machine is behind firewalls and have restricted access to the outside.
Either I'll see when that does not work... or I can get some information about 
which port
(or anything else) I have to ask to be opened, if needed.
Anyway, I'll read it in depth now and I'll try to implement it.


About the random error we see, I guess that I may see it, though PostgreSQL 
BuildFarm AIX VMs do not see it,
because I'm using a not-too-small VM, using variable Physical Processing units 
(CPUs) since my VM is uncapped
(I may use all Physical CPU if available): up to 4 physical processors and up 
to 8 virtual processors.
And, on BuildFarm, I do not see any details about the logical/physical 
configuration of the AIX VMs, like hornet.
Being able to run real concurrent parallel stress programs, thus required 
multi-physical-CPU VM, would help.

Regards,

Tony



Re: ALTER TABLE ADD COLUMN fast default

2018-01-16 Thread Andrew Dunstan


On 01/16/2018 12:13 AM, David Rowley wrote:
> On 2 January 2018 at 05:01, Andrew Dunstan
>  wrote:
>> New version of the patch that fills in the remaining piece in
>> equalTupleDescs.
> This no longer applies to current master. Can send an updated patch?
>


Yeah, got caught by the bki/pg_attribute changes on Friday. here's an
updated version. Thanks for looking.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3f02202..d5dc14a 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1150,6 +1150,18 @@
  
 
  
+  atthasmissing
+  bool
+  
+  
+This column has a value which is used where the column is entirely
+missing from the row, as happens when a column is added after the
+row is created. The actual value used is stored in the
+attmissingval  column.
+  
+ 
+
+ 
   attidentity
   char
   
@@ -1229,6 +1241,18 @@
   
  
 
+ 
+  attmissingval
+  bytea
+  
+  
+This column has a binary representation of the value used when the column
+is entirely missing from the row, as happens when the column is added after
+the row is created. The value is only used when
+atthasmissing is true.
+  
+ 
+
 

   
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 7bcf242..780bead 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1115,26 +1115,28 @@ ALTER TABLE [ IF EXISTS ] name

 

-When a column is added with ADD COLUMN, all existing
-rows in the table are initialized with the column's default value
-(NULL if no DEFAULT clause is specified).
-If there is no DEFAULT clause, this is merely a metadata
-change and does not require any immediate update of the table's data;
-the added NULL values are supplied on readout, instead.
+ When a column is added with ADD COLUMN and a
+ non-volatile DEFAULT is specified, the default
+ is evaluated at the time of the statement and the result stored in the
+ table's metadata. That value will be used for the column for
+ all existing rows. If no DEFAULT is specified
+ NULL is used. In neither case is a rewrite of the table required.

 

-Adding a column with a DEFAULT clause or changing the type of
-an existing column will require the entire table and its indexes to be
-rewritten.  As an exception when changing the type of an existing column,
-if the USING clause does not change the column
-contents and the old type is either binary coercible to the new type or
-an unconstrained domain over the new type, a table rewrite is not needed;
-but any indexes on the affected columns must still be rebuilt.  Adding or
-removing a system oid column also requires rewriting the entire
-table.  Table and/or index rebuilds may take a significant amount of time
-for a large table; and will temporarily require as much as double the disk
-space.
+ Adding a column with a volatile DEFAULT or
+ changing the type of
+ an existing column will require the entire table and its indexes to be
+ rewritten.  As an exception when changing the type of an existing column,
+ if the USING clause does not change the column
+ contents and the old type is either binary coercible to the new type or
+ an unconstrained domain over the new type, a table rewrite is not needed;
+ but any indexes on the affected columns must still be rebuilt.  Adding or
+ removing a system oid column also requires rewriting
+ the entire table.
+ Table and/or index rebuilds may take a significant amount of time
+ for a large table; and will temporarily require as much as double the disk
+ space.

 

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 0a13251..5ea3f1c 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -76,6 +76,116 @@
  * 
  */
 
+/*
+ * Return the missing value of an attribute, or NULL if it
+ * there is no missing value.
+ */
+static Datum
+getmissingattr(TupleDesc tupleDesc,
+			   int attnum, bool *isnull)
+{
+	int			missingnum;
+	Form_pg_attribute att;
+	AttrMissing *attrmiss;
+
+	Assert(attnum <= tupleDesc->natts);
+	Assert(attnum > 0);
+
+	att = TupleDescAttr(tupleDesc, attnum - 1);
+
+	if (att->atthasmissing)
+	{
+		Assert(tupleDesc->constr);
+		Assert(tupleDesc->constr->num_missing > 0);
+
+		attrmiss = tupleDesc->constr->missing;
+
+		/*
+		 * Look through the tupledesc's attribute missing values
+		 * for the one that corresponds to this attribute.

Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-16 Thread Michael Paquier
On Tue, Jan 16, 2018 at 08:25:51AM +, REIX, Tony wrote:
> My team and my company (ATOS/Bull) are involved in improving the
> quality of PostgreSQL on AIX.

Cool to hear that!

> We have AIX 6.1, 7.1, and 7.2 Power8 systems, with several
> logical/physical processors. And I plan to have a more powerful (more
> processors) machine for running PostgreSQL stress tests. 
> A DB-expert colleague has started to write some new not-too-complex
> stress tests that we'd like to submit to PostgreSQL project later.
> For now, using latest versions of XLC 12 (12.1.0.19) and 13 (13.1.3.4
> with a patch), we have only (on AIX 6.1 and 7.2) one remaining random
> failure (dealing with src/bin/pgbench/t/001_pgbench.pl test), for
> PostgreSQL 9.6.6 and 10.1 . And, on AIX 7.1, we have one more
> remaining failure that may be due to some other dependent
> software. Investigating. 
> XLC 13.1.3.4 shows an issue with -O2 and I have a work-around that
> fixes it in ./src/backend/parser/gram.c . We have opened a PMR
> (defect) against XLC. Note that our tests are now executed without the 
> PG_FORCE_DISABLE_INLINE "inline" trick in src/include/port/aix.h that
> suppresses the inlining of routines on AIX. I think that older
> versions of XLC have shown issues that have now disappeared (or, at
> least, many of them). 
> I've been able to compare PostgreSQL compiled with XLC vs GCC 7.1 and,
> using times outputs provided by PostgreSQL tests, XLC seems to provide
> at least 8% more speed. We also plan to run professional performance
> tests in order to compare PostgreSQL 10.1 on AIX vs Linux/Power. I saw
> some 2017 performance slides, made with older versions of PostgreSQL
> and XLC, that show bad PostgreSQL performance on AIX vs Linux/Power,
> and I cannot believe it. We plan to investigate this.

That's interesting investigation. The community is always interested in
such stories. You could have material for a conference talk.

> Though I have very very little skills about PostgreSQL (I'm porting
> too now GCC Go on AIX), we can help, at least by
> compiling/testing/investigating/stressing in a different AIX
> environment than the AIX ones (32/64bit, XLC/GCC) you have in your
> BuildFarm. 

Setting up a buildfarm member with the combination of compiler and
environment where you are seeing the failures would be the best answer
in my opinion:
https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto

This does not require special knowledge of PostgreSQL internals, and the
in-core testing framework has improved the last couple of years to allow
for more advanced tests. I do use it as well for some tests on my own
modules (company stuff). The buildfarm code has also followed the pace,
which really helps a lot, thanks to Andrew Dunstan.

Developers and committers are more pro-active if they can see automated
tests failing in the central community place. And buildfarm animals
usually don't stay red for more than a couple of days.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] [PATCH] Improve geometric types

2018-01-16 Thread Kyotaro HORIGUCHI
Hello,

I'm still wandering on the way and confused. Sorry for
inconsistent comments in advanceX-(

At Sun, 14 Jan 2018 13:20:57 +0100, Emre Hasegeli  wrote in 

Re: proposal: alternative psql commands quit and exit

2018-01-16 Thread Geoff Winkless
On 15 January 2018 at 17:53, Tom Lane  wrote:
> Geoff Winkless  writes:
>> At this point it depends quite how far down the rabbit-hole you want
>> to go to stop people googling "how do I exit psql", I suppose :p
>
> Well, I concur with Robert's comment upthread that we don't want to
> print any advice that's possibly wrong.  So I'd rather provide hints
> that are sure to work, rather than hints that might represent the
> way to get out with fewest keystrokes, but could go wrong in unusual
> situations.

A quick PoC.

I should say upfront that I've no axe to grind over this, if no-one
likes the idea I don't mind: I'm not sure I like it myself (it's quite
an aggressive stance to take, I think) - I just wanted to see if it
would work, and provide it as a working possibility.

Basically, I intercept every keypress before we send it to readline.
If it's 4 (Ctrl-D) we replace the buffer with \q which then quits,
even if there's something in the buffer. So simply sending "Ctrl-D to
quit" would always be true. Everything else just gets passed through
to readline directly.

Would need additional work to make the same thing work when readline
is disabled. I decided it wouldn't be worth the effort if the basic
idea of always intercepting Ctrl-D doesn't gain traction.

It's patched against 9.5.1, purely because that's the version I
already had source built for on my devbox. Again, if the idea is one
that people like I'm happy to rebase it against head.

I've also no idea whether Ctrl-D would be 4 in other locales.

I also don't know how much of the boilerplate code from the readline
docs is required, eg the sigwinch stuff.

Geoff


input.c.patch
Description: Binary data


Re: [HACKERS] UPDATE of partition key

2018-01-16 Thread Amit Khandekar
On 16 January 2018 at 09:17, David Rowley  wrote:
> On 16 January 2018 at 01:09, Robert Haas  wrote:
>> On Sun, Jan 14, 2018 at 6:57 AM, Amit Khandekar  
>> wrote:
>>> Even where partitions are present, in the usual case where there are
>>> Instead of a bool array, we can even make it a Bitmapset. But I think
>>> access would become slower as compared to array, particularly because
>>> it is going to be a heavily used function.
>>
>> It probably makes little difference -- the Bitmapset will be more
>> compact (which saves time) but involve function calls (which cost
>> time).
>
> I'm not arguing in either direction, but you'd also want to factor in
> how Bitmapsets only allocate words for the maximum stored member,
> which might mean multiple realloc() calls resulting in palloc/memcpy
> calls. The array would just be allocated in a single chunk, although
> it would be more memory and would require a memset too, however,
> that's likely much cheaper than the palloc() anyway.

Right. I agree. And also a function call for knowing whether required
or not. Overall, I think especially because the data structure will be
used heavily whenever it is set up, it's better to make it an array.
In the latest patch, I have retained it as an array


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: Setting BLCKSZ 4kB

2018-01-16 Thread Giuseppe Broccolo
Hi Sanyam,

Interesting topic!

2018-01-16 7:50 GMT+01:00 sanyam jain :

> Hi,
>
> I am trying to solve WAL flooding due to FPWs.
>
>
> What are the cons of setting BLCKSZ as 4kB?
>
>
> When saw the results published on http://blog.coelho.net/
> database/2014/08/17/postgresql-page-size-for-SSD-2.html
>
> 4kB page is giving better performance in comparison to 8kB except when
> tested with 15kB row size.
>
>
> Does turning off FPWs will be safe if BLCKSZ is set to 4kB given page size
> of file system is 4kB?
>

There is this interesting article of Tomas Vondra:

https://blog.2ndquadrant.com/on-the-impact-of-full-page-writes/

that explains some consequences turning off full_page_writes. If I
correctly understood, turning off full_page_writes with BLCKSZ set to 4kB
can reduce
significantly the amount of produced WAL, but you cannot be sure that you
are completely safe with a PostgreSQL page that can be completely contained
in a 4kB file system page, though modern ones are less vulnerable to
partial writes.

In the article, Tomas focus the attention on the fact that most of full
page writes happens right after a checkpoint: a proper tuning of checkpoint
can help
reducing the amount of writes on the storage, continuing to safely keep
full_page_writes enabled.

Giuseppe.


TOAST table created for partitioned tables

2018-01-16 Thread Amit Langote
Hi.

I used to think that $subject didn't happen, but it actually does and ends
up consuming a fixed 8192 bytes on the disk.

create table p (a int[]) partition by list (a);
CREATE TABLE

select pg_table_size('p');
 pg_table_size
---
  8192
(1 row)

select  pg_relation_size(c1.oid) as p_size,
pg_relation_size(c1.reltoastrelid) as p_toast_heap_size,
pg_relation_size(c2.oid) as p_toast_index_size
frompg_class c1, pg_class c2, pg_index i
where   c1.relname = 'p' and
c1.reltoastrelid = i.indrelid and
c2.oid = i.indexrelid;
 p_size | p_toast_heap_size | p_toast_index_size
+---+
  0 | 0 |   8192
(1 row)

I think we should prevent this, a fix for which is implemented by the
attached patch.

Thanks,
Amit
From b84ee9f5aed9519d5976dc1a6c8f501d0122a686 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 16 Jan 2018 19:08:11 +0900
Subject: [PATCH v1] Do not create TOAST table for partitioned tables

---
 src/backend/catalog/toasting.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 0b4b5631a1..5e84f28201 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -393,6 +393,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid 
toastIndexOid,
  * (1) there are any toastable attributes, and (2) the maximum length
  * of a tuple could exceed TOAST_TUPLE_THRESHOLD.  (We don't want to
  * create a toast table for something like "f1 varchar(20)".)
+ * No need to create a TOAST table for partitioned tables.
  */
 static bool
 needs_toast_table(Relation rel)
@@ -404,6 +405,9 @@ needs_toast_table(Relation rel)
int32   tuple_length;
int i;
 
+   if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+   return false;
+
tupdesc = rel->rd_att;
 
for (i = 0; i < tupdesc->natts; i++)
-- 
2.11.0



Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2018-01-16 Thread Yuto Hayamizu
On Sun, Jan 7, 2018 at 8:25 AM, Stephen Frost  wrote:
>
> Greetings,
>
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
> > On Thu, Nov 9, 2017 at 12:33 PM, Ashutosh Bapat
> >  wrote:
> > > Looking at order_qual_clauses(), we can say that a set of quals q1
> > >  qn are ordered the same irrespective of the set of clauses they
> > > are subset of. E.g. if {q1 .. qn} is subset of Q (ordered as Qo) and
> > > also Q' (ordered as Q'o) the order in which they appear in Qo and Q'o
> > > is same. So, even if different paths segregate the clauses in
> > > different set, within each set the order is same. FWIW, we can order
> > > all clauses in largest set once and use that order every time. Albeit
> > > we will have to remember the order somewhere OR make the separator
> > > routine retain the order in the larger set, which I guess is true
> > > about all separator functions.
> >
> > I am not sure what to think about this patch, so moved to next CF. The
> > patch still applies. Hayamizu-san, it would be nice as well if you
> > could review other's patches. One patch reviewed for one patch
> > submitted, with equal difficulty. You should also get a community
> > account so as it is possible to add your name as an author of this
> > patch.
>
> While this patch does still apply, it doesn't pass the 'make check'
> regression tests and there appears to be concern raised up-thread about
> the performance impact.
>
> Yuto Hayamizu, it looks like some good questions were raised and which
> need to be addressed, in addition to making sure that the patch at least
> passes 'make check', so I'm leaving this as waiting-for-author.  If you
> believe the patch is no longer viable, we can change it to 'returned
> with feedback', otherwise, an updated patch and some responses to the
> questions raised would be great as we're already a week into this
> commitfest and this thread has been dormant since near the start of the
> last CF.
>
> Thanks!
>
> Stephen


Thank you for your kind guidance, and sorry for late reply.

Attached patch fixes regression tests and now it passes the 'make check' tests.
Since this patch changes cost estimation in set_baserel_size_estimates,
picked query plans were changed for some tests, so I've updated their
expected EXPLAIN results.

I'm going to answer raised questions by replying to each message soon.

regards,

Yuto Hayamizu


Mitigate-filter-cost-overestimation-v2.patch
Description: Binary data


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-16 Thread Prabhat Sahu
Hi all,

I have been continue doing testing of parallel create index patch. So far
I haven't came across any sort of issue or regression with the patches.
Here are few performance number for the latest round of testing - which
is perform on top of 6th Jan patch submitted by Peter.

Testing is done on openstack instance with:

CUP: 8
RAM : 16GB
HD: 640 GB

postgres=# select pg_size_pretty(pg_total_relation_size
('lineitem'));
 pg_size_pretty

 93 GB
(1 row)

-- Test 1.
max_parallel_workers_maintenance = 2
max_parallel_workers = 16
max_parallel_workers_per_gather = 8
maintenance_work_mem = 1GB
max_wal_size = 4GB

-- Test 2.
max_parallel_workers_maintenance = 4
max_parallel_workers = 16
max_parallel_workers_per_gather = 8
maintenance_work_mem = 2GB
max_wal_size = 4GB

-- Test 3.
max_parallel_workers_maintenance = 8
max_parallel_workers = 16
max_parallel_workers_per_gather = 8
maintenance_work_mem = 4GB
max_wal_size = 4GB

NOTE: All the time taken entries are the median of 3 consecutive runs for
the same B-tree index creation query.

Time taken for Parallel Index createion:
Test 1 Test 2 Test 3
Simple/Composite Indexes: Without patch With patch ,
max_parallel_workers_maintenance = 2 % Change Without patch With patch,
max_parallel_workers_maintenance = 4 % Change Without patch With patch,
max_parallel_workers_maintenance = 8 % Change
Index on "bigint" column:
CREATE INDEX li_ordkey_idx1 ON lineitem(l_orderkey); 1062446.462 ms
(17:42.446) 1024972.273 ms
(17:04.972) 3.52 % 1053468.945 ms
(17:33.469) 896375.543 ms
(14:56.376) 17.75 % 1082920.703 ms
(18:02.921) 932550.058 ms
(15:32.550) 13.88 %
index on "integer" column:
CREATE INDEX li_lineno_idx2 ON lineitem(l_linenumber); 1538285.499 ms
(25:38.285) 1201008.423 ms
(20:01.008) 21.92 % 1529837.023 ms
(25:29.837) 1014188.489 ms
(16:54.188) 33.70 % 1642160.947 ms
(27:22.161) 978518.253 ms
(16:18.518) 40.41 %
index on "numeric" column:
CREATE INDEX li_qty_idx3 ON lineitem(l_quantity); 3968102.568 ms
(01:06:08.103) 2359304.405 ms
(39:19.304) 40.54 % 4129510.930 ms
(01:08:49.511) 1680201.644 ms
(28:00.202) 59.31 % 4348248.210 ms
(01:12:28.248) 1490461.879 ms
(24:50.462) 65.72 %
index on "character" column:
CREATE INDEX li_lnst_idx4 ON lineitem(l_linestatus); 1510273.931 ms
(25:10.274) 1240265.301 ms
(20:40.265) 17.87 % 1516842.985 ms
(25:16.843) 995730.092 ms
(16:35.730) 34.35 % 1580789.375 ms
(26:20.789) 984975.746 ms
(16:24.976) 37.69 %
index on "date" column:
CREATE INDEX li_shipdt_idx5 ON lineitem(l_shipdate); 1483603.274 ms
(24:43.603) 1189704.930 ms
(19:49.705) 19.80 % 1498348.925 ms
(24:58.349) 1040421.626 ms
(17:20.422) 30.56 % 1653651.499 ms
(27:33.651) 1016305.794 ms
(16:56.306) 38.54 %
index on "character varying" column:
CREATE INDEX li_comment_idx6 ON lineitem(l_comment); 6945953.838 ms
(01:55:45.954) 4329696.334 ms
(01:12:09.696) 37.66 % 6818556.437 ms
(01:53:38.556) 2834034.054 ms
(47:14.034) 58.43 % 6942285.711 ms
(01:55:42.286) 2648430.902 ms
(44:08.431) 61.85 %
composite index on "numeric", "character" columns:
CREATE INDEX li_qtylnst_idx34 ON lineitem
(l_quantity, l_linestatus); 4961563.400 ms
(01:22:41.563) 2959722.178 ms
(49:19.722) 40.34 % 5242809.501 ms
(01:27:22.810) 2077463.136 ms
(34:37.463) 60.37 % 5576765.727 ms
(01:32:56.766) 1755829.420 ms
(29:15.829) 68.51 %
composite index on "date", "character varying" columns:
CREATE INDEX li_shipdtcomment_idx56 ON lineitem
(l_shipdate, l_comment); 4693318.077 ms
(01:18:13.318) 3181494.454 ms
(53:01.494) 32.21 % 4627624.682 ms
(01:17:07.625) 2613289.211 ms
(43:33.289) 43.52 % 4719242.965 ms
(01:18:39.243) 2685516.832 ms
(44:45.517) 43.09 %


*Thanks & Regards,*

*Prabhat Kumar Sahu*
Skype ID: prabhat.sahu1984

www.enterprisedb.co m


On Tue, Jan 16, 2018 at 6:24 AM, Peter Geoghegan  wrote:

> On Fri, Jan 12, 2018 at 10:28 AM, Robert Haas 
> wrote:
> > More comments:
>
> Attached patch has all open issues worked through, including those
> that I respond to or comment on below, as well as the other feedback
> from your previous e-mails. Note also that I fixed the issue that Amit
> raised, as well as the GetOldestXmin()-argument bug that I noticed in
> passing when responding to Amit. I also worked on the attribution in
> the commit message.
>
> Before getting to my responses to your most recent round of feedback,
> I want to first talk about some refactoring that I decided to do. As
> you can see from the master branch, tuplesort_performsort() isn't
> necessarily reached for spool2, even when we start out with a spool2
> (that is, for many unique index builds, spool2 never even does a
> tuplesort_performsort()). We may instead decide to shut down spool2
> when it has no (dead) tuples. I made this work just as well for the
> parallel case in this latest revision. I had to teach tuplesort.c to
> accept an early tuplesort_end() for LEADER() -- it had to be prepared
> to release still-waiting 

Re: [HACKERS] Restricting maximum keep segments by repslots

2018-01-16 Thread Kyotaro HORIGUCHI
I'm digressing...

At Mon, 15 Jan 2018 21:45:34 -0500, Tom Lane  wrote in 
<26718.1516070...@sss.pgh.pa.us>
> Robert Haas  writes:
> > Since the "Stripping trailing CRs from patch" message is totally
> > harmless, I'm not sure why you should need to devote any effort to
> > avoiding it.  Anyone who gets it should just ignore it.

I know that and totally agree to Robert but still I wonder why
(and am annoyed by) I sometimes receive such complain or even an
accusation that I sent an out-of-the-convention patch and I was
afraid that it is not actually common.

For thie reason I roughly counted up CT/CTE's that people here is
using for patches in my mail box this time and got the following
numbers. (Counted on attachments with a name "*.patch/diff".)

Rank : Freq : CT/CTE
1:  3308: application/octet-stream:base64
2:  1642: text/x-patch;charset=us-ascii:base64
3:  1286: text/x-diff;charset=us-ascii:7bit
*   4:   997: text/x-patch;charset=us-ascii:7bit
5:   497: text/x-diff;charset=us-ascii:base64
6:   406: text/x-diff:quoted-printable
7:   403: text/plain;charset=us-ascii:7bit
8:   389: text/x-diff:base64
9:   321: application/x-gzip:base64
   10:   281: text/plain;charset=us-ascii:base64

Total: attachments=11461 / mails=158121

The most common setting is application/octet-stream:base64 but
text/x-patch;charset=us-ascii:7bit is also one of ... the majority?

I'm convinced that my original setting is not so problematic so I
reverted it.

> Not sure, but that might be another situation in which "patch"
> works and "git apply" doesn't.  (Feeling too lazy to test it...)

I was also afraid of that as I wrote upthread but it seems also a
needless fear.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Partition-wise aggregation/grouping

2018-01-16 Thread Jeevan Chalke
On Tue, Jan 16, 2018 at 3:41 AM, Robert Haas  wrote:

> On Thu, Jan 11, 2018 at 6:00 AM, Jeevan Chalke
>  wrote:
> >  Attached new set of patches adding this. Only patch 0007 (main patch)
> and
> > 0008 (testcase patch) has changed.
> >
> > Please have a look and let me know if I missed any.
>
> I spent a little time studying 0001 and 0002 today, as well as their
> relation with 0007.  I find the way that the refactoring has been done
> slightly odd.  With 0001 and 0002 applied, we end up with three
> functions for creating aggregate paths: create_partial_agg_path, which
> handles the partial-path case for both sort and hash;
> create_sort_agg_path, which handles the sort case for non-partial
> paths only; and create_hash_agg_path, which handles the hash case for
> non-partial paths only.  This leads to the following code in 0007:
>
> +   /* Full grouping paths */
> +
> +   if (try_parallel_aggregation)
> +   {
> +   Assert(extra->agg_partial_costs &&
> extra->agg_final_costs);
> +   create_partial_agg_path(root, input_rel,
> grouped_rel, target,
> +
>  partial_target, extra->agg_partial_costs,
> +
>  extra->agg_final_costs, gd, can_sort,
> +
>  can_hash, (List *) extra->havingQual);
> +   }
> +
> +   if (can_sort)
> +   create_sort_agg_path(root, input_rel,
> grouped_rel, target,
> +
> partial_target, agg_costs,
> +
> extra->agg_final_costs, gd, can_hash,
> +
> dNumGroups, (List *) extra->havingQual);
> +
> +   if (can_hash)
> +   create_hash_agg_path(root, input_rel,
> grouped_rel, target,
> +
> partial_target, agg_costs,
> +
> extra->agg_final_costs, gd, dNumGroups,
> +(List
> *) extra->havingQual);
>
> That looks strange -- you would expect to see either "sort" and "hash"
> cases here, or maybe "partial" and "non-partial", or maybe all four
> combinations, but seeing three things here looks surprising.  I think
> the solution is just to create a single function that does both the
> work of create_sort_agg_path and the work of create_hash_agg_path
> instead of having two separate functions.
>

In existing code (in create_grouping_paths()), I see following pattern:
if (try_parallel_aggregation)
if (can_sort)
if (can_hash)
if (can_sort)
if (can_hash)

And thus, I have created three functions to match with existing pattern.

I will make your suggested changes that is merge create_sort_agg_path() and
create_hash_agg_path(). Will name that function as
create_sort_and_hash_agg_paths().


>
> A related thing that is also surprising is that 0007 manages to reuse
> create_partial_agg_path for both the isPartialAgg and non-isPartialAgg
> cases -- in fact, the calls appear to be identical, and could be
> hoisted out of the "if" statement


Yes. We can do that as well and I think it is better too.
I was just trying to preserve the existing pattern. So for PWA I chose:
if partialAgg
if (try_parallel_aggregation)
if (can_sort)
if (can_hash)
if (can_sort)
if (can_hash)
else fullAgg
if (try_parallel_aggregation)
if (can_sort)
if (can_hash)
if (can_sort)
if (can_hash)

But since, if (try_parallel_aggregation) case is exactly same, I will pull
that out of if..else.



> -- but create_sort_agg_path and
> create_hash_agg_path do not get reused.  I think you should see
> whether you can define the new combo function that can be used for
> both cases.  The logic looks very similar, and I'm wondering why it
> isn't more similar than it is; for instance, create_sort_agg_path
> loops over the input rel's pathlist, but the code for
> isPartialAgg/can_sort seems to consider only the cheapest path.  If
> this is correct, it needs a comment explaining it, but I don't see why
> it should be correct.
>

Oops. My mistake. Missed. We should loop over the input rel's pathlist.

Yep. With above change, the logic is very similar except
(1) isPartialAgg/can_sort case creates the partial paths and
(2) finalization step is not needed at this stage.

I think it can be done by passing a flag to create_sort_agg_path() (or new
combo function) and making appropriate adjustments. Do you think addition of
this new flag should go in re-factoring patch or main PWA patch?
I think re-factoring patch.


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


RE:[HACKERS] Deadlock in XLogInsert at AIX

2018-01-16 Thread REIX, Tony
Hi Michael,

My team and my company (ATOS/Bull) are involved in improving the quality of 
PostgreSQL on AIX.

We have AIX 6.1, 7.1, and 7.2 Power8 systems, with several logical/physical 
processors.
And I plan to have a more powerful (more processors) machine for running 
PostgreSQL stress tests.
A DB-expert colleague has started to write some new not-too-complex stress 
tests that we'd like to submit to PostgreSQL project later.
For now, using latest versions of XLC 12 (12.1.0.19) and 13 (13.1.3.4 with a 
patch), we have only (on AIX 6.1 and 7.2) one remaining random failure (dealing 
with src/bin/pgbench/t/001_pgbench.pl test), for PostgreSQL 9.6.6 and 10.1 . 
And, on AIX 7.1, we have one more remaining failure that may be due to some 
other dependent software. Investigating.
XLC 13.1.3.4 shows an issue with -O2 and I have a work-around that fixes it in 
./src/backend/parser/gram.c . We have opened a PMR (defect) against XLC.
Note that our tests are now executed without the PG_FORCE_DISABLE_INLINE 
"inline" trick in src/include/port/aix.h that suppresses the inlining of 
routines on AIX. I think that older versions of XLC have shown issues that have 
now disappeared (or, at least, many of them).
I've been able to compare PostgreSQL compiled with XLC vs GCC 7.1 and, using 
times outputs provided by PostgreSQL tests, XLC seems to provide at least 8% 
more speed. We also plan to run professional performance tests in order to 
compare PostgreSQL 10.1 on AIX vs Linux/Power. I saw some 2017 performance 
slides, made with older versions of PostgreSQL and XLC, that show bad 
PostgreSQL performance on AIX vs Linux/Power, and I cannot believe it. We plan 
to investigate this.

Though I have very very little skills about PostgreSQL (I'm porting too now GCC 
Go on AIX), we can help, at least by compiling/testing/investigating/stressing 
in a different AIX environment than the AIX ones (32/64bit, XLC/GCC) you have 
in your BuildFarm.
Let me know how we can help.

Regards,

Cordialement,

Tony Reix

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net


De : Michael Paquier [michael.paqu...@gmail.com]
Envoyé : mardi 16 janvier 2018 08:12
À : Noah Misch
Cc : Heikki Linnakangas; Konstantin Knizhnik; PostgreSQL Hackers; Bernd Helmle
Objet : Re: [HACKERS] Deadlock in XLogInsert at AIX

On Fri, Feb 03, 2017 at 12:26:50AM +, Noah Misch wrote:
> On Wed, Feb 01, 2017 at 02:39:25PM +0200, Heikki Linnakangas wrote:
>> @@ -73,11 +73,19 @@ pg_atomic_compare_exchange_u32_impl(volatile 
>> pg_atomic_uint32 *ptr,
>>  static inline uint32
>>  pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
>>  {
>> +uint32  ret;
>> +
>>  /*
>> - * __fetch_and_add() emits a leading "sync" and trailing "isync", 
>> thereby
>> - * providing sequential consistency.  This is undocumented.
>> + * Use __sync() before and __isync() after, like in compare-exchange
>> + * above.
>>   */
>> -return __fetch_and_add((volatile int *)>value, add_);
>> +__sync();
>> +
>> +ret = __fetch_and_add((volatile int *)>value, add_);
>> +
>> +__isync();
>> +
>> +return ret;
>>  }
>
> Since this emits double syncs with older xlc, I recommend instead replacing
> the whole thing with inline asm.  As I opined in the last message of the
> thread you linked above, the intrinsics provide little value as abstractions
> if one checks the generated code to deduce how to use them.  Now that the
> generated code is xlc-version-dependent, the port is better off with
> compiler-independent asm like we have for ppc in s_lock.h.

Could it be cleaner to just use __xlc_ver__ to avoid double syncs on
past versions? I think that it would make the code more understandable
than just listing directly the instructions. As there have been other
bug reports from Tony Reix who has been working on AIX with XLC 13.1 and
that this thread got lost in the wild, I have added an entry in the next
CF:
https://commitfest.postgresql.org/17/1484/

As Heikki is not around these days, Noah, could you provide a new
version of the patch? This bug has been around for some time now, it
would be nice to move on.. I think I could have written patches myself,
but I don't have an AIX machine at hand. Of course not with XLC 13.1.
--
Michael