Re: NaNs in numeric_power (was Re: Postgres 11 release notes)

2018-05-16 Thread Dean Rasheed
On 15 May 2018 at 22:55, Tom Lane  wrote:
> David Rowley  writes:
>> On 16 May 2018 at 02:01, Tom Lane  wrote:
>>> I'm not particularly fussed about getting credit for that.  However,
>>> looking again at how that patch series turned out --- ie, that
>>> we ensured POSIX behavior for NaNs only in HEAD --- I wonder
>>> whether we shouldn't do what was mentioned in the commit log for
>>> 6bdf1303, and teach numeric_pow() about these same special cases.
>>> It seems like it would be more consistent to change both functions
>>> for v11, rather than letting that other shoe drop in some future
>>> major release.
>
>> I'm inclined to agree. It's hard to imagine these two functions
>> behaving differently in regards to NaN input is useful to anyone.
>
> Here's a proposed patch for that.
>

In the case 1 ^ NaN = 1, what should the result scale be?

For other inputs, the result scale is at least as large as the scale
of the first input, so I would suggest that the same ought to be the
case here.

Otherwise, this looks fine, and I agree that it makes thinks neater /
more consistent.

Regards,
Dean



Re: [HACKERS] Planning counters in pg_stat_statements

2018-05-16 Thread Dmitry Ivanov

I am not actively working on this now, but I'll come back to it for
PG12 if you or Lukas don't beat me to it, and I'll help/test/review if
I you do.  It seems there is plenty of demand for the feature and I'll
be very happy to see it.


Good to know, thanks!

I'd argue that it might be better to add a new argument to 
pg_plan_query()

and pg_plan_queries() and a new field to QueryDesc, i.e.:

PlannedStmt *
pg_plan_query(Query *querytree,
  int cursorOptions,
  ParamListInfo boundParams,
  double *planningTime)

List *
pg_plan_queries(List *querytrees,
int cursorOptions,
ParamListInfo boundParams,
double *planningTime) /* total time as 
in

BuildCachedPlan() */

The measured time can later be passed to QueryDesc via 
PortalDefineQuery().
Of course, this requires more changes, but the result might be worth 
it.


What do you think?


So who does the actual timing work in this model?  Does the core code
do the timing, but it'd be disabled by default because NULL is usually
passed in, and you need to register an extension that provides a place
to stick the result in order to turn on the time-measuring code?  If
you mean that it's still the extension that does the timing, it seems
strange to have struct members and parameter arguments for something
specific that the core code doesn't understand.


Right, I guess my proposal was too brief. Yes, my idea's that the core 
code

should do the timing, which might be disabled by passing NULL to those
functions. However, given that there's lots of people out there who use
pg_stat_statements on a regular basis, it might be meaningful to just
turn it on unconditionally (my assumptions might be wrong, of course).
Alternatively, we could introduce a new GUC variable to disable this 
feature.


The extensions would no longer have to do the accounting.


As a more general version of that, I wondered about having some kind
of associative data structure (hash table, assoc list, etc) that would
somehow travel everywhere with the PlannedStmt, but not inside it,
just for the use of extensions.  Then planning time could be stored in
there by a planner hook, and the fished out by any other hook that
knows the same key (not sure how you manage that key space but I'm
sure you could come up with something).  That could have uses for
other extensions too, and could also be used for the "query ID", which
is, after all, the model that the abandoned planning_time member was
following.  Just a thought.


Perhaps we could change planner() so that it would return pointer to 
some
struct holding PlannedStmt and a List of some nodes or structs for 
accounting.


--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2018-05-16 Thread Mark Rofail
I was wondering if anyone knows the proper way to write a benchmarking test
for the @>> operator. I used the below script in my previous attempt
https://gist.github.com/markrofail/174ed370a2f2ac24800fde2fc27e2d38
The script does the following steps:

1. Generate Table A with 5 rows
2. Generate Table B with n rows such as:
every row of Table B is an array of IDs referencing rows in Table A.

The tests we ran previously had Table B up to 1 million rows and the
results can be seen here :
https://www.postgresql.org/message-id/CAJvoCusMuLnYZUbwTBKt%2Bp6bB9GwiTqF95OsQFHXixJj3LkxVQ%40mail.gmail.com

How would we change this so it would be more exhaustive and accurate?

Regards,
Mark Rofail


Re: Postgres 11 release notes

2018-05-16 Thread Etsuro Fujita

(2018/05/16 4:45), Bruce Momjian wrote:

On Tue, May 15, 2018 at 02:26:33PM +0900, Etsuro Fujita wrote:

(2018/05/12 0:08), Bruce Momjian wrote:

I have committed the first draft of the Postgres 11 release notes.  I
will add more markup soon.  You can view the most current version here:

http://momjian.us/pgsql_docs/release-11.html

I expect a torrent of feedback.  ;-)

On a personal note, I want to apologize for not adequately championing
two features that I think have strategic significance but didn't make it
into Postgres 11:  parallel FDW access and improved multi-variate
statistics.  I hope to do better job during Postgres 12.


Thanks for compiling this, Bruce!

I found a copy-pasto in this:

 Below you will find a detailed account of the changes between
 PostgreSQL  10 and the previous major
 release.

I think the PG version should be 11, not 10.  Patch attached.


Ah, seems I missed that one, thanks.


Thank you.

Best regards,
Etsuro Fujita



Re: postgres_fdw: Oddity in pushing down inherited UPDATE/DELETE joins to remote servers

2018-05-16 Thread Etsuro Fujita
(2018/05/14 9:45), Amit Langote wrote:
> On 2018/05/11 21:48, Etsuro Fujita wrote:
>> (2018/05/11 16:19), Amit Langote wrote:
>>> On 2018/05/11 16:12, Amit Langote wrote:
 Just to clarify, does this problem only arise because there is a pushed
 down join involving the child?  That is, does the problem only occur as of
 the following commit:

 commit 1bc0100d270e5bcc980a0629b8726a32a497e788
 Author: Robert Haas
 Date:   Wed Feb 7 15:34:30 2018 -0500

   postgres_fdw: Push down UPDATE/DELETE joins to remote servers.

 In other words, do we need to back-patch this up to 9.5 which added
 foreign table inheritance?
>>>
>>> Oops, it should have been clear by the subject line that the problem
>>> didn't exist before that commit.  Sorry.
>>
>> No.  In theory, I think we could consider this as an older bug added in
>> 9.5, because in case of inherited UPDATE/DELETE, the PlannerInfo passed
>> to PlanForeignModify doesn't match the one the FDW saw at Path creation
>> time, as you mentioned in a previous email, while in case of
>> non-inherited UPDATE/DELETE, the PlannerInfo passed to that function
>> matches the one the FDW saw at that time.  I think that's my fault :(.
> 
> Ah, I see.  Thanks for clarifying.
> 
>> But considering there seems to be no field reports on that, I don't
>> think we need back-patching up to 9.5.
> 
> Yeah, that might be fine, although it perhaps wouldn't hurt to have the
> code match in all branches.

I don't object to back-patching.  Should I remove this from the open
items list for PG11?

Best regards,
Etsuro Fujita



Re: postgres_fdw: Oddity in pushing down inherited UPDATE/DELETE joins to remote servers

2018-05-16 Thread Amit Langote
Fujita-san,

On 2018/05/16 18:35, Etsuro Fujita wrote:
> (2018/05/14 9:45), Amit Langote wrote:
>> On 2018/05/11 21:48, Etsuro Fujita wrote:
>>> (2018/05/11 16:19), Amit Langote wrote:
 On 2018/05/11 16:12, Amit Langote wrote:
> Just to clarify, does this problem only arise because there is a pushed
> down join involving the child?  That is, does the problem only occur as of
> the following commit:
>
> commit 1bc0100d270e5bcc980a0629b8726a32a497e788
> Author: Robert Haas
> Date:   Wed Feb 7 15:34:30 2018 -0500
>
>   postgres_fdw: Push down UPDATE/DELETE joins to remote servers.
>
> In other words, do we need to back-patch this up to 9.5 which added
> foreign table inheritance?

 Oops, it should have been clear by the subject line that the problem
 didn't exist before that commit.  Sorry.
>>>
>>> No.  In theory, I think we could consider this as an older bug added in
>>> 9.5, because in case of inherited UPDATE/DELETE, the PlannerInfo passed
>>> to PlanForeignModify doesn't match the one the FDW saw at Path creation
>>> time, as you mentioned in a previous email, while in case of
>>> non-inherited UPDATE/DELETE, the PlannerInfo passed to that function
>>> matches the one the FDW saw at that time.  I think that's my fault :(.
>>
>> Ah, I see.  Thanks for clarifying.
>>
>>> But considering there seems to be no field reports on that, I don't
>>> think we need back-patching up to 9.5.
>>
>> Yeah, that might be fine, although it perhaps wouldn't hurt to have the
>> code match in all branches.
> 
> I don't object to back-patching.  Should I remove this from the open
> items list for PG11?

Perhaps, keep on the page but in Older Bugs section?

Thanks,
Amit




Re: Postgres 11 release notes

2018-05-16 Thread Heikki Linnakangas

On 16/05/18 07:22, Michael Paquier wrote:

On Mon, May 14, 2018 at 08:45:44PM -0400, Bruce Momjian wrote:

What TLS does is to mix the offered ciphers into the negotiation hash so
a man-in-the-middle can't pretend it doesn't support something.  Could
we do something like that here?


I have to admit that I don't quite follow here, the shape of channel
binding data is decided by RFC 5929, so we need to stick with it.


I have to question the value of man-in-the-middle protection that is so
easily bypassed.


Well, the backend does its job, and answers based on what the client
wants to do.  But what you are questioning here is the handling of
authentication downgrade attempts from a server by libpq, which is a
different problem, larger than just channel binding as it relates as
well to MD5/SCRAM interactions.  For example, it is perfectly possible
to implement downgrade protections for any drivers which speak the
protocol, like JDBC, even with a v11 backend.


I have to agree with Bruce, that it's pretty useless to implement 
channel binding, if there is no way to require it in libpq. IMHO that 
must be fixed.


It's true that even if libpq doesn't implement it, other drivers like 
JDBC could. Good for them, but that still sucks for libpq.


- Heikki



Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-16 Thread Etsuro Fujita

(2018/05/14 15:34), Ashutosh Bapat wrote:

On Mon, May 14, 2018 at 10:21 AM, Ashutosh Bapat
  wrote:

On Fri, May 11, 2018 at 6:31 PM, Etsuro Fujita
  wrote:
Here's the query.

explain verbose select * from fprt1 t1 join fprt2 t2 on t1.a = t2.b
where t1 = row_as_is(row(t2.a, t2.b, t2.c)::ftprt2_p1)::fprt2;


Thanks!


Yet yet another case where pull_var_clause produces an error:

postgres=# create table list_pt (a int, b text) partition by list (a);
CREATE TABLE
postgres=# create table list_ptp1 partition of list_pt for values in (1);
CREATE TABLE
postgres=# alter table list_ptp1 add constraint list_ptp1_check check
(list_ptp1::list_pt::text != NULL);
ERROR:  ConvertRowtypeExpr found where not expected

The patch kept the flags passed to pull_var_clause in
src/backend/catalog/heap.c as-is, but I think we need to modify that
accordingly.  Probably, the flags passed to pull_var_clause in CreateTrigger
as well.


With all the support that we have added in partitioning for v11, I
think we need to add PVC_RECURSE_CONVERTROWTYPE at all the places,
which were left unchanged in the earlier versions of patches, which
were written when all that support wasn't in v11.


Actually, we'd get the same error without using anything in 
partitioning.  Here is an example:


postgres=# create table parent (a int, b text);
CREATE TABLE
postgres=# create table child (a int, b text);
CREATE TABLE
postgres=# alter table child inherit parent ;
ALTER TABLE
postgres=# alter table child add constraint child_check check 
(child::parent::text != NULL);

ERROR:  ConvertRowtypeExpr found where not expected


Having said that, I started to think this approach would make code much
complicated.  Another idea I came up with to avoid that would be to use
pull_var_clause as-is and then rewrite wholerows of partitions back to
ConvertRowtypeExprs translating those wholerows to wholerows of their
parents tables' rowtypes if necessary.  That would be annoying and a little
bit inefficient, but the places where we need to rewrite is limited:
prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC.


Other reason why we can't use your approach is we can not decide
whether ConvertRowtypeExpr is needed by just looking at the Var node.
You might say, that we add a ConvertRowtypeExpr if the Var::varno
references a child relation. But that's not true. For example a whole
row reference embedded in ConvertRowtypeExpr added by query
adjustments in inheritance_planner() is not a child's whole-row
reference in the adjusted query.


Sorry, I don't understand this fully.


So, it's not clear to me when we add
a ConvertRowtypeExpr and we don't.


I think it'd be OK to rewrite so at least in prepare_sort_from_pathkeys 
and build_tlist_to_deparse.



I am not sure if those are the only
two places which require a fix. Right now it looks like those are the
places which need PVC_INCLUDE_CONVERTROWTYPE, but that may not be true
in the future, esp. given the amount of work we expect to happen in
the partitioning area. When that happens, fixing all those places in
that way wouldn't be good.


I have to admit that the approach I proposed is a band-aid fix.


So we could
really minimize the code change and avoid the additional overhead caused by
the is_converted_whole_row_reference test added to pull_var_clause.  I think
the latter would be rather important, because PWJ is disabled by default.
What do you think about that approach?


Partition-wise join and partition-wise aggregation are disabled by
default temporarily. We will change it in near future once the memory
consumption issue has been tackled. The features are useful and users
would want those to be enabled by default like other query
optimizations. So, we can't take a decision based on that.


Yeah, I really agree on that point!  However, considering that 
pull_var_clause is used in many places where partitioning is *not* 
involved, I still think it's better to avoid spending extra cycles 
needed only for partitioning in that function.



Here's set of updated patches.


Thanks for updating the patch!

Here are some other minor comments on patch 
0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:


+   /* Construct the conversion map. */
+   parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0);

I think it'd be better to pass -1, not 0, as the typmod argument for 
that function because that would be consistent with other places where 
we use that function with named rowtypes (eg, ruleutils.c).


-is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno)
+is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel,
+   int *relno, int *colno)

-1 for that change; because 1) we use "var" for implying many things as 
in eg, pull_var_clause and 2) I think it'd make back-patching easy to 
keep the name as-is.


Other than that the patch set looks good to me.

Best regards,
Etsuro Fujita



Re: postgres_fdw: Oddity in pushing down inherited UPDATE/DELETE joins to remote servers

2018-05-16 Thread Etsuro Fujita
(2018/05/16 18:40), Amit Langote wrote:
> On 2018/05/16 18:35, Etsuro Fujita wrote:
>> I don't object to back-patching.  Should I remove this from the open
>> items list for PG11?
> 
> Perhaps, keep on the page but in Older Bugs section?

OK, I'll move to Older Bugs on that page and add to the next commitfest.

Best regards,
Etsuro Fujita



Re: [BUGFIX] amcanbackward is not checked before building backward index paths

2018-05-16 Thread Alexander Korotkov
On Wed, May 16, 2018 at 1:41 AM, Tom Lane  wrote:

> Nikita Glukhov  writes:
> > Experimenting with the new pluggable storage API, I found that
> amcanbackward
> > flag is not checked in build_index_paths() before
> > build_index_pathkeys(... BackwardScanDirection) call when we are building
> > paths for ORDER BY.  And this flag is even not copied into IndexOptInfo
> struct.
> > Obviously, this can lead to misuse of Backward Index [Only] Scan plans.
>
> > Attached patch with the corresponding fix.
>
> I think this patch is based on a misunderstanding of what amcanbackward
> means.  We *require* indexes that claim to support amcanorder to support
> scanning in either direction.  What amcanbackward is about is whether
> the index can support reversing direction mid-scan, as would be required
> to support FETCH FORWARD followed by FETCH BACKWARD in a cursor.  That's
> actually independent of whether the index can implement a defined
> ordering; see for example the hash AM, which sets amcanbackward but not
> amcanorder.
>
> This is documented, though perhaps not sufficiently clearly, in
> indexam.sgml:
>
> The amgettuple function has a direction argument, which can be either
> ForwardScanDirection (the normal case) or BackwardScanDirection. If
> the first call after amrescan specifies BackwardScanDirection, then
> the set of matching index entries is to be scanned back-to-front
> rather than in the normal front-to-back direction, so amgettuple must
> return the last matching tuple in the index, rather than the first one
> as it normally would. (This will only occur for access methods that
> set amcanorder to true.) After the first call, amgettuple must be
> prepared to advance the scan in either direction from the most
> recently returned entry. (But if amcanbackward is false, all
> subsequent calls will have the same direction as the first one.)
>

Thank you for clarifying this point.  We've missed that.

Perhaps there is a case for adding an additional flag to allow specifying
> "I can't support ORDER BY DESC", but I'm not in a big hurry to do so.
> I think there would be more changes than this needed to handle such a
> restriction, anyway.
>

OK, got it.  We'll probably propose a patch implementing that to the
next commitfest.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Optimze usage of immutable functions as relation

2018-05-16 Thread Aleksandr Parfenov
Hello,

I reworked a patch to make more stable in different cases. I decided to
use simplify_function instead of eval_const_expression to prevent
inlining of the function. The only possible outputs of the
simplify_function are Const node and NULL.

Also, I block pre-evaluation of functions with types other than
TYPTYPE_BASE, cause there is no special logic for compound (and others)
values yet.

There is still a problem with memory leak in case of simplified
arguments. The only way I see is a creation of temporary memory context,
but it cost some performance. Maybe we can store simplified arguments
in the pointed function itself for later use. But eval_const_expression
and friends doesn't change the content of the nodes inside the tree, it
generates new nodes and returns it as a result.

The last point to mention is a fixed plan for the query in the initial
letter of the thread. As I mentioned before, new versions of the patch
replace var not with a function call, but with a function execution
result. After the patch, the following plan is used instead of Nested
Loop with Sequence Scan:

explain select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from messages, 
to_tsquery('english', 'tuple&header&overhead') q where body_tsvector @@ q limit 
10;
 QUERY PLAN

 Limit  (cost=224.16..266.11 rows=3 width=36)
   ->  Nested Loop  (cost=224.16..266.11 rows=3 width=36)
 ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=0)
 ->  Bitmap Heap Scan on messages  (cost=224.16..266.04 rows=3 
width=275)
   Recheck Cond: (body_tsvector @@ '''tupl'' & ''header'' & 
''overhead'''::tsquery)
   ->  Bitmap Index Scan on message_body_idx  (cost=0.00..224.16 
rows=3 width=0)
 Index Cond: (body_tsvector @@ '''tupl'' & ''header'' & 
''overhead'''::tsquery)

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 505ae0af85..2c9983004a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -3655,6 +3655,59 @@ eval_const_expressions_mutator(Node *node,
 	  context);
 			}
 			break;
+		case T_Var:
+			if (context->root && context->root->parse->rtable)
+			{
+Var		   *var;
+Query	   *query;
+RangeTblEntry *pointedNode;
+
+var = (Var *)node;
+query = context->root->parse;
+
+if (var->varlevelsup != 0 || var->varattno != 1)
+	break;
+
+pointedNode = list_nth(query->rtable, var->varno - 1);
+Assert(IsA(pointedNode, RangeTblEntry));
+
+if (pointedNode->rtekind == RTE_FUNCTION && list_length(pointedNode->functions) == 1)
+{
+	Form_pg_type type_form;
+	Node	   *result;
+	RangeTblFunction *tblFunction = linitial_node(RangeTblFunction, pointedNode->functions);
+	FuncExpr   *expr = (FuncExpr *) tblFunction->funcexpr;
+	List	   *args = expr->args;
+	HeapTuple type_tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(expr->funcresulttype));
+
+	if (!HeapTupleIsValid(type_tuple))
+		elog(ERROR, "cache lookup failed for type %u", expr->funcresulttype);
+
+	type_form = (Form_pg_type) GETSTRUCT(type_tuple);
+
+	if (type_form->typtype != TYPTYPE_BASE)
+	{
+		ReleaseSysCache(type_tuple);
+		break;
+	}
+
+	result = simplify_function(expr->funcid,
+			   expr->funcresulttype,
+			   exprTypmod(expr),
+			   expr->funccollid,
+			   expr->inputcollid,
+			   &args,
+			   expr->funcvariadic,
+			   true,
+			   false,
+			   context);
+
+	ReleaseSysCache(type_tuple);
+
+	if (result) /* successfully simplified it */
+		return (Node *) result;
+			}
+			break;
 		default:
 			break;
 	}
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index cbc882d47b..eb92f556d8 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3021,23 +3021,23 @@ select * from
   int4(sin(1)) q1,
   int4(sin(0)) q2
 where q1 = thousand or q2 = thousand;
-   QUERY PLAN   
-
- Hash Join
-   Hash Cond: (tenk1.twothousand = int4_tbl.f1)
+   QUERY PLAN
+-
+ Nested Loop
->  Nested Loop
- ->  Nested Loop
-   ->  Function Scan on q1
-   ->  Function Scan on q2
- ->  Bitmap Heap Scan on tenk1
-   Recheck Cond: ((q1.q1 = thousand) OR (q2.q2 = thousand))
-   ->  BitmapOr
- ->  Bitmap Index Scan on tenk1_thou

Re: Postgres 11 release notes

2018-05-16 Thread Heikki Linnakangas

* Allow bytes to be specified for server variable sizes (Beena Emerson)

The specification is "B".


I had to dig the commit in the git history to figure out what this is. 
I'd suggest rewording this:


* Allow server options related to memory and file sizes, to be specified 
as number of bytes.


The new unit is "B". This is in addition to "kB", "MB", "GB" and "TB", 
which were accepted previously.


- Heikki



Re: [PROPOSAL] Shared Ispell dictionaries

2018-05-16 Thread Arthur Zakirov
Hello,

On Tue, May 15, 2018 at 05:02:43PM -0400, Robert Haas wrote:
> On Tue, Mar 27, 2018 at 8:19 AM, Arthur Zakirov
>  wrote:
> > Yes, there is dsm_pin_mapping() for this. But it is necessary to keep a
> > segment even if there are no attached processes. From 0003:
> >
> > +   /* Remain attached until end of postmaster */
> > +   dsm_pin_segment(seg);
> > +   /* Remain attached until end of session */
> > +   dsm_pin_mapping(seg);
> 
> I don't quite understand the problem you're trying to solve here, but:
> 
> 1. Unless dsm_pin_segment() is called, a DSM segment will
> automatically be removed when there are no remaining mappings.
> 
> 2. Unless dsm_pin_mapping() is called, a DSM segment will be unmapped
> when the currently-in-scope resource owner is cleaned up, like at the
> end of the query.  If it is called, then the mapping will stick around
> until the backend exits.

I tried to solve the case when DSM segment remains mapped even a
dictionary was dropped. It may happen in the following situation:

Backend 1:

=# select ts_lexize('english_shared', 'test');
-- The dictionary is loaded into DSM, the segment and the mapping is
pinned
...
-- Call ts_lexize() from backend 2 below
=# drop text search dictionary english_shared;
-- The segment and the mapping is unpinned, see ts_dict_shmem_release()

Backend 2:

=# select ts_lexize('english_shared', 'test');
-- The dictionary got from DSM, the mapping is pinned
...
-- The dictionary was dropped by backend 1, but the mapping still is
pinned

As you can see the DSM still is pinned by backend 2. Later I fixed it by
checking do we need to unping segments. In the current version of the
patch do_ts_dict_shmem_release() is called in
lookup_ts_dictionary_cache(). It unpins segments if text search cache
was invalidated. It unpins all segments, but I think it is ok since
text search changes should be infrequent.

> If you pin the mapping or the segment and later no longer want it
> pinned, there are dsm_unpin_mapping() and dsm_unpin_segment()
> functions available, too.  So it seems like what you might want to do
> is pin the segment when it's created, and then unpin it if it's
> stale/obsolete.  The latter won't remove it immediately, but will once
> all the mappings are gone.

Yes, dsm_unpin_mapping() and dsm_unpin_segment() will be called when the
dictionary is dropped or altered in the current version of the patch. I
descriped the approach above.


In sum, I think the problem is mostly solved. Backend 2 unpins the
segment in next ts_lexize() call. But if backend 2 doesn't call
ts_lexize() (or other TS function) anymore the segment will remain mapped.
It is the only problem I see for now.

I hope the description is clear. I attached the rebased patch. 

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index 6f5b635413..09297e384c 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -78,6 +78,8 @@
 #define tmpalloc(sz)  MemoryContextAlloc(Conf->buildCxt, (sz))
 #define tmpalloc0(sz)  MemoryContextAllocZero(Conf->buildCxt, (sz))
 
+#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str))
+
 /*
  * Prepare for constructing an ISpell dictionary.
  *
@@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char 
*flag)
Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + 
strlen(word) + 1);
strcpy(Conf->Spell[Conf->nspell]->word, word);
Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0')
-   ? cpstrdup(Conf, flag) : VoidString;
+   ? tmpstrdup(flag) : VoidString;
Conf->nspell++;
 }
 
@@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, 
CompoundAffixFlag *entry,
entry->flag.i = i;
}
else
-   entry->flag.s = cpstrdup(Conf, s);
+   entry->flag.s = tmpstrdup(s);
 
entry->flagMode = Conf->flagMode;
entry->value = val;
@@ -1541,6 +1543,9 @@ nextline:
return;
 
 isnewformat:
+   pfree(recoded);
+   pfree(pstr);
+
if (oldformat)
ereport(ERROR,
(errcode(ERRCODE_CONFIG_FILE_ERROR),
diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c
index 56ede37089..8dd4959028 100644
--- a/contrib/dict_int/dict_int.c
+++ b/contrib/dict_int/dict_int.c
@@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize);
 Datum
 dintdict_init(PG_FUNCTION_ARGS)
 {
-   List   *dictoptions = (List *) PG_GETARG_POINTER(0);
+   DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0);
DictInt*d;
ListCell   *l;
 
@@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS)
d->maxlen = 6;
d->rejectlong = false;
 
-   foreach(l, dictoptions)
+   foreach(l, init_data->dict_options)
{
DefElem*defel = (DefElem *) lfirst(l);
 
d

Re: Postgres 11 release notes

2018-05-16 Thread Michael Paquier
On Wed, May 16, 2018 at 01:09:07PM +0300, Heikki Linnakangas wrote:
> I have to agree with Bruce, that it's pretty useless to implement channel
> binding, if there is no way to require it in libpq. IMHO that must be
> fixed.

Wouldn't we want to also do something for the case where a client is
willing to use SCRAM but that the server forces back MD5?  In which
case, one possibility is a connection parameter like the following,
named say authmethod:
- An empty value is equivalent to the current behavior, and is the
default.
- 'scram' means that client is willing to use SCRAM, which would cause a
failure if server attempts to enforce MD5.
- 'scram-plus' means that client enforces SCRAM and channel binding.

Or we could just have a channel_binding_mode, which has a "require"
value like sslmode, and "prefer" mode, which is the default and the
current behavior...  Still what to do with MD5 requests in this case?
--
Michael


signature.asc
Description: PGP signature


Re: Global snapshots

2018-05-16 Thread Stas Kelvich


> On 15 May 2018, at 15:53, Robert Haas  wrote:
> 
> Actually, I think if we're going to pursue that approach, we ought to
> back off a bit from thinking about global snapshots and think about
> what kind of general mechanism we want.  For example, maybe you can
> imagine it like a message bus, where there are a bunch of named
> channels on which the server publishes messages and you can listen to
> the ones you care about.  There could, for example, be a channel that
> publishes the new system-wide globalxmin every time it changes, and
> another channel that publishes the wait graph every time the deadlock
> detector runs, and so on.  In fact, perhaps we should consider
> implementing it using the existing LISTEN/NOTIFY framework: have a
> bunch of channels that are predefined by PostgreSQL itself, and set
> things up so that the server automatically begins publishing to those
> channels as soon as anybody starts listening to them.  I have to
> imagine that if we had a good mechanism for this, we'd get all sorts
> of proposals for things to publish.  As long as they don't impose
> overhead when nobody's listening, we should be able to be fairly
> accommodating of such requests.
> 
> Or maybe that model is too limiting, either because we don't want to
> broadcast to everyone but rather send specific messages to specific
> connections, or else because we need a request-and-response mechanism
> rather than what is in some sense a one-way communication channel.
> Regardless, we should start by coming up with the right model for the
> protocol first, bearing in mind how it's going to be used and other
> things for which somebody might want to use it (deadlock detection,
> failover, leader election), and then implement whatever we need for
> global snapshots on top of it.  I don't think that writing the code
> here is going to be hugely difficult, but coming up with a good design
> is going to require some thought and discussion.

Well, it would be cool to have some general mechanism to unreliably send
messages between postgres instances. I was thinking about the same thing
mostly in context of our multimaster, where we have an arbiter bgworker
which collects 2PC responses and heartbeats from other nodes on different
TCP port. It used to have some logic inside but evolved to just sending
messages from shared memory out queue and wake backends upon message arrival.
But necessity to manage second port is painful and error-prone at least
from configuration point of view. So it would be nice to have more general
mechanism to exchange messages via postgres port. Ideally with interface
like in shm_mq: send some messages in one queue, subscribe to responses
in different. Among other thing that were mentioned (xmin, deadlock,
elections/heartbeats) I especially interested in some multiplexing for
postgres_fdw, to save on context switches of individual backends while
sending statements.

Talking about model, I think it would be cool to have some primitives like
ones provided by ZeroMQ (message push/subscribe/pop) and then implement
on top of them some more complex ones like scatter/gather.

However, that's probably topic for a very important, but different thread.
For the needs of global snapshots something less ambitious will be suitable.

> And, for that matter, I think the same thing is true for global
> snapshots.  The coding is a lot harder for that than it is for some
> new subprotocol, I'd imagine, but it's still easier than coming up
> with a good design.

Sure. This whole global snapshot thing experienced several internal redesigns,
before becoming satisfactory from our standpoint. However, nothing refraining
us from next iterations. In this regard, it is interesting to also hear comments
from Postgres-XL team -- from my experience with XL code this patches in
core can help XL to drop a lot of visibility-related ifdefs and seriously
offload GTM. But may be i'm missing something.

> I guess it seems to me that you
> have some further research to do along the lines you've described:
> 
> 1. Can we hold back xmin only when necessary and to the extent
> necessary instead of all the time?
> 2. Can we use something like an STO analog, maybe as an optional
> feature, rather than actually holding back xmin?

Yes, to both questions. I'll implement that and share results.

> And I'd add:
> 
> 3. Is there another approach altogether that doesn't rely on holding
> back xmin at all?

And for that question I believe the answer is no. If we want to keep
MVCC-like behaviour where read transactions aren't randomly aborted, we
will need to keep old versions. Disregarding whether it is local or global
transaction. And to keep old versions we need to hold xmin to defuse HOT,
microvacuum, macrovacuum, visibility maps, etc. At some point we can switch
to STO-like behaviour, but that probably should be used as protection from
unusually long transactions rather then a standard behavior.

> For example, if you constructed the happens-

Memory unit GUC range checks

2018-05-16 Thread Heikki Linnakangas

Hi,

I played around with the GUC memory units, specifically to test the new 
GUC_UNIT_BYTES flag (commit 6e7baa32):


$ postmaster -c track_activity_query_size=1024kB
FATAL:  1048576 is outside the valid range for parameter 
"track_activity_query_size" (100 .. 102400)


$ postmaster -c track_activity_query_size=1024MB
FATAL:  1073741824 is outside the valid range for parameter 
"track_activity_query_size" (100 .. 102400)


$ postmaster -c track_activity_query_size=1024GB
FATAL:  invalid value for parameter "track_activity_query_size": "1024GB"
HINT:  Value exceeds integer range.

$ postmaster -c track_activity_query_size=1024TB
FATAL:  invalid value for parameter "track_activity_query_size": "1024TB"
HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".

The first two look OK, but the last two cases seem a bit weird. With 
1024 GB, it would be nice to print the valid range, like in the first 
two cases.


The HINT in the last message seems wrong: the hint claims that "TB" is 
accepted, yet "1024 TB" was not accepted. And shouldn't the hint also 
mention "B", since we accept that now?



Testing a setting with GUC_UNIT_KB:

$ postmaster -c work_mem=102400B
FATAL:  invalid value for parameter "work_mem": "10B"
HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".

This time the hint is accurate, but why is "B" not accepted here? Seems 
inconsistent.


- Heikki



Re: Memory unit GUC range checks

2018-05-16 Thread Heikki Linnakangas

On 16/05/18 15:19, Heikki Linnakangas wrote:

$ postmaster -c track_activity_query_size=1024TB
FATAL:  invalid value for parameter "track_activity_query_size": "1024TB"
HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".

...

The HINT in the last message seems wrong: the hint claims that "TB" is
accepted, yet "1024 TB" was not accepted. And shouldn't the hint also
mention "B", since we accept that now?


Testing a setting with GUC_UNIT_KB:

$ postmaster -c work_mem=102400B
FATAL:  invalid value for parameter "work_mem": "10B"
HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".

This time the hint is accurate, but why is "B" not accepted here? Seems
inconsistent.


Here's a pretty straightforward fix for these two issues. Any objections 
or better ideas?


- Heikki
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 7cd2d2d80e..93402030f7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -705,7 +705,7 @@ typedef struct
 	char		unit[MAX_UNIT_LEN + 1]; /* unit, as a string, like "kB" or
 		 * "min" */
 	int			base_unit;		/* GUC_UNIT_XXX */
-	int			multiplier;		/* If positive, multiply the value with this
+	int64		multiplier;		/* If positive, multiply the value with this
  * for unit -> base_unit conversion.  If
  * negative, divide (with the absolute value) */
 } unit_conversion;
@@ -718,10 +718,16 @@ typedef struct
 #error XLOG_BLCKSZ must be between 1KB and 1MB
 #endif
 
-static const char *memory_units_hint = gettext_noop("Valid units for this parameter are \"kB\", \"MB\", \"GB\", and \"TB\".");
+static const char *memory_units_hint = gettext_noop("Valid units for this parameter are \"B\", \"kB\", \"MB\", \"GB\", and \"TB\".");
 
 static const unit_conversion memory_unit_conversion_table[] =
 {
+	/*
+	 * TB -> bytes conversion always overflows 32-bit integer, so this
+	 * always produces an error.  Include it for completeness, and so that
+	 * you get an "out of range" error, rather than "invalid unit".
+	 */
+	{"TB", GUC_UNIT_BYTE, INT64CONST(1024) * 1024 * 1024 * 1024},
 	{"GB", GUC_UNIT_BYTE, 1024 * 1024 * 1024},
 	{"MB", GUC_UNIT_BYTE, 1024 * 1024},
 	{"kB", GUC_UNIT_BYTE, 1024},
@@ -731,21 +737,25 @@ static const unit_conversion memory_unit_conversion_table[] =
 	{"GB", GUC_UNIT_KB, 1024 * 1024},
 	{"MB", GUC_UNIT_KB, 1024},
 	{"kB", GUC_UNIT_KB, 1},
+	{"B", GUC_UNIT_KB, -1024},
 
 	{"TB", GUC_UNIT_MB, 1024 * 1024},
 	{"GB", GUC_UNIT_MB, 1024},
 	{"MB", GUC_UNIT_MB, 1},
 	{"kB", GUC_UNIT_MB, -1024},
+	{"B", GUC_UNIT_MB, -(1024 * 1024)},
 
 	{"TB", GUC_UNIT_BLOCKS, (1024 * 1024 * 1024) / (BLCKSZ / 1024)},
 	{"GB", GUC_UNIT_BLOCKS, (1024 * 1024) / (BLCKSZ / 1024)},
 	{"MB", GUC_UNIT_BLOCKS, 1024 / (BLCKSZ / 1024)},
 	{"kB", GUC_UNIT_BLOCKS, -(BLCKSZ / 1024)},
+	{"B", GUC_UNIT_BLOCKS, -(BLCKSZ / (1024 * 1024))},
 
 	{"TB", GUC_UNIT_XBLOCKS, (1024 * 1024 * 1024) / (XLOG_BLCKSZ / 1024)},
 	{"GB", GUC_UNIT_XBLOCKS, (1024 * 1024) / (XLOG_BLCKSZ / 1024)},
 	{"MB", GUC_UNIT_XBLOCKS, 1024 / (XLOG_BLCKSZ / 1024)},
 	{"kB", GUC_UNIT_XBLOCKS, -(XLOG_BLCKSZ / 1024)},
+	{"B", GUC_UNIT_XBLOCKS, -(XLOG_BLCKSZ / (1024 * 1024))},
 
 	{""}		/* end of table marker */
 };


Re: Flexible permissions for REFRESH MATERIALIZED VIEW

2018-05-16 Thread Robert Haas
On Tue, May 15, 2018 at 6:07 PM, Tom Lane  wrote:
> That seems like an awful lot of work to handle what's still going to be
> a pretty small set of permissions.  Every permission we add is going to
> have to be enforced in the C code, and it'll break applications to some
> extent to treat the situation differently from before, so I don't see
> that we're going to add them willy-nilly.
>
> (For what it's worth, my first instinct would be to lump all three of
> these proposals under a single grantable permission MAINTAIN, or some
> such name.  I don't think it's worth subdividing more finely.)

That's certainly a fair opinion, but I'm just not sure whether users
are going to agree.

>> To handle the on-disk issue, I think we could introduce a new varlena
>> type that's like aclitem but permits extra variable-length data at the
>> end.  It would be a different data type but pretty easy to convert
>> back and forth.  Still probably a lot of work to make it happen,
>> though, unfortunately.
>
> I think an appropriate amount of effort would be to widen AclMode to 64
> bits, which wasn't practical back in the day but now we could do easily
> (I think).  That would move us from having four spare permission bits
> to having 20.  I don't think it'd cause an on-disk compatibility break
> because type aclitem is generally only stored in the catalogs anyway.

How much are we worried about users (or extensions) who have used it
in user tables?  We could introduce aclitem2 and keep the old one
around, I guess.

If we're going to go to the trouble of making an incompatible change
to aclitem, it seems like we should go all the way and make it into
some kind of varlena type.  Realistically, if we add another 32 bits
to it, you're going to let 3 or 4 new permissions through -- max --
and then go back to worrying about how many bits we have left and how
quickly we're eating them up.  I guess if somebody else is doing the
work I'll be content to let them do it how they want to do it, but if
I were doing the work, I would look for a bigger hammer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-16 Thread Robert Haas
On Thu, May 10, 2018 at 10:22 PM, David Rowley
 wrote:
> Here's a recap of the current way to determine where the pruning occurred:
>
> Phase 1: Plan time pruning:
>
> EXPLAIN/EXPLAIN ANALYZE shows Append/MergeAppend/ModifyTable shows
> fewer subnodes than there are partitions.
> Both EXPLAIN and EXPLAIN ANALYZE output gives no indication of any pruning.
>
> Phase 2: Executor init pruning:
>
> EXPLAIN and EXPLAIN ANALYZE shows Append with fewer subnodes than
> there are partitions + "Subplans Removed: " appears to indicate the
> number of subnodes removed by this phase.
>
> MergeAppend and ModifyTable are unsupported in PG11.
>
> Phase 3: Executor run pruning:
>
> EXPLAIN/EXPLAIN ANALYZE shows all nodes that survived phase 1+2.
>
> EXPLAIN ANALYZE shows that if a given node was never executed then the
> runtime times appear as "(never executed)". If the Append was executed
> and a subnode the Append was "(never executed)" then it was pruned by
> this phase.

Hmm, that's actually not as bad as I thought.  Thanks for the
explanation.  I think if I were going to try to improve things, I'd
try to annotate the Append node with the name of the partitioned table
that it's using for pruning in case #2 and case #3, and maybe
something to indicate which type of pruning is in use.  That would
make it really clear whether pruning is enabled or not.  The methods
you mention above sort of require reading the tea leaves -- and it
might not always be very easy to distinguish between cases where
pruning is possible but nothing got pruned (imagine an inequality
qual) and where it's not even possible in the first place.

e.g.

Append
  Execution-Time Pruning: order_lines (at executor startup)
  -> Index Scan ...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PROPOSAL] Shared Ispell dictionaries

2018-05-16 Thread Robert Haas
On Wed, May 16, 2018 at 7:36 AM, Arthur Zakirov
 wrote:
>> I don't quite understand the problem you're trying to solve here, but:
>>
>> 1. Unless dsm_pin_segment() is called, a DSM segment will
>> automatically be removed when there are no remaining mappings.
>>
>> 2. Unless dsm_pin_mapping() is called, a DSM segment will be unmapped
>> when the currently-in-scope resource owner is cleaned up, like at the
>> end of the query.  If it is called, then the mapping will stick around
>> until the backend exits.
>
> I tried to solve the case when DSM segment remains mapped even a
> dictionary was dropped. It may happen in the following situation:
>
> Backend 1:
>
> =# select ts_lexize('english_shared', 'test');
> -- The dictionary is loaded into DSM, the segment and the mapping is
> pinned
> ...
> -- Call ts_lexize() from backend 2 below
> =# drop text search dictionary english_shared;
> -- The segment and the mapping is unpinned, see ts_dict_shmem_release()
>
> Backend 2:
>
> =# select ts_lexize('english_shared', 'test');
> -- The dictionary got from DSM, the mapping is pinned
> ...
> -- The dictionary was dropped by backend 1, but the mapping still is
> pinned

Yeah, there's really nothing we can do about that (except switch from
processes to threads).  There's no way for one process to force
another process to unmap something.  As you've observed, you can get
it to be dropped eventually, but not immediately.

> In sum, I think the problem is mostly solved. Backend 2 unpins the
> segment in next ts_lexize() call. But if backend 2 doesn't call
> ts_lexize() (or other TS function) anymore the segment will remain mapped.
> It is the only problem I see for now.

Maybe you could use CacheRegisterSyscacheCallback to get a callback
when the backend notices that a DROP has occurred.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Memory unit GUC range checks

2018-05-16 Thread Alexander Korotkov
Hi!

On Wed, May 16, 2018 at 3:19 PM, Heikki Linnakangas  wrote:

> I played around with the GUC memory units, specifically to test the new
> GUC_UNIT_BYTES flag (commit 6e7baa32):
>
> $ postmaster -c track_activity_query_size=1024kB
> FATAL:  1048576 is outside the valid range for parameter
> "track_activity_query_size" (100 .. 102400)
>
> $ postmaster -c track_activity_query_size=1024MB
> FATAL:  1073741824 is outside the valid range for parameter
> "track_activity_query_size" (100 .. 102400)
>
> $ postmaster -c track_activity_query_size=1024GB
> FATAL:  invalid value for parameter "track_activity_query_size": "1024GB"
> HINT:  Value exceeds integer range.
>
> $ postmaster -c track_activity_query_size=1024TB
> FATAL:  invalid value for parameter "track_activity_query_size": "1024TB"
> HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".
>
> The first two look OK, but the last two cases seem a bit weird. With 1024
> GB, it would be nice to print the valid range, like in the first two cases.
>

+1,
I also think it would be nice to show units in the valid range.  I image
that if I would see such message at the first time, then I would try to
reverse-engineer units from input value representation in the error
message.  Displaying units in the valid range would be clarifying for me.

The HINT in the last message seems wrong: the hint claims that "TB" is
> accepted, yet "1024 TB" was not accepted.


Yes, this seems like a bug.  I think it should be fixed.


> And shouldn't the hint also mention "B", since we accept that now?
>

Yes, I think "B" should be mentioned in hint.

Testing a setting with GUC_UNIT_KB:
>
> $ postmaster -c work_mem=102400B
> FATAL:  invalid value for parameter "work_mem": "10B"
> HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".
>
> This time the hint is accurate, but why is "B" not accepted here? Seems
> inconsistent.


+1

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Memory unit GUC range checks

2018-05-16 Thread Alexander Korotkov
On Wed, May 16, 2018 at 3:49 PM, Heikki Linnakangas  wrote:

> On 16/05/18 15:19, Heikki Linnakangas wrote:
>
>> $ postmaster -c track_activity_query_size=1024TB
>> FATAL:  invalid value for parameter "track_activity_query_size": "1024TB"
>> HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".
>>
>> ...
>>
>> The HINT in the last message seems wrong: the hint claims that "TB" is
>> accepted, yet "1024 TB" was not accepted. And shouldn't the hint also
>> mention "B", since we accept that now?
>>
>>
>> Testing a setting with GUC_UNIT_KB:
>>
>> $ postmaster -c work_mem=102400B
>> FATAL:  invalid value for parameter "work_mem": "10B"
>> HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".
>>
>> This time the hint is accurate, but why is "B" not accepted here? Seems
>> inconsistent.
>>
>
> Here's a pretty straightforward fix for these two issues. Any objections
> or better ideas?


This patch looks good for me.
But I would also like to see units in valid range message (as I wrote in
previous email).

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: NaNs in numeric_power (was Re: Postgres 11 release notes)

2018-05-16 Thread Tom Lane
Dean Rasheed  writes:
> On 15 May 2018 at 22:55, Tom Lane  wrote:
>> Here's a proposed patch for that.

> In the case 1 ^ NaN = 1, what should the result scale be?

The result is exact, so I don't see a reason to be worried about its
scale.  Messing with the scale would also require expending even
more code on what is, in the end, a corner case that no users have
expressed concern about.

regards, tom lane



Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-16 Thread Ashutosh Bapat
On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita
 wrote:
>
 So we could
 really minimize the code change and avoid the additional overhead caused
 by
 the is_converted_whole_row_reference test added to pull_var_clause.  I
 think
 the latter would be rather important, because PWJ is disabled by
 default.
 What do you think about that approach?
>>>
>>>
>>> Partition-wise join and partition-wise aggregation are disabled by
>>> default temporarily. We will change it in near future once the memory
>>> consumption issue has been tackled. The features are useful and users
>>> would want those to be enabled by default like other query
>>> optimizations. So, we can't take a decision based on that.
>
>
> Yeah, I really agree on that point!  However, considering that
> pull_var_clause is used in many places where partitioning is *not* involved,
> I still think it's better to avoid spending extra cycles needed only for
> partitioning in that function.

Right. The changes done in inheritance_planner() imply that we can
encounter a ConvertRowtypeExpr even in the expressions for a relation
which is not a child relation in the translated query. It was a child
in the original query but after translating it becomes a parent and
has ConvertRowtypeExpr somewhere there. So, there is no way to know
whether a given expression will have ConvertRowtypeExpr in it. That's
my understanding. But if we can device such a test, I am happy to
apply that test before or witin the call to pull_var_clause().

We don't need that expensive test if user has passed
PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check
the shape of expression tree. It would cause more asymmetry in
pull_var_clause(), but we can live with it or change the order of
tests for all the three options. I will differ that to a committer.

> +   /* Construct the conversion map. */
> +   parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0);
>
> I think it'd be better to pass -1, not 0, as the typmod argument for that
> function because that would be consistent with other places where we use
> that function with named rowtypes (eg, ruleutils.c).

Done.

>
> -is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno)
> +is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel,
> +   int *relno, int *colno)
>
> -1 for that change; because 1) we use "var" for implying many things as in
> eg, pull_var_clause and 2) I think it'd make back-patching easy to keep the
> name as-is.

I think pull_var_clause is the only place where we do that and I don't
like that very much. I also don't like is_subquery_var() name since
it's too specific. We might want that kind of functionality to ship
generic expressions and not just Vars in future. Usually we would be
searching for columns in the subquery targetlist so the name change
looks good to me. IIRC, the function was added one release back, so
backpatching pain, if any, won't be much. But I don't think we should
carry a misnomer for many releases to come. Let's differ this to a
committer.

Here's set of updated patches.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


expr_ref_error_pwj_v8.tar.gz
Description: GNU Zip compressed data


Re: Considering signal handling in plpython again

2018-05-16 Thread Hubert Zhang
There are remaining two problems

1. Do we need to consider when to delete the extension hook or it's not
necessary?
As the destroy function _PG_fini doesn't work, I cannot find a good
place to reset to hook gracefully.
I tested the drop language plpythonu statement which will not remove
the python shared library in the current session,
So it seems to be safe to leave the cancel_handler_hook not be reset.
How about other extensions, for example plr. Does the "drop extension"
statement will not remove the loaded shared library in the process either?

-- Another idea is to register the hook at the beginning of
plpython_call_handler and unregister the hook at the end of
plpython_call_handler.

2. Do we need to use explicit hook list(List *cancel_hook_list) instead of
implicit cancel_hook(which relies on the extension to link the cancel_hook
inside their code
 e.g. prev_hook = cancel_hook; cancel_hook=my_hook;   void
my_hook(){mywork(); (*prev_hook)();} )?
I didn't find any explicit hook list in PG code base, is that a good
practice?

-- Hubert


On Mon, May 14, 2018 at 6:40 PM, Heikki Linnakangas  wrote:

> On 14/05/18 10:56, Hubert Zhang wrote:
>
>> For nested SPI case, one option is to turn off the bool variable when
>> entering the SPI function(PLy_spi_prepare, PLy_spi_execute, PLy_cursor
>> etc.)
>> and turn on the bool variable again when exiting the SPI function.
>>
>
> Yeah, that seems reasonable.
>
> - Heikki
>



-- 
Thanks

Hubert Zhang


Re: [BUGFIX] amcanbackward is not checked before building backward index paths

2018-05-16 Thread Tom Lane
Alexander Korotkov  writes:
> On Wed, May 16, 2018 at 1:41 AM, Tom Lane  wrote:
>> Perhaps there is a case for adding an additional flag to allow specifying
>> "I can't support ORDER BY DESC", but I'm not in a big hurry to do so.
>> I think there would be more changes than this needed to handle such a
>> restriction, anyway.

> OK, got it.  We'll probably propose a patch implementing that to the
> next commitfest.

If you do, it wouldn't be a bad idea to try to clarify the existing
code and docs around this point.  I'm thinking that amcanbackward is
misleadingly named; maybe we should rename it as part of the change?

regards, tom lane



Re: index scan over composite type

2018-05-16 Thread Teodor Sigaev

Thank you. Seems, it works, at least I can't find a counter-example for that.

Tom Lane wrote:

Teodor Sigaev  writes:

I'm not understand why postgres prefers to sort table instead of using
index only scan when query is a simple inner join on composite type.
Query with equality clause with constant works fine with index scan but
join not. Could somebody point me why? Thank you.


Hmm ... the reason why not seems to be that canonicalize_ec_expression()
improperly adds a RelabelType node, causing the composite-type Vars to not
be recognized as matching the eclass they should match.  The attached
patch fixes it and doesn't seem to break anything in the regression tests.

This raises the question of why we don't treat type RECORD more like a
true polymorphic type, but that's a can of worms I don't particularly want
to open right now.  For the moment, this is the only IsPolymorphicType
call in the planner AFAICS, so there's some reason to hope that we don't
have more bugs of the same ilk.

regards, tom lane



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: Postgres 11 release notes

2018-05-16 Thread Alvaro Herrera
On 2018-May-15, Bruce Momjian wrote:

> On Wed, May 16, 2018 at 09:01:35AM +0900, Tatsuo Ishii wrote:
> > There's a small typo.
> > 
> > > Add support for with huge(large) pages on Windows (Takayuki Tsunakawa, 
> > > Thomas Munro) 
> > 
> > I think a space between "huge" and "(large)" is needed.
> 
> Done, URL updated.

I'm not sure why we say "huge (large) pages".  The natural term for
Windows is "large-pages",
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366720(v=vs.85).aspx
so I think we should use that terminology.  Maybe something like

Add support for large pages on Windows (Takayuki 
Tsunakawa, Thomas Munro)
This is controlled by the huge_pages configuration parameter.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: index scan over composite type

2018-05-16 Thread Tom Lane
Teodor Sigaev  writes:
> Thank you. Seems, it works, at least I can't find a counter-example for that.

Will push, thanks for reviewing.

regards, tom lane



Re: parallel foreign scan

2018-05-16 Thread Manuel Kniep

> I think you are trying collecting data from multple kafka
> server. This means each server has a dedicate foreign table on a
> dedicate foreign server. Parallel execution doesn't fit in that
> case since it works on single base relation (or a
> table). Parallel append/merge append look a bit different but
> actually is the same in the sense that one base relation is
> scanned on multiple workers. Even if you are trying to fetch from
> one kafka stream on multiple workers, I think the fdw driver
> doesn't support parallel scanning anyway.

Well my idea was to to scan multiple partitions from a single kafka
server / topic  in parallel.
I’ll will look into your suggestion to set up partial paths

regards

Manuel



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-16 Thread Robert Haas
On Mon, May 14, 2018 at 12:57 AM, David Rowley
 wrote:
> I noticed that a comment in get_partition_dispatch_recurse claims that:
>
> "it contains the
> * leaf partition's position in the global list *leaf_part_oids minus 1"
>
> The "minus 1" part is incorrect. It simply just stores the 0-based
> index of the item in the list. I was going to fix it by removing just
> that part, but instead, I ended up rewriting the whole thing.

I think that's clearer.  Committed with a few tweaks that are
hopefully improvements.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: log_min_messages shows debug instead of debug2

2018-05-16 Thread Robert Haas
On Tue, May 15, 2018 at 4:46 AM, Ideriha, Takeshi
 wrote:
> I noticed that if log_min_messages is set to ‘debug2’, it shows ‘debug’
> instead of debug2.
>
> Patch attached.

Seems worth changing to me.  Anyone else think differently?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Should total_pages be calculated after partition pruning and constraint exclusion?

2018-05-16 Thread Tom Lane
David Rowley  writes:
> On 16 May 2018 at 11:04, Tom Lane  wrote:
>> No, it should go under "planner improvement".  If this were a bug fix,
>> it'd be a candidate for back-patch, which IMO it's not --- if only
>> because of the risk of plan destabilization.

> I'm not going to make a fuss over it, but I guess we must differ in
> opinion as I would count tracking relation sizes of relations we're
> actually not going to scan as a bug.

Dunno, the way in which total_pages is used is so squishy/heuristic
that it's really hard to say that changing the way we calculate it
is necessarily going to lead to better plans.  I agree that this is
*probably* an improvement, but I don't think it's absolutely clear.

If there were some way to paint this as connected to other stuff
already done for v11, I'd be happier about pushing it in now --- but
it doesn't seem to be.

regards, tom lane



Re: libpq compression

2018-05-16 Thread Grigory Smolkin

Hello!
I have noticed that psql --help lack -Z|--compression option.
Also it would be nice to have option like --compression-level in psql 
and pgbench.



On 03/30/2018 03:53 PM, Konstantin Knizhnik wrote:

Hi hackers,

One of our customers was managed to improve speed about 10 times by 
using SSL compression for the system where client and servers are 
located in different geographical regions
and query results are very large because of JSON columns. Them 
actually do not need encryption, just compression.
I expect that it is not the only case where compression of libpq 
protocol can be useful. Please notice that Postgres replication is 
also using libpq protocol.


Taken in account that vulnerability was found in SSL compression and 
so SSLComppression is considered to be deprecated and insecure 
(http://www.postgresql-archive.org/disable-SSL-compression-td6010072.html), 
it will be nice to have some alternative mechanism of reducing libpq 
traffic.


I have implemented some prototype implementation of it (patch is 
attached).
To use zstd compression, Postgres should be configured with 
--with-zstd. Otherwise compression will use zlib unless it is disabled 
by --without-zlib option.
I have added compression=on/off parameter to connection string and -Z 
option to psql and pgbench utilities.

Below are some results:

Compression ratio (raw->compressed):


libz (level=1)
libzstd (level=1)
pgbench -i -s 10
16997209->2536330
16997209->268077
pgbench -t 10 -S
6289036->1523862
6600338<-900293
6288933->1777400
6600338<-1000318


There is no mistyping: libzstd compress COPY data about 10 times 
better than libz, with wonderful compression ratio 63.


Influence on execution time is minimal (I have tested local 
configuration when client and server are at the same host):



no compression
libz (level=1)
libzstd (level=1)
pgbench -i -s 10
1.552
1.572
1.611
pgbench -t 10 -S
4.482
4.926
4.877

--
Konstantin Knizhnik
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Allow COPY's 'text' format to output a header

2018-05-16 Thread Robert Haas
On Tue, May 15, 2018 at 12:06 PM, Tom Lane  wrote:
>> One can imagine extensions of the idea: for example, the header could
>> actually be used to identify the columns, so the column order in the file
>> doesn't matter. There could also be an "AS" syntax to allow the target
>> field names to be different from the field names in the header. I have
>> occasionally found myself wanting to ignore certain columns of the file.
>> But these are all significantly more complicated than just looking at the
>> header and requiring it to match the target field names.
>
> Yeah, and every bit of flexibility you add raises the chance of an
> undetected error.  COPY isn't intended as a general ETL facility,
> so I'd mostly be -1 on adding such things.  But I can see the value
> of confirming that you're copying the right file, and a header match
> check would go a long way towards doing that.

True.

FWIW, I'm +1 on this idea.  I think a header line is a pretty common
need, and if you're exporting a large amount of data, it could be
pretty annoying to have to first run COPY, and then do

(echo blah,blah1,blah2; cat copyoutput.txt)>whatireallywant.txt

There's a lot of value in being able to export from program A
*exactly* what program B wants to import, rather than something that
is close but has to be massaged.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Oddity in COPY FROM handling of check constraints on partition tables

2018-05-16 Thread Robert Haas
On Tue, Mar 27, 2018 at 9:00 AM, Etsuro Fujita
 wrote:
> Attached is a patch for fixing this issue.

This no longer applies.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: log_min_messages shows debug instead of debug2

2018-05-16 Thread Tom Lane
Robert Haas  writes:
> On Tue, May 15, 2018 at 4:46 AM, Ideriha, Takeshi
>  wrote:
>> I noticed that if log_min_messages is set to ‘debug2’, it shows ‘debug’
>> instead of debug2.

> Seems worth changing to me.  Anyone else think differently?

I think the current behavior was intentional at some point, probably
with the idea that if you put in "debug" it should come out as "debug".
This patch just moves the discrepancy.  Still, it's more precise to
print debug2, so I'm okay with changing.

regards, tom lane



Re: Considering signal handling in plpython again

2018-05-16 Thread Heikki Linnakangas


On 16 May 2018 16:50:24 EEST, Hubert Zhang  wrote:
>There are remaining two problems
>
>1. Do we need to consider when to delete the extension hook or it's not
>necessary?

No, PostgreSQL never unloads shared libraries, so that can be ignored.

We used to, and that's what _PG_fini() was for. But we stopped doing that, 
because it was unsafe. IIRC, hook functions like this was exactly the reason 
that made it unsafe.

>2. Do we need to use explicit hook list(List *cancel_hook_list) instead
>of
>implicit cancel_hook(which relies on the extension to link the
>cancel_hook
>inside their code
> e.g. prev_hook = cancel_hook; cancel_hook=my_hook;   void
>my_hook(){mywork(); (*prev_hook)();} )?
>   I didn't find any explicit hook list in PG code base, is that a good
>practice

I didn't understand what you meant with a hook list. But I believe the way I 
had the hook in the patch was fine.

- Heikki



Removing unneeded self joins

2018-05-16 Thread Alexander Kuzmenkov

Hi hackers,

There is a join optimization we don't do -- removing inner join of a 
table with itself on a unique column. Such joins are generated by 
various ORMs, so from time to time our customers ask us to look into 
this. Most recently, it was discussed on the list in relation to an 
article comparing the optimizations that some DBMS make [1].


I started to explore what can be done about this. Attached is a proof of 
concept patch. It works for some simple cases:


create table tt(a int primary key, b text);
explain select p.* from tt p join (select * from tt where b ~~ 'a%') q 
on p.a = q.a;


  QUERY PLAN
──
 Seq Scan on tt p  (cost=0.00..25.88 rows=6 width=36)
   Filter: (b ~~ 'a%'::text)

It also works for semi-joins like `explain select p.* from tt p where 
exists (select * from tt where b ~~ 'a%' and a = p.a);`. This requires a 
preparatory step of reducing unique semi joins to inner joins, and we 
already do this (reduce_unique_semijoin).



What this patch tries to do is to remove these inner joins when a single 
join is being planned (populate_joinrel_with_paths). The main entry 
point is reduce_self_unique_join. First, it proves that both input 
relations are uniquely constrained by the same index given the 
particular join clauses. We already have a way to find such indexes 
(relation_has_unique_index_for), so I was able to reuse this. What I'm 
not sure about is how to properly remove the join after that. For now, I 
just pretend that the join relation being built is the outer baserel, 
add to it the restrictions from the inner relation, and then plan it as 
usual. Maybe there is a less hacky way to do it? I've seen elsewhere a 
suggestion to use an AppendPath for a similar purpose, but here we can't 
just use the outer relation we've already planned because the 
restriction list is different.


I'd be glad to hear your thoughts on this.

[1] 
https://www.postgresql.org/message-id/flat/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw%40mail.gmail.com#camjna7cc4x9yr-vajs-jsycajhrdvjqnn7m2slh1wlh-_z2...@mail.gmail.com


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 477b9f7..334793c 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -76,13 +76,9 @@ static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 			 Index rti, RangeTblEntry *rte);
 static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  Index rti, RangeTblEntry *rte);
-static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
-   RangeTblEntry *rte);
 static void create_plain_partial_paths(PlannerInfo *root, RelOptInfo *rel);
 static void set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
 		  RangeTblEntry *rte);
-static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
-	   RangeTblEntry *rte);
 static void set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		 RangeTblEntry *rte);
 static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -366,7 +362,7 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 else
 {
 	/* Plain relation */
-	set_plain_rel_size(root, rel, rte);
+	set_plain_rel_size(root, rel);
 }
 break;
 			case RTE_SUBQUERY:
@@ -449,7 +445,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 else
 {
 	/* Plain relation */
-	set_plain_rel_pathlist(root, rel, rte);
+	set_plain_rel_pathlist(root, rel);
 }
 break;
 			case RTE_SUBQUERY:
@@ -516,8 +512,8 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  * set_plain_rel_size
  *	  Set size estimates for a plain relation (no subquery, no inheritance)
  */
-static void
-set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+void
+set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel)
 {
 	/*
 	 * Test any partial indexes of rel for applicability.  We must do this
@@ -691,8 +687,8 @@ set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
  * set_plain_rel_pathlist
  *	  Build access paths for a plain relation (no subquery, no inheritance)
  */
-static void
-set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+void
+set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
 {
 	Relids		required_outer;
 
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index f295558..d79b3c7 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2961,7 +2961,7 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
- * relation_has_unique_index_for
+ * relation_get_unique_index_for
  *	  Determine whether the relation provably has at most one row satisfying
  *	  a set of equality conditions, beca

Re: libpq compression

2018-05-16 Thread Konstantin Knizhnik



On 16.05.2018 18:09, Grigory Smolkin wrote:


Hello!
I have noticed that psql --help lack -Z|--compression option.
Also it would be nice to have option like --compression-level in psql 
and pgbench.



Thank you for this notice.
Updated and rebased patch is attached.
Concerning specification of compression level: I have made many 
experiments with different data sets and both zlib/zstd and in both 
cases using compression level higher than default doesn't cause some 
noticeable increase of compression ratio, but quite significantly reduce 
speed. Moreover, for "pgbench -i" zstd provides better compression ratio 
(63 times!) with compression level 1 than with with largest recommended 
compression level 22! This is why I decided not to allow user to choose 
compression level.
diff --git a/configure b/configure
index 0aafd9c..fc5685c 100755
--- a/configure
+++ b/configure
@@ -699,6 +699,7 @@ ELF_SYS
 EGREP
 GREP
 with_zlib
+with_zstd
 with_system_tzdata
 with_libxslt
 with_libxml
@@ -863,6 +864,7 @@ with_libxml
 with_libxslt
 with_system_tzdata
 with_zlib
+with_zstd
 with_gnu_ld
 enable_largefile
 enable_float4_byval
@@ -8017,6 +8019,86 @@ fi
 
 
 #
+# ZStd
+#
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+  withval=$with_zstd;
+  case $withval in
+yes)
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  with_zstd=no
+
+fi
+
+
+
+
+if test "$with_zstd" = yes ; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_zstd_ZSTD_compress=yes
+else
+  ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5
+$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; }
+if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBZSTD 1
+_ACEOF
+
+  LIBS="-lzstd $LIBS"
+
+else
+  as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5
+fi
+
+fi
+
+
+
+#
 # Elf
 #
 
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 95d090e..b59629c 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -196,6 +196,7 @@ with_llvm	= @with_llvm@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
+with_zstd   = @with_zstd@
 enable_rpath	= @enable_rpath@
 enable_nls	= @enable_nls@
 enable_debug	= @enable_debug@
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 25af514..a8e461a 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -51,6 +51,14 @@ ifeq ($(with_systemd),yes)
 LIBS += -lsystemd
 endif
 
+ifeq ($(with_zstd),yes)
+LIBS += -lzstd
+endif
+
+ifeq ($(with_zlib),yes)
+LIBS += -lz
+endif
+
 ##
 
 all: submake-libpgport submake-catalog-headers submake-utils-headers postgres $(POSTGRES_IMP)
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index a4f6d4d..3a6f54b 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -95,6 +95,7 @@
 #include "storage/ipc.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
+#include  "common/zpq_stream.h"
 
 /*
  * Cope with the various platform-specific ways to spell TCP keepalive socket
@@ -143,6 +144,9 @@ static char PqRecvBuffer[PQ_RECV_BUFFER_SIZE];
 static int	PqRecvPointer;		/* Next index to read a byte from PqRecvBuffer */
 static int	PqRecvLength;		/* End of data available in PqRecvBuffer */
 
+static ZpqStream* PqStream;
+
+
 /*
  * Message status
  */
@@ -185,6 +189,31 @@ PQcommMethods *PqCommMethods = &PqCommSocketMethods;
 
 WaitEventSet *FeBeWaitSet;
 
+/* 
+ *		pq_configure - configure connection using port settings
+ *
+ * Right now only conpression is toggled in the configure.
+ * Function returns 0 in case of success, non-null in case of error
+ * 
+ */
+int
+pq_configure(Port* port)
+{
+	if (port->use_compression)
+	{
+		char compression = 'z'; /* Request compression message */
+		int rc;
+		/* Switch on compression at client side */
+		socket_set_nonblocking(false);

Re: Removing unneeded self joins

2018-05-16 Thread Tom Lane
Alexander Kuzmenkov  writes:
> There is a join optimization we don't do -- removing inner join of a 
> table with itself on a unique column. Such joins are generated by 
> various ORMs, so from time to time our customers ask us to look into 
> this. Most recently, it was discussed on the list in relation to an 
> article comparing the optimizations that some DBMS make [1].

This is the sort of thing that I always wonder why the customers don't
ask the ORM to stop generating such damfool queries.  Its *expensive*
for us to clean up after their stupidity; almost certainly, it would
take far fewer cycles, net, for them to be a bit smarter in the first
place.

regards, tom lane



Re: postgres_fdw: Oddity in pushing down inherited UPDATE/DELETE joins to remote servers

2018-05-16 Thread Robert Haas
On Fri, May 11, 2018 at 8:46 AM, Etsuro Fujita
 wrote:
> I added an assertion test to make_modifytable to match that in
> create_modifytable_path.  Attached is an updated version of the patch.

Committed.  Was it just good luck that this ever worked at all?  I mean:

-if (rti < root->simple_rel_array_size &&
-root->simple_rel_array[rti] != NULL)
+if (rti < subroot->simple_rel_array_size &&
+subroot->simple_rel_array[rti] != NULL)
 {
-RelOptInfo *resultRel = root->simple_rel_array[rti];
+RelOptInfo *resultRel = subroot->simple_rel_array[rti];

 fdwroutine = resultRel->fdwroutine;
 }
 else
 {
-RangeTblEntry *rte = planner_rt_fetch(rti, root);
+RangeTblEntry *rte = planner_rt_fetch(rti, subroot);

...an RTI is only meaningful relative to a particular PlannerInfo's
range table.  It can't be right to just take an RTI for one
PlannerInfo and look it up in some other PlannerInfo's range table.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Postgres 11 release notes

2018-05-16 Thread Magnus Hagander
On Wed, May 16, 2018 at 4:08 PM, Alvaro Herrera 
wrote:

> On 2018-May-15, Bruce Momjian wrote:
>
> > On Wed, May 16, 2018 at 09:01:35AM +0900, Tatsuo Ishii wrote:
> > > There's a small typo.
> > >
> > > > Add support for with huge(large) pages on Windows (Takayuki
> Tsunakawa, Thomas Munro)
> > >
> > > I think a space between "huge" and "(large)" is needed.
> >
> > Done, URL updated.
>
> I'm not sure why we say "huge (large) pages".  The natural term for
> Windows is "large-pages",
> https://msdn.microsoft.com/en-us/library/windows/desktop/
> aa366720(v=vs.85).aspx
> so I think we should use that terminology.  Maybe something like
>
> Add support for large pages on Windows (Takayuki
> Tsunakawa, Thomas Munro)
> This is controlled by the huge_pages configuration parameter.
>
>
I assume the point here is that we the term huge pages is what's used in
PostgreSQL. For example, they are still configured using the huge_pages
GUC.



-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Postgres 11 release notes

2018-05-16 Thread Alvaro Herrera
On 2018-May-16, Magnus Hagander wrote:

> On Wed, May 16, 2018 at 4:08 PM, Alvaro Herrera 
> wrote:

> > I'm not sure why we say "huge (large) pages".  The natural term for
> > Windows is "large-pages",
> > https://msdn.microsoft.com/en-us/library/windows/desktop/
> > aa366720(v=vs.85).aspx
> > so I think we should use that terminology.  Maybe something like
> >
> > Add support for large pages on Windows (Takayuki
> > Tsunakawa, Thomas Munro)
> > This is controlled by the huge_pages configuration parameter.
>
> I assume the point here is that we the term huge pages is what's used
> in PostgreSQL. For example, they are still configured using the
> huge_pages GUC.

That's exactly what my proposed wording says :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Removing unneeded self joins

2018-05-16 Thread Robert Haas
On Wed, May 16, 2018 at 12:08 PM, Tom Lane  wrote:
> Alexander Kuzmenkov  writes:
>> There is a join optimization we don't do -- removing inner join of a
>> table with itself on a unique column. Such joins are generated by
>> various ORMs, so from time to time our customers ask us to look into
>> this. Most recently, it was discussed on the list in relation to an
>> article comparing the optimizations that some DBMS make [1].
>
> This is the sort of thing that I always wonder why the customers don't
> ask the ORM to stop generating such damfool queries.  Its *expensive*
> for us to clean up after their stupidity; almost certainly, it would
> take far fewer cycles, net, for them to be a bit smarter in the first
> place.

The trouble, of course, is that the customer didn't write the ORM,
likely has no idea how it works, and doesn't want to run a modified
version of it even if they do.  If the queries run faster on other
systems than they do on PostgreSQL, we get dinged -- not unjustly.

Also, I'm not sure that I believe that it's always easy to avoid
generating such queries.  I mean, this case is trivial so it's easy to
say, well, just rewrite the query.  But suppose that I have a fact
table over which I've created two views, each of which performs
various joins between the fact table and various lookup tables.  My
queries are such that I normally need the joins in just one of these
two views and not the other to fetch the information I care about.
But every once in a while I need to run a report that involves pulling
every column possible.  The obvious solution is to join the views on
the underlying table's primary key, but then you get this problem.  Of
course there's a workaround: define a third view that does both sets
of joins-to-lookup-tables.  But that starts to feel like you're
handholding the database; surely it's the database's job to optimize
queries, not the user's.

It's been about 10 years since I worked as a web developer, but I do
remember hitting this kind of problem from time to time and I'd really
like to see us do something about it.  I wish we could optimize away
inner joins, too, for similar reasons.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Postgres 11 release notes

2018-05-16 Thread Magnus Hagander
On Wed, May 16, 2018 at 6:24 PM, Alvaro Herrera 
wrote:

> On 2018-May-16, Magnus Hagander wrote:
>
> > On Wed, May 16, 2018 at 4:08 PM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > wrote:
>
> > > I'm not sure why we say "huge (large) pages".  The natural term for
> > > Windows is "large-pages",
> > > https://msdn.microsoft.com/en-us/library/windows/desktop/
> > > aa366720(v=vs.85).aspx
> > > so I think we should use that terminology.  Maybe something like
> > >
> > > Add support for large pages on Windows (Takayuki
> > > Tsunakawa, Thomas Munro)
> > > This is controlled by the huge_pages configuration
> parameter.
> >
> > I assume the point here is that we the term huge pages is what's used
> > in PostgreSQL. For example, they are still configured using the
> > huge_pages GUC.
>
> That's exactly what my proposed wording says :-)
>
>
Uh. Nevermind. Nothing to see here. For some reason I only saw the first
paragraph.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: log_min_messages shows debug instead of debug2

2018-05-16 Thread Robert Haas
On Wed, May 16, 2018 at 11:28 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, May 15, 2018 at 4:46 AM, Ideriha, Takeshi
>>  wrote:
>>> I noticed that if log_min_messages is set to ‘debug2’, it shows ‘debug’
>>> instead of debug2.
>
>> Seems worth changing to me.  Anyone else think differently?
>
> I think the current behavior was intentional at some point, probably
> with the idea that if you put in "debug" it should come out as "debug".

Hmm, that's an angle I hadn't considered.

> This patch just moves the discrepancy.  Still, it's more precise to
> print debug2, so I'm okay with changing.

OK.  Let's see if anyone else has an opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Memory unit GUC range checks

2018-05-16 Thread Heikki Linnakangas

On 16/05/18 16:35, Alexander Korotkov wrote:

I also think it would be nice to show units in the valid range.  I image
that if I would see such message at the first time, then I would try to
reverse-engineer units from input value representation in the error
message.  Displaying units in the valid range would be clarifying for me.


Totally agreed. Users shouldn't need to know whether the base unit for a 
setting is bytes or kilobytes or megabytes. But that sounds like a 
feature rather than a bug, so, not now.


- Heikki



Re: An out-of-date comment in nodeIndexonlyscan.c

2018-05-16 Thread Heikki Linnakangas

On 14/05/18 02:15, Thomas Munro wrote:

Hello,

Since commit cdf91edb (2012), nodeIndexonlyscan.c says:

 /*
  * Predicate locks for index-only scans must be
acquired at the page
  * level when the heap is not accessed, since
tuple-level predicate
  * locks need the tuple's xmin value.  If we had to
visit the tuple
  * anyway, then we already have the tuple-level lock
and can skip the
  * page lock.
  */
 if (tuple == NULL)
 PredicateLockPage(scandesc->heapRelation,

ItemPointerGetBlockNumber(tid),
   estate->es_snapshot);

The first sentence of that comment is no longer true as of commit
c01262a8 (2013).  As for whether it's necessary to predicate-lock the
whole eheap page (rather than the heap tuple) anyway because of HOT
update chains, I don't know, so I'm not sure what wording to propose
instead.


Hmm. If there are any updated tuples, HOT or not, the visibility map bit 
will not be set, and we won't reach this code. So I think we could 
acquire the tuple lock here.


- Heikki



Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2018-05-16 Thread Robert Haas
On Mon, Apr 16, 2018 at 7:35 AM, Ashutosh Bapat
 wrote:
> It does fix the problem. But the patch as is interferes with the way
> we handle tableoid currently. That can be seen from the regression
> diffs that the patch causes.  RIght now, every tableoid reference gets
> converted into the tableoid of the foreign table (and not the tableoid
> of the foreign table). Somehow we need to differentiate between the
> tableoid injected for DML and tableoid references added by the user in
> the original query and then use tableoid on the foreign server for the
> first and local foreign table's oid for the second. Right now, I don't
> see a simple way to do that.

I think that the place to start would be to change this code to use
something other than TableOidAttributeNumber:

+   var = makeVar(parsetree->resultRelation,
+ TableOidAttributeNumber,
+ OIDOID,
+ -1,
+ InvalidOid,
+ 0);

Note that rewriteTargetListUD, which calls AddForeignUpdateTargets,
also contingently adds a "wholerow" attribute which ExecModifyTable()
is able to fish out later.  It seems like it should be possible to add
a "remotetableoid" column that works similarly, although I'm not
exactly sure what would be involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-16 Thread David Rowley
On 17 May 2018 at 02:51, Robert Haas  wrote:
> On Mon, May 14, 2018 at 12:57 AM, David Rowley
>  wrote:
>> The "minus 1" part is incorrect. It simply just stores the 0-based
>> index of the item in the list. I was going to fix it by removing just
>> that part, but instead, I ended up rewriting the whole thing.
>
> I think that's clearer.  Committed with a few tweaks that are
> hopefully improvements.

Thanks for committing. Although, I disagree with your tweak:

+* 1-based index into the *pds list.

I think that's making the same mistake as the last comment did. You
think it's 1-based because the index is being set with list_length
rather than list_length - 1, but it can do that simply because the
item has not been added to the list yet.

Nothing converts this index back to 0-based;

RelationGetPartitionDispatchInfo builds the array from the list with:

i = 0;
foreach(lc, pdlist)
{
pd[i++] = lfirst(lc);
}

ExecFindPartition uses the pd array with:

parent = pd[-parent->indexes[cur_index]];

So if it was 1-based then we'd be off by one here.

Maybe we can clear up that confusion with

+ /*
+  * No need to subtract 1 to get the 0-based index as the item for this
+  * partitioned table has not been added to the list yet.
+  */
pd->indexes[i] = -list_length(*pds);

and just switch 1-based to 0-based in the new comment.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Removing unneeded self joins

2018-05-16 Thread Andres Freund
Hi,

On 2018-05-16 12:26:48 -0400, Robert Haas wrote:
> Also, I'm not sure that I believe that it's always easy to avoid
> generating such queries.

Yea. There's obviously plenty cases where ORMs just want to make the
database hurt. But especially when building a join between a number of
tables based on various fields, it's not going to be easy for the ORM to
figure out which ones can be safely omitted. It'd need similar
optimization as we'd have to do, without having the infrastructure core
PG has.  And then there's, as you say, views etc...

Greetings,

Andres Freund



Re: Odd procedure resolution

2018-05-16 Thread Robert Haas
On Fri, Mar 23, 2018 at 10:42 AM, Ashutosh Bapat
 wrote:
> On Fri, Mar 23, 2018 at 7:53 PM, Tom Lane  wrote:
>> Ashutosh Bapat  writes:
>>> Incidently the fix looks quite simple. See patch attached.
>>
>> ISTM this patch effectively proposes to make procedures have their own
>> namespace yet still live in pg_proc.  That is the worst of all possible
>> worlds IMO.  Somewhere early in this patch series, I complained that
>> procedures should be in a different namespace and therefore not be kept
>> in pg_proc but in some new catalog.  That argument was rejected on the
>> grounds that SQL requires them to be in the same namespace, which I
>> wasn't particularly sold on, but that's where we are.  If they are in
>> the same namespace, though, we have to live with the consequences of
>> that, including ambiguity.  Otherwise there will soon be questions
>> like "well, why can't I create both function foo(int) and procedure
>> foo(int), seeing that there's no question which of them a particular
>> statement intends to call?".
>
> That question did cross my mind and I think that's a valid question.

I agree, but I'm not sure it settles the issue.  If you hand somebody
a plate and a slice of pizza and say "eat this", you expect them to
understand that they should try to eat the pizza, not the plate.  You
expect this because virtually all human beings over the age of two
understand that pizza is eatable and plates are not.  It is similar
reasonable to expect that when the database is asked to call one of
two things, one of which can be called and the other of which cannot,
it might decide to try calling the one that can be called rather than
the one that can't.  I think we need procedures and functions to live
in the same namespace because otherwise DROP ROUTINE foo(int) could
find two things equally worthy of being dropped, and we don't want
that to happen (leaving aside the question of whether DROP ROUTINE is
a good idea in the first place).  That does not mean -- to me anyway
-- that we've got to make CALL latch onto a function when a procedure
is available.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-16 Thread Robert Haas
On Wed, May 16, 2018 at 2:28 PM, David Rowley
 wrote:
> Thanks for committing. Although, I disagree with your tweak:
>
> +* 1-based index into the *pds list.
>
> I think that's making the same mistake as the last comment did. You
> think it's 1-based because the index is being set with list_length
> rather than list_length - 1, but it can do that simply because the
> item has not been added to the list yet.

Uh, maybe I've got that wrong.  We can say 0-based instead if that's
right.  I just didn't want to say that in one case it was 0-based and
in the other case make no mention.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Odd procedure resolution

2018-05-16 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 23, 2018 at 10:42 AM, Ashutosh Bapat
>  wrote:
>> On Fri, Mar 23, 2018 at 7:53 PM, Tom Lane  wrote:
>>> ISTM this patch effectively proposes to make procedures have their own
>>> namespace yet still live in pg_proc.  That is the worst of all possible
>>> worlds IMO.

> ... That does not mean -- to me anyway
> -- that we've got to make CALL latch onto a function when a procedure
> is available.

My opinion remains unchanged.  If you're unhappy about the system
confusing procedure foo(int) and function foo(real), maybe the answer
is to not overload the name "foo" with such enthusiasm.  But putting
kluges into (some of) the lookup rules is just going to lead to its
own problems and surprising results.

In particular, I dislike the idea that this patch would make routine
names appear unique in some cases when they do not in others.
Overloading is complicated/confusing enough without that.

BTW, we seem to have some confusion here already:

regression=# create function foo(int) returns int as 'select $1' language sql;
CREATE FUNCTION
regression=# create procedure foo(text) as 'select $1' language sql;
CREATE PROCEDURE
regression=# drop function foo;
ERROR:  function name "foo" is not unique
HINT:  Specify the argument list to select the function unambiguously.
regression=# drop routine foo;
ERROR:  function name "foo" is not unique
HINT:  Specify the argument list to select the function unambiguously.
regression=# drop procedure foo;
ERROR:  could not find a procedure named "foo"

The first two errors are what I'd expect, but why is the third
different?

regards, tom lane



Re: Odd procedure resolution

2018-05-16 Thread Robert Haas
On Wed, May 16, 2018 at 3:29 PM, Tom Lane  wrote:
> My opinion remains unchanged.  If you're unhappy about the system
> confusing procedure foo(int) and function foo(real), maybe the answer
> is to not overload the name "foo" with such enthusiasm.  But putting
> kluges into (some of) the lookup rules is just going to lead to its
> own problems and surprising results.
>
> In particular, I dislike the idea that this patch would make routine
> names appear unique in some cases when they do not in others.
> Overloading is complicated/confusing enough without that.

I am not endorsing the patch and haven't looked at it, but I don't buy
the idea that having CALL prefer procedures and SELECT functions would
confuse people more than what we've got already.  As we have it,
creating an uncallable object can make CALL fail, which is certainly a
POLA violation.  You might be be able to convince me that it's better
than the alternatives, but it can't possibly be *good*.

> BTW, we seem to have some confusion here already:
>
> regression=# create function foo(int) returns int as 'select $1' language sql;
> CREATE FUNCTION
> regression=# create procedure foo(text) as 'select $1' language sql;
> CREATE PROCEDURE
> regression=# drop function foo;
> ERROR:  function name "foo" is not unique
> HINT:  Specify the argument list to select the function unambiguously.
> regression=# drop routine foo;
> ERROR:  function name "foo" is not unique
> HINT:  Specify the argument list to select the function unambiguously.
> regression=# drop procedure foo;
> ERROR:  could not find a procedure named "foo"
>
> The first two errors are what I'd expect, but why is the third
> different?

Good question.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Memory unit GUC range checks

2018-05-16 Thread Andres Freund
Hi,

On 2018-05-16 15:49:29 +0300, Heikki Linnakangas wrote:
> Here's a pretty straightforward fix for these two issues. Any objections or
> better ideas?

Generally ok, two minor points:

> diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
> index 7cd2d2d80e..93402030f7 100644
>   {"TB", GUC_UNIT_BLOCKS, (1024 * 1024 * 1024) / (BLCKSZ / 1024)},
>   {"GB", GUC_UNIT_BLOCKS, (1024 * 1024) / (BLCKSZ / 1024)},
>   {"MB", GUC_UNIT_BLOCKS, 1024 / (BLCKSZ / 1024)},
>   {"kB", GUC_UNIT_BLOCKS, -(BLCKSZ / 1024)},
> + {"B", GUC_UNIT_BLOCKS, -(BLCKSZ / (1024 * 1024))},

Isn't this 0 in the common case of 8k pages?


>   {"TB", GUC_UNIT_XBLOCKS, (1024 * 1024 * 1024) / (XLOG_BLCKSZ / 1024)},
>   {"GB", GUC_UNIT_XBLOCKS, (1024 * 1024) / (XLOG_BLCKSZ / 1024)},
>   {"MB", GUC_UNIT_XBLOCKS, 1024 / (XLOG_BLCKSZ / 1024)},
>   {"kB", GUC_UNIT_XBLOCKS, -(XLOG_BLCKSZ / 1024)},
> + {"B", GUC_UNIT_XBLOCKS, -(XLOG_BLCKSZ / (1024 * 1024))},

Same?

Greetings,

Andres Freund



Re: Removing unneeded self joins

2018-05-16 Thread Simon Riggs
On 16 May 2018 at 11:26, Robert Haas  wrote:
> On Wed, May 16, 2018 at 12:08 PM, Tom Lane  wrote:
>> Alexander Kuzmenkov  writes:
>>> There is a join optimization we don't do -- removing inner join of a
>>> table with itself on a unique column. Such joins are generated by
>>> various ORMs, so from time to time our customers ask us to look into
>>> this. Most recently, it was discussed on the list in relation to an
>>> article comparing the optimizations that some DBMS make [1].
>>
>> This is the sort of thing that I always wonder why the customers don't
>> ask the ORM to stop generating such damfool queries.  Its *expensive*
>> for us to clean up after their stupidity; almost certainly, it would
>> take far fewer cycles, net, for them to be a bit smarter in the first
>> place.
>
> The trouble, of course, is that the customer didn't write the ORM,
> likely has no idea how it works, and doesn't want to run a modified
> version of it even if they do.  If the queries run faster on other
> systems than they do on PostgreSQL, we get dinged -- not unjustly.
>
> Also, I'm not sure that I believe that it's always easy to avoid
> generating such queries.  I mean, this case is trivial so it's easy to
> say, well, just rewrite the query.  But suppose that I have a fact
> table over which I've created two views, each of which performs
> various joins between the fact table and various lookup tables.  My
> queries are such that I normally need the joins in just one of these
> two views and not the other to fetch the information I care about.
> But every once in a while I need to run a report that involves pulling
> every column possible.  The obvious solution is to join the views on
> the underlying table's primary key, but then you get this problem.  Of
> course there's a workaround: define a third view that does both sets
> of joins-to-lookup-tables.  But that starts to feel like you're
> handholding the database; surely it's the database's job to optimize
> queries, not the user's.
>
> It's been about 10 years since I worked as a web developer, but I do
> remember hitting this kind of problem from time to time and I'd really
> like to see us do something about it.  I wish we could optimize away
> inner joins, too, for similar reasons.

I agree with everything you say.

What I would add is that I've seen cases where the extra joins do NOT
hurt performance, so the extra CPU used to remove the join hurts more
than the benefit of removing it. Yes, we tried it.

More advanced optimizations should only be applied when we've assessed
that the likely run time is high enough to make it worth investing in
further optimization.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Removing unneeded self joins

2018-05-16 Thread Tom Lane
Simon Riggs  writes:
> What I would add is that I've seen cases where the extra joins do NOT
> hurt performance, so the extra CPU used to remove the join hurts more
> than the benefit of removing it. Yes, we tried it.

Interesting.  The concern I had was more about the cost imposed on every
query to detect self-joins and try to prove them useless, even in queries
where no benefit ensues.  It's possible that we can get that down to the
point where it's negligible; but this says that even the successful-proof
case has to be very cheap.

regards, tom lane



Re: Removing unneeded self joins

2018-05-16 Thread Jonathan S. Katz

> On May 16, 2018, at 1:58 PM, Andres Freund  wrote:
> 
> Hi,
> 
> On 2018-05-16 12:26:48 -0400, Robert Haas wrote:
>> Also, I'm not sure that I believe that it's always easy to avoid
>> generating such queries.
> 
> Yea. There's obviously plenty cases where ORMs just want to make the
> database hurt. But especially when building a join between a number of
> tables based on various fields, it's not going to be easy for the ORM to
> figure out which ones can be safely omitted. It'd need similar
> optimization as we'd have to do, without having the infrastructure core
> PG has.  And then there's, as you say, views etc…

Are there specific examples of what the ORM code is that generated
the SQL? I’m more curious to see what people are writing that
generates such code. As earlier mentioned we could always report back
to the specific ORM maintainer(s) such examples and see if they could
tweak.

Jonathan


Re: [PROPOSAL] Shared Ispell dictionaries

2018-05-16 Thread Arthur Zakirov
On Wed, May 16, 2018 at 09:33:46AM -0400, Robert Haas wrote:
> > In sum, I think the problem is mostly solved. Backend 2 unpins the
> > segment in next ts_lexize() call. But if backend 2 doesn't call
> > ts_lexize() (or other TS function) anymore the segment will remain mapped.
> > It is the only problem I see for now.
> 
> Maybe you could use CacheRegisterSyscacheCallback to get a callback
> when the backend notices that a DROP has occurred.

Yes, it was the first approach. DSM segments was unpinned in
InvalidateTSCacheCallBack() in that approach, which is registered using
CacheRegisterSyscacheCallback().

I haven't deep knowledge about guts of invalidation callbacks. It seems
that there is problem with it. Tom pointed above:

> > I'm not sure that I understood the second case correclty. Can cache
> > invalidation help in this case? I don't have confident knowledge of cache
> > invalidation. It seems to me that InvalidateTSCacheCallBack() should
> > release segment after commit.
>
> "Release after commit" sounds like a pretty dangerous design to me,
> because a release necessarily implies some kernel calls, which could
> fail. We can't afford to inject steps that might fail into post-commit
> cleanup (because it's too late to recover by failing the transaction).
> It'd be better to do cleanup while searching for a dictionary to use.

But it is possible that I misunderstood his note.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: Removing unneeded self joins

2018-05-16 Thread Simon Riggs
On 16 May 2018 at 15:10, Tom Lane  wrote:
> Simon Riggs  writes:
>> What I would add is that I've seen cases where the extra joins do NOT
>> hurt performance, so the extra CPU used to remove the join hurts more
>> than the benefit of removing it. Yes, we tried it.
>
> Interesting.  The concern I had was more about the cost imposed on every
> query to detect self-joins and try to prove them useless, even in queries
> where no benefit ensues.  It's possible that we can get that down to the
> point where it's negligible; but this says that even the successful-proof
> case has to be very cheap.

What I was advocating was an approach that varies according to the
query cost, so we don't waste time trying to tune the heck out of OLTP
queries, but for larger queries we might take a more considered
approach.

For advanced optimizations that are costly to check for, skip the
check if we are already below a cost threshold. The threshold would be
a heuristic that varies according to the cost of the check.

I realise that in this case we wouldn't know the full query cost until
we've done join planning, so we would need some lower bound estimate
to check whether its worth trying to remove joins.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Removing unneeded self joins

2018-05-16 Thread David Rowley
On 17 May 2018 at 03:43, Alexander Kuzmenkov  wrote:
> I'd be glad to hear your thoughts on this.

(I only glanced at the patch)

I've thought and discussed this before on this list. I think the
arguments for and against it were much the same as you've received
already. If you trawl through the archives you'll see my argument for
matches quite closely to Robert regarding the nested-views. I
personally experienced this issue in my previous job, although it was
not with PostgreSQL.

I think it's worth doing this providing that we can fast-path out
quickly enough in cases where we can't possibly remove anything.
Likely the success of this patch depends on how quick that fast-path
is.

>From my experience on join removals, I imagine all this can be done
just after the left join removal code has completed. I see your patch
does it much later, which I don't think is particularly great since
Paths have already been generated by that time. I think it makes sense
to do this as early as possible to save wasting planning work for
relations that will be removed.

I think all this can be done just after left joins are removed by
remove_useless_joins. You may like to move the code that exists in
that function today into a new static function named
remove_useless_left_joins, and put this new code in new static
function named remove_useless_self_joins:

1. Allocate an array root->simple_rel_array_size in size. Populate it
with a struct which is defined as struct { Index relid; Oid oid; }
2. Populate that array by looping over the simple_rel_array. Ignore
anything that's not a baserel with relkind = 'r'
3. qsort the array on Oid.
4. Make a pass over the array (up to its size - 1) looking for
elements where the current oid is the same as the next. Build a List
of RelIds containing all relids of Oids which are duplicated.
5. If no pairs. Abort.
6. Process each combination of pairs found in each Relids in the list
made in step 1. Probably start at the lowest relid.
7. For each pair:
  a. If there's a join condition, ensure all join OpExprs are equality
exprs with a mergejoinable opno (copy what left join removal check
with the opno used). Ensure Vars used in the OpExpr have the same
attrno on each side.
  b. For bonus points don't reject non-Vars used in the join
condition, but ensure they're equal and there are no non-immutable
functions inside.
  c. Ensure relation_has_unique_index_for returns true for the Vars
(and Exprs if doing b) used in the join condition.
  d. Choose the relation with the highest relid and rewrite the parse
changing the varno of all Vars to use the one of the relation with the
lowest relid.
  e. list_concat baserestictinfos from removed relation onto other relation.
  f. Check for Vars in the join condition that can contain NULLs and
lappend IS NOT NULLs into the baserestrictinfo. (Removing the join
could have removed NULL filtering)
  g. Mark highest relid relation as DEAD. (See what the left join
removal code does (you may need to do some extra work around
equivalence classes))


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Removing unneeded self joins

2018-05-16 Thread David Rowley
On 17 May 2018 at 08:44, Simon Riggs  wrote:
> What I was advocating was an approach that varies according to the
> query cost, so we don't waste time trying to tune the heck out of OLTP
> queries, but for larger queries we might take a more considered
> approach.

That's tricky. If we do this, it should be done before Path
generation, so not much is known about the costs in those case.

Perhaps something can be done by looking at the number of relpages,
but I've no idea what that would be. Perhaps we need to see how costly
this operation is first before we try to think of ways to only apply
it conditionally?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Removing unneeded self joins

2018-05-16 Thread Alexander Kuzmenkov

David,

Many thanks for the detailed explanation. I'll try to code it up and 
measure how much overhead it introduces.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Removing unneeded self joins

2018-05-16 Thread Thomas Munro
On Thu, May 17, 2018 at 3:43 AM, Alexander Kuzmenkov
 wrote:
> There is a join optimization we don't do -- removing inner join of a table
> with itself on a unique column. Such joins are generated by various ORMs, so
> from time to time our customers ask us to look into this. Most recently, it
> was discussed on the list in relation to an article comparing the
> optimizations that some DBMS make [1].
>
> ...
>
> I'd be glad to hear your thoughts on this.

+1

Some thoughts:

There might be some interesting corner cases involving self-joins in
UPDATE/DELETE statements, and also FOR UPDATE etc.  Those can result
in some surprising results in a self-join (one side is subject to EPQ
and the other isn't) which I think might be changed by your patch
(though I didn't try it or read the patch very closely).

IIUC in DB2 (the clear winner at join elimination in the article you
mentioned), you get these sorts of things by default (optimisation
level 5 includes it), but not if you SET CURRENT QUERY OPTIMIZATION =
3 as many articles recommend for OLTP work.  I think it's interesting
that they provide that knob rather than something automatic, and
interesting that there is one linear knob to classify your workload
rather than N knobs for N optimisations.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: NaNs in numeric_power (was Re: Postgres 11 release notes)

2018-05-16 Thread David Rowley
On 16 May 2018 at 09:55, Tom Lane  wrote:
> David Rowley  writes:
>> On 16 May 2018 at 02:01, Tom Lane  wrote:
>>> I'm not particularly fussed about getting credit for that.  However,
>>> looking again at how that patch series turned out --- ie, that
>>> we ensured POSIX behavior for NaNs only in HEAD --- I wonder
>>> whether we shouldn't do what was mentioned in the commit log for
>>> 6bdf1303, and teach numeric_pow() about these same special cases.
>>> It seems like it would be more consistent to change both functions
>>> for v11, rather than letting that other shoe drop in some future
>>> major release.
>
>> I'm inclined to agree. It's hard to imagine these two functions
>> behaving differently in regards to NaN input is useful to anyone.
>
> Here's a proposed patch for that.

Looks good to me.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Memory unit GUC range checks

2018-05-16 Thread Alexander Korotkov
On Wed, May 16, 2018 at 10:41 PM, Andres Freund  wrote:

> On 2018-05-16 15:49:29 +0300, Heikki Linnakangas wrote:
> > Here's a pretty straightforward fix for these two issues. Any objections
> or
> > better ideas?
>
> Generally ok, two minor points:
>
> > diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
> > index 7cd2d2d80e..93402030f7 100644
> >   {"TB", GUC_UNIT_BLOCKS, (1024 * 1024 * 1024) / (BLCKSZ / 1024)},
> >   {"GB", GUC_UNIT_BLOCKS, (1024 * 1024) / (BLCKSZ / 1024)},
> >   {"MB", GUC_UNIT_BLOCKS, 1024 / (BLCKSZ / 1024)},
> >   {"kB", GUC_UNIT_BLOCKS, -(BLCKSZ / 1024)},
> > + {"B", GUC_UNIT_BLOCKS, -(BLCKSZ / (1024 * 1024))},
>
> Isn't this 0 in the common case of 8k pages?
>
>
> >   {"TB", GUC_UNIT_XBLOCKS, (1024 * 1024 * 1024) / (XLOG_BLCKSZ /
> 1024)},
> >   {"GB", GUC_UNIT_XBLOCKS, (1024 * 1024) / (XLOG_BLCKSZ / 1024)},
> >   {"MB", GUC_UNIT_XBLOCKS, 1024 / (XLOG_BLCKSZ / 1024)},
> >   {"kB", GUC_UNIT_XBLOCKS, -(XLOG_BLCKSZ / 1024)},
> > + {"B", GUC_UNIT_XBLOCKS, -(XLOG_BLCKSZ / (1024 * 1024))},
>
> Same?
>

As I understand, in these cases multiplier should be just -BLCKSZ and
-XLOG_BLCKSZ correspondingly.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Removing unneeded self joins

2018-05-16 Thread Andres Freund
HI,

On 2018-05-17 08:48:58 +1200, David Rowley wrote:
> On 17 May 2018 at 08:44, Simon Riggs  wrote:
> > What I was advocating was an approach that varies according to the
> > query cost, so we don't waste time trying to tune the heck out of OLTP
> > queries, but for larger queries we might take a more considered
> > approach.
> 
> That's tricky. If we do this, it should be done before Path
> generation, so not much is known about the costs in those case.
> 
> Perhaps something can be done by looking at the number of relpages,
> but I've no idea what that would be. Perhaps we need to see how costly
> this operation is first before we try to think of ways to only apply
> it conditionally?

I'm also not buying that this isn't a benefit in OLTP in general. Sure,
for a single query RTT costs are going to dominate, but if you use
prepared statements the costs are going to pay of over multiple
executions.  Even just avoiding initializing unnecessary executor nodes
shows up in profiles.

Greetings,

Andres Freund



Re: Removing unneeded self joins

2018-05-16 Thread Tom Lane
David Rowley  writes:
> On 17 May 2018 at 08:44, Simon Riggs  wrote:
>> What I was advocating was an approach that varies according to the
>> query cost, so we don't waste time trying to tune the heck out of OLTP
>> queries, but for larger queries we might take a more considered
>> approach.

> That's tricky. If we do this, it should be done before Path
> generation, so not much is known about the costs in those case.

Yeah.  It'd have to be a very heuristic thing that doesn't account
for much beyond the number of relations in the query, and maybe their
sizes --- although I don't think we even know the latter at the
point where join removal would be desirable.  (And note that one of
the desirable benefits of join removal is not having to find out the
sizes of removed rels ... so just swapping that around doesn't appeal.)

regards, tom lane



Re: [BUGFIX] amcanbackward is not checked before building backward index paths

2018-05-16 Thread Alexander Korotkov
On Wed, May 16, 2018 at 4:54 PM, Tom Lane  wrote:

> Alexander Korotkov  writes:
> > On Wed, May 16, 2018 at 1:41 AM, Tom Lane  wrote:
> >> Perhaps there is a case for adding an additional flag to allow
> specifying
> >> "I can't support ORDER BY DESC", but I'm not in a big hurry to do so.
> >> I think there would be more changes than this needed to handle such a
> >> restriction, anyway.
>
> > OK, got it.  We'll probably propose a patch implementing that to the
> > next commitfest.
>
> If you do, it wouldn't be a bad idea to try to clarify the existing
> code and docs around this point.  I'm thinking that amcanbackward is
> misleadingly named; maybe we should rename it as part of the change?
>

I was thinking about naming new property as amcanorderbydesc,
which is kind of non-overlapping with amcanbackward.  For sure,
amcanbackward could be renamed, but I don't have ideas of better
name for now.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Removing unneeded self joins

2018-05-16 Thread David Rowley
On 17 May 2018 at 10:13, Tom Lane  wrote:
> Yeah.  It'd have to be a very heuristic thing that doesn't account
> for much beyond the number of relations in the query, and maybe their
> sizes --- although I don't think we even know the latter at the
> point where join removal would be desirable.  (And note that one of
> the desirable benefits of join removal is not having to find out the
> sizes of removed rels ... so just swapping that around doesn't appeal.)

There's probably some argument for delaying obtaining the relation
size until after join removal and probably partition pruning too, but
it's currently done well before that in build_simple_rel, where the
RelOptInfo is built.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: NaNs in numeric_power (was Re: Postgres 11 release notes)

2018-05-16 Thread Dean Rasheed
On 16 May 2018 at 14:44, Tom Lane  wrote:
> Dean Rasheed  writes:
>> In the case 1 ^ NaN = 1, what should the result scale be?
>
> The result is exact, so I don't see a reason to be worried about its
> scale.  Messing with the scale would also require expending even
> more code on what is, in the end, a corner case that no users have
> expressed concern about.
>

OK, fine. Not really worth worrying about.

Regards,
Dean



Re: Removing unneeded self joins

2018-05-16 Thread Tom Lane
Thomas Munro  writes:
> IIUC in DB2 (the clear winner at join elimination in the article you
> mentioned), you get these sorts of things by default (optimisation
> level 5 includes it), but not if you SET CURRENT QUERY OPTIMIZATION =
> 3 as many articles recommend for OLTP work.  I think it's interesting
> that they provide that knob rather than something automatic, and
> interesting that there is one linear knob to classify your workload
> rather than N knobs for N optimisations.

There's a lot to be said for that type of approach, as opposed to trying
to drive it off some necessarily-very-inexact preliminary estimate of
query cost.  For example, the mere fact that you're joining giant tables
doesn't in itself suggest that extra efforts in query optimization will be
repaid.  (If anything, it seems more likely that the user would've avoided
silliness like useless self-joins in such a case.)

A different line of thought is that, to me, the most intellectually
defensible rationale for efforts like const-simplification and join
removal is that opportunities for those things can arise after view
expansion, even in queries where the original query text didn't seem
to contain anything extraneous.  (Robert and Andres alluded to this
upthread, but not very clearly.)  So maybe we could track how much
the query got changed during rewriting, and use that to drive the
planner's decisions about how hard to work later on.  But I'm not
very sure that this'd be superior to having a user-visible knob.

regards, tom lane



Re: Removing unneeded self joins

2018-05-16 Thread Andres Freund
On 2018-05-16 18:37:11 -0400, Tom Lane wrote:
> Thomas Munro  writes:
> > IIUC in DB2 (the clear winner at join elimination in the article you
> > mentioned), you get these sorts of things by default (optimisation
> > level 5 includes it), but not if you SET CURRENT QUERY OPTIMIZATION =
> > 3 as many articles recommend for OLTP work.  I think it's interesting
> > that they provide that knob rather than something automatic, and
> > interesting that there is one linear knob to classify your workload
> > rather than N knobs for N optimisations.
> 
> There's a lot to be said for that type of approach, as opposed to trying
> to drive it off some necessarily-very-inexact preliminary estimate of
> query cost.  For example, the mere fact that you're joining giant tables
> doesn't in itself suggest that extra efforts in query optimization will be
> repaid.  (If anything, it seems more likely that the user would've avoided
> silliness like useless self-joins in such a case.)

For prepared statements we could also start making more expensive
optimizations after the first execution, when we know how long the query
took / how expensive it was (also, if we had a plan cache...).

Greetings,

Andres Freund



Re: Removing unneeded self joins

2018-05-16 Thread David Rowley
On 17 May 2018 at 10:37, Tom Lane  wrote:
> Thomas Munro  writes:
>> IIUC in DB2 (the clear winner at join elimination in the article you
>> mentioned), you get these sorts of things by default (optimisation
>> level 5 includes it), but not if you SET CURRENT QUERY OPTIMIZATION =
>> 3 as many articles recommend for OLTP work.  I think it's interesting
>> that they provide that knob rather than something automatic, and
>> interesting that there is one linear knob to classify your workload
>> rather than N knobs for N optimisations.
>
> There's a lot to be said for that type of approach, as opposed to trying
> to drive it off some necessarily-very-inexact preliminary estimate of
> query cost.  For example, the mere fact that you're joining giant tables
> doesn't in itself suggest that extra efforts in query optimization will be
> repaid.  (If anything, it seems more likely that the user would've avoided
> silliness like useless self-joins in such a case.)
>
> A different line of thought is that, to me, the most intellectually
> defensible rationale for efforts like const-simplification and join
> removal is that opportunities for those things can arise after view
> expansion, even in queries where the original query text didn't seem
> to contain anything extraneous.  (Robert and Andres alluded to this
> upthread, but not very clearly.)  So maybe we could track how much
> the query got changed during rewriting, and use that to drive the
> planner's decisions about how hard to work later on.  But I'm not
> very sure that this'd be superior to having a user-visible knob.

This seems like a good line of thought.  Perhaps a knob is a good
first step, then maybe having the ability to set that knob to
"automatic" is something to aspire for later.

I don't think Alexander should work on this as part of this patch
though. Perhaps we can re-evaluate when Alexander posts some planner
benchmarks from the patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Removing unneeded self joins

2018-05-16 Thread Tom Lane
David Rowley  writes:
> On 17 May 2018 at 10:13, Tom Lane  wrote:
>> Yeah.  It'd have to be a very heuristic thing that doesn't account
>> for much beyond the number of relations in the query, and maybe their
>> sizes --- although I don't think we even know the latter at the
>> point where join removal would be desirable.  (And note that one of
>> the desirable benefits of join removal is not having to find out the
>> sizes of removed rels ... so just swapping that around doesn't appeal.)

> There's probably some argument for delaying obtaining the relation
> size until after join removal and probably partition pruning too, but
> it's currently done well before that in build_simple_rel, where the
> RelOptInfo is built.

Yeah, but that's something we ought to fix someday; IMO it's an artifact
of having wedged in remove_useless_joins without doing the extensive
refactoring that'd be needed to do it at a more desirable time.  I don't
want to build user-visible behavior that's dependent on doing that wrong.

(But wait a second ... we could improve this without quite that much work:
instead of doing estimate_rel_size immediately during get_relation_info,
couldn't it be left until the set_base_rel_sizes pass?  Since
RelationGetNumberOfBlocks involves kernel calls, skipping it for removed
rels seems worth doing.)

regards, tom lane



Re: Removing unneeded self joins

2018-05-16 Thread Andres Freund
On 2018-05-16 18:55:41 -0400, Tom Lane wrote:
> David Rowley  writes:
> > On 17 May 2018 at 10:13, Tom Lane  wrote:
> >> Yeah.  It'd have to be a very heuristic thing that doesn't account
> >> for much beyond the number of relations in the query, and maybe their
> >> sizes --- although I don't think we even know the latter at the
> >> point where join removal would be desirable.  (And note that one of
> >> the desirable benefits of join removal is not having to find out the
> >> sizes of removed rels ... so just swapping that around doesn't appeal.)
> 
> > There's probably some argument for delaying obtaining the relation
> > size until after join removal and probably partition pruning too, but
> > it's currently done well before that in build_simple_rel, where the
> > RelOptInfo is built.
> 
> Yeah, but that's something we ought to fix someday; IMO it's an artifact
> of having wedged in remove_useless_joins without doing the extensive
> refactoring that'd be needed to do it at a more desirable time.  I don't
> want to build user-visible behavior that's dependent on doing that wrong.

My patch that introduced a radix tree buffer mapping also keeps an
accurate relation size in memory, making it far cheaper to use. While I
depriorized the patchset for the moment (I'll post what I'm working on
first soon), that should address some of the cost till then.

Wonder if we shouldn't just cache an estimated relation size in the
relcache entry till then. For planning purposes we don't need to be
accurate, and usually activity that drastically expands relation size
will trigger relcache activity before long. Currently there's plenty
workloads where the lseeks(SEEK_END) show up pretty prominently.

Greetings,

Andres Freund



Re: Postgres 11 release notes

2018-05-16 Thread Bruce Momjian
On Wed, May 16, 2018 at 01:22:45PM +0900, Michael Paquier wrote:
> On Mon, May 14, 2018 at 08:45:44PM -0400, Bruce Momjian wrote:
> > What TLS does is to mix the offered ciphers into the negotiation hash so
> > a man-in-the-middle can't pretend it doesn't support something.  Could
> > we do something like that here?
> 
> I have to admit that I don't quite follow here, the shape of channel
> binding data is decided by RFC 5929, so we need to stick with it.

OK, let me explain.  First, there are two man-in-the-middle types of
attacks.  The first one allows the two legitimate ends of the TLS
connection to negotiate the shared secret between themselves, but
injects or changes some of the packets before the shared secret is
agreed upon, perhaps to downgrade the strength of the protocol options. 
TLS prevents this type of man-in-the-middle attack by hashing the shared
secret with a hash of all of the TLS negotiation messages previously
sent.  Each side who knows the shared secret can verify that no messages
were changed;  see:

  
https://security.stackexchange.com/questions/115284/how-can-an-attacker-downgrade-modify-the-cipher-suites-when-they-are-maced-fre

The second type of man-in-the-middle attack is where the
man-in-the-middle is negotiating the shared secret separately with the
two end points.  There is no way to detect this without having some
shared secret between the two valid end points.  One shared secret is
the password hashed with the shared secret, which is what tls-unique
uses.  The second uses the server certificate hashed with the shared
secret.  This is now documented in Postgres:

   In SCRAM with tls-unique
   channel binding, the shared secret negotiated during the SSL session
   is mixed into the user-supplied password hash.  The shared secret
   is partly chosen by the server, but not directly transmitted, making
   it impossible for a fake server to create an SSL connection with the
   client that has the same shared secret it has with the real server.
   SCRAM with tls-server-end-point
   mixes a hash of the server's certificate into the user-supplied password
   hash.  While a fake server can retransmit the real server's certificate,
   it doesn't have access to the private key matching that certificate, and
   therefore cannot prove it is the owner, causing SSL connection failure.

TLS prevents a man-in-the-middle from injecting invalid packets. 
However, it does not prevent a man-in-the-middle attack with separeate
shared secret negotiation unless certificates are used.

The current problem under discussion is the same as that of the
man-in-the-middle packet change/injection attack in that the
man-in-the-middle can change the options reported as supported by the
Postgres server, before the password is sent.  The most efficient fix
for this would be for the all Postgres protocol messages up to the point
where the authentication was chosen to be hashed with the password and
sent to the server.  In that way, if a man-in-the-middle changed the
server-reported supported authentication, the server would identify this
and refuse the connection.

The problem is that current and past-version PG authentication methods
don't have any protocol downgrade protection, and it is hard to add it
unless you just remove support for old protocols.  I think the only
reasonable solution is to allow the client and/or server to force
channel binding.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Postgres 11 release notes

2018-05-16 Thread Bruce Momjian
On Wed, May 16, 2018 at 08:59:23PM +0900, Michael Paquier wrote:
> On Wed, May 16, 2018 at 01:09:07PM +0300, Heikki Linnakangas wrote:
> > I have to agree with Bruce, that it's pretty useless to implement channel
> > binding, if there is no way to require it in libpq. IMHO that must be
> > fixed.
> 
> Wouldn't we want to also do something for the case where a client is
> willing to use SCRAM but that the server forces back MD5?  In which
> case, one possibility is a connection parameter like the following,
> named say authmethod:
> - An empty value is equivalent to the current behavior, and is the
> default.
> - 'scram' means that client is willing to use SCRAM, which would cause a
> failure if server attempts to enforce MD5.
> - 'scram-plus' means that client enforces SCRAM and channel binding.
> 
> Or we could just have a channel_binding_mode, which has a "require"
> value like sslmode, and "prefer" mode, which is the default and the
> current behavior...  Still what to do with MD5 requests in this case?

Just to reiterate, MD5 and SCRAM-less-binding is designed to avoid
packet _replay_.  It assumes no man-in-the-middle has adjusted what is
supported by the two endpoints.

SCRAM-with-binding is the first password method that attempts to avoid
man-in-the-middle attacks, and therefore is much less likely to be able
to trust what the endpoints supports.  I think it is really the
channel_binding_mode that we want to control at the client.  The lesser
modes are much more reasonable to use an automatic best-supported
negotiation, which is what we do now.

FYI, I think the server could also require channel binding for SCRAM. We
already have scram-sha-256 in pg_hba.conf, and I think
scram-sha-256-plus would be reasonable.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Postgres 11 release notes

2018-05-16 Thread Bruce Momjian
On Wed, May 16, 2018 at 09:55:05AM +0530, Amit Kapila wrote:
> On Wed, May 16, 2018 at 12:17 AM, Bruce Momjian  wrote:
> > On Tue, May 15, 2018 at 08:45:07AM +0530, Amit Kapila wrote:
> >> No, it is not like that.  We divide the scan among workers and each
> >> worker should perform projection of the rows it scanned (after
> >> applying filter).  Now, if the expensive functions are part of target
> >> lists, then we can push the computation of expensive functions (as
> >> part of target list) in workers which will divide the work.
> >>
> >> >  Really?  Do
> >> > we run each column in its own worker or do we split the result set into
> >> > parts and run those in parallel?  How do we know, just the function call
> >> > costs?
> >> >
> >>
> >> The function's cost can be determined via pg_proc->procost.  For this
> >> particular case, you can refer the call graph -
> >> create_pathtarget->set_pathtarget_cost_width->cost_qual_eval_node->cost_qual_eval_walker->get_func_cost
> >>
> >> >  I can admit I never saw that coming.
> >> >
> >>
> >> I think the use case becomes interesting with parallel query because
> >> now you can divide such cost among workers.
> >>
> >> Feel free to ask more questions if above doesn't clarify the usage of
> >> these features.
> >
> > OK, I have added the following release note item for both of these:
> >
> > 2017-11-16 [e89a71fb4] Pass InitPlan values to workers via Gather (Merge).
> > 2018-03-29 [3f90ec859] Postpone generate_gather_paths for topmost scan/join 
> > rel
> > 2018-03-29 [11cf92f6e] Rewrite the code that applies scan/join targets to 
> > paths
> >
> > Allow single-evaluation queries, e.g. FROM
> > clause queries, and functions in the target list to be
> > parallelized (Amit Kapila, Robert Haas)
> >
> 
> Sorry, but it is not clear to me what you intend to say by "e.g.
> FROM clause queries"?   What I could imagine is
> something like "e.g. queries in WHERE clause that
> return aggregate value"

Oh, sorry, changed to:

Allow single-evaluation queries, e.g. WHERE
clause aggregate queries, and functions in the target list to be
parallelized (Amit Kapila, Robert Haas)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Postgres 11 release notes

2018-05-16 Thread Bruce Momjian
On Wed, May 16, 2018 at 01:50:04PM +0300, Heikki Linnakangas wrote:
> >* Allow bytes to be specified for server variable sizes (Beena Emerson)
> >
> >The specification is "B".
> 
> I had to dig the commit in the git history to figure out what this is. I'd
> suggest rewording this:
> 
> * Allow server options related to memory and file sizes, to be specified as
> number of bytes.
> 
> The new unit is "B". This is in addition to "kB", "MB", "GB" and "TB", which
> were accepted previously.

OK, good, done.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Postgres 11 release notes

2018-05-16 Thread Bruce Momjian
On Wed, May 16, 2018 at 10:08:18AM -0400, Alvaro Herrera wrote:
> On 2018-May-15, Bruce Momjian wrote:
> 
> > On Wed, May 16, 2018 at 09:01:35AM +0900, Tatsuo Ishii wrote:
> > > There's a small typo.
> > > 
> > > > Add support for with huge(large) pages on Windows (Takayuki Tsunakawa, 
> > > > Thomas Munro) 
> > > 
> > > I think a space between "huge" and "(large)" is needed.
> > 
> > Done, URL updated.
> 
> I'm not sure why we say "huge (large) pages".  The natural term for
> Windows is "large-pages",
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa366720(v=vs.85).aspx
> so I think we should use that terminology.  Maybe something like
> 
> Add support for large pages on Windows (Takayuki 
> Tsunakawa, Thomas Munro)
> This is controlled by the huge_pages configuration parameter.

OK, done.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Odd procedure resolution

2018-05-16 Thread Michael Paquier
On Wed, May 16, 2018 at 03:37:18PM -0400, Robert Haas wrote:
> On Wed, May 16, 2018 at 3:29 PM, Tom Lane  wrote:
> > BTW, we seem to have some confusion here already:
> >
> > regression=# create function foo(int) returns int as 'select $1' language 
> > sql;
> > CREATE FUNCTION
> > regression=# create procedure foo(text) as 'select $1' language sql;
> > CREATE PROCEDURE
> > regression=# drop function foo;
> > ERROR:  function name "foo" is not unique
> > HINT:  Specify the argument list to select the function unambiguously.
> > regression=# drop routine foo;
> > ERROR:  function name "foo" is not unique
> > HINT:  Specify the argument list to select the function unambiguously.
> > regression=# drop procedure foo;
> > ERROR:  could not find a procedure named "foo"
> >
> > The first two errors are what I'd expect, but why is the third
> > different?
> 
> Good question.

I actually find the first error messages ambiguous as well, so that
looks like a bug to me when a lookup is done for those function names.
Shouldn't DROP FUNCTION work only on functions and DROP PROCEDURE only
on procedures?  It is documented that DROP ROUTINE can work on
aggregates, functions and procedures, but the docs tell a different
story about DROP PROCEDURE and DROP FUNCTION.
--
Michael


signature.asc
Description: PGP signature


Re: Postgres 11 release notes

2018-05-16 Thread Michael Paquier
On Wed, May 16, 2018 at 08:20:49PM -0400, Bruce Momjian wrote:
> SCRAM-with-binding is the first password method that attempts to avoid
> man-in-the-middle attacks, and therefore is much less likely to be able
> to trust what the endpoints supports.  I think it is really the
> channel_binding_mode that we want to control at the client.  The lesser
> modes are much more reasonable to use an automatic best-supported
> negotiation, which is what we do now.

Noted.  Which means that the parameter is ignored when using a non-SSL
connection, as well as when the server tries to enforce the use of
anything else than SCRAM.

> FYI, I think the server could also require channel binding for SCRAM. We
> already have scram-sha-256 in pg_hba.conf, and I think
> scram-sha-256-plus would be reasonable.

Noted as well.  There is of course the question of v10 libpq versions
which don't support channel binding, but if an admin is willing to set
up scram-sha-256-plus in pg_hba.conf then he can request his users to
update his drivers/libs as well.

What's the take of others?  Magnus, Stephen or Heikki perhaps (you've
been the most involved with SCRAM early talks)?
--
Michael


signature.asc
Description: PGP signature


Re: Postgres 11 release notes

2018-05-16 Thread Bruce Momjian
On Thu, May 17, 2018 at 09:56:49AM +0900, Michael Paquier wrote:
> On Wed, May 16, 2018 at 08:20:49PM -0400, Bruce Momjian wrote:
> > SCRAM-with-binding is the first password method that attempts to avoid
> > man-in-the-middle attacks, and therefore is much less likely to be able
> > to trust what the endpoints supports.  I think it is really the
> > channel_binding_mode that we want to control at the client.  The lesser
> > modes are much more reasonable to use an automatic best-supported
> > negotiation, which is what we do now.
> 
> Noted.  Which means that the parameter is ignored when using a non-SSL
> connection, as well as when the server tries to enforce the use of
> anything else than SCRAM.

Uh, a man-in-the-middle could prevent SSL or ask for a different
password authentication method and then channel binding would not be
used.  I think when you say you want channel binding, you have to fail
if you don't get it.

> > FYI, I think the server could also require channel binding for SCRAM. We
> > already have scram-sha-256 in pg_hba.conf, and I think
> > scram-sha-256-plus would be reasonable.
> 
> Noted as well.  There is of course the question of v10 libpq versions
> which don't support channel binding, but if an admin is willing to set
> up scram-sha-256-plus in pg_hba.conf then he can request his users to
> update his drivers/libs as well.

Yes, I don't see a way around it.  Once you accept that someone in the
middle can change what you request undetected, then you can't do 
fallback.  Imagine a man-in-the-middle with TLS where the
man-in-the-middle allows the two end-points to negotiate the shared
secret, but the man-in-the-middle forces a weak cipher.  This is what is
happening with Postgres when the man-in-the-middle forces a weaker
authentication method.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-16 Thread Amit Langote
On 2018/05/17 3:28, David Rowley wrote:
> On 17 May 2018 at 02:51, Robert Haas  wrote:
>> I think that's clearer.  Committed with a few tweaks that are
>> hopefully improvements.
> 
> Thanks for committing. Although, I disagree with your tweak:
> 
> +* 1-based index into the *pds list.
> 
> I think that's making the same mistake as the last comment did. You
> think it's 1-based because the index is being set with list_length
> rather than list_length - 1, but it can do that simply because the
> item has not been added to the list yet.
> 
> Nothing converts this index back to 0-based;
>
> RelationGetPartitionDispatchInfo builds the array from the list with:
> 
> i = 0;
> foreach(lc, pdlist)
> {
> pd[i++] = lfirst(lc);
> }
> 
> ExecFindPartition uses the pd array with:
> 
> parent = pd[-parent->indexes[cur_index]];
> 
> So if it was 1-based then we'd be off by one here.

That's right.  Even those negative values in the pd->indexes are still
0-based, with the 0th entry being for the root table.

> Maybe we can clear up that confusion with
> 
> + /*
> +  * No need to subtract 1 to get the 0-based index as the item for this
> +  * partitioned table has not been added to the list yet.
> +  */
> pd->indexes[i] = -list_length(*pds);
> 
> and just switch 1-based to 0-based in the new comment.

Or maybe, change the comment to say that even the negative indexes are
0-based like you pointed out, *but* instead of updating the comment like
you suggest above, change the other index value assignment statement to
not subtract 1 from the list_length by switching order with the
accompanying lappend; like this:

 if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE)
 {
+pd->indexes[i] = list_length(*leaf_part_oids);
 *leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid);
-pd->indexes[i] = list_length(*leaf_part_oids) - 1;
 }
 else
 {

Attached a patch.

Thanks,
Amit
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 75329b3624..da962c4375 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -975,7 +975,7 @@ get_partition_dispatch_recurse(Relation rel, Relation 
parent,
 * array element belongs to a leaf partition or a subpartitioned table.
 * For leaf partitions we store the 0-based index into *leaf_part_oids,
 * and for sub-partitioned tables we store a negative version of the
-* 1-based index into the *pds list.  When searching, if we see a 
negative
+* 0-based index into the *pds list.  When searching, if we see a 
negative
 * value, the search must continue in the corresponding sub-partition;
 * otherwise, we've identified the correct partition.
 */
@@ -986,8 +986,8 @@ get_partition_dispatch_recurse(Relation rel, Relation 
parent,
 
if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE)
{
+   pd->indexes[i] = list_length(*leaf_part_oids);
*leaf_part_oids = lappend_oid(*leaf_part_oids, 
partrelid);
-   pd->indexes[i] = list_length(*leaf_part_oids) - 1;
}
else
{


Re: Removing unneeded self joins

2018-05-16 Thread David Rowley
On 17 May 2018 at 10:55, Tom Lane  wrote:
> David Rowley  writes:
>> There's probably some argument for delaying obtaining the relation
>> size until after join removal and probably partition pruning too, but
>> it's currently done well before that in build_simple_rel, where the
>> RelOptInfo is built.
>
> Yeah, but that's something we ought to fix someday; IMO it's an artifact
> of having wedged in remove_useless_joins without doing the extensive
> refactoring that'd be needed to do it at a more desirable time.  I don't
> want to build user-visible behavior that's dependent on doing that wrong.
>
> (But wait a second ... we could improve this without quite that much work:
> instead of doing estimate_rel_size immediately during get_relation_info,
> couldn't it be left until the set_base_rel_sizes pass?  Since
> RelationGetNumberOfBlocks involves kernel calls, skipping it for removed
> rels seems worth doing.)

I did mean just obtaining the sizes, not delaying building the
RelOptInfo. I see nothing that needs RelOptInfo->pages before
set_base_rel_size apart from the code which I mentioned about moving a
couple of days ago in [1].

[1] 
https://www.postgresql.org/message-id/CAKJS1f_eUz0_h5_vU1rqE7wuxMcoENcWK2FTODz0pOyxp3_Uig%40mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Removing unneeded self joins

2018-05-16 Thread David Rowley
On 17 May 2018 at 11:00, Andres Freund  wrote:
> Wonder if we shouldn't just cache an estimated relation size in the
> relcache entry till then. For planning purposes we don't need to be
> accurate, and usually activity that drastically expands relation size
> will trigger relcache activity before long. Currently there's plenty
> workloads where the lseeks(SEEK_END) show up pretty prominently.

While I'm in favour of speeding that up, I think we'd get complaints
if we used a stale value.  We could have uses pg_class.relpages all
along, but it would cause the planner to not work so well in face of
the relation changing size significantly between analyze runs.

FWIW the major case where that does show up is when generating a plan
for a partitioned table with many partitions then pruning all but a
few of them.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-16 Thread David Rowley
On 17 May 2018 at 13:17, Amit Langote  wrote:
> Or maybe, change the comment to say that even the negative indexes are
> 0-based like you pointed out, *but* instead of updating the comment like
> you suggest above, change the other index value assignment statement to
> not subtract 1 from the list_length by switching order with the
> accompanying lappend; like this:
>
>  if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE)
>  {
> +pd->indexes[i] = list_length(*leaf_part_oids);
>  *leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid);
> -pd->indexes[i] = list_length(*leaf_part_oids) - 1;
>  }
>  else
>  {

That makes sense.  It's probably less confusing that way.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



partition -> partitioned

2018-05-16 Thread Amit Langote
Hi.

Commit 499be013de6 used 'partition' where it really meant 'partitioned' in
a few places including in a variable name.  For example, what almost all
places call 'partitioned_rels', make_partition_pruneinfo called
'partition_rels'.

Attached a patch to make that uniform to avoid confusion.

Thanks,
Amit
diff --git a/src/backend/partitioning/partprune.c 
b/src/backend/partitioning/partprune.c
index 58ec2a684d..c39b618cee 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -188,7 +188,7 @@ static bool partkey_datum_from_expr(PartitionPruneContext 
*context,
  * pruning would be useless.
  */
 List *
-make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
+make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 List *subpaths, List 
*prunequal)
 {
RelOptInfo *targetpart = NULL;
@@ -218,9 +218,9 @@ make_partition_pruneinfo(PlannerInfo *root, List 
*partition_rels,
relid_subnode_map[pathrel->relid] = i++;
}
 
-   /* Likewise for the partition_rels */
+   /* Likewise for the partitioned_rels */
i = 1;
-   foreach(lc, partition_rels)
+   foreach(lc, partitioned_res)
{
Index   rti = lfirst_int(lc);
 
@@ -229,8 +229,8 @@ make_partition_pruneinfo(PlannerInfo *root, List 
*partition_rels,
relid_subpart_map[rti] = i++;
}
 
-   /* We now build a PartitionPruneInfo for each partition_rels */
-   foreach(lc, partition_rels)
+   /* We now build a PartitionPruneInfo for each partitioned_rels */
+   foreach(lc, partitioned_rels)
{
Index   rti = lfirst_int(lc);
RelOptInfo *subpart = find_base_rel(root, rti);
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f90aa7b2a1..6bec45a2a3 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1593,7 +1593,7 @@ typedef struct PartitionPruneStepCombine
 typedef struct PartitionPruneInfo
 {
NodeTag type;
-   Oid reloid; /* Oid of partition rel 
*/
+   Oid reloid; /* Oid of the 
partitioned table */
List   *pruning_steps;  /* List of PartitionPruneStep */
Bitmapset  *present_parts;  /* Indexes of all partitions which 
subnodes
 * are present 
for. */
diff --git a/src/include/partitioning/partprune.h 
b/src/include/partitioning/partprune.h
index 3d114b4c71..ff034206f7 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -62,7 +62,8 @@ typedef struct PartitionPruneContext
 #define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
 
-extern List *make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
+extern List *make_partition_pruneinfo(PlannerInfo *root,
+List *partitioned_rels,
 List *subpaths, List 
*prunequal);
 extern Relids prune_append_rel_partitions(RelOptInfo *rel);
 extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,


Re: Removing unneeded self joins

2018-05-16 Thread Tom Lane
David Rowley  writes:
> On 17 May 2018 at 11:00, Andres Freund  wrote:
>> Wonder if we shouldn't just cache an estimated relation size in the
>> relcache entry till then. For planning purposes we don't need to be
>> accurate, and usually activity that drastically expands relation size
>> will trigger relcache activity before long. Currently there's plenty
>> workloads where the lseeks(SEEK_END) show up pretty prominently.

> While I'm in favour of speeding that up, I think we'd get complaints
> if we used a stale value.

Yeah, that scares me too.  We'd then be in a situation where (arguably)
any relation extension should force a relcache inval.  Not good.
I do not buy Andres' argument that the value is noncritical, either ---
particularly during initial population of a table, where the size could
go from zero to something-significant before autoanalyze gets around
to noticing.

I'm a bit skeptical of the idea of maintaining an accurate relation
size in shared memory, too.  AIUI, a lot of the problem we see with
lseek(SEEK_END) has to do with contention inside the kernel for access
to the single-point-of-truth where the file's size is kept.  Keeping
our own copy would eliminate kernel-call overhead, which can't hurt,
but it won't improve the contention angle.

regards, tom lane



Re: Removing unneeded self joins

2018-05-16 Thread Andres Freund
On 2018-05-16 22:11:22 -0400, Tom Lane wrote:
> David Rowley  writes:
> > On 17 May 2018 at 11:00, Andres Freund  wrote:
> >> Wonder if we shouldn't just cache an estimated relation size in the
> >> relcache entry till then. For planning purposes we don't need to be
> >> accurate, and usually activity that drastically expands relation size
> >> will trigger relcache activity before long. Currently there's plenty
> >> workloads where the lseeks(SEEK_END) show up pretty prominently.
> 
> > While I'm in favour of speeding that up, I think we'd get complaints
> > if we used a stale value.
> 
> Yeah, that scares me too.  We'd then be in a situation where (arguably)
> any relation extension should force a relcache inval.  Not good.
> I do not buy Andres' argument that the value is noncritical, either ---
> particularly during initial population of a table, where the size could
> go from zero to something-significant before autoanalyze gets around
> to noticing.

I don't think every extension needs to force a relcache inval. It'd
instead be perfectly reasonable to define a rule that an inval is
triggered whenever crossing a 10% relation size boundary. Which'll lead
to invalidations for the first few pages, but much less frequently
later.


> I'm a bit skeptical of the idea of maintaining an accurate relation
> size in shared memory, too.  AIUI, a lot of the problem we see with
> lseek(SEEK_END) has to do with contention inside the kernel for access
> to the single-point-of-truth where the file's size is kept.  Keeping
> our own copy would eliminate kernel-call overhead, which can't hurt,
> but it won't improve the contention angle.

A syscall is several hundred instructions. An unlocked read - which'll
be be sufficient in many cases, given that the value can quickly be out
of date anyway - is a few cycles. Even with a barrier you're talking a
few dozen cycles.  So I can't see how it'd not improve the contention.

But the main reason for keeping it in shmem is less the lseek avoidance
- although that's nice, context switches aren't great - but to make
relation extension need far less locking.

Greetings,

Andres Freund



Re: Postgres 11 release notes

2018-05-16 Thread Amit Kapila
On Thu, May 17, 2018 at 5:54 AM, Bruce Momjian  wrote:
>
> Oh, sorry, changed to:
>
> Allow single-evaluation queries, e.g. WHERE
> clause aggregate queries, and functions in the target list to be
> parallelized (Amit Kapila, Robert Haas)
>

LGTM.  Thanks.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: partition -> partitioned

2018-05-16 Thread David Rowley
On 17 May 2018 at 13:52, Amit Langote  wrote:
> Commit 499be013de6 used 'partition' where it really meant 'partitioned' in
> a few places including in a variable name.  For example, what almost all
> places call 'partitioned_rels', make_partition_pruneinfo called
> 'partition_rels'.
>
> Attached a patch to make that uniform to avoid confusion.

Looks good to me.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: partition -> partitioned

2018-05-16 Thread Amit Langote
On 2018/05/17 11:40, David Rowley wrote:
> On 17 May 2018 at 13:52, Amit Langote  wrote:
>> Commit 499be013de6 used 'partition' where it really meant 'partitioned' in
>> a few places including in a variable name.  For example, what almost all
>> places call 'partitioned_rels', make_partition_pruneinfo called
>> 'partition_rels'.
>>
>> Attached a patch to make that uniform to avoid confusion.
> 
> Looks good to me.

Thanks for taking a look at it.

Regards,
Amit




Re: Oddity in COPY FROM handling of check constraints on partition tables

2018-05-16 Thread Etsuro Fujita

(2018/05/17 0:27), Robert Haas wrote:

On Tue, Mar 27, 2018 at 9:00 AM, Etsuro Fujita
  wrote:

Attached is a patch for fixing this issue.


This no longer applies.


The patch has already been committed by you [1].  Thanks for committing!

Best regards,
Etsuro Fujita

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=cfbecf8100ecb83c07c2017f843b0642580416bf;hp=870d89608e5f891275d0b752560c827dbce3d7b4




Re: postgres_fdw: Oddity in pushing down inherited UPDATE/DELETE joins to remote servers

2018-05-16 Thread Etsuro Fujita

(2018/05/17 1:16), Robert Haas wrote:

On Fri, May 11, 2018 at 8:46 AM, Etsuro Fujita
  wrote:

I added an assertion test to make_modifytable to match that in
create_modifytable_path.  Attached is an updated version of the patch.


Committed.


Thanks you!


Was it just good luck that this ever worked at all?  I mean:

-if (rti<  root->simple_rel_array_size&&
-root->simple_rel_array[rti] != NULL)
+if (rti<  subroot->simple_rel_array_size&&
+subroot->simple_rel_array[rti] != NULL)
  {
-RelOptInfo *resultRel = root->simple_rel_array[rti];
+RelOptInfo *resultRel = subroot->simple_rel_array[rti];

  fdwroutine = resultRel->fdwroutine;
  }
  else
  {
-RangeTblEntry *rte = planner_rt_fetch(rti, root);
+RangeTblEntry *rte = planner_rt_fetch(rti, subroot);

...an RTI is only meaningful relative to a particular PlannerInfo's
range table.  It can't be right to just take an RTI for one
PlannerInfo and look it up in some other PlannerInfo's range table.


I think that can be right; because inheritance_planner generates the 
simple_rel_array and simple_rte_array for the parent PlannerInfo so that 
it allows us to do that.  This is the logic that that function creates 
the simple_rel_array for the parent PlannerInfo:


/*
 * We need to collect all the RelOptInfos from all child plans into
 * the main PlannerInfo, since setrefs.c will need them.  We 
use the
 * last child's simple_rel_array (previous ones are too short), 
so we
 * have to propagate forward the RelOptInfos that were already 
built

 * in previous children.
 */
Assert(subroot->simple_rel_array_size >= save_rel_array_size);
for (rti = 1; rti < save_rel_array_size; rti++)
{
RelOptInfo *brel = save_rel_array[rti];

if (brel)
subroot->simple_rel_array[rti] = brel;
}

Best regards,
Etsuro Fujita



Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-16 Thread David Rowley
On 17 May 2018 at 01:19, Robert Haas  wrote:
> Hmm, that's actually not as bad as I thought.  Thanks for the
> explanation.  I think if I were going to try to improve things, I'd
> try to annotate the Append node with the name of the partitioned table
> that it's using for pruning in case #2 and case #3, and maybe
> something to indicate which type of pruning is in use.  That would
> make it really clear whether pruning is enabled or not.  The methods
> you mention above sort of require reading the tea leaves -- and it
> might not always be very easy to distinguish between cases where
> pruning is possible but nothing got pruned (imagine an inequality
> qual) and where it's not even possible in the first place.
>
> e.g.
>
> Append
>   Execution-Time Pruning: order_lines (at executor startup)
>   -> Index Scan ...

Perhaps Append should be shown as "Unordered Partitioned Table Scan on
". That seems more aligned to how else we show which relation a
node belongs to. The partition being scanned is simple to obtain. It's
just the first item in the partitioned_rels List. (MergeAppend would
be an "Ordered Partitioned Table Scan")

I'm not really a fan of overloading properties with a bunch of text.
Multiple int or text properties would be easier to deal with,
especially so when you consider the other explain formats. Remember,
all 3 pruning methods could be used for a single Append node.

I guess doing work here would require additional code in the planner
to track how many relations were removed by both partition pruning and
constraint exclusion. Dunno if that would be tracked together or
separately. However, I'd prefer to have a clear idea of what exactly
the design should be before I go write some code that perhaps nobody
will like.

Unsure what you have in mind for the pruning done during actual
execution; just a yay or nay as to whether we're attempting it or not?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



[Bug Fix]ECPG: cancellation of significant digits on ECPG

2018-05-16 Thread Higuchi, Daisuke
Hi, 
Currently our customer uses PostgreSQL 9.6 and hits ECPG's bug during using 
numeric data type by SQLDA. 
I confirmed that this problem is occurred on master and 9.6 latest branch. 

PROBLEM
---
When the integer part of numeric data type is "0", cancellation of significant 
digits is occurred. 
For example, I expect "0.12345", but I got "0.12340". When I expect "0.01234", 
I got "0.01200"
I attached the sample application code to reproduce this problem. 

CAUSE
---
When copy the data of numeric data, the size is wrong. 
"src/interfaces/ecpg/ecpglib/sqlda.c" has problem. 

ecpg_set_native_sqlda(int lineno, struct sqlda_struct ** _sqlda, const PGresult 
*res, int row, enum COMPAT_MODE compat)
{
...
if (num->ndigits)
{
ecpg_sqlda_align_add_size(next_offset, sizeof(int), num->ndigits + 1, 
&offset, &next_offset);
memcpy((char *) sqlda + offset, num->buf, num->ndigits + 1);

((numeric *) sqlda->sqlvar[i].sqldata)->buf = (NumericDigit *) sqlda + 
offset;
((numeric *) sqlda->sqlvar[i].sqldata)->digits = (NumericDigit *) sqlda 
+ offset + (num->digits - num->buf);
}
...

When numeric data is "0.12345", num->buf has "0 0 1 2 3 4 5" and num->digits 
has "1 2 3 4 5". 
num->ndigits has the number of digits which is or later "1", it means 5.

In this code, currently copy "num->ndigits + 1" as size of numeric data. 
As a result, (char *) sqlda + offset has "0 0 1 2 3 4", not "0 0 1 2 3 4 5". 
So, "num->digits - num->buf + num->ndigits" should be copied. 

FIX
---
Above source code should be fixed and other similar bugs are fixed too.
I attached patches for bug fix and regression test for master branch. 
I hope this bug fix will be backport to other versions. 

Regards, 
Daisuke Higuchi



001_ecpg_numeric_bugfix_v1.patch
Description: 001_ecpg_numeric_bugfix_v1.patch


numeric_sample_test.pgc
Description: numeric_sample_test.pgc


002_ecpg_numeric_test_v1.patch
Description: 002_ecpg_numeric_test_v1.patch


  1   2   >