Re: make installcheck-world in a clean environment

2018-05-07 Thread Alexander Lakhin

07.05.2018 20:07, Tom Lane wrote:

Robert Haas  writes:

After thinking about this some more, I think the question here is
definitional.  A first attempt at defining 'make installcheck' is to
say that it runs the tests from the build tree against the running
server.  Certainly, we intend to use the SQL files that are present in
the build tree, not the ones that were present in the build tree where
the running server was built.  But what about client programs that we
use to connect to the server?  You're suggesting that we use the
pre-installed ones, but that is actually pretty problematic because
the ones we see as installed might correspond neither to the contents
of the build tree nor to the running server.
If the contents of the source tree doesn't correspond to the running 
server, then I'm afraid, we can't installcheck it for sure.
I think it's supposed that we use for installcheck exactly the same 
source files that was used to build the server.
Regarding clients program, if we will not use/check it while 
installchecking, then by what means they can be tested when installed?
I think, the pgsql-packagers would like to check whether the whole 
installation of PostgreSQL is working, not only the server.

For me, very realistic and most useful scenario of installcheck is:
install postgresql-x.rpm
install postgresql-x.src.rpm
./configure --prefix=$target_path --enable-tap-tests, etc...
make installcheck(-world)

   Conceivably we could end
up having a mix of assets from three different places: (1) the running
server, (2) the build tree, (3) whatever is in our path at the moment.
That seems very confusing.  So now I think it's probably right to
define 'make installcheck' as using the assets from the build tree to
test the running server.  Under that definition, we're missing some
dependencies, but USE_INSTALLED_ASSETS isn't a thing we need.
I see another scenario, that was discussed a month ago - remote (or 
server-only) installcheck.
( 
https://www.postgresql.org/message-id/flat/CAM0nTJ6iorX_tmg5MX0mQU3z3w%3D9wk%2BpGK8zrvn7DNWqnyv%2BsQ%40mail.gmail.com 
)
It can be useful too and if it will be supported, then 
USE_INSTALLED_ASSETS usage should be extended to psql, etc.

Nah, I disagree with this.  To me, the purpose of "make installcheck"
is to verify the correctness of an installation, which I take to include
the client programs as well as the server.  I think that "make
installcheck" ought to use the installed version of any file that we
actually install, and go to the build tree only for things we don't
install (e.g. SQL test scripts).
Yes, that's the proposed patches intended for.  I didn't encounter any 
problems with it during internal testing with Linux and mingw-builds.

If the user has screwed up his PATH or other environmental aspects so that
what he's testing isn't a single installation, that's his error, not
something that "make installcheck" ought to work around.  Indeed, maybe
such aspects of his setup are intentional, and second-guessing them would
completely defeat his purpose.  In any case, if you want to test the
build-tree assets, that's what "make check" is for.
Even modified configs could lead to test failures (for example, 
lc_messages can break server log checking), so the installcheck should 
be performed against some clean and determinated installation anyway.


Best regards,
--
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Parallel Append implementation

2018-05-07 Thread Thomas Munro
On Tue, May 8, 2018 at 5:23 AM, Robert Haas  wrote:
> On Sat, Apr 7, 2018 at 10:21 AM, Adrien Nayrat
>  wrote:
>> I notice Parallel append is not listed on Parallel Plans documentation :
>> https://www.postgresql.org/docs/devel/static/parallel-plans.html
>
> I agree it might be nice to mention this somewhere on this page, but
> I'm not exactly sure where it would make logical sense to put it.

It's not a scan, it's not a join and it's not an aggregation so I
think it needs to be in a new  as the same level as those
others.  It's a different kind of thing.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: parallel.sgml for Gather with InitPlans

2018-05-07 Thread Amit Kapila
On Mon, May 7, 2018 at 11:07 PM, Robert Haas  wrote:
> In the wake of commit e89a71fb449af2ef74f47be1175f99956cf21524,
> parallel.sgml is no longer correct about the effect of InitPlans:
>
>   
> The following operations are always parallel restricted.
>   
>
> ...
>
>   
> Access to an InitPlan or correlated
> SubPlan.
>   
>
> I thought about this a bit and came up with the attached patch.
>

-Access to an InitPlan or correlated
SubPlan.
+Plan nodes to which an InitPlan is attached.
+  
+

Is this correct?  See below example:

Serial-Plan
-
postgres=# explain select * from t1 where t1.k=(select max(k) from t3);
 QUERY PLAN

 Seq Scan on t1  (cost=35.51..71.01 rows=10 width=12)
   Filter: (k = $0)
   InitPlan 1 (returns $0)
 ->  Aggregate  (cost=35.50..35.51 rows=1 width=4)
   ->  Seq Scan on t3  (cost=0.00..30.40 rows=2040 width=4)
(5 rows)

Parallel-Plan

postgres=# explain select * from t1 where t1.k=(select max(k) from t3);
  QUERY PLAN
---
 Gather  (cost=9.71..19.38 rows=2 width=12)
   Workers Planned: 2
   Params Evaluated: $1
   InitPlan 1 (returns $1)
 ->  Finalize Aggregate  (cost=9.70..9.71 rows=1 width=4)
   ->  Gather  (cost=9.69..9.70 rows=2 width=4)
 Workers Planned: 2
 ->  Partial Aggregate  (cost=9.69..9.70 rows=1 width=4)
   ->  Parallel Seq Scan on t3  (cost=0.00..8.75
rows=375 width=4)
   ->  Parallel Seq Scan on t1  (cost=0.00..9.67 rows=1 width=12)
 Filter: (k = $1)
(11 rows)

In the above example, InitPlan is attached to a Plan node (Seq Scan
t1) which is not a parallel restricted.

>  Other ideas?
>

How about changing the statement as:
-Access to an InitPlan or correlated
SubPlan.
+Access to a correlated SubPlan.


I think we can cover InitPlan and Subplans that can be parallelized in
a separate section "Parallel Subplans" or some other heading.  I think
as of now we have enabled parallel subplans and initplans in a
limited, but useful cases (as per TPC-H benchmark) and it might be
good to cover them in a separate section.  I can come up with an
initial patch (or I can review it if you write the patch) if you and
or others think that makes sense.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: perlcritic and perltidy

2018-05-07 Thread Michael Paquier
On Sun, May 06, 2018 at 09:14:06PM -0400, Stephen Frost wrote:
> While I appreciate the support, I'm not sure that you're actually
> agreeing with me..  I was arguing that braces should be on their own
> line and therefore there would be a new line for the brace.
> Specifically, when moving lines between hashes, it's annoying to have to
> also worry about if the line being copied/moved has braces at the end or
> not- much easier if they don't and the braces are on their own line.

I should have read that twice.  Yes we are not on the same line.  Even
if a brace is on a different line, per your argument it would still be
nicer to add a comma at the end of each last element of a hash or an
array, which is what you have done in the tests of pg_dump, but not
something that the proposed patch does consistently.  If the formatting
is automated, the way chosen does not matter much, but the extra last
comma should be consistently present as well?
--
Michael


signature.asc
Description: PGP signature


Re: Refreshing findoidjoins for v11

2018-05-07 Thread Michael Paquier
On Mon, May 07, 2018 at 02:32:48PM -0400, Tom Lane wrote:
> Pushed, thanks for doing the legwork.

Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: Having query cache in core

2018-05-07 Thread Pavel Stehule
2018-05-07 19:52 GMT+02:00 Robert Haas :

> On Sun, May 6, 2018 at 10:32 PM, Tatsuo Ishii  wrote:
> > Does anybody think having in-memory query result cache in core is a
> > good idea? From the experience of implementing the feature in
> > Pgpool-II, I would think this is not terribly hard job. But first of
> > all I'm wondering if there's a demand for the feature.
>
> Caching results doesn't seem very useful to me, but caching plans
> does.  I think the challenges are formidable, though.
>

My last customer has queries where planning time is 10 sec, unfortunately
execution time in worst case (badly optimized is 200ms).

Can we introduce some planning cost to cache only expensive queries, or
some hints for query plan control.

For interactive application only for one subset of queries the plan cache
is interesting.

@1 There are long queries - the planning time is not significant (although
can be high), and then plan cache is not important
@2 there are fast queries with fast planning time - usually we don't need
plan cache too
@3 there are fast queries with long planning time - and then plan cache is
very interesting - can be difficult to separate this case from @1.

Regards

Pavel




> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>


Re: Refreshing findoidjoins for v11

2018-05-07 Thread Tom Lane
Michael Paquier  writes:
> Wouldn't it be time to update the list of catalog joins generated by
> findoidjoins?

Pushed, thanks for doing the legwork.

regards, tom lane



Re: Cast jsonb to numeric, int, float, bool

2018-05-07 Thread Robert Haas
On Fri, Mar 30, 2018 at 11:21 AM, Dagfinn Ilmari Mannsåker
 wrote:
> How about "cannot cast jsonb $json_type to $sql_type" where $json_type
> is the type inside the jsonb (e.g. string, number, boolean, array,
> object)?

Yes, that sounds pretty good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Having query cache in core

2018-05-07 Thread Andres Freund
On 2018-05-07 13:52:26 -0400, Robert Haas wrote:
> On Sun, May 6, 2018 at 10:32 PM, Tatsuo Ishii  wrote:
> > Does anybody think having in-memory query result cache in core is a
> > good idea? From the experience of implementing the feature in
> > Pgpool-II, I would think this is not terribly hard job. But first of
> > all I'm wondering if there's a demand for the feature.
> 
> Caching results doesn't seem very useful to me, but caching plans
> does.  I think the challenges are formidable, though.

+1

Greetings,

Andres Freund



Re: Built-in connection pooling

2018-05-07 Thread Robert Haas
On Fri, May 4, 2018 at 5:54 PM, Merlin Moncure  wrote:
>> I mean, if you have a large number of sessions open, it's going to
>> take more memory in any design.  If there are multiple sessions per
>> backend, there may be some possibility to save memory by allocating it
>> per-backend rather than per-session; it shouldn't be any worse than if
>> you didn't have pooling in the first place.
>
> It is absolutely worse, or at least can be.   plpgsql plan caches can
> be GUC dependent due to search_path; you might get a different plan
> depending on which tables resolve into the function.  You might
> rightfully regard this as an edge case but there are other 'leakages',
> for example, sessions with different planner settings obviously ought
> not to share backend plans.  Point being, there are many
> interdependent things in the session that will make it difficult to
> share some portions but not others.

I think you may be misunderstanding my remarks.  Suppose I've got 10
real connections multiplexed across 1000 sessions.  Barring a
crazy-stupid implementation, that should never use more memory than
1000 completely separate connections.  (How could it?)  It will of
course use a lot more memory than 10 real connections handling 10
sessions, but that's to be expected.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Having query cache in core

2018-05-07 Thread Robert Haas
On Sun, May 6, 2018 at 10:32 PM, Tatsuo Ishii  wrote:
> Does anybody think having in-memory query result cache in core is a
> good idea? From the experience of implementing the feature in
> Pgpool-II, I would think this is not terribly hard job. But first of
> all I'm wondering if there's a demand for the feature.

Caching results doesn't seem very useful to me, but caching plans
does.  I think the challenges are formidable, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Explain buffers wrong counter with parallel plans

2018-05-07 Thread Robert Haas
On Sat, May 5, 2018 at 8:56 AM, Amit Kapila  wrote:
> The reason why I think the current behavior is okay because it is
> coincidental that they were displayed correctly.  We have not made any
> effort to percolate it to upper nodes.  For ex., before that commit
> also, it was not being displayed for Gather Merge or Gather with some
> kind of node like 'Limit' where we have to stop before reaching the
> end of the result.

It's not entirely coincidental.  I had the intention to try to ensure
that the workers would be shut down before the Gather or Gather Merge,
and I think that various things in the code testify to that intention.
It sounds like I missed some cases, but now we're missing even more
cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



parallel.sgml for Gather with InitPlans

2018-05-07 Thread Robert Haas
In the wake of commit e89a71fb449af2ef74f47be1175f99956cf21524,
parallel.sgml is no longer correct about the effect of InitPlans:

  
The following operations are always parallel restricted.
  

...

  
Access to an InitPlan or correlated
SubPlan.
  

I thought about this a bit and came up with the attached patch.  Other ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


parallel-initplan.patch
Description: Binary data


Re: [HACKERS] Parallel Append implementation

2018-05-07 Thread Robert Haas
On Sat, Apr 7, 2018 at 10:21 AM, Adrien Nayrat
 wrote:
> I notice Parallel append is not listed on Parallel Plans documentation :
> https://www.postgresql.org/docs/devel/static/parallel-plans.html

I agree it might be nice to mention this somewhere on this page, but
I'm not exactly sure where it would make logical sense to put it.



-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Compiler warnings with --enable-dtrace

2018-05-07 Thread Tom Lane
I wrote:
> Thomas Munro  writes:
>> Maybe we should do what the Perl people do[2] and post-process the
>> generated header file to add const qualifiers?  Please see attached.

> +1 for the idea.  I notice that Perl's version of this is careful
> not to munge lines that already contain "const" ... do we need to
> worry about that?

Oh, I take that back --- on closer look, I see that you're getting
the same effect by checking for a preceding paren or comma.  That's
arguably better than their way because it works if there's a mix of
const and not-const parameters on one input line, though likely no
dtrace implementation actually emits such things.

regards, tom lane



Re: make installcheck-world in a clean environment

2018-05-07 Thread Tom Lane
Robert Haas  writes:
> On Fri, May 4, 2018 at 8:30 AM, Alexander Lakhin  wrote:
>> I'm not seeing a workaround to perform more complete installcheck without
>> modifying makefiles. So for me the question is whether the increasing the
>> testing surface justifies this use of time.

> After thinking about this some more, I think the question here is
> definitional.  A first attempt at defining 'make installcheck' is to
> say that it runs the tests from the build tree against the running
> server.  Certainly, we intend to use the SQL files that are present in
> the build tree, not the ones that were present in the build tree where
> the running server was built.  But what about client programs that we
> use to connect to the server?  You're suggesting that we use the
> pre-installed ones, but that is actually pretty problematic because
> the ones we see as installed might correspond neither to the contents
> of the build tree nor to the running server.  Conceivably we could end
> up having a mix of assets from three different places: (1) the running
> server, (2) the build tree, (3) whatever is in our path at the moment.
> That seems very confusing.  So now I think it's probably right to
> define 'make installcheck' as using the assets from the build tree to
> test the running server.  Under that definition, we're missing some
> dependencies, but USE_INSTALLED_ASSETS isn't a thing we need.

Nah, I disagree with this.  To me, the purpose of "make installcheck"
is to verify the correctness of an installation, which I take to include
the client programs as well as the server.  I think that "make
installcheck" ought to use the installed version of any file that we
actually install, and go to the build tree only for things we don't
install (e.g. SQL test scripts).

If the user has screwed up his PATH or other environmental aspects so that
what he's testing isn't a single installation, that's his error, not
something that "make installcheck" ought to work around.  Indeed, maybe
such aspects of his setup are intentional, and second-guessing them would
completely defeat his purpose.  In any case, if you want to test the
build-tree assets, that's what "make check" is for.

regards, tom lane



Re: Global snapshots

2018-05-07 Thread Robert Haas
On Sun, May 6, 2018 at 6:22 AM, Stas Kelvich  wrote:
> Also each second GetSnapshotData writes globalxmin (as it was before
> procArray->global_snapshot_xmin was taken into account) into a circle
> buffer with a size equal to global_snapshot_defer_time value. That more
> or less the same thing as with Snapshot Too Old feature, but with a
> bucket size of 1 second instead of 1 minute.
> procArray->global_snapshot_xmin is always set to oldest
> value in circle buffer.
>
> This way xmin calculation is always gives a value that were
> global_snapshot_xmin seconds ago and we have mapping from time (or
> GlobalCSN) to globalxmin for each second in this range. So when
> some backends imports global snapshot with some GlobalCSN, that
> GlobalCSN is mapped to a xmin and this xmin is set as a Proc->xmin.

But what happens if a transaction starts on node A at time T0 but
first touches node B at a much later time T1, such that T1 - T0 >
global_snapshot_defer_time?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: make installcheck-world in a clean environment

2018-05-07 Thread Robert Haas
On Fri, May 4, 2018 at 8:30 AM, Alexander Lakhin  wrote:
> I'm not seeing a workaround to perform more complete installcheck without
> modifying makefiles. So for me the question is whether the increasing the
> testing surface justifies this use of time.

After thinking about this some more, I think the question here is
definitional.  A first attempt at defining 'make installcheck' is to
say that it runs the tests from the build tree against the running
server.  Certainly, we intend to use the SQL files that are present in
the build tree, not the ones that were present in the build tree where
the running server was built.  But what about client programs that we
use to connect to the server?  You're suggesting that we use the
pre-installed ones, but that is actually pretty problematic because
the ones we see as installed might correspond neither to the contents
of the build tree nor to the running server.  Conceivably we could end
up having a mix of assets from three different places: (1) the running
server, (2) the build tree, (3) whatever is in our path at the moment.
That seems very confusing.  So now I think it's probably right to
define 'make installcheck' as using the assets from the build tree to
test the running server.  Under that definition, we're missing some
dependencies, but USE_INSTALLED_ASSETS isn't a thing we need.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: doc fixes: vacuum_cleanup_index_scale_factor

2018-05-07 Thread Justin Pryzby
On Mon, May 07, 2018 at 07:26:25PM +0300, Alexander Korotkov wrote:
> Hi!
> 
> I've revised docs and comments, and also made some fixes in the code.
> See the attached patchset.
> 
> * 0004-btree-cleanup-docs-comments-fixes.patch
> Documentation and comment improvements from Justin Pryzby
> revised by me.

2nd iteration:

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index eabe2a9235..785ecf922a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1893,15 +1893,35 @@ include_dir 'conf.d'
   
   

-When no tuples were deleted from the heap, B-tree indexes might still
-be scanned during VACUUM cleanup stage by two
-reasons.  The first reason is that B-tree index contains deleted pages
-which can be recycled during cleanup.  The second reason is that B-tree
-index statistics is stalled.  The criterion of stalled index statistics
-is number of inserted tuples since previous statistics collection
-is greater than vacuum_cleanup_index_scale_factor
-fraction of total number of heap tuples.
+When no tuples were deleted from the heap, B-tree indexes are still
+scanned during VACUUM cleanup stage unless two
+conditions are met: the index contains no deleted pages which can be
+recycled during cleanup; and, the index statistics are not stale.
+In order to detect stale index statistics, number of total heap tuples
should say: "THE number"

+during previous statistics collection is memorized in the index
s/memorized/stored/

+meta-page.  Once number number of inserted tuples since previous
Should say "Once the number of inserted tuples..."

+statistics collection is more than
+vacuum_cleanup_index_scale_factor fraction of
+number of heap tuples memorized in the meta-page, index statistics is
s/memorized/stored/

+considered to be stalled.  Note, that number of heap tuples is written
"THE number"
s/stalled/stale/

+to the meta-page at the first time when no dead tuples are found
remove "at"

+during VACUUM cycle.  Thus, skip of B-tree index
I think should say: "Thus, skipping of the B-tree index scan"

+scan during cleanup stage is only possible in second and subsequent
s/in/when/

+   
+Zero value of vacuum_cleanup_index_scale_factor
I would say "A zero value of ..."

Thanks,
Justin



Re: [HACKERS] Clock with Adaptive Replacement

2018-05-07 Thread Юрий Соколов
2018-05-07 17:15 GMT+03:00 Robert Haas :
>
> On Sat, May 5, 2018 at 2:16 AM, Andrey Borodin 
wrote:
> > But you loose difference between "touched once" and "actively used". Log
> > scale of usage solves this: usage count grows logarithmically, but
drains
> > linearly.
>
> Even if we have that, or something with similar effects, it's still
> desirable to avoid bumping the usage count multiple times for accesses
> that happen close together in time.  I don't really agree with Yura
> Sokolov's proposal for how to prevent that problem, because during
> periods when no buffer eviction is happening it basically throws out
> all of the data: no items are replaced, and the usage count can't be
> bumped again until NBlocks/32 items are replaced.  That's bad.

That is not as bad as it seems on first glance: during period when
no buffer eviction is happenning, all information is almost irrelevant,
because it is naturally outdated. And due to choose of "batch" size (25),
there is always window between "NBlocks/32 items replaced" and
"this item is considered for replacement": even if all items in
25/32*NBlocks had non-zero usage count, then there are at least
7/32*NBlocks to consider before this item could be replaced, during
which usage count can be incremented.

Note: "1/32 replacements" really means "1/32 new buffers", so while
shared_buffers are not full yet (so no evictions happend), all blocks,
except
last 1/32, could have increment of usage count.

>
> We want an algorithm that, if there's no buffer eviction for a while and
> then we start having to evict buffers, has good information on which
> things should be evicted and which should resist eviction.  But I
> think that he's right that preventing the usage count from rising
> artificially when there are correlated accesses is important.  If we
> have a page with usage count 0 which incurs 4 correlated accesses, as
> shown in Vladimir Sitnikov's results upthread, then reducing the usage
> count linearly requires 4 sweeps through the buffer ring to get the
> usage count back down to 0 (4->3->2->1->0); reducing it by dividing by
> two still requires 3 sweeps (4->2->1->0).

Yes, that is why log-scale doesn't help much.

> A good solution for that
> case should, like Yura Sokolov's proposed algorithm, avoid pushing the
> usage count higher than 1. So he has the right idea, I think, even if
> the method's not quite what we want.

Well, I proposed increment by 8 , because simulation (on traces,
I found) shows, it really improves quality of algorithm. Looks like, if
item had a hit after not-so-short period there is great chance this item
is hot, so it should be "protected" from eviction.

I used traces from https://github.com/djblue/caching/tree/master/traces
(and code in that repository to develop algorithm).

That is really important thing to have real-world traces. Because it is
not so obvious how algorithm should perform. Mental conclusions
could be wrong and doesn't match production load behavior.

For example, all good algorithm tracks recently evicted items, and it is
not so obvious that already evicted item that get a hit is a "hot" item, and
should be placed in separate "hot" queue.
(My algo doesn't track evicted items, so it could not be as good in term
of hit-rate.)

(Personally, I vote more for Clock-Pro than for my algo. But it has three
"clock-hands", and if implemented as described, still demands
double-linked lists. So, it should be modified/adapted, imho.)

With regards,
Yura Sokolov.


Re: Compiler warnings with --enable-dtrace

2018-05-07 Thread Tom Lane
Thomas Munro  writes:
> --enable-dtrace produces compiler warnings about const correctness,
> except on macOS.  That's because Apple's dtrace produces function
> declarations in probes.h that take strings as const char * whereas
> AFAIK on all other operating systems they take char * (you can see
> that possibly recent difference in Apple's version of dt_header_decl()
> in dt_program.c).  People have complained before[1].

Yeah, it's a bit annoying, although AFAICT hardly anyone uses dtrace.

> Maybe we should do what the Perl people do[2] and post-process the
> generated header file to add const qualifiers?  Please see attached.

+1 for the idea.  I notice that Perl's version of this is careful
not to munge lines that already contain "const" ... do we need to
worry about that?

regards, tom lane



Re: doc fixes: vacuum_cleanup_index_scale_factor

2018-05-07 Thread Alexander Korotkov
Hi!

I've revised docs and comments, and also made some fixes in the code.
See the attached patchset.

* 0001-vacuum-cleanup-index-scale-factor-user-set.patch
Changes vacuum_cleanup_index_scale_factor GUC to PGC_USERSET,
because it might be useful to change in specific session.

* 0002-vacuum-cleanup-index-scale-factor-tab-complete.patch
Add missing psql tab-complete support for
vacuum_cleanup_index_scale_factor

* 0003-btree-cleanup-condition-fix.patch
Fix condition which triggers btree index cleanup scan to always fire
when vacuum_cleanup_index_scale_factor == 0.

* 0004-btree-cleanup-docs-comments-fixes.patch
Documentation and comment improvements from Justin Pryzby
revised by me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-vacuum-cleanup-index-scale-factor-user-set.patch
Description: Binary data


0002-vacuum-cleanup-index-scale-factor-tab-complete.patch
Description: Binary data


0003-btree-cleanup-condition-fix.patch
Description: Binary data


0004-btree-cleanup-docs-comments-fixes.patch
Description: Binary data


Re: MAP syntax for arrays

2018-05-07 Thread Tom Lane
Ashutosh Bapat  writes:
> Is there a way we can improve unnest() and array_agg() to match the
> performance you have specified by let's say optimizing the cases
> specially when those two are used together. Identifying that may be
> some work, but will not require introducing new syntax.

+1.  The first thing I thought on seeing this proposal was "I wonder
how long it will be before the SQL committee introduces some syntax
that uses the MAP keyword and breaks this".

ISTM the planner could be taught to notice the combination of unnest
and array_agg and produce a special output plan from that.

It is, however, fair to wonder whether this is worth our time to
optimize.  I've not noticed a lot of people complaining about
performance of this sort of thing, at least not since we fixed
array_agg to not be O(N^2).

regards, tom lane



Re: [HACKERS] Clock with Adaptive Replacement

2018-05-07 Thread Robert Haas
On Sat, May 5, 2018 at 2:16 AM, Andrey Borodin  wrote:
> But you loose difference between "touched once" and "actively used". Log
> scale of usage solves this: usage count grows logarithmically, but drains
> linearly.

Even if we have that, or something with similar effects, it's still
desirable to avoid bumping the usage count multiple times for accesses
that happen close together in time.  I don't really agree with Yura
Sokolov's proposal for how to prevent that problem, because during
periods when no buffer eviction is happening it basically throws out
all of the data: no items are replaced, and the usage count can't be
bumped again until NBlocks/32 items are replaced.  That's bad.  We
want an algorithm that, if there's no buffer eviction for a while and
then we start having to evict buffers, has good information on which
things should be evicted and which should resist eviction.  But I
think that he's right that preventing the usage count from rising
artificially when there are correlated accesses is important.  If we
have a page with usage count 0 which incurs 4 correlated accesses, as
shown in Vladimir Sitnikov's results upthread, then reducing the usage
count linearly requires 4 sweeps through the buffer ring to get the
usage count back down to 0 (4->3->2->1->0); reducing it by dividing by
two still requires 3 sweeps (4->2->1->0).  A good solution for that
case should, like Yura Sokolov's proposed algorithm, avoid pushing the
usage count higher than 1.  So he has the right idea, I think, even if
the method's not quite what we want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Interesting paper: Contention-Aware Lock Scheduling

2018-05-07 Thread Юрий Соколов
2018-05-04 23:45 GMT+03:00 AJG :
>
> Another interesting article from Jan 2018 (Tsinghua University and
Microsoft
> Research)
>
> http://madsys.cs.tsinghua.edu.cn/publications/TS2018-liu.pdf
>
> DudeTx: Durable Transactions Made Decoupled

Cite from pdf:

> The key insight of our solution is decoupling a durable transaction into
three
> fully asynchronous steps. (1) Perform: execute the transaction on a shadow
> memory, and produce a redo log for the transaction. (2) Persist: flush
redo
> logs of transactions to persistent memory in an atomic manner. (3)
Reproduce:
> modify original data in persistent memory according to the persisted redo
logs.
> It is essential that we never directly write back dirty data from shadow
memory
> to persistent memory – all the updates are realized via the logs.

It is exactly the same Amazon did for its Aurora!
Using this decoupling, Aurora provides great replication and failover (by
use of
replicated and versioned storage both for logs and for data pages).

regards,
Yura.


Re: [doc fix] Trivial fix for PG 11 partitioning

2018-05-07 Thread Robert Haas
On Fri, May 4, 2018 at 9:53 PM, Tsunakawa, Takayuki
 wrote:
> Attached is a trivial documentation fix for PG 11 partitioning, which 
> includes:
>
> * pg_partition fails to mention hash for the strategy.
> * Partitioning key column values can now be modified, which results in row 
> movement between partitions.

Committed, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Porting PG Extension from UNIX to Windows

2018-05-07 Thread Craig Ringer
On 25 April 2018 at 16:45, insaf.k  wrote:
> Hello,
>
> I have developed a postgres extension in Linux. I want to compile it in MS
> Windows as well.
>
> The extension extensively make use of POSIX threads and mutexes.
>
> I've done some research regarding compiling in Windows. I am not sure in
> what way I should compile the extension. AFAIK, Visual Studio is not POSIX
> compliant and so I'll have to rewrite all those POSIX calls using Windows
> API. So it's better to compile the extension in Cygwin.
>
> I have some questions regarding compiling the extension in Cygwin. Do I have
> to build PG in Cygwin, if I want to compile the extension in Cygwin?

If you want to compile the extension in cygwin with the cygwin
compiler, to get support for native(ish) pthreads, etc, then yes you
must use a PostgreSQL that is also for cygwin.

You can compile with mingw within cygwin - after all, msys is
basically a cut down cygwin. That produces a native Windows
executable, though it might still need mingw's runtime library, I
don't remember. You might as well use msys in that case, though. I
don't know what the state of pthreads support in mingw is.

> Our deployment pattern is like this, build everything in one machine and
> copy the DLLs into the production machine. So, if I compile in Cygwin, will
> the generated DLLs work in other Windows machines?

Only if they have a compatible cygwin AND a cygwin postgres.

> Would the target machine
> be required to install any dependencies related to Cygwin in order to use
> the DLLs?

Yes.

> Also, one more question, not related to PG. Cygwin vs native Windows API
> calls, will there be much difference in the performance?(We are not using
> fork() calls, btw).

Yes, Cygwin was significantly slower last I checked.

You might be able to run Pg under Windows Subsystem for Linux, using a
Linux PostgreSQL and a pthreads build of your extension.

Personally, I think you'll have to face adapting your threading layer.
But another option might be to use mingw64's threading support; see
the "POSIX Threads for Windows" section of
https://en.wikipedia.org/wiki/POSIX_Threads

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Porting PG Extension from UNIX to Windows

2018-05-07 Thread Pavlo Golub
Greetings, insaf.k.

You wrote 25.04.2018, 11:45:

> Hello,



> I have developed a postgres extension in Linux. I want to compile it in MS 
> Windows as well.

You should try MSYS2. It's far better than VS and MSYS right now.

I may try to build your extension if you want.



> The extension extensively make use of POSIX threads and mutexes.



> I've done some research regarding compiling in Windows. I am not
> sure in what way I should compile the extension. AFAIK, Visual
> Studio is not POSIX compliant and so I'll have to rewrite all those
> POSIX calls using Windows API. So it's better to compile the extension in 
> Cygwin.



> I have some questions regarding compiling the extension in Cygwin.
> Do I have to build PG in Cygwin, if I want to compile the extension in Cygwin?



> Our deployment pattern is like this, build everything in one
> machine and copy the DLLs into the production machine. So, if I
> compile in Cygwin, will the generated DLLs work in other Windows
> machines? Would the target machine be required to install any
> dependencies related to Cygwin in order to use the DLLs?



> Also, one more question, not related to PG. Cygwin vs native
> Windows API calls, will there be much difference in the
> performance?(We are not using fork() calls, btw).





> Thanks,

> Insaf



-- 
Kind regards,
 Pavlo  mailto:pavlo.go...@cybertec.at




Re: Reopen logfile on SIGHUP

2018-05-07 Thread Alexander Kuzmenkov

Here is a documentation update from Liudmila Mantrova.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 4a68ec3..5bd7b45 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -932,8 +932,8 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
program if you have one that you are already using with other
server software. For example, the rotatelogs
tool included in the Apache distribution
-   can be used with PostgreSQL.  To do this,
-   just pipe the server's
+   can be used with PostgreSQL. One way to 
+   do this is to pipe the server's
stderr output to the desired program.
If you start the server with
pg_ctl, then stderr
@@ -946,6 +946,36 @@ pg_ctl start | rotatelogs /var/log/pgsql_log 86400
   
 
   
+   You can combine these approaches by setting up logrotate
+   to collect log files produced by PostgreSQL built-in logging
+   collector. In this case, the logging collector defines the names and
+   location of the log files, while logrotate
+   periodically archives these files. When initiating log rotation,
+   logrotate must ensure that the application
+   sends further output to the new file. This is commonly done with a
+   postrotate script that sends a SIGHUP signal to
+   the application, which then reopens the log file.
+   In PostgreSQL, you can run pg_ctl
+   with the logrotate option instead. When the server receives
+   this command, the server either switches to a new log file or reopens the
+   existing file, depending on the logging configuration
+   (see ).
+  
+
+  
+   
+When using static log file names, the server might fail to reopen the log
+file if the max open file limit is reached or a file table overflow occurs.
+In this case, log messages are sent to the old log file until a
+successful log rotation. If logrotate is
+configured to compress the log file and delete it, the server may lose
+the messages logged in this timeframe. To avoid this issue, you can
+configure the logging collector to dynamically assign log file names
+and use a prerotate script to ignore open log files.
+
+  
+
+  
Another production-grade approach to managing log output is to
send it to syslog and let
syslog deal with file rotation. To do this, set the
@@ -958,7 +988,6 @@ pg_ctl start | rotatelogs /var/log/pgsql_log 86400
configured to work with log files from
syslog.
   
-
   
On many systems, however, syslog is not very reliable,
particularly with large log messages; it might truncate or drop messages
diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index 0816bc1..235c8ba 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -102,6 +102,12 @@ PostgreSQL documentation
kill
signal_name
process_id
+
+
+  
+   pg_ctl
+   logrotate
+   -D datadir
   
 
   On Microsoft Windows, also:
@@ -234,6 +240,12 @@ PostgreSQL documentation
   
 
   
+   logrotate mode rotates the server log file.
+   For details on how to use this mode with external log rotation tools, see
+   .
+  
+
+  
register mode registers the PostgreSQL
server as a system service on Microsoft Windows.
The -S option allows selection of service start type,
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a4b53b3..e7e779e 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -385,6 +385,9 @@ static bool LoadedSSL = false;
 static DNSServiceRef bonjour_sdref = NULL;
 #endif
 
+/* Log rotation signal file path, relative to $PGDATA */
+#define LOGROTATE_SIGNAL_FILE	"logrotate"
+
 /*
  * postmaster.c - function prototypes
  */
@@ -398,6 +401,7 @@ static void reset_shared(int port);
 static void SIGHUP_handler(SIGNAL_ARGS);
 static void pmdie(SIGNAL_ARGS);
 static void reaper(SIGNAL_ARGS);
+static bool checkLogrotateSignal(void);
 static void sigusr1_handler(SIGNAL_ARGS);
 static void startup_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
@@ -5003,6 +5007,24 @@ ExitPostmaster(int status)
 }
 
 /*
+ * Check to see if a log rotation request has arrived. Should be
+ * called by postmaster after receiving SIGUSR1.
+ */
+static bool
+checkLogrotateSignal(void)
+{
+	struct stat stat_buf;
+
+	if (stat(LOGROTATE_SIGNAL_FILE, _buf) == 0)
+	{
+		unlink(LOGROTATE_SIGNAL_FILE);
+		return true;
+	}
+
+	return false;
+}
+
+/*
  * sigusr1_handler - handle signal conditions from child processes
  */
 static void
@@ -5100,7 +5122,8 @@ sigusr1_handler(SIGNAL_ARGS)
 		signal_child(PgArchPID, SIGUSR1);
 	}
 
-	if (CheckPostmasterSignal(PMSIGNAL_ROTATE_LOGFILE) &&
+	if ((checkLogrotateSignal() ||
+		 CheckPostmasterSignal(PMSIGNAL_ROTATE_LOGFILE)) &&
 		SysLoggerPID != 0)
 	{
 		/* Tell syslogger to rotate 

Re: MAP syntax for arrays

2018-05-07 Thread Ashutosh Bapat
On Fri, May 4, 2018 at 6:38 PM, Ildar Musin  wrote:
> Hello hackers,
>
> Recently I was working with sql arrays in postgres and it turned out
> that postgres doesn't have such very convinient functional constructions
> as map, reduce and filter. Currently to map function over array user has
> to make a subquery like:
>
> select u.* from
> my_table,
> lateral (
> select array_agg(lower(elem))
> from unnest(arr) as elem
> ) as u;
>
> Which is not only inconvenient but not very efficient as well (see
> 'Demo' section below).

Is there a way we can improve unnest() and array_agg() to match the
performance you have specified by let's say optimizing the cases
specially when those two are used together. Identifying that may be
some work, but will not require introducing new syntax.

>
> When I dug into the code I found that postgres already has the needed
> infrastructure for implementing map for arrays; actually array coercing
> already works that way (it basically maps cast function).
>
> In the attached patch there is a simple map implementation which
> introduces new expression type and syntax:
>
> MAP( OVER )
>
> For example:
>
> SELECT MAP(upper OVER array['one', 'two', 'three']::text[]);
> ?column?
> -
>  {ONE,TWO,THREE}
> (1 row)
>
> This is probably not the most useful notation and it would be better to
> have syntax for mapping arbitrary expressions over array, not just
> function. I'm struggling to come up with a good idea of how it should
> look like. It could look something like following:
>
> MAP( FOR  IN )
>
> For instance:
>
> SELECT MAP(x*2 FOR x IN array[1, 2, 3]::int[]);
>
> Looking forward for community's suggestions!

What if the expression has more than one variable, each mapping to a
different array? What if the arrays have different lengths or worse
different dimensions? This looks like the way SRFs used to work.

Instead of introducing a new syntax, is it possible to detect that
argument to a function is an array of the same type as the argument
and apply MAP automatically? In your example, upper(arr) would detect
that the input is an array of the same type as the scalar argument
type and do array_agg(upper(arr[id1], arr[id2], ...).

>
>  elements per array |  map (tps) | unnest/aggregate (tps)
> ++
>   5 | 139.105359 |   74.434010
>  10 |  74.089743 |   43.622554
> 100 |   7.693000 |5.325805
>
> Apparently map is more efficient for small arrays. And as the size of
> array increases the difference decreases.

I am afraid that the way difference is diminishing with increase in
the number of elements, unnest, array_agg combination might win for
large number of elements. Have you tried that?
If we try to improve unnest, array_agg combination for small array, we
will get consistent performance without any additional syntax.
Although, I admit that query involving unnest and array_agg is not
very readable.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: citext function overloads for text parameters

2018-05-07 Thread Sergey Mirvoda
Hello hanckers,

We use this simple function to workaround citext=text behavior.

create extension citext;

CREATE FUNCTION citext_eq( citext, text )
RETURNS bool
AS 'citext'
LANGUAGE C IMMUTABLE STRICT;

CREATE OPERATOR = (
LEFTARG= CITEXT,
RIGHTARG   = TEXT,
COMMUTATOR = =,
NEGATOR= <>,
PROCEDURE  = citext_eq,
RESTRICT   = eqsel,
JOIN   = eqjoinsel,
HASHES,
MERGES
);

select 'xexe'::text = 'Xexe'::citext
select 'xexe' = 'Xexe'::citext
select 'xexe'::citext = 'Xexe'
select 'xexe'::citext = 'Xexe'::text
select 'xexe'::citext = 1234::text


CREATE or replace FUNCTION ttt(t text)
RETURNS bool
AS
$$select $1 = 'Ttt'::citext  $$
LANGUAGE sql;

select ttt('ttt')



But in general, it is wrong to compare values with different types.
We used this and other strange cases like TO_CHAR for type text  for our
own BI system with user defined calculations .



On Mon, May 7, 2018 at 12:09 PM, Shay Rojansky  wrote:

>
>>> Thanks for the input. It's worth noting that the equality operator
>>> currently works in the same way: citext = text comparison is (surprisingly
>>> for me) case-sensitive.
>>>
>>> My expectation was that since citext is supposed to be a
>>> case-insensitive *type*, all comparison operations involving it should be
>>> case-insensitive;
>>>
>>
>> Comparison requires both things to be the same type.  The rules for
>> implicitly converting one type to another prefer the core type text over
>> the extension type citext.
>>
>> IOW, there is no such operator =(citext,text) and thus "citext = text
>> comparison" is technically invalid.
>>
>> At this point we're sorta stuck with our choice, and while individual
>> databases can implement their own functions and operators there is value in
>> doing things the way the system provides to minimize future confusion and
>> bugs.
>>
>
> OK, thanks for everyone's input.
>
>


-- 
--Regards, Sergey Mirvoda


Re: Interesting paper: Contention-Aware Lock Scheduling

2018-05-07 Thread AJG
Another interesting article from Jan 2018 (Tsinghua University and Microsoft
Research)

http://madsys.cs.tsinghua.edu.cn/publications/TS2018-liu.pdf

DudeTx: Durable Transactions Made Decoupled

"This paper presents DudeTx, a crash-consistent durable transaction system
that avoids the drawbacks of
both undo and redo logging. DudeTx uses shadowDRAM to decouple the execution
of a durable transaction into three fully asynchronous steps. The advantage
is that only minimal fences and no memory read instrumentation are required.
This design enables an out-of-the-box concurrency control mechanism,
transactional memory or fine-grained locks, to be used as an independent
component. The evaluation results show that DudeTx adds durability to a
software transactional memory system with only 7.4% ∼ 24.6% throughput
degradation.
Compared to typical existing durable transaction systems, DudeTx provides
1.7× ∼ 4.4× higher throughput.
Moreover, DudeTx can be implemented with hardware transactional memory or
lock-based concurrency
control, leading to a further 1.7× and 3.3× speedup, respectively."



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Built-in connection pooling

2018-05-07 Thread Konstantin Knizhnik



On 04.05.2018 18:22, Merlin Moncure wrote:

On Thu, May 3, 2018 at 12:01 PM, Robert Haas  wrote:

On Fri, Apr 27, 2018 at 4:43 PM, Merlin Moncure  wrote:

What _I_ (maybe not others) want is a
faster pgbouncer that is integrated into the database; IMO it does
everything exactly right.

I have to admit that I find that an amazing statement.  Not that
pgbouncer is bad technology, but saying that it does everything
exactly right seems like a vast overstatement.  That's like saying
that you don't want running water in your house, just a faster motor
for the bucket you use to draw water from the well.

Well you certainly have a point there; I do have a strong tendency for
overstatement :-).

Let's put it like this: being able to have connections funnel down to
a smaller number of sessions is nice feature.  Applications that are
large, complex, or super high volume have a tendency towards stateless
(with respect to the database session) architecture anyways so I tend
not to mind lack of session features when pooling (prepared statements
perhaps being the big outlier here).  It really opens up a lot of
scaling avenues.  So better a better phrased statement might be, "I
like the way pgbouncer works, in particular transaction mode pooling
from the perspective of the applications using it".  Current main pain
points are the previously mentioned administrative headaches and
better performance from a different architecture (pthreads vs libev)
would be nice.

I'm a little skeptical that we're on the right path if we are pushing
a lot of memory consumption into the session level where a session is
pinned all the way back to a client connection. plpgsql function plan
caches can be particularly hungry on memory and since sessions have
their own GUC ISTM each sessions has to have their own set of them
since plans depend on search path GUC which is session specific.
Previous discussions on managing cache memory consumption (I do dimly
recall you making a proposal on that very thing) centrally haven't
gone past panning stages AFAIK.

If we are breaking 1:1 backend:session relationship, what controls
would we have to manage resource consumption?


Most of resource consumption is related with backends, not with sessions.
It is first of all catalog and relation caches. If there are thousands 
of tables in a databases, then this caches (which size is not limited 
now) can grow up to several megabytes.
Taken in account, that at modern SMP systems with hundreds of CPU core 
it may be reasonable to spawn hundreds of backends, total memory 
footprint of this caches can be very significant.
This is why I think that we should move towards shared caches... But 
this trip is not expected to be so easy.


Right now connection pooler allows to handle much more user sessions 
than there are active backends.

So it helps to partly solve this problem with resource consumption.
Session context itself is not expected to be very large: changed GUCs + 
prepared statements.


I accept your argument about stateless application architecture.
Moreover, this is more or less current state of things: most customers 
has to use pgbouncer and so have to prohibit to use in their application 
all session specific stuff.
What them are loosing in this case? Prepared statements? But there are 
really alternative solutions: autoprepare, shared plan cache,... which 
allow to use prepared statements without session context. Temporary 
tables, advisory locks,... ?


Temporary tables are actually very "ugly" thing, causing a lot of problems:
- can not be created at hot standby
- cause catalog bloating
- deallocation of large number of temporary table may acquire too much 
locks.

...
May be them somehow should be redesigned? For example, have shared 
ctalog entry for temporary table, but backend-private content... Or make 
it possible to change lifetime of temporary tables from session to 
transaction...



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Having query cache in core

2018-05-07 Thread Konstantin Knizhnik



On 07.05.2018 11:58, Tatsuo Ishii wrote:

On 07.05.2018 05:32, Tatsuo Ishii wrote:

Does anybody think having in-memory query result cache in core is a
good idea? From the experience of implementing the feature in
Pgpool-II, I would think this is not terribly hard job. But first of
all I'm wondering if there's a demand for the feature.

Do you want to implement this cache locally in backend?

No, as a global cache, if I employ the design of Pgpool-II. Pgpool-II
offsers two types of query result cache: in the shared memory or using
external cache server (i.e. memcached).


Global result cache (like one used in mySQL) may be more efficient.
But I think it is better to start first with
1. Global prepared statements cache
2. Global catalog cache
3. Global relation cache

Are they totally different story from query result cache, no?
Well, catalog/relation/plan caches are really not related with query 
result cache.

But from my point of view first three are much more important and critical.
Just because catalog is used by all backends. It is possible to 
discuss/investigate how frequently DBMS applications are issuing the 
queries returning the same result
(for example, search engines have classical 90/10 relation: most of 
users are doing the same queries, so caching is vital for Google). 
But I am not sure
that it is true for database applications... But there are no doubts, at 
least for OLAP, that applications used to run queries with the same 
query execution plan (but with different parameters).
So having global prepared statement cache seems to be much more useful 
than caching results.


Also, implementations of all shared caches have to solve the similar 
problems: access synchronization, invalidation,...
In this sense query result cache implementation will be similar with 
other shared caches implementation. If we have infrastructure for 
building efficient shared caches (lockless algorithms, smart 
invalidation, ...) then it will be not so difficult to implement result 
cache on top of it.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Having query cache in core

2018-05-07 Thread Tatsuo Ishii
> On 07.05.2018 05:32, Tatsuo Ishii wrote:
>> Does anybody think having in-memory query result cache in core is a
>> good idea? From the experience of implementing the feature in
>> Pgpool-II, I would think this is not terribly hard job. But first of
>> all I'm wondering if there's a demand for the feature.
> 
> Do you want to implement this cache locally in backend?

No, as a global cache, if I employ the design of Pgpool-II. Pgpool-II
offsers two types of query result cache: in the shared memory or using
external cache server (i.e. memcached).

> Global result cache (like one used in mySQL) may be more efficient.
> But I think it is better to start first with
> 1. Global prepared statements cache
> 2. Global catalog cache
> 3. Global relation cache

Are they totally different story from query result cache, no?

> Concerning result cache, I think it will be better to ask opinion of
> mysql users: how useful it is.

One RedShift user used to use the query result cache in Pgpool-II (I
think they moved to more application layer cache). So I just wonder if
there is any demand for in-core cache.

I think one merit to have it in-core is, it's easy to get catalog
info: Pgpool-II works hard to figure out if the query result is safe
to cache or not. e.g. the SELECT uses stable functions or not, uses
temp tables or not. In-core cache eliminates the hard work (no need to
say cache invalidation). On the other hand, query cache definitely
avoid heavy SELECT run twice, but user need to connect to PostgreSQL
if it's in-core. Having query cache in the middle ware like Pgpool-II
could completely avoid accessing PostgreSQL.

Anyway, I already have a query cache in Pgpool-II. So unless users are
eager to have it in core, I will not be so interested in implementing
it.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Having query cache in core

2018-05-07 Thread Konstantin Knizhnik



On 07.05.2018 11:24, Tsunakawa, Takayuki wrote:

From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru]

But I think it is better to start first with
1. Global prepared statements cache
2. Global catalog cache
3. Global relation cache

May I ask why prepared statements need to precede catalog and relation caches? 
We're suffering from the bloat of catalog and relation caches, and are thinking 
of proposing placing those caches in shared memory.



Sorry, I didn't assume some particular order here. Yes, shared catalog 
and relation cache seems to be more important to implement first.





Switch to global caches seems to be a challenged task, requiring a lot
of changes in Postgres core.
But I think that sometime we will have to implement them in any case (as
it was done in most of other DBMSes).

Agreed.  I respect your attitude toward revolutionizing PostgreSQL.



Concerning result cache, I think it will be better to ask opinion of
mysql users: how useful it is.

And possibly Oracle Database users, as Oracle implemented it relatively 
recently, IIRC.

Regards
Takayuki Tsunakawa



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Having query cache in core

2018-05-07 Thread Hartmut Holzgraefe

On 07.05.2018 10:12, Konstantin Knizhnik wrote:
Concerning result cache, I think it will be better to ask opinion of 
mysql users: how useful it is.


It isn't useful. I haven't seen a customer case in years where the query 
cache would have done any good.


It is off by default ever since MySQL 5.5 (became "GA" in 2010), and
has now finally been removed from MySQL 8.0. It is still there in
MariaDB, and as far as I can tell there are no plans to remove it
yet, at least not in the next two major releases 10.3 and 10.4.

At the same time we (wearing my MariaDB Corp. hat right now) have
added a "time-to-live" cache module to our MaxScale proxy solution,
so moving caching out of the server for use cases where a time-to-live
cache is "good enough" and immediate cache entry invalidation on
changes of underlying data is not a requirement, so that getting
outdated results back if they are not too old is OK.



--
Hartmut Holzgraefe, Principal Support Engineer (EMEA)
MariaDB Corporation | http://www.mariadb.com/



Re: Having query cache in core

2018-05-07 Thread Hartmut Holzgraefe

On 07.05.2018 08:23, Laurenz Albe wrote:

Having been bitten by the feature on MySQL, I think it's not a good thing.

Essentially, it's a band-aid for badly written applications, but it will
only help in certain cases and hurts in many others.


The MySQL query cache helped quite a bit in the early web days, before
session handling became a thing (e.g. before PHP 4.0), before massive
caching on other layers, and when most machines were still having a
single CPU core only.

With multiple cores the cost of query cache maintenance, and especially
result purging, quickly outweighed the gain though. I'd assume that this
would be even more of a burden in a multi-process model as it is in
MySQLs one-process-multiple-threads model already.

And so for about a decade now there's a simple rule:

 Q: What is the ptimal size for my query cache?
 A: zero

(There's even a tuning wizzard web site for this, created by a former
 team member of mine:   )

All of the above is probably true for all query result caches that
try to invalidate cache entries as soon as the underlying tables
change.

Caches with a simple "time-to-live" logic will not suffer from cache
maintenance contention, but at the same time they don't really need
any information available on the server side only. So having such
kind of cache in a separate proxy process, like pqc, or on the
MySQL/MariaDB side e.g. in the cache modules for MaxScale and
ProxySQL, looks like the better approach for me.


PS:

I tried PQC for one of my PostGIS based OpenStreetMap projects,
there i suffer from "badly written application", or actually
from too many layers of abstraction. For rendering maps in
different file formats (SVG, PDF, PNG) I need to run the full
Mapnik rendering queue multiple times, even though the map
bounding box, and so the underlying queries from the Mapnik
style sheet, actually stay the same.

Even with that setup, a set of moderately complex queries
on a rather large database being run three times in a row,
adding PQC didn't provide that much of an improvement though,
and in the end I didn't bother to have to take care of yet
another service component in the setup.

--
Hartmut Holzgraefe, Principal Support Engineer (EMEA)
MariaDB Corporation | http://www.mariadb.com/



Re: [HACKERS] path toward faster partition pruning

2018-05-07 Thread Michael Paquier
On Mon, May 07, 2018 at 10:37:10AM +0900, Michael Paquier wrote:
> On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote:
> > I got a similar server crash as in [1] on the master branch since the commit
> > 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because
> > the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is
> > an ArrayCoerceExpr (see [2]):
> 
> Indeed, I can see the crash.  I have been playing with this stuff and I
> am in the middle of writing the patch, but let's track this properly for
> now.

So the problem appears when an expression needs to use
COERCION_PATH_ARRAYCOERCE for a type coercion from one type to another,
which requires an execution state to be able to build the list of
elements.  The clause matching happens at planning state, so while there
are surely cases where this could be improved depending on the
expression type, I propose to just discard all clauses which do not
match OpExpr and ArrayExpr for now, as per the attached.  It would be
definitely a good practice to have a default code path returning
PARTCLAUSE_UNSUPPORTED where the element list cannot be built.

Thoughts?
--
Michael
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index f954b92a6b..2d2f88e880 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -1690,7 +1690,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
 elem_exprs = lappend(elem_exprs, elem_expr);
 			}
 		}
-		else
+		else if (IsA(rightop, ArrayExpr))
 		{
 			ArrayExpr  *arrexpr = castNode(ArrayExpr, rightop);
 
@@ -1704,6 +1704,16 @@ match_clause_to_partition_key(RelOptInfo *rel,
 
 			elem_exprs = arrexpr->elements;
 		}
+		else
+		{
+			/*
+			 * Ignore all other clause types.  It could be possible here
+			 * to reach this code path with a type coercion from an
+			 * array type to another with ArrayCoerceExpr which depends on
+			 * per-element execution for the conversion.
+			 */
+			return PARTCLAUSE_UNSUPPORTED;
+		}
 
 		/*
 		 * Now generate a list of clauses, one for each array element, of the
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index e0cc5f3393..86dcd62d55 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1073,6 +1073,48 @@ explain (costs off) select * from boolpart where a is not unknown;
  Filter: (a IS NOT UNKNOWN)
 (7 rows)
 
+-- type coercion from one array type to another, no pruning
+create table coercepart (a varchar) partition by list (a);
+create table coercepart_ab partition of coercepart for values in ('ab');
+create table coercepart_cd partition of coercepart for values in ('cd');
+create table coercepart_ef_gh partition of coercepart for values in ('ef', 'gh');
+explain (costs off) select * from coercepart where a in ('ab', to_char(125, '999'));
+  QUERY PLAN  
+--
+ Append
+   ->  Seq Scan on coercepart_ab
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+   ->  Seq Scan on coercepart_cd
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+   ->  Seq Scan on coercepart_ef_gh
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a in ('ab', NULL, to_char(125, '999'));
+  QUERY PLAN   
+---
+ Append
+   ->  Seq Scan on coercepart_ab
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, NULL::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+   ->  Seq Scan on coercepart_cd
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, NULL::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+   ->  Seq Scan on coercepart_ef_gh
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, NULL::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a in ('ef', 'gh', to_char(125, '999'));
+  QUERY PLAN   

Re: [HACKERS] path toward faster partition pruning

2018-05-07 Thread Marina Polyakova

On 07-05-2018 4:37, Michael Paquier wrote:

On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote:
I got a similar server crash as in [1] on the master branch since the 
commit
9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails 
because
the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, 
but is

an ArrayCoerceExpr (see [2]):


Indeed, I can see the crash.  I have been playing with this stuff and I
am in the middle of writing the patch, but let's track this properly 
for

now.


Thank you very much!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Having query cache in core

2018-05-07 Thread Sergei Kornilov
> Having been bitten by the feature on MySQL, I think it's not a good thing.
Even in MySQL itself this feature was already removed.



Re: citext function overloads for text parameters

2018-05-07 Thread Shay Rojansky
>
>
>> Thanks for the input. It's worth noting that the equality operator
>> currently works in the same way: citext = text comparison is (surprisingly
>> for me) case-sensitive.
>>
>> My expectation was that since citext is supposed to be a case-insensitive
>> *type*, all comparison operations involving it should be case-insensitive;
>>
>
> Comparison requires both things to be the same type.  The rules for
> implicitly converting one type to another prefer the core type text over
> the extension type citext.
>
> IOW, there is no such operator =(citext,text) and thus "citext = text
> comparison" is technically invalid.
>
> At this point we're sorta stuck with our choice, and while individual
> databases can implement their own functions and operators there is value in
> doing things the way the system provides to minimize future confusion and
> bugs.
>

OK, thanks for everyone's input.


citext function overloads for text parameters

2018-05-07 Thread David G. Johnston
On Sunday, May 6, 2018, Shay Rojansky  wrote:
>
>
> Thanks for the input. It's worth noting that the equality operator
> currently works in the same way: citext = text comparison is (surprisingly
> for me) case-sensitive.
>
> My expectation was that since citext is supposed to be a case-insensitive
> *type*, all comparison operations involving it should be case-insensitive;
>

Comparison requires both things to be the same type.  The rules for
implicitly converting one type to another prefer the core type text over
the extension type citext.

IOW, there is no such operator =(citext,text) and thus "citext = text
comparison" is technically invalid.

At this point we're sorta stuck with our choice, and while individual
databases can implement their own functions and operators there is value in
doing things the way the system provides to minimize future confusion and
bugs.

David J.


Re: Having query cache in core

2018-05-07 Thread Heikki Linnakangas

On 07/05/18 05:47, Tom Lane wrote:

Tatsuo Ishii  writes:

Does anybody think having in-memory query result cache in core is a
good idea?


No.


Agreed.

You could probably write an extension for that, though. I think the 
planner hook and custom scans give you enough flexibility to do that 
without modifying the server code.


- Heikki



Re: Having query cache in core

2018-05-07 Thread Laurenz Albe
Tom Lane wrote:
> Tatsuo Ishii  writes:
> > Does anybody think having in-memory query result cache in core is a
> > good idea?
> 
> No.

I agree.

Having been bitten by the feature on MySQL, I think it's not a good thing.

Essentially, it's a band-aid for badly written applications, but it will
only help in certain cases and hurts in many others.

Yours,
Laurenz Albe



Re: citext function overloads for text parameters

2018-05-07 Thread Shay Rojansky
>
> > Do you think it would be appropriate to simply add an strpos(citext,
> text)
> > overload to the extension to make sure this behaves more as expected? If
> so
> > I can try to submit a patch at some point.
>
> To me, if there's both a citext and a text parameter, then it's simply
> unclear which behavior is wanted; I do not think there's a principled
> argument that it should be case-insensitive.  Thus, adding a function
> to reverse the current interpretation would amount to breaking some
> currently-working apps to favor others that currently don't work.
> Doesn't really sound like a net gain.
>

Thanks for the input. It's worth noting that the equality operator
currently works in the same way: citext = text comparison is (surprisingly
for me) case-sensitive.

My expectation was that since citext is supposed to be a case-insensitive
*type*, all comparison operations involving it should be case-insensitive;
at least there seems to be more value in things being this way. But if that
doesn't sound like a convincing/principled argument this can be dropped
(and I do agree that the breakage may not be worth it).


Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-07 Thread David Rowley
Many thanks for reviewing this.

On 2 May 2018 at 20:07, Amit Langote  wrote:
> +   
> +Partition Pruning is also more powerful than constraint exclusion as
> +partition pruning is not something that is performed only during the
> +planning of a given query.
>
> Maybe, don't repeat "partition pruning" again in the same sentence.  How
> about:
>
> .. more powerful than constraint exclusion as *it* is not something..

changed.

> Or may suggest to rewrite it as:
>
> Partition pruning is also more powerful than constraint exclusion as it
> can be performed not only during the planning of a given query, but also
> during its execution.
>
> If you accept the above rewrite, the next sentences in the paragraph:
>
> +In certain cases, partition pruning may also
> +be performed during execution of the query as well.  This allows pruning
> +to be performed using values which are unknown during query planning, for
> +example, using parameters defined in a PREPARE
> +statement, using a value obtained from a subquery or using parameters
> from
> +a parameterized nested loop join.
>
> could be adjusted a bit to read as:
>
> For example, this allows pruning to be performed using values which are
> unknown during query planning but will be known during execution, such as
> using parameters defined in a PREPARE statement (if a
> generic plan is chosen), or using a value obtained from a subquery, or
> using values from an outer row of a parameterized nested loop join.

I've changed this a bit but I didn't mention generic plans. What you
say is true, but I didn't think we needed to be so specific.

> +   
> +The partition pruning which is performed during execution is done so at
> +either one or both of the following times:
>
> done so at -> done at

Changed

> +   If partition pruning can be
> +   performed here then there is the added benefit of not having to
> +   initialize partitions which are pruned.  Partitions which are pruned
> +   during this stage will not show up in the query's
> +   EXPLAIN or EXPLAIN ANALYZE.  It
> +   is possible to determine the number of partitions which were removed
> +   using this method by observing the Subplans Removed
> +   property in the EXPLAIN output.
>
> While it might be OK to keep the last two sentences, not sure about the
> 1st, which seems like it's spelling out an implementation detail -- that
> there is an initialization step for partitions.  It's a nice performance
> enhancement, sure, but might be irrelevant to the users reading this
> documentation.

I've reworded this. I think it's important to inform the reader that
this is performed during initialization of the plan as without that
they might ask why there are two phases of pruning and not just one.
Not having to initialize the subnode for pruned partitions is the sole
advantage of doing this pruning phase, so I would rather be specific
about when it occurs.

> +   nested loop joins.  Since the value of these parameters may change
> many
> +   times during the execution of the query, partition pruning is
> performed
> +   whenever one of the execution parameters which is being compared to a
> +   partition column or expression changes.
>
> How about writing the last part as: whenever one of the execution
> parameters relevant to pruning changes

I've reworded this.

> +   
> +
> + Currently, partition pruning of partitions during the planning of an
> + UPDATE or DELETE command are
> + internally implemented using the constraint exclusion method.  Only
> + SELECT uses the faster partition pruning method.
> Also
> + partition pruning performed during execution is only done so for the
> + Append node type.  Both of these limitations are likely to be removed
> + in a future release of PostgreSQL.
> +
> +   
>
> Do we need to write this given that we decided to decouple even the
> UPDATE/DELETE pruning from the constraint_exclusion configuration?

I think it's important to inform people of the limitations. I know
there's a lot of opinions floating around about the usability of
partitioning in PostgreSQL with a large number of partitions. I
included this here so interested parties know that their problems are
not all solved by partition pruning. Perhaps those people can watch
for the removal of this notice.

>  Also,
> noting that only Append nodes can use execution-time pruning seems
> unnecessary.  I don't see plan node names mentioned like this elsewhere in
> the documentation.  But more to the point, it seems like spilling out
> finer implementation details (and/or limitations thereof) in the
> user-facing documentation.

I thought about this while writing the patch, and it forced me to grep
for instances of "Append" in the docs.  There were some, so I didn't
think I was breaking any rules.  I also have no idea how else we might
explain that it works for Append