Re: AS OF queries

2018-01-25 Thread Konstantin Knizhnik



On 26.01.2018 03:55, Bruce Momjian wrote:

On Sat, Dec 23, 2017 at 11:53:19PM +0300, konstantin knizhnik wrote:

On Dec 23, 2017, at 2:08 AM, Greg Stark wrote:


On 20 December 2017 at 12:45, Konstantin Knizhnik
 wrote:


It seems to me that it will be not so difficult to implement them in
Postgres - we already have versions of tuples.
Looks like we only need to do three things:
1. Disable autovacuum (autovacuum = off)

"The Wheel of Time turns, and Ages come and pass, leaving memories
that become legend. Legend fades to myth, and even myth is long
forgotten when the Age that gave it birth comes again"

I think you'll find it a lot harder to get this to work than just
disabling autovacuum. Notably HOT updates can get cleaned up (and even
non-HOT updates can now leave tombstone dead line pointers iirc) even
if vacuum hasn't run.


Yeh, I suspected that just disabling autovacuum was not enough.
I heard (but do no know too much) about microvacuum and hot updates.
This is why I was a little bit surprised when me test didn't show lost of 
updated versions.
May be it is because of vacuum_defer_cleanup_age.

Well vacuum and single-page pruning do 3 things:

1.  remove expired updated rows
2.  remove deleted row
3.  remove rows from aborted transactions

While time travel doesn't want #1 and #2, it probably wants #3.


Rows of aborted transactions are in any case excluded by visibility checks.
Definitely skipping them costs some time, so large percent of aborted 
transactions  may affect query speed.
But query speed is reduced in any case if in order to support time 
travel we prohibit or postpone vacuum.


What is the expected relation of committed and aborted transactions? I 
expected that it should be much bigger than one (especially if we take 
in account
only read-write transaction which has really updated database). In this 
case number of versions created by aborted transaction should be much 
smaller than number of versions created by updated/delete of successful 
transactions. So them should not have significant impact on performance.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] GnuTLS support

2018-01-25 Thread Daniel Gustafsson
> On 26 Jan 2018, at 02:10, Michael Paquier  wrote:
> On Fri, Jan 26, 2018 at 12:27:16AM +0100, Daniel Gustafsson wrote:

>> While only tangentially related to the issue this patch solves, converting
>> be_tls_get_peerdn_name() to return const char * seems reasonable too to keep
>> the API consistent.
> 
> Why? This is not used for error message generation yet. We could always
> change the API as needed later on.

Mainly to keep the be_tls_* interface consistent in how the routines return
data.  I don’t have any strong opinions though, it just seemed like a good time
to do it.

cheers ./daniel


Re: JIT compiling with LLVM v9.0

2018-01-25 Thread Konstantin Knizhnik

Hi,

I've spent the last weeks working on my LLVM compilation patchset. In
the course of that I *heavily* revised it. While still a good bit away
from committable, it's IMO definitely not a prototype anymore.




Below are results on my system for Q1 TPC-H scale 10 (~13Gb database)

Options
Time
Default
20075
jit_expressions=on
16105
jit_tuple_deforming=on  14734
jit_perform_inlining=on
13441


Also I noticed that parallel execution didsables JIT.
At my computer with 4 cores time of Q1 with parallel execution is 6549.
Are there any principle problems with combining JIT and parallel execution?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-01-25 Thread David Rowley
On 21 January 2018 at 19:21, David Rowley  wrote:
> On 20 January 2018 at 18:50, Tom Lane  wrote:
>> Stephen Froehlich  writes:
>>> Are custom statistics in PG10 retained in LIKE (INCLUDING ALL) or do I need 
>>> to recreate the statistics by hand each time I create a new table?
>>
>> LIKE doesn't know about those, no.  Perhaps it should.
>
> Agreed. ALL does not mean MOST.

(Moving to -hackers)

The attached implements this.

Looking at "LIKE ALL" in more detail in the docs it claims to be
equivalent to "INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING
CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING COMMENTS",
so it didn't seem right to magically have LIKE ALL include something
that there's no option to include individually, so I added INCLUDING
STATISTICS.

This also required writing code allow statistics to be created without
a name. The code I added to choose the default name is in the form
___stat. I'm sure someone will want
something else, but that's just what I've personally standardised on.
I'd also be fine with "stx" or "sta". Making this work required a bit
of shuffling of error checking in CreateStatistics() so that we
generate a name before complaining about any duplicate name.

I'm unsure if this should be categorised as a bug fix or a new feature.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


create_table_like_statistics_v1.patch
Description: Binary data


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-25 Thread Amit Kapila
On Fri, Jan 26, 2018 at 12:00 PM, Peter Geoghegan  wrote:
> On Thu, Jan 25, 2018 at 10:00 PM, Amit Kapila  wrote:
>>> At this point, my preferred solution is for someone to go implement
>>> Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems
>>> like the logical person for the job).
>>>
>>
>> I can implement it and share a prototype patch with you which you can
>> use to test parallel sort stuff.
>
> That would be great. Thank you.
>
>> I would like to highlight the
>> difference which you will see with WaitForParallelWorkersToAttach as
>> compare to WaitForParallelWorkersToFinish() is that the former will
>> give you how many of nworkers_launched workers are actually launched
>> whereas latter gives an error if any of the expected workers is not
>> launched.  I feel former is good and your proposed way of calling it
>> after the leader is done with its work has alleviated the minor
>> disadvantage of this API which is that we need for workers to startup.
>
> I'm not sure that it makes much difference, though, since in the end
> WaitForParallelWorkersToFinish() is called anyway, much like
> nodeGather.c. Have I missed something?
>

Nopes, you are right.  I had in my mind that if we have something like
what I am proposing, then we don't even need to detect failures in
WaitForParallelWorkersToFinish and we can finish the work without
failing.

> I had imagined that WaitForParallelWorkersToAttach() would give me an
> error in the style of WaitForParallelWorkersToFinish(), without
> actually waiting for the parallel workers to finish.
>

I think that is also doable.  I will give it a try and report back if
I see any problem with it.  However, it might take me some time as I
am busy with few other things and I am planning to take two days off
for some personal reasons, OTOH if it turns out to be a simple (which
I expect it should be), then I will report back early.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2018-01-25 Thread Peter Moser
On Mon, 2018-01-08 at 11:18 -0500, Robert Haas wrote:
> I also don't agree with the idea that we should reject syntax that
> doesn't appear in the SQL standard.  We have a great deal of such
> syntax already, and we add more of it in every release, and a good
> deal of it is contributed by you and your colleagues.  I don't see
> why this patch should be held to a stricter standard than we do in
> general.  I agree that there is some possibility for pain if the SQL
> standards committee adopts syntax that is similar to whatever we pick
> but different in detail, but I don't think we should be too worried
> about that unless other database systems, such as Oracle, have syntax
> that is similar to what is proposed here but different in
> detail.  The
> SQL standards committee seems to like standardizing on whatever
> companies with a lot of money have already implemented; it's unlikely
> that they are going to adopt something totally different from any
> existing system but inconveniently similar to ours.

We agree with you.

Best regards,
Anton, Johann, Michael, Peter



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2018-01-25 Thread Peter Moser
On Sun, 2018-01-07 at 09:58 +, Simon Riggs wrote:
> > The attached README explains the ALIGN operation step-by-step with
> > a TEMPORAL LEFT OUTER JOIN example. That is, we start from a query
> > input, show how we rewrite it during parser stage, and show how the
> > final execution generates result tuples.
> Sorry, this was too complex for me.
>
> Can we get a much simpler example please?

Please see the following simpler example:

DROP TABLE budg;
CREATE TABLE budg(name VARCHAR(5), amnt INTEGER, t DATERANGE);
INSERT INTO budg VALUES ('Joe', 5, '[2012/2/1,2012/9/1)');
INSERT INTO budg VALUES ('Ann', 7, '[2012/5/1,2012/9/1)');
INSERT INTO budg VALUES ('Per', 3, '[2012/4/1,2012/10/1)');
SELECT * FROM budg AS r;

SELECT *
FROM ( budg r ALIGN budg s ON s.amnt > r.amnt WITH (t,t)) r
WHERE NOT EXISTS (
  SELECT *
  FROM (budg s ALIGN budg r ON s.amnt > r.amnt WITH (t,t)) s
  WHERE s.amnt > r.amnt
  AND r.t = s.t  );

--  name | amnt |t
-- --+--+-
--  Joe  |5 | [2012-02-01,2012-05-01)
--  Ann  |7 | [2012-05-01,2012-09-01)
--  Per  |3 | [2012-09-01,2012-10-01)



Best regards,
Anton, Johann, Michael, Peter



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2018-01-25 Thread Peter Moser
On Sat, 2018-01-06 at 20:29 +, Simon Riggs wrote:
> * various PostJoinSetProjection() functions to be called as needed
> So the idea is we enable Postgres to allow major new functionality,
> as was done for PostGIS so successfully.

If we use a post-join approach many intermediate result tuples, that do
not contribute to the final result would be emmitted, and thus the
performance would suffer.

Best regards,
Anton, Johann, Michael, Peter



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2018-01-25 Thread Peter Moser
On Thu, 2017-11-30 at 12:26 -0500, Robert Haas wrote:
> I wonder if we could instead think about R NORMALIZE S ON R.x = S.y
> WITH (R.time, S.time) as an ordinary join on R.x = S.y with some
> extra processing afterwards.  After finding all the join partners for
> a row in R, extract all the lower and upper bounds from the S.time
> fields of the join partners and use that to emit as many rows from R
> as may be necessary.

We have now a new approach to plan and execute NORMALIZE as a special
join node of type NORMALIZE, an append-plan on the inner join path,
and a merge-join executor. For the latter, we would need to
extend nodeMergejoin.c with an point-in-range-containment join.

That is, we create a new join path within sort_inner_and_outer
(joinpath.c). First two projection nodes to extract the start- and
end-timepoints of the range type used as interval, and above an
append-plan to merge both subplans. In detail, outer_path is just sort,
whereas inner_path is append of (B, ts) projection with (B, te)
projection.
Hereby, B is a set of non-temporal attributes used in join equality
predicates, and [ts,te) forms the valid-time interval. Non-equality
predicates must be handled separately as a filter step.

Do you think, it is a good idea to extend the sort_inner_and_outer()
with a new branch, where jointype == NORMALIZE and add the projection
and append sub-paths there?

> The main problem that I see with that approach is that it
> seems desirable to separate the extra-processing-afterwards step into
> a separate executor node, and there's no way for a separate type of
> plan node to know when the join advances the outer side of the join.
> That's the purpose that row_id() is serving as you have it; we'd have
> to invent some other kind of signalling mechanism, which might not be
> entirely trivial.  :-(

One advantage of the new approach is that row_id() is no longer needed.
The executor knows that it is inside a new "group" after every NEXTOUTER,
then the whole loop of the inner relation has the same row_id. The whole
mechanism could be handled through the MergeJoinState struct.

> If we could do it, though, it might be quite a bit more efficient,
> because it would avoid scanning S twice and performing a UNION on the
> results of those scans.  Also, we wouldn't necessarily need to sort
> the whole set of rows from R, which I suspect is unavoidable in your
> implementation.  We'd just need to sort the individual groups of rows
> from S, and my guess is that in many practical cases those groups are
> fairly small.

We would need a special SEQSCAN node/executor, that does the projection
and append steps in a single read. Do you have any suggestions how to
implement this?

> I wonder what identities hold for NORMALIZE.  It does not seem to be
> commutative, but it looks like it might be associative... i.e. (R
> NORMALIZE S) NORMALIZE T produces the same output as R NORMALIZE (S
> NORMALIZE T), perhaps?

It is not commutative and also not associative.
Assume relations that contain one tuple each as follows:

R={[1, 100)}, S={[50, 100)}, and T={[10, 20)}

(R NORMALIZE S) NORMALIZE T gives {[1, 10), [10, 20), [20,50), [50, 100)}

while

R NORMALIZE (S NORMALIZE T) gives {[1, 50), [50, 100)}

Best regards,
Anton, Johann, Michael, Peter



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-25 Thread Peter Geoghegan
On Thu, Jan 25, 2018 at 10:00 PM, Amit Kapila  wrote:
>> At this point, my preferred solution is for someone to go implement
>> Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems
>> like the logical person for the job).
>>
>
> I can implement it and share a prototype patch with you which you can
> use to test parallel sort stuff.

That would be great. Thank you.

> I would like to highlight the
> difference which you will see with WaitForParallelWorkersToAttach as
> compare to WaitForParallelWorkersToFinish() is that the former will
> give you how many of nworkers_launched workers are actually launched
> whereas latter gives an error if any of the expected workers is not
> launched.  I feel former is good and your proposed way of calling it
> after the leader is done with its work has alleviated the minor
> disadvantage of this API which is that we need for workers to startup.

I'm not sure that it makes much difference, though, since in the end
WaitForParallelWorkersToFinish() is called anyway, much like
nodeGather.c. Have I missed something?

I had imagined that WaitForParallelWorkersToAttach() would give me an
error in the style of WaitForParallelWorkersToFinish(), without
actually waiting for the parallel workers to finish.

> However, now I see that you and Thomas are trying to find a different
> way to overcome this problem differently, so not sure if I should go
> ahead or not.  I have seen that you told you wanted to look at
> Thomas's proposed stuff carefully tomorrow, so I will wait for you
> guys to decide which way is appropriate.

I suspect that the overhead of Thomas' experimental approach is going
to causes problems in certain cases. Cases that are hard to foresee.
That patch makes HandleParallelMessages() set ParallelMessagePending
artificially, pending confirmation of having launched all workers.

It was an interesting experiment, but I think that your
WaitForParallelWorkersToAttach() idea has a better chance of working
out.

>> Once that's committed, I can
>> post a new version of the patch that uses that new infrastructure --
>> I'll add a call to the new function, without changing anything else.
>> Failing that, we could actually just use
>> WaitForParallelWorkersToFinish(). I still don't want to use a barrier,
>> mostly because it complicates  parallel_leader_participation=off,
>> something that Amit is in agreement with [2][3].
>>
>
> I think if we want we can use barrier API's to solve this problem, but
> I kind of have a feeling that it doesn't seem to be the most
> appropriate API, especially because existing API like
> WaitForParallelWorkersToFinish() can serve the need in a similar way.

I can't see a way in which using a barrier can have less complexity. I
think it will have quite a bit more, and I suspect that you share this
feeling.

> Just to conclude, following are proposed ways to solve this problem:
>
> 1. Implement a new API WaitForParallelWorkersToAttach and use that to
> solve this problem.  Peter G. and Amit thinks, this is a good way to
> solve this problem.
> 2. Use existing API WaitForParallelWorkersToFinish to solve this
> problem.  Peter G. feels that if API mentioned in #1 is not available,
> we can use this to solve the problem and I agree with that position.
> Thomas is not against it.
> 3. Use Thomas's new way to detect such failures.  It is not clear to
> me at this stage if any one of us have accepted it to be the way to
> proceed, but Thomas and Peter G. want to investigate it further.
> 4. Use of  Barrier API to solve this problem.  Robert appears to be
> strongly in favor of this approach.

That's a good summary.

The next revision of the patch will make the
leader-participates-as-worker spool/Tuplelsortstate start and finish
sorting before the main leader spool/Tuplelsortstate is even started.
I did this with the intention of making it very clear that my approach
does not assume a number of participants up-front -- that is actually
something we only need a final answer on at the point that the leader
merges, which is logically the last possible moment.

Hopefully this will reassure Robert. It is quite a small change, but
leads to a slightly cleaner organization within nbtsort.c, since
_bt_begin_parallel() is the only point that has to deal with leader
participation. Another minor advantage is that this makes the
trace_sort overheads/duration for each of the two tuplesort within the
leader non-overlapping (when the leader participates as a worker).

-- 
Peter Geoghegan



Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-01-25 Thread Amit Kapila
On Wed, Jan 24, 2018 at 12:44 PM, amul sul  wrote:
> On Tue, Jan 23, 2018 at 7:01 PM, Amit Kapila  wrote:
>> On Fri, Jan 12, 2018 at 11:43 AM, amul sul  wrote:
> []
>> I have asked to change the message "tuple to be updated .." after
>> heap_lock_tuple call not in nodeModifyTable.c, so please revert the
>> message in nodeModifyTable.c.
>>
>
> Understood, fixed in the attached version, sorry I'd missed it.
>
>> Have you verified the changes in execReplication.c in some way? It is
>> not clear to me how do you ensure to set the special value
>> (InvalidBlockNumber) in CTID for delete operation via logical
>> replication?  The logical replication worker uses function
>> ExecSimpleRelationDelete to perform Delete and there is no way it can
>> pass the correct value of row_moved in heap_delete.  Am I missing
>> something due to which we don't need to do this?
>>
>
> You are correct, from ExecSimpleRelationDelete, heap_delete will always
> receive row_moved = false and InvalidBlockNumber will never set.
>

So, this means that in case of logical replication, it won't generate
the error this patch is trying to introduce.  I think if we want to
handle this we need some changes in WAL and logical decoding as well.

Robert, others, what do you think?  I am not very comfortable leaving
this unaddressed, if we don't want to do anything about it, at least
we should document it.

> I didn't found any test case to hit changes in execReplication.c.  I am not 
> sure
> what should we suppose do here, and having LOG is how much worse either.
>

I think you can manually (via debugger) hit this by using
PUBLICATION/SUBSCRIPTION syntax for logical replication.  I think what
you need to do is in node-1, create a partitioned table and subscribe
it on node-2.  Now, perform an Update on node-1, then stop the logical
replication worker before it calls heap_lock_tuple.  Now, in node-2,
update the same row such that it moves the row.  Now, continue the
logical replication worker.  I think it should hit your new code, if
not then we need to think of some other way.

> What do you think, should we add an assert like EvalPlanQualFetch() here or
> the current LOG message is fine?
>

I think first let's try to hit this case.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: General purpose hashing func in pgbench

2018-01-25 Thread Fabien COELHO


Hello Ildar,

Applies, compiles, runs.

I still have a few very minor comments, sorry for this (hopefully) last 
iteration from my part. I'm kind of iterative...


The XML documentation source should avoid a paragraph on one very long 
line, but rather be indented like other sections.


I'd propose simplify the second part:

  Hash functions can be used, for example, to modify the distribution of
  random_zipfian or random_exponential
  functions in order to obtain scattered distribution.
  Thus the following pgbench script simulates possible real world workload
  typical for social media and blogging platforms where few accounts
  generate excessive load:

->

  Hash functions can be used to scatter the distribution of random
  functions such as random_zipfian or
  random_exponential.
  For instance, the following pgbench script simulates possible real
  world workload typical for social media and blogging platforms where
  few accounts generate excessive load:

Comment the Assert(0) as an internal error that cannot happen.

I'd suggest to compact the execution code by declaring int64 variable and 
coerce to int in one go, like the integer bitwise functions. I'm in favor 
to keeping them in their own case and not reuse this one.


--
Fabien.



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-25 Thread Amit Kapila
On Fri, Jan 26, 2018 at 11:30 AM, Amit Kapila  wrote:
> On Thu, Jan 25, 2018 at 1:24 AM, Peter Geoghegan  wrote:
>> On Tue, Jan 23, 2018 at 9:43 PM, Amit Kapila  wrote:
>>> Right, but what if the worker dies due to something proc_exit(1) or
>>> something like that before calling BarrierArriveAndWait.  I think this
>>> is part of the problem we have solved in
>>> WaitForParallelWorkersToFinish such that if the worker exits abruptly
>>> at any point due to some reason, the system should not hang.
>>
>> I have used Thomas' chaos-monkey-fork-process.patch to verify:
>>
>> 1. The problem of fork failure causing nbtsort.c to wait forever is a
>> real problem. Sure enough, the coding pattern within
>> _bt_leader_heapscan() can cause us to wait forever even with commit
>> 2badb5afb89cd569500ef7c3b23c7a9d11718f2f, more or less as a
>> consequence of the patch not using tuple queues (it uses the new
>> tuplesort sharing thing instead).
>>
>> 2. Simply adding a single call to WaitForParallelWorkersToFinish()
>> within _bt_leader_heapscan() before waiting on our condition variable
>> fixes the problem -- errors are reliably propagated, and we never end
>> up waiting forever.
>>
>> 3. This short term fix works just as well with
>> parallel_leader_participation=off.
>>
>> At this point, my preferred solution is for someone to go implement
>> Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems
>> like the logical person for the job).
>>
>
> I can implement it and share a prototype patch with you which you can
> use to test parallel sort stuff.  I would like to highlight the
> difference which you will see with WaitForParallelWorkersToAttach as
> compare to WaitForParallelWorkersToFinish() is that the former will
> give you how many of nworkers_launched workers are actually launched
> whereas latter gives an error if any of the expected workers is not
> launched.  I feel former is good and your proposed way of calling it
> after the leader is done with its work has alleviated the minor
> disadvantage of this API which is that we need for workers to startup.
>

/we need for workers to startup./we need to wait for workers to startup.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-25 Thread Amit Kapila
On Thu, Jan 25, 2018 at 1:24 AM, Peter Geoghegan  wrote:
> On Tue, Jan 23, 2018 at 9:43 PM, Amit Kapila  wrote:
>> Right, but what if the worker dies due to something proc_exit(1) or
>> something like that before calling BarrierArriveAndWait.  I think this
>> is part of the problem we have solved in
>> WaitForParallelWorkersToFinish such that if the worker exits abruptly
>> at any point due to some reason, the system should not hang.
>
> I have used Thomas' chaos-monkey-fork-process.patch to verify:
>
> 1. The problem of fork failure causing nbtsort.c to wait forever is a
> real problem. Sure enough, the coding pattern within
> _bt_leader_heapscan() can cause us to wait forever even with commit
> 2badb5afb89cd569500ef7c3b23c7a9d11718f2f, more or less as a
> consequence of the patch not using tuple queues (it uses the new
> tuplesort sharing thing instead).
>
> 2. Simply adding a single call to WaitForParallelWorkersToFinish()
> within _bt_leader_heapscan() before waiting on our condition variable
> fixes the problem -- errors are reliably propagated, and we never end
> up waiting forever.
>
> 3. This short term fix works just as well with
> parallel_leader_participation=off.
>
> At this point, my preferred solution is for someone to go implement
> Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems
> like the logical person for the job).
>

I can implement it and share a prototype patch with you which you can
use to test parallel sort stuff.  I would like to highlight the
difference which you will see with WaitForParallelWorkersToAttach as
compare to WaitForParallelWorkersToFinish() is that the former will
give you how many of nworkers_launched workers are actually launched
whereas latter gives an error if any of the expected workers is not
launched.  I feel former is good and your proposed way of calling it
after the leader is done with its work has alleviated the minor
disadvantage of this API which is that we need for workers to startup.

However, now I see that you and Thomas are trying to find a different
way to overcome this problem differently, so not sure if I should go
ahead or not.  I have seen that you told you wanted to look at
Thomas's proposed stuff carefully tomorrow, so I will wait for you
guys to decide which way is appropriate.

> Once that's committed, I can
> post a new version of the patch that uses that new infrastructure --
> I'll add a call to the new function, without changing anything else.
> Failing that, we could actually just use
> WaitForParallelWorkersToFinish(). I still don't want to use a barrier,
> mostly because it complicates  parallel_leader_participation=off,
> something that Amit is in agreement with [2][3].
>

I think if we want we can use barrier API's to solve this problem, but
I kind of have a feeling that it doesn't seem to be the most
appropriate API, especially because existing API like
WaitForParallelWorkersToFinish() can serve the need in a similar way.

Just to conclude, following are proposed ways to solve this problem:

1. Implement a new API WaitForParallelWorkersToAttach and use that to
solve this problem.  Peter G. and Amit thinks, this is a good way to
solve this problem.
2. Use existing API WaitForParallelWorkersToFinish to solve this
problem.  Peter G. feels that if API mentioned in #1 is not available,
we can use this to solve the problem and I agree with that position.
Thomas is not against it.
3. Use Thomas's new way to detect such failures.  It is not clear to
me at this stage if any one of us have accepted it to be the way to
proceed, but Thomas and Peter G. want to investigate it further.
4. Use of  Barrier API to solve this problem.  Robert appears to be
strongly in favor of this approach.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Redefining inet_net_ntop

2018-01-25 Thread Tom Lane
Craig Ringer  writes:
> Should we be using our own if the OS has it? I'm thinking of adding a test
> to configure and omitting our own version if configure finds it. Objections?

I can't imagine that there's any real upside here.  The amount of code
involved is barely over a kilobyte, and we'd be exposing ourselves to
indeterminate version discrepancies.

Having said that, we got that code from bind, and the release process docs
suggest that we should check for upstream changes every so often.  I don't
think we've done so in a long time :-(

regards, tom lane



RE: Temporary tables prevent autovacuum, leading to XID wraparound

2018-01-25 Thread Tsunakawa, Takayuki
From: Robert Haas [mailto:robertmh...@gmail.com]
> I think we should consider having backends try to remove their temporary
> schema on startup; then, if a temp table in a backend is old enough that
> it's due for vacuum for wraparound, have autovacuum kill the connection.
> The former is necessary to prevent sessions from being killed on account
> of temp tables they "inherited" from a backend that didn't exit cleanly.

That seems to be the only reasonable solution.  One might feel it annoying to 
emit WAL during connection establishment to delete the temp schema, but even 
now the client authentication can emit WAL for hot pruning while scanning 
pg_database, pg_authid, etc.  Thanks.

> The in-place update idea won't work for a couple of reasons.  First, a
> command shouldn't see the results of modifications made earlier in the same
> command.  Second, using cursors, it's possible to have more than one
> distinct snapshot open against a temporary table at the same time.

You're right.  And if the transaction rolls back, it needs to see the old 
tuples, which requires undo log.

Regards
Takayuki Tsunakawa



RE: Temporary tables prevent autovacuum, leading to XID wraparound

2018-01-25 Thread Tsunakawa, Takayuki
From: Masahiko Sawada [mailto:sawada.m...@gmail.com]
> On Thu, Jan 25, 2018 at 3:14 PM, Tsunakawa, Takayuki
>  wrote:
> > * Why does autovacuum launcher always choose only one database when that
> database need vacuuming for XID wraparound?  Shouldn't it also choose other
> databases?
> 
> Yeah, I'd also like to fix this issue. This can be problem even in other
> case; there are two databases that require anti-wraparound vacuum, and one
> of them has a very large table that could take a long time to vacuum. In
> this case, if autovacuum chooses the database having big table first,
> another database would not be selected until an autovacuum worker completes
> vacuum on the large table. To deal with it, I think we can make autovacuum
> workers tell that it is no longer necessary to launch a new autovacuum worker
> on the database to autovacuum launcher before exit. For example, autovacuum
> worker reports both the total number of relations and the number of relations
> that require an anti-wraparound vacuum to the stats collector.

Thanks for commenting.  I believe you have deep knowledge and experience with 
vacuum because you did a great work for freeze map in 9.6, so I appreciate your 
help!

How would you use those two counts?

How about just modifying do_start_worker(), so that the launcher chooses a 
database in the following order?

1. wraparound-risky database not being vacuumed by any worker
2. non-wraparound-risky database  not being vacuumed by any worker
3. wraparound-risky database being vacuumed by any worker
4. non-wraparound-risky database being vacuumed by any worker

Regards
Takayuki Tsunakawa




Re: Redefining inet_net_ntop

2018-01-25 Thread Peter Eisentraut
On 1/25/18 22:24, Craig Ringer wrote:
> port.h declares inet_net_ntop and we always compile our own
> from port/inet_net_ntop.c .
> 
> But it's part of -lresolv on Linux, and more importantly, it's declared
> in .
> 
> Should we be using our own if the OS has it? I'm thinking of adding a
> test to configure and omitting our own version if configure finds it.

ISTR that ours is hacked to produce specific output.  Do the regression
tests still pass if you use the one from the OS?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Covering + unique indexes.

2018-01-25 Thread Thomas Munro
On Fri, Jan 26, 2018 at 3:01 AM, Anastasia Lubennikova
 wrote:
> Thanks for the reminder. Rebased patches are attached.

This is a really cool and also difficult feature.  Thanks for working
on it!  Here are a couple of quick comments on the documentation,
since I noticed it doesn't build:

SGML->XML change: (1) empty closing tags "" are no longer accepted,
(2)  now needs to be written  and (3) xref IDs
are now case-sensitive.

+  PRIMARY KEY ( column_name [, ... ] ) index_parameters INCLUDE
(column_name [,
...]) |

I hadn't seen that use of "" before.  Almost everywhere else
we use explicit [ and ] characters, but I see that there are other
examples, and it is rendered as [ and ] in the output.  OK, cool, but
I think there should be some extra whitespace so that it comes out as:

  [ INCLUDE ... ]

instead of:

  [INCLUDE ...]

to fit with the existing convention.

+... This also allows UNIQUE indexes to be defined on
+one set of columns, which can include another set of columns in the
+   INCLUDE clause, on which the uniqueness is not enforced.
+It's the same with other constraints (PRIMARY KEY and
EXCLUDE). This can
+also can be used for non-unique indexes as any columns which
are not required
+for the searching or ordering of records can be used in the
+INCLUDE clause, which can slightly reduce the
size of the index.

Can I suggest rewording these three sentences a bit?  Just an idea:

UNIQUE indexes, PRIMARY KEY
constraints and EXCLUDE constraints can be defined
with extra columns in an INCLUDE clause, in which
case uniqueness is not enforced for the extra columns.  Moving columns
that are not needed for searching, ordering or uniqueness into the
INCLUDE clause can sometimes reduce the size of the
index while retaining the possibility of using a faster index-only
scan.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] generated columns

2018-01-25 Thread Peter Eisentraut
On 1/19/18 00:18, Michael Paquier wrote:
> Instead of leaving bits for a feature that may or may not be
> implemented, have you considered just blocking STORED at parsing level
> and remove those bits? There is little point in keeping the equivalent
> of dead code in the tree. So I would suggest a patch simplification:
> - Drop the VIRTUAL/STORED parsing from the grammar for now.
> - Define VIRTUAL as the default for the future.

fixed

> =# CREATE TABLE gen_1 (a int, b int GENERATED ALWAYS AS (a * 2)
> VIRTUAL);
> CREATE TABLE
> =# insert into gen_1 values (20);
> INSERT 0 1
> =# select * from gen_1 ;
> ERROR:  22003: integer out of range
> Overflow checks do not happen when inserting, which makes the following
> SELECT to fail. This could be confusing for the user because the row has
> been inserted. Perhaps some evaluation of virtual tuples at insert phase
> should happen. The existing behavior makes sense as well as virtual
> values are only evaluated when read, so a test would be at least
> welcome.

added test

> Does the SQL spec mention the matter? How do other systems
> handle such cases?

In Oracle you get the same overflow error.

> CHECK constraints can be combined, still..

I think you can compare this to a view.  A view can produce overflow
errors on read.  But a CHECK constraint is more like a CHECK option on a
view that checks as values are put into the view.

> The last patch crashes for typed tables:
> =# CREATE TYPE itest_type AS (f1 integer, f2 text, f3 bigint);
> CREATE TYPE
> =# CREATE TABLE itest12 OF itest_type (f1 WITH OPTIONS GENERATED ALWAYS
> AS (f2 *2));
> [... boom ...]
> And for partitions:
> =# CREATE TABLE itest_parent (f1 date NOT NULL, f2 text, f3 bigint)
>PARTITION BY RANGE (f1);
> CREATE TABLE
> =# CREATE TABLE itest_child PARTITION OF itest_parent (f3 WITH OPTIONS
>GENERATED ALWAYS AS (f3)) FOR VALUES FROM ('2016-07-01') TO
>('2016-08-01');
> [... boom ...]
> Like what we did in 005ac298, I would suggest to throw
> ERRCODE_FEATURE_NOT_SUPPORTED. Please also add some tests.

done

> +SELECT a, c FROM gtest12;  -- FIXME: should be allowed
> +ERROR:  permission denied for function gf1

This is quite hard to fix and I would like to leave this for a future
release.

> +ALTER TABLE gtest27 ALTER COLUMN b DROP DEFAULT;  -- FIXME
> +ERROR:  column "b" of relation "gtest27" is a generated column

That FIXME seems to have been a mistake.  I have removed it.

> + if (get_attgenerated(relationId, attno))
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +  errmsg("index creation on generated columns is not 
> supported")));
> Shouldn't such messages mention explicitely "virtually"-generated
> columns? For stored columns the support would not be complicated in this
> case.

done

> +-- domains
> +CREATE DOMAIN gtestdomain1 AS int CHECK (VALUE < 10);
> +CREATE TABLE gtest24 (a int PRIMARY KEY, b gtestdomain1 GENERATED
> ALWAYS AS (a * 2));  -- prohibited
> +ERROR:  virtual generated column "b" cannot have a domain type
> CHECK constraints can be used, so I find this restriction confusing.

We currently don't have infrastructure to support this.  We run all
CHECK constraints whenever a row is changed, so that is easy.  But we
don't have facilities to recheck the domain constraint in column b when
column a is updated.  This could be done, but it's extra work.

> No test coverage for DELETE triggers?

done

> +UPDATE gtest26 SET a = a * -2;
> +INFO:  gtest1: old = (-2,)
> +INFO:  gtest1: new = (4,)
> +INFO:  gtest3: old = (-2,)
> +INFO:  gtest3: new = (4,)
> +INFO:  gtest4: old = (3,)
> +INFO:  gtest4: new = (-6,)
> OLD and NEW values for generated columns don't show up. Am I missing
> something or they should be available?

This was already discussed a few times in the thread.  I don't know what
a good solution is.

I have in this patch added facilties to handle this better in other PLs.
 So the virtual generated column doesn't show up there in the trigger
data.  This is possible because we copy the trigger data from the
internal structures into language-specific hashes/dictionaries/etc.

In PL/pgSQL, this is a bit more difficult, because we handle the table's
row type there.  We can't just "hide" the generated column when looking
at the row type.  Actually, we could do it quite easily, but that would
probably raise other weirdnesses.

This also raises a question how a row type with generated columns should
behave.  I think a generation expression is a property of a table, so it
does not apply in a row type.  (Just like a default is a property of a
table and does not apply in row types.)

> Please use brackers here if you use a comment in the if() block...
> [/nit_mode]

done

> COPY as you are proposing looks sensible to me. I am not sure about any
> options though as it is possible to enforce the selection of generated
> columns using COPY (SELECT).

removed that comment

> Per my tests, gene

Redefining inet_net_ntop

2018-01-25 Thread Craig Ringer
Hi folks

port.h declares inet_net_ntop and we always compile our own
from port/inet_net_ntop.c .

But it's part of -lresolv on Linux, and more importantly, it's declared in
.

Should we be using our own if the OS has it? I'm thinking of adding a test
to configure and omitting our own version if configure finds it. Objections?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] GnuTLS support

2018-01-25 Thread Peter Eisentraut
On 1/25/18 20:10, Michael Paquier wrote:
> Peter, could you change ssl_version() and ssl_cipher() in sslinfo at the
> same time please? I think that those should use the generic backend-side
> APIs as well. sslinfo depends heavily on OpenSSL, OK, but if possible
> getting this code more generic will help users of sslinfo to get
> something partially working with other SSL implementations natively.

sslinfo is currently entirely dependent on OpenSSL, so I don't think
it's useful to throw in one or two isolated API changes.

I'm thinking maybe we should get rid of sslinfo and fold everything into
pg_stat_ssl.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-01-25 Thread Peter Eisentraut
On 1/25/18 09:40, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 1/23/18 13:39, Robert Haas wrote:
>>> I don't understand.  AAUI, it is currently the case that if you set
>>> the options before the TOAST table exists, they are lost.
> 
>> Well, that's also weird.
> 
> Well, maybe the right answer is to address that.  It's clear to me
> why that would happen if we store these things as reloptions on the
> toast table, but can't they be stored on the parent table?

I don't think there is a case where this can currently be a problem with
the current options set, but here is what I was thinking about:  Imagine
there were a setting toast.fillfactor or something like that that
affects the storage layout of the TOAST table.  If you ran

ALTER TABLE mytbl ADD COLUMN foo text DEFAULT somelongvalue();

then you would conceivably want to set TOAST-related reloptions before
the TOAST table is created and have them apply as the TOAST table is
being first populated.

Again, currently not a problem, I think.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Logical decoding of TRUNCATE

2018-01-25 Thread Peter Eisentraut
On 1/24/18 07:53, Petr Jelinek wrote:
> That depends on if we consider this to be part of sequence handling or
> truncate statement replication handling. It's true that if we had
> sequence replication, the restart would get propagated that way anyway.
> OTOH, if we'll want to add option in the future to propagate cascade (to
> any additional tables on downstream), it may need to reset sequences
> which are not even present upstream and as such would not be handled by
> sequence replication.

Logical replication, as it currently is designed, replicates discrete
actions, not commands.  A delete command that deletes five rows and
cascades to delete three more rows somewhere else ends up being
replicated as eight changes.  So I think a TRUNCATE command that
cascades to four tables and resets six sequences should end up being
replicated as ten changes.  (Since we currently don't handle sequences
at all, we'll omit the six sequence changes for now.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Subscription code improvements

2018-01-25 Thread Peter Eisentraut
On 1/24/18 02:33, Masahiko Sawada wrote:
> Thank you for notification. Since it seems to me that no one is
> interested in this patch, it would be better to close out this patch.
> This patch is a refactoring patch but subscription code seems to work
> fine so far. If a problem appears around subscriptions, I might
> propose it again.

I have looked at the patches again.  They seem generally reasonable, but
I don't see quite why we want or need them.  More details would help
review them.  Do they fix any bugs, or are they just rearranging code?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2018-01-25 Thread Michael Paquier
On Thu, Jan 25, 2018 at 07:38:37AM -0500, Stephen Frost wrote:
> Michael, all,
> 
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> On Wed, Jan 24, 2018 at 12:43:51PM -0500, Stephen Frost wrote:
>>> * chenhj (chjis...@163.com) wrote:
 At 2018-01-23 09:56:48, "Stephen Frost"  wrote:
> I've only read through the thread to try and understand what's going on
> and the first thing that comes to mind is that you're changing
> pg_rewind to not remove the WAL from before the divergence (split)
> point, but I'm not sure why.  As noted, that WAL isn't needed for
> anything (it's from before the split, after all), so why keep it?  Is
> there something in this optimization that depends on the old WAL being
> there and, if so, what and why?
 
 After run pg_rewind, the first startup of postgres will do crash recovery.
 And crash recovery will begin from the previous redo point preceding the 
 divergence.
 So, the WAL after the redo point and before the divergence is needed.
>>> 
>>> Right.
>> 
>> Most of the time, and particularly since v11 has removed the need to
>> retain more past segments than one completed checkpoint, those segments
>> have less chances to be on the source server, limiting more the impact
>> of the patch discussed on this thread.
> 
> Good point.

Actually, that is worse that than, because after doing a promotion one
has to issue a checkpoint on the promoted server so as to update its
control file (pg_rewind fetches the new TLI number from the on-disk 
file), so we can never ensure that needed WAL segments will be on the
primary. Hence the proposed patch loses most of its value, because there
will always be a WAL segment hole anyway.

It can also help in reducing the load in creating new segments if
min_wal_size is large enough on the rewound server but that is a narrow
benefit.

In short, it seems really to me that we should reject the approach as
proposed, and replace it with something that prevents the fetching of
any WAL segments from the source server. I think that we could consider
as well removing all WAL segments on the primary from the point WAL
forked, as those created between the last checkpoint before WAL forked
up to the point where WAL forked are useful anyway. But those are bonus
points to keep a minimalistic amount of space for the rewound node as
they finish by being recycled anyway. For those reasons I think that the
patch should be marked as returned with feedback.

>> Yes, superuser is necessary now, if we could get to a point where only a
>> replication permission is needed that would be nice. Now we could do
>> things differently. We could have a system role dedicated to pg_rewind
>> which works only on the functions from genfile.c that pg_rewind needs,
>> in order to leverage the need of a superuser.
> 
> Actually, the other work I'm doing nearby wrt removing the explicit
> superuser() checks in those functions would allow a non-superuser role
> to be created which could be used by pg_rewind.  We could even add an
> explicit 'pg_rewind' default role if we wanted to, but is that the route
> we want to go with this or should we be thinking about changing
> pg_rewind to use the replication protocol and a replication user
> instead..?  The only reason that it doesn't today is that there isn't an
> easy way for it to do so with the existing replication protocol, as I
> understand it.

Yes, actually with the piece of work we are doing, we don't need to work
more on the replication protocol, and pg_rewind does not require heavy
refactoring either, so from a maintenance point of view we would gain
much more in the long term by using a dedicated role. If your patch to
have the file-read catalog user gets in, we should add in the docs of
pg_rewind a small set of ACL commands to allow pg_rewind to work with a
non-superuser role as long is it gains access to the minimal set of
functions necessary.

> Then again, there's this whole question of if we should even keep the
> replication protocol or if we should be getting rid of it in favor of
> have regular connections that can support replication.  There's been
> discussion of that recently, as I recall, though I can't remember
> where.

Hm. I don't recall this one..

> Having the rewound server decide by itself what it needs actually sounds
> like it would be better and would allow whatever the restore command is
> to fetch any missing WAL necessary (which it might be able to do more
> efficiently, from an archive server and possibly even in parallel as
> compared to pg_rewind doing it serially from the primary...).  We don't
> have any particularly easy way to prevent needed WAL from disappearing
> off of the primary anyway, so it seems like it would be necessary to
> have a working restore command for pg_rewind to be reliable.  While we
> could create a physical replication slot for pg_rewind when it starts,
> that may already be too late.

The only approach I can think of that

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2018-01-25 Thread Stephen Frost
Craig, Michael, all,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
> On 21 December 2017 at 11:31, Michael Paquier 
> wrote:
> 
> > On Thu, Dec 21, 2017 at 11:46 AM, Alvaro Herrera
> >  wrote:
> > > Michael Paquier wrote:
> > >> Well, the idea is really to get rid of that as there are already
> > >> facilities of this kind for CREATE TABLE LIKE in the parser and ALTER
> > >> TABLE when rewriting a relation. It is not really attractive to have a
> > >> 3rd method in the backend code to do the same kind of things, for a
> > >> method that is even harder to maintain than the other two.
> > >
> > > I dislike the backend code that uses SPI and manufacturing node to
> > > re-creates indexes.  IMO we should get rid of it.  Let's not call it
> > > "facilities", but rather "grotty hacks".
> >
> > Aha. You are making my day here ;)
> >
> > > I think before suggesting to add even more code to perpetuate that idea,
> > > we should think about going in the other direction.  I have not tried to
> > > write the code, but it should be possible to have an intermediate
> > > function called by ProcessUtility* which transforms the IndexStmt into
> > > an internal representation, then calls DefineIndex.   This way, all this
> > > code that wants to create indexes for backend-internal reasons can
> > > create the internal representation directly then call DefineIndex,
> > > instead of the horrible hacks they use today creating parse nodes by
> > > hand.
> >
> > Yeah, that would be likely possible. I am not volunteering for that in
> > the short term though..
> 
> It sounds like that'd make some of ALTER TABLE a bit less ... upsetting ...
> too.

I'm a big fan of this patch but it doesn't appear to have made any
progress in quite a while.  Is there any chance we can get an updated
patch and perhaps get another review before the end of this CF...?

Refactoring this to have an internal representation between
ProcessUtility() and DefineIndex doesn't sound too terrible and if it
means the ability to reuse that, seems like it'd be awful nice to do
so..

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: \describe*

2018-01-25 Thread Vik Fearing
On 01/26/2018 02:11 AM, Corey Huinker wrote:
> Some of the discussions about making psql more user friendly (more tab
> completions help, exit, etc) got me thinking about other ways that psql
> could be more friendly, and the one that comes to mind is our terse but
> cryptic \d* commands.
> 
> I think it would be helpful and instructive to have corresponding
> long-form describe commands. 
> 
> Take describing schemas. Is \dn intuitive? Not really. In hindsight, you
> may think "yeah, a schema is a namespace", but you never guessed 'n' on
> the first try, or the second.

At first blush, I support this idea.

> Looking over exec_command_d() a bit, I think it's a bit of a stretch do
> have each command handle a long form like this:
> 
> \describe table my_table
> or
> \describe table verbose my_table
> 
> because then each \d-variant has to account for objects named "table"
> and "verbose" and that's a path to unhappiness.

We're already being verbose so we can easily require

\describe table table

for the first case, and if you move "verbose" to before the object, then
we can have

\describe verbose table verbose

So basically, the grammar would be "\describe [verbose] [system] object
name" instead of "\dXS[+] name" where X is the object.

One thing not addressed here is a long version of \ditvS+.  Maybe
something like

\describe verbose system index, table, view 

> But if we dash-separated them, then all of the strcmps would be in the
> 'e' subsection, and each one would just have to know it's long to short
> translation, and call exec_command_d with the corresponding short command
> 
> describe => d
> describe-verbose => d+
> describe-aggregates-verbose => da+
> describe-roles => du

-1

> We could even presume the verbose flag in all cases (after all, the user
> was being verbose...), which would also cut down on tab-completion
> results, and we could check for interactive mode and display a message like
> 
> \describe-schemas (short: \dn+)
> 
> so that the person has the opportunity to learn the corresponding short
> command.

-1 on this, too.

If we presume "verbose", we need to add a "terse".  If the user is
interested in the short forms, they can issue a \? like everybody else.

> In additional to aiding tab completion discovery of the commands (i.e.
> typing "\desc" and then hitting tab, it would also make scripts a little
> more self-documenting.

I always use long versions of options when writing scripts specifically
because they are self-documenting (see 0be22457d7) so I certainly
support this argument.


Note: I am not volunteering to implement any of this, but I'll happily
review it.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-25 Thread Tsunakawa, Takayuki
From: Robert Haas [mailto:robertmh...@gmail.com]
> If I understand correctly, those results are all just pg_test_fsync results.
> That's not reflective of what will happen when the database is actually
> running.  When you use open_sync or open_datasync, you force WAL write and
> WAL flush to happen simultaneously, instead of letting the WAL flush be
> delayed.

Yes, that's pg_test_fsync output.  Isn't pg_test_fsync the tool to determine 
the value for wal_sync_method?  Is this manual misleading?

https://www.postgresql.org/docs/devel/static/pgtestfsync.html
--
pg_test_fsync - determine fastest wal_sync_method for PostgreSQL

pg_test_fsync is intended to give you a reasonable idea of what the fastest 
wal_sync_method is on your specific system, as well as supplying diagnostic 
information in the event of an identified I/O problem.
--


Anyway, I'll use pgbench, and submit a patch if open_datasync is better than 
fdatasync.  I guess the current tweak of making fdatasync the default is a 
holdover from the era before ext4 and XFS became prevalent.


> I don't have the results handy at the moment.  We found it to be faster
> on a database benchmark where the WAL was stored on an NVRAM device.

Oh, NVRAM.  Interesting.  Then I'll try open_datasync/fdatasync comparison on 
HDD and SSD/PCie flash with pgbench.

Regards
Takayuki Tsunakawa




Re: Boolean partitions syntax

2018-01-25 Thread Amit Langote
Hi Stephen.

On 2018/01/26 10:16, Stephen Frost wrote:
> * Amit Langote (langote_amit...@lab.ntt.co.jp) wrote:
> Still compiles and passes regression tests, which is good.

Thanks for looking at this.

>>> I extended your test a bit to check whether partitions over booleans are 
>>> useful.
>>> Note specifically the 'explain' output, which does not seem to restrict the 
>>> scan
>>> to just the relevant partitions.  You could easily argue that this is 
>>> beyond the scope
>>> of your patch (and therefore not your problem), but I doubt it makes much 
>>> sense
>>> to have boolean partitions without planner support for skipping partitions 
>>> like is
>>> done for tables partitioned over other data types.
>>
>> Yeah.  Actually, I'm aware that the planner doesn't work this.  While
>> constraint exclusion (planner's current method of skipping partitions)
>> does not work with IS TRUE/FALSE/UNKNOWN clauses, the new partition
>> pruning patch [1] addresses that.  In fact, I started this thread prompted
>> by some discussion about Boolean partitions on that thread [2].
>>
>> That said, someone might argue that we should also fix constraint
>> exclusion (the current method of partition pruning) so that partition
>> skipping works correctly for Boolean partitions.
> 
> For my 2c, at least, I don't think we need to fix constraint exclusion
> to work for this case and hopefully we'll get the partition pruning
> patch in but I'm not sure that we really need to wait for that either.
> Worst case, we can simply document that the planner won't actually
> exclude boolean-based partitions in this release and then fix it in the
> future.

Yeah, I meant this just as a tiny syntax extension patch.

> Looking over this patch, it seems to be in pretty good shape to me
> except that I'm not sure why you went with the approach of naming the
> function 'NoCast'.  There's a number of other functions just above
> makeBoolAConst() that don't include a TypeCast and it seems like having
> makeBoolConst() and makeBoolConstCast() would be more in-line with the
> existing code (see makeStringConst() and makeStringConstCast() for
> example, but also makeIntConst(), makeFloatConst(), et al).  That would
> require updating the existing callers that really want a TypeCast result
> even though they're calling makeBoolAConst(), but that seems like a good
> improvement to be making.

Agreed, done.

> I could see an argument that we should have two patches (one to rename
> the existing function, another to add support for boolean) but that's
> really up to whatever committer picks this up.  For my 2c, I don't think
> it's really necessary to split it into two patches.

OK, I kept the function name change part with the main patch.

Attached updated patch.

Thanks,
Amit
>From 3ebd9387c4515cc5272e12e0138cd8035fedaaea Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 12 Dec 2017 10:33:11 +0900
Subject: [PATCH v3] Allow Boolean values in partition FOR VALUES clause

---
 doc/src/sgml/ref/create_table.sgml |  6 +++---
 src/backend/parser/gram.y  | 24 +++-
 src/test/regress/expected/create_table.out | 14 ++
 src/test/regress/sql/create_table.sql  |  7 +++
 4 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml 
b/doc/src/sgml/ref/create_table.sgml
index a0c9a6d257..eaa79ae333 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -86,9 +86,9 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] 
TABLE [ IF NOT EXI
 
 and partition_bound_spec 
is:
 
-IN ( { numeric_literal | 
string_literal | NULL } [, ...] ) |
-FROM ( { numeric_literal | 
string_literal | MINVALUE | 
MAXVALUE } [, ...] )
-  TO ( { numeric_literal | 
string_literal | MINVALUE | 
MAXVALUE } [, ...] ) |
+IN ( { numeric_literal | 
string_literal | TRUE | FALSE | 
NULL } [, ...] ) |
+FROM ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] )
+  TO ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] ) |
 WITH ( MODULUS numeric_literal, 
REMAINDER numeric_literal )
 
 index_parameters in 
UNIQUE, PRIMARY KEY, and 
EXCLUDE constraints are:
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 459a227e57..15b5c85a6e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -152,6 +152,7 @@ static Node *makeFloatConst(char *str, int location);
 static Node *makeBitStringConst(char *str, int location);
 static Node *makeNullAConst(int location);
 static Node *makeAConst(Value *v, int location);
+static Node *makeBoolAConstCast(bool state, int location);
 static Node *makeBoolAConst(bool state, int location);
 static RoleSpec *makeRoleSpec(RoleSpecType type, int location);
 static void check_qualified_name(List *names, core_yyscan_t yyscanner);
@@ -2793,6 +2794,8 @@ partbound_datum:
Sconst  { $$ = makeStri

Re: [PATCH] Logical decoding of TRUNCATE

2018-01-25 Thread Robert Haas
On Thu, Jan 25, 2018 at 8:36 PM, Petr Jelinek
 wrote:
> On 26/01/18 02:34, Robert Haas wrote:
>> On Wed, Jan 17, 2018 at 12:07 PM, Petr Jelinek
>>  wrote:
>>> The patch will cascade truncation on downstream if cascade was specified
>>> on the upstream, that can potentially be dangerous and we either should
>>> not do it and only truncate the tables which were truncated upstream
>>> (but without restricting because of FKs), leaving the data inconsistent
>>> on downstream (like we do already with DELETE or UPDATE). Or maybe make
>>> it into either subscription or publication option so that user can chose
>>> the behaviour here as I am sure some people will want it to cascade (but
>>> the default should still IMHO be to not cascade as that's safer).
>>
>> Maybe I'm not understanding what is being proposed here, but it sounds
>> like you're saying that if somebody removes a bunch of data on the
>> logical master, replication will remove only some of it from the
>> servers to which the change is replicated.  That seems crazy.  Then
>> replication can't be counted on to produce a replica.
>>
> No, I was talking about extra tables that might be present on downstream
> which weren't truncated on upstream.

Oh.  That's different.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Fix a typo in autoprewarm.c

2018-01-25 Thread Masahiko Sawada
On Fri, Jan 26, 2018 at 10:13 AM, Bruce Momjian  wrote:
> On Fri, Dec 22, 2017 at 02:30:56PM +0900, Masahiko Sawada wrote:
>> Hi,
>>
>> Attached a patch for fixing a typo in autoprewarm.c. I'm sorry if I'm 
>> mistaken.
>>
>> s/withs/with/
>
> Patch applied to head, thanks.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-25 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:michael.paqu...@gmail.com]
> Or to put it short, the lack of granular syncs in ext3 kills performance
> for some workloads. Tomas Vondra's presentation on such matters are a really
> cool read by the way:
> https://www.slideshare.net/fuzzycz/postgresql-on-ext4-xfs-btrfs-and-zf
> s

Yeah, I saw this recently, too.  That was cool.

Regards
Takayuki Tsunakawa






Re: [PATCH] Logical decoding of TRUNCATE

2018-01-25 Thread Petr Jelinek
On 26/01/18 02:34, Robert Haas wrote:
> On Wed, Jan 17, 2018 at 12:07 PM, Petr Jelinek
>  wrote:
>> The patch will cascade truncation on downstream if cascade was specified
>> on the upstream, that can potentially be dangerous and we either should
>> not do it and only truncate the tables which were truncated upstream
>> (but without restricting because of FKs), leaving the data inconsistent
>> on downstream (like we do already with DELETE or UPDATE). Or maybe make
>> it into either subscription or publication option so that user can chose
>> the behaviour here as I am sure some people will want it to cascade (but
>> the default should still IMHO be to not cascade as that's safer).
> 
> Maybe I'm not understanding what is being proposed here, but it sounds
> like you're saying that if somebody removes a bunch of data on the
> logical master, replication will remove only some of it from the
> servers to which the change is replicated.  That seems crazy.  Then
> replication can't be counted on to produce a replica.
> 

No, I was talking about extra tables that might be present on downstream
which weren't truncated on upstream.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-25 Thread Robert Haas
On Thu, Jan 25, 2018 at 8:32 PM, Tsunakawa, Takayuki
 wrote:
> As I showed previously, regular file writes on PCIe flash, *not writes using 
> PMDK on persistent memory*, was 20% faster with open_datasync than with 
> fdatasync.

If I understand correctly, those results are all just pg_test_fsync
results.  That's not reflective of what will happen when the database
is actually running.  When you use open_sync or open_datasync, you
force WAL write and WAL flush to happen simultaneously, instead of
letting the WAL flush be delayed.

> And you said open_datasync was significantly faster than fdatasync.  Could 
> you show your results?  What device and filesystem did you use?

I don't have the results handy at the moment.  We found it to be
faster on a database benchmark where the WAL was stored on an NVRAM
device.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Logical decoding of TRUNCATE

2018-01-25 Thread Robert Haas
On Wed, Jan 17, 2018 at 12:07 PM, Petr Jelinek
 wrote:
> The patch will cascade truncation on downstream if cascade was specified
> on the upstream, that can potentially be dangerous and we either should
> not do it and only truncate the tables which were truncated upstream
> (but without restricting because of FKs), leaving the data inconsistent
> on downstream (like we do already with DELETE or UPDATE). Or maybe make
> it into either subscription or publication option so that user can chose
> the behaviour here as I am sure some people will want it to cascade (but
> the default should still IMHO be to not cascade as that's safer).

Maybe I'm not understanding what is being proposed here, but it sounds
like you're saying that if somebody removes a bunch of data on the
logical master, replication will remove only some of it from the
servers to which the change is replicated.  That seems crazy.  Then
replication can't be counted on to produce a replica.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-25 Thread Tsunakawa, Takayuki
From: Robert Haas [mailto:robertmh...@gmail.com]> On Thu, Jan 25, 2018 at 7:08 
PM, Tsunakawa, Takayuki
>  wrote:
> > No, I'm not saying we should make the persistent memory mode the default.
> I'm simply asking whether it's time to make open_datasync the default
> setting.  We can write a notice in the release note for users who still
> use ext3 etc. on old systems.  If there's no objection, I'll submit a patch
> for the next CF.
> 
> Well, like I said, I think that will degrade performance for users of SSDs
> or spinning disks.


As I showed previously, regular file writes on PCIe flash, *not writes using 
PMDK on persistent memory*, was 20% faster with open_datasync than with 
fdatasync.

In addition, regular file writes on HDD with ext4 was also 10% faster:

--
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync  3408.905 ops/sec 293 usecs/op
fdatasync  3111.621 ops/sec 321 usecs/op
fsync  3609.940 ops/sec 277 usecs/op
fsync_writethrough  n/a
open_sync  3356.362 ops/sec 298 usecs/op

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync  1892.157 ops/sec 528 usecs/op
fdatasync  3284.278 ops/sec 304 usecs/op
fsync  3066.655 ops/sec 326 usecs/op
fsync_writethrough  n/a
open_sync  1853.415 ops/sec 540 usecs/op
--


And you said open_datasync was significantly faster than fdatasync.  Could you 
show your results?  What device and filesystem did you use?

Regards
Takayuki Tsunakawa




Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-25 Thread Michael Paquier
On Thu, Jan 25, 2018 at 09:30:45AM -0500, Robert Haas wrote:
> On Wed, Jan 24, 2018 at 10:31 PM, Tsunakawa, Takayuki
>  wrote:
>>> This is just a guess, of course.  You didn't mention what the underlying
>>> storage for your test was?
>>
>> Uh, your guess was correct.  My file system was ext3, where fsync() writes 
>> all dirty buffers in page cache.
> 
> Oh, ext3 is terrible.  I don't think you can do any meaningful
> benchmark results on ext3.  Use ext4 or, if you prefer, xfs.

Or to put it short, the lack of granular syncs in ext3 kills
performance for some workloads. Tomas Vondra's presentation on such
matters are a really cool read by the way: 
https://www.slideshare.net/fuzzycz/postgresql-on-ext4-xfs-btrfs-and-zfs
(I would have loved seeing this presentation in live).
--
Michael


signature.asc
Description: PGP signature


Re: Temporary tables prevent autovacuum, leading to XID wraparound

2018-01-25 Thread Robert Haas
On Thu, Jan 25, 2018 at 1:14 AM, Tsunakawa, Takayuki
 wrote:
> * I think temporary tables should not require vacuuming for XID wraparound.  
> Furtherover, should updates/deletes to temporary tables  be in-place instead 
> of creating garbage, so that any form of vacuum is unnecessary?  Other 
> sessions do not need to read temporary tables.

Temporary tables contain XIDs, so they need to be vacuumed for XID
wraparound.  Otherwise, queries against those tables by the session
that created them could yield wrong answers.  However, autovacuum
can't perform that vacuuming; it would have to be done by the session.
I think we should consider having backends try to remove their
temporary schema on startup; then, if a temp table in a backend is old
enough that it's due for vacuum for wraparound, have autovacuum kill
the connection.  The former is necessary to prevent sessions from
being killed on account of temp tables they "inherited" from a backend
that didn't exit cleanly.

The in-place update idea won't work for a couple of reasons.  First, a
command shouldn't see the results of modifications made earlier in the
same command.  Second, using cursors, it's possible to have more than
one distinct snapshot open against a temporary table at the same time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Documentation of pgcrypto AES key sizes

2018-01-25 Thread Michael Paquier
On Fri, Jan 26, 2018 at 12:33:41PM +1300, Thomas Munro wrote:
> I noticed that the documentation for encrypt()/decrypt() says "aes —
> AES (Rijndael-128)", but in fact 192 and 256 bit keys are also
> supported, whether you build --with-openssl or --without-openssl.
> Should that say "AES (Rijndael-128, -192 or -256)" instead?

Indeed. Instead of using the keysize as a prefix, I would personally
find less confusing if written as "AES (Rijndael with key sizes of 128,
192 or 256 bytes)" instead of the phrasing you are proposing. Well, it
is true that "Rijndael-128" and friends are wordings that can be found
here and there..
--
Michael


signature.asc
Description: PGP signature


Re: Boolean partitions syntax

2018-01-25 Thread Stephen Frost
Greetings Amit,

* Amit Langote (langote_amit...@lab.ntt.co.jp) wrote:
> On 2017/12/20 6:46, Mark Dilger wrote:
> >> On Dec 12, 2017, at 10:32 PM, Amit Langote  
> >> wrote:
> >> Added to CF: https://commitfest.postgresql.org/16/1410/
> > 
> > This compiles and passes the regression tests for me.
> 
> Thanks for the review.

Still compiles and passes regression tests, which is good.

> > I extended your test a bit to check whether partitions over booleans are 
> > useful.
> > Note specifically the 'explain' output, which does not seem to restrict the 
> > scan
> > to just the relevant partitions.  You could easily argue that this is 
> > beyond the scope
> > of your patch (and therefore not your problem), but I doubt it makes much 
> > sense
> > to have boolean partitions without planner support for skipping partitions 
> > like is
> > done for tables partitioned over other data types.
> 
> Yeah.  Actually, I'm aware that the planner doesn't work this.  While
> constraint exclusion (planner's current method of skipping partitions)
> does not work with IS TRUE/FALSE/UNKNOWN clauses, the new partition
> pruning patch [1] addresses that.  In fact, I started this thread prompted
> by some discussion about Boolean partitions on that thread [2].
> 
> That said, someone might argue that we should also fix constraint
> exclusion (the current method of partition pruning) so that partition
> skipping works correctly for Boolean partitions.

For my 2c, at least, I don't think we need to fix constraint exclusion
to work for this case and hopefully we'll get the partition pruning
patch in but I'm not sure that we really need to wait for that either.
Worst case, we can simply document that the planner won't actually
exclude boolean-based partitions in this release and then fix it in the
future.

Looking over this patch, it seems to be in pretty good shape to me
except that I'm not sure why you went with the approach of naming the
function 'NoCast'.  There's a number of other functions just above
makeBoolAConst() that don't include a TypeCast and it seems like having
makeBoolConst() and makeBoolConstCast() would be more in-line with the
existing code (see makeStringConst() and makeStringConstCast() for
example, but also makeIntConst(), makeFloatConst(), et al).  That would
require updating the existing callers that really want a TypeCast result
even though they're calling makeBoolAConst(), but that seems like a good
improvement to be making.

I could see an argument that we should have two patches (one to rename
the existing function, another to add support for boolean) but that's
really up to whatever committer picks this up.  For my 2c, I don't think
it's really necessary to split it into two patches.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: list partition constraint shape

2018-01-25 Thread Amit Langote
Fujita-san,

Thanks for the review.

On 2018/01/25 21:17, Etsuro Fujita wrote:
> Thanks for the updated patch!  Some minor comments:
> 
> +   /*
> +    * Construct an ArrayExpr for the non-null partition
> +    * values
> +    */
> +   arrexpr = makeNode(ArrayExpr);
> +   arrexpr->array_typeid =
> +   !type_is_array(key->parttypid[0])
> +   ? get_array_type(key->parttypid[0])
> +   : key->parttypid[0];
> 
> We test the type_is_array() above in this bit, so I don't think we need to
> test that again here.

Ah, you're right.  Fixed.

> +   arrexpr->array_collid = key->parttypcoll[0];
> +   arrexpr->element_typeid = key->parttypid[0];
> 
> We can assume that keynum=0 here, so it would be okay to specify zero as
> the offset.  But ISTM that that makes code a bit less readable, so I'd
> vote for using keynum as the offset and maybe adding an assertion that
> keynum should be zero, somewhere in the PARTITION_STRATEGY_LIST block.

Agreed, done.

> 
> * Both comments in the following in get_qual_for_list needs to be updated,
> because the expression made there isn't necessarily = ANY anymore.
> 
>     if (elems)
>     {
>     /* Generate the main expression, i.e., keyCol = ANY (arr) */
>     opexpr = make_partition_op_expr(key, 0, BTEqualStrategyNumber,
>     keyCol, (Expr *) elems);
>     }
>     else
>     {
>     /* If there are no partition values, we don't need an = ANY expr */
>     opexpr = NULL;
>     }

Fixed those.

Attached updated patch.  Thanks again.

Regards,
Amit
From 358b7c0fbbc646939b43b94405aeca96e56ccb9b Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 8 Dec 2017 19:00:46 +0900
Subject: [PATCH v3] Change how list partition constraint is emitted

In its current form (key = ANY ()), a list partition
constraint expression is not always accepted by the backend's parser.
---
 src/backend/catalog/partition.c   | 102 ++
 src/test/regress/expected/create_table.out|  18 -
 src/test/regress/expected/foreign_data.out|   6 +-
 src/test/regress/expected/partition_prune.out |   8 +-
 src/test/regress/sql/create_table.sql |   6 ++
 5 files changed, 97 insertions(+), 43 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 8adc4ee977..c82a52fc1b 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1625,18 +1625,67 @@ make_partition_op_expr(PartitionKey key, int keynum,
{
case PARTITION_STRATEGY_LIST:
{
-   ScalarArrayOpExpr *saopexpr;
-
-   /* Build leftop = ANY (rightop) */
-   saopexpr = makeNode(ScalarArrayOpExpr);
-   saopexpr->opno = operoid;
-   saopexpr->opfuncid = get_opcode(operoid);
-   saopexpr->useOr = true;
-   saopexpr->inputcollid = 
key->partcollation[keynum];
-   saopexpr->args = list_make2(arg1, arg2);
-   saopexpr->location = -1;
-
-   result = (Expr *) saopexpr;
+   ListCell   *lc;
+   List   *elems = (List *) arg2,
+  *elemops = NIL;
+
+   Assert(keynum == 0);
+   if (type_is_array(key->parttypid[keynum]))
+   {
+   foreach (lc, elems)
+   {
+   Expr   *elem = lfirst(lc),
+  *elemop;
+
+   elemop = make_opclause(operoid,
+   
   BOOLOID,
+   
   false,
+   
   arg1, elem,
+   
   InvalidOid,
+   
   key->partcollation[keynum]);
+   elemops = lappend(elemops, 
elemop);
+   }
+
+   result = list_length(elemops) > 1
+   ? 
makeBoolExpr(OR_EXPR, elemops, -1)
+  

Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-25 Thread Robert Haas
On Thu, Jan 25, 2018 at 7:08 PM, Tsunakawa, Takayuki
 wrote:
> No, I'm not saying we should make the persistent memory mode the default.  
> I'm simply asking whether it's time to make open_datasync the default 
> setting.  We can write a notice in the release note for users who still use 
> ext3 etc. on old systems.  If there's no objection, I'll submit a patch for 
> the next CF.

Well, like I said, I think that will degrade performance for users of
SSDs or spinning disks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Removing WITH clause support in CREATE FUNCTION, for isCachable and isStrict

2018-01-25 Thread Michael Paquier
On Thu, Jan 25, 2018 at 06:30:04PM -0500, Tom Lane wrote:
> I poked into the git log and confirmed Michael's statement that
> CREATE FUNCTION ... WITH has been documented as deprecated since
> 7.3 (commit 94bdc4855 to be exact).

Thanks for the confirmation.

> I think the original intention was that we'd keep it for PG-specific
> function attributes (ie those not found in the SQL spec), so as to
> avoid creating keywords for every attribute we thought of.  That
> intention has been roundly ignored since then, though, since COST
> and ROWS and LEAKPROOF and PARALLEL (UN)SAFE have all been implemented
> as separate keywords not WITH options.  I'm dubious that that was a good
> plan, but it's probably too late to change direction now.

:(

> In short, I'm on board with removing the WITH clause.  I've not
> reviewed the patch in detail, but will do so and push it if there's
> not objections pretty soon.

Glad to hear that, thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Fix a typo in autoprewarm.c

2018-01-25 Thread Bruce Momjian
On Fri, Dec 22, 2017 at 02:30:56PM +0900, Masahiko Sawada wrote:
> Hi,
> 
> Attached a patch for fixing a typo in autoprewarm.c. I'm sorry if I'm 
> mistaken.
> 
> s/withs/with/

Patch applied to head, thanks.

---


> 
> Regards,
> 
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center

> diff --git a/contrib/pg_prewarm/autoprewarm.c 
> b/contrib/pg_prewarm/autoprewarm.c
> index 006c315..abf8f03 100644
> --- a/contrib/pg_prewarm/autoprewarm.c
> +++ b/contrib/pg_prewarm/autoprewarm.c
> @@ -368,7 +368,7 @@ apw_load_buffers(void)
>   if (current_db != blkinfo[i].database)
>   {
>   /*
> -  * Combine BlockRecordInfos for global objects 
> withs those of
> +  * Combine BlockRecordInfos for global objects 
> with those of
>* the database.
>*/
>   if (current_db != InvalidOid)


-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: [HACKERS] GnuTLS support

2018-01-25 Thread Michael Paquier
On Fri, Jan 26, 2018 at 12:27:16AM +0100, Daniel Gustafsson wrote:
>> On 25 Jan 2018, at 15:07, Peter Eisentraut 
>>  wrote:
>> 
>> On 1/19/18 13:43, Peter Eisentraut wrote:
>>> Comparing the existing {be,fe}-secure-openssl.c with the proposed
>>> {be,fe}-secure-gnutls.c, and with half an eye on the previously proposed
>>> Apple Secure Transport implementation, I have identified a few more
>>> areas of refactoring that should be done in order to avoid excessive
>>> copy-and-pasting in the new implementations:
>> 
>> And here is another place that needs cleaning up, where the OpenSSL API
>> was used directly.
> 
> +1 on these cleanups.

Peter, could you change ssl_version() and ssl_cipher() in sslinfo at the
same time please? I think that those should use the generic backend-side
APIs as well. sslinfo depends heavily on OpenSSL, OK, but if possible
getting this code more generic will help users of sslinfo to get
something partially working with other SSL implementations natively.

> Regarding this hunk:
> 
>  extern int   be_tls_get_cipher_bits(Port *port);
>  extern bool be_tls_get_compression(Port *port);
> -extern void be_tls_get_version(Port *port, char *ptr, size_t len);
> -extern void be_tls_get_cipher(Port *port, char *ptr, size_t len);
> +extern const char *be_tls_get_version(Port *port);
> +extern const char *be_tls_get_cipher(Port *port);
>  extern void be_tls_get_peerdn_name(Port *port, char *ptr, size_t len);
> 
> While only tangentially related to the issue this patch solves, converting
> be_tls_get_peerdn_name() to return const char * seems reasonable too to keep
> the API consistent.

Why? This is not used for error message generation yet. We could always
change the API as needed later on.
--
Michael


signature.asc
Description: PGP signature


\describe*

2018-01-25 Thread Corey Huinker
Some of the discussions about making psql more user friendly (more tab
completions help, exit, etc) got me thinking about other ways that psql
could be more friendly, and the one that comes to mind is our terse but
cryptic \d* commands.

I think it would be helpful and instructive to have corresponding long-form
describe commands.

Take describing schemas. Is \dn intuitive? Not really. In hindsight, you
may think "yeah, a schema is a namespace", but you never guessed 'n' on the
first try, or the second.

Looking over exec_command_d() a bit, I think it's a bit of a stretch do
have each command handle a long form like this:

\describe table my_table
or
\describe table verbose my_table

because then each \d-variant has to account for objects named "table" and
"verbose" and that's a path to unhappiness.

But if we dash-separated them, then all of the strcmps would be in the 'e'
subsection, and each one would just have to know it's long to short
translation, and call exec_command_d with the corresponding short command

describe => d
describe-verbose => d+
describe-aggregates-verbose => da+
describe-roles => du

We could even presume the verbose flag in all cases (after all, the user
was being verbose...), which would also cut down on tab-completion results,
and we could check for interactive mode and display a message like

\describe-schemas (short: \dn+)

so that the person has the opportunity to learn the corresponding short
command.

In additional to aiding tab completion discovery of the commands (i.e.
typing "\desc" and then hitting tab, it would also make scripts a little
more self-documenting.

Thoughts?


Re: PATCH: psql tab completion for SELECT

2018-01-25 Thread Edmund Horner
On 26 January 2018 at 13:44, Vik Fearing  wrote:
> On 01/26/2018 01:28 AM, Edmund Horner wrote:
>> The patch mentioned attempts to put savepoints around the tab
>> completion query where appropriate.
>
> I am -1 on this idea.

May I ask why?  It doesn't stop psql working against older versions,
as it checks that the server supports savepoints.



Re: [HACKERS] GnuTLS support

2018-01-25 Thread Michael Paquier
On Fri, Jan 19, 2018 at 01:55:30PM -0500, Robert Haas wrote:
> On Wed, Jan 17, 2018 at 10:02 PM, Tom Lane  wrote:
>> Also, this isn't really a good argument against using uniform names
>> for parameters that every implementation is certain to have, like
>> ssl_key_file.
> 
> Even then, it's not that hard to imagine minor variations between what
> different implementations will accept.  The most obvious difference is
> probably that they might expect different file formats, but it's also
> possible that a Windows-specific implementation might allow omitting
> the file extension while some other implementation does not, for
> example.  I agree that it would probably be fairly low-risk to use one
> parameter for the key file for every implementation, but I suggest
> that it would be cleaner and less prone to confusion if we enforce a
> full separation of parameters.  That also spares us having to make a
> judgement call about which parameters have semantics close enough that
> we need not separate them.

Having to debate about the most correct default values for one common
parameter depending on N SSL implemententations will likely prove at the
end that the most correct answer is just to split all parameters from
the start. Some implementations are platform-specific, the format file
is one thing that matters. There are also other things like ssl_ciphers
which could depend on what kind of cipher types an SSL implementation is
able to support... Another thing is that we cannot be 100% sure that one
parameter can be set to say GUC_SIGHUP for an SSL implementation and
needs to have  other modes with another implementations. Of course this
can be solved if some ifdefs, but splitting brings clarity.

At the end, I agree with the position of Robert to split everything and
rename the existing ssl_* parameters to openssl_* if v11 gains a new SSL
implementation.
--
Michael


signature.asc
Description: PGP signature


Re: AS OF queries

2018-01-25 Thread Bruce Momjian
On Sat, Dec 23, 2017 at 11:53:19PM +0300, konstantin knizhnik wrote:
> 
> On Dec 23, 2017, at 2:08 AM, Greg Stark wrote:
> 
> > On 20 December 2017 at 12:45, Konstantin Knizhnik
> >  wrote:
> > 
> >> It seems to me that it will be not so difficult to implement them in
> >> Postgres - we already have versions of tuples.
> >> Looks like we only need to do three things:
> >> 1. Disable autovacuum (autovacuum = off)
> > 
> > "The Wheel of Time turns, and Ages come and pass, leaving memories
> > that become legend. Legend fades to myth, and even myth is long
> > forgotten when the Age that gave it birth comes again"
> > 
> > I think you'll find it a lot harder to get this to work than just
> > disabling autovacuum. Notably HOT updates can get cleaned up (and even
> > non-HOT updates can now leave tombstone dead line pointers iirc) even
> > if vacuum hasn't run.
> > 
> 
> Yeh, I suspected that just disabling autovacuum was not enough.
> I heard (but do no know too much) about microvacuum and hot updates.
> This is why I was a little bit surprised when me test didn't show lost of 
> updated versions.
> May be it is because of vacuum_defer_cleanup_age.

Well vacuum and single-page pruning do 3 things:

1.  remove expired updated rows
2.  remove deleted row
3.  remove rows from aborted transactions

While time travel doesn't want #1 and #2, it probably wants #3.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: PATCH: psql tab completion for SELECT

2018-01-25 Thread Vik Fearing
On 01/26/2018 01:28 AM, Edmund Horner wrote:
> On 19 January 2018 at 05:37, Vik Fearing  wrote:
>> On 01/18/2018 01:07 AM, Tom Lane wrote:
>>> Edmund Horner  writes:
 On 15 January 2018 at 15:45, Andres Freund  wrote:
> All worries like this are supposed to check the server version.
>>>
 In psql there are around 200 such tab completion queries, none of
 which checks the server version.  Many would cause the user's
 transaction to abort if invoked on an older server.  Identifying the
 appropriate server versions for each one would be quite a bit of work.
>>>
 Is there a better way to make this more robust?
>>>
>>> Maybe it'd be worth the effort to wrap tab completion queries in
>>> SAVEPOINT/RELEASE SAVEPOINT if we're inside a user transaction
>>> (which we could detect from libpq's state, I believe).
>>>
>>> That seems like an independent patch, but it'd be a prerequisite
>>> if you want to issue tab completion queries with version dependencies.
>>>
>>> A bigger point here is: do you really want SELECT tab completion
>>> to work only against the latest and greatest server version?
>>> That would become an argument against committing the feature at all;
>>> maybe not enough to tip the scales against it, but still a demerit.
>>
>> I don't really want such a patch.  I use new psql on old servers all the
>> time.
> 
> I'm not sure where we got with this.  I really should have put this
> patch in a separate thread, since it's an independent feature.

Yes, it should have been a separate thread.

> The patch mentioned attempts to put savepoints around the tab
> completion query where appropriate.

I am -1 on this idea.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: PATCH: psql tab completion for SELECT

2018-01-25 Thread Edmund Horner
On 19 January 2018 at 05:37, Vik Fearing  wrote:
> On 01/18/2018 01:07 AM, Tom Lane wrote:
>> Edmund Horner  writes:
>>> On 15 January 2018 at 15:45, Andres Freund  wrote:
 All worries like this are supposed to check the server version.
>>
>>> In psql there are around 200 such tab completion queries, none of
>>> which checks the server version.  Many would cause the user's
>>> transaction to abort if invoked on an older server.  Identifying the
>>> appropriate server versions for each one would be quite a bit of work.
>>
>>> Is there a better way to make this more robust?
>>
>> Maybe it'd be worth the effort to wrap tab completion queries in
>> SAVEPOINT/RELEASE SAVEPOINT if we're inside a user transaction
>> (which we could detect from libpq's state, I believe).
>>
>> That seems like an independent patch, but it'd be a prerequisite
>> if you want to issue tab completion queries with version dependencies.
>>
>> A bigger point here is: do you really want SELECT tab completion
>> to work only against the latest and greatest server version?
>> That would become an argument against committing the feature at all;
>> maybe not enough to tip the scales against it, but still a demerit.
>
> I don't really want such a patch.  I use new psql on old servers all the
> time.

I'm not sure where we got with this.  I really should have put this
patch in a separate thread, since it's an independent feature.

The patch mentioned attempts to put savepoints around the tab
completion query where appropriate.



Re: ALTER TABLE ADD COLUMN fast default

2018-01-25 Thread Andrew Dunstan
On Thu, Jan 25, 2018 at 6:10 PM, Thomas Munro
 wrote:
> On Wed, Jan 17, 2018 at 2:21 AM, Andrew Dunstan
>  wrote:
>> Yeah, got caught by the bki/pg_attribute changes on Friday. here's an
>> updated version. Thanks for looking.
>
> A boring semi-automated update:  this no long compiles, because
> 8561e4840c8 added a new call to heap_attisnull().  Pretty sure it just
> needs NULL in the third argument.
>

Yeah, thanks. revised patch attached

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fast_default-v7.patch
Description: Binary data


Re: PATCH: psql tab completion for SELECT

2018-01-25 Thread Edmund Horner
On 23 January 2018 at 21:47, Vik Fearing  wrote:
> Don't forget to include the list :-)
> I'm quoting the entirety of the message---which I would normally trim---for
> the archives.

Thanks for spotting that.  Sorry list!

> If this were my patch, I'd have one query for 8.0, and then queries for all
> currently supported versions if needed.  If 9.2 only gets 8.0-level
> features, that's fine by me.  That limits us to a maximum of six queries.
> Just because we keep support for dead versions doesn't mean we have to
> retroactively develop for them.  Au contraire.
>> I'll make another patch version, with these few differences.

I managed to do testing against servers on all the versions >= 8.0 and
I discovered that prior to version 8.1 they don't support ALL/ANY on
oidvectors; so I added another query form.

But I don't like having so many different queries, so I am in favour
of trimming it down.  Can I leave that up to the committer to remove
some cases or do we need another patch?

> Great!

Here's another.


psql-select-tab-completion-v3.patch
Description: Binary data


RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-25 Thread Tsunakawa, Takayuki
From: Robert Haas [mailto:robertmh...@gmail.com]
> On Wed, Jan 24, 2018 at 10:31 PM, Tsunakawa, Takayuki
>  wrote:
> > As you said, open_datasync was 20% faster than fdatasync on RHEL7.2, on
> a LVM volume with ext4 (mounted with options noatime, nobarrier) on a PCIe
> flash memory.
> 
> So does that mean it was faster than your PMDK implementation?

The PMDK patch is not mine, but is from people in NTT Lab.  I'm very curious 
about the comparison of open_datasync and PMDK, too.


> > What do you think about changing the default value of wal_sync_method
> on Linux in PG 11?  I can understand the concern that users might hit
> performance degredation if they are using PostgreSQL on older systems.  But
> it's also mottainai that many users don't notice the benefits of
> wal_sync_method = open_datasync on new systems.
> 
> Well, some day persistent memory may be a common enough storage technology
> that such a change makes sense, but these days most people have either SSD
> or spinning disks, where the change would probably be a net negative.  It
> seems more like something we might think about changing in PG 20 or PG 30.

No, I'm not saying we should make the persistent memory mode the default.  I'm 
simply asking whether it's time to make open_datasync the default setting.  We 
can write a notice in the release note for users who still use ext3 etc. on old 
systems.  If there's no objection, I'll submit a patch for the next CF.

Regards
Takayuki Tsunakawa





Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-25 Thread Stephen Frost
Alvaro,

* Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote:
> Stephen Frost wrote:
> > Alvaro, Tom,
> > 
> > * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote:
> 
> > > Crazy idea: maybe a large fraction of that test could be replaced with
> > > comparisons of the "pg_restore -l" output file rather than pg_dump's
> > > text output (i.e. the TOC entry for each object, rather than the
> > > object's textual representation.)  Sounds easier in code than current
> > > implementation.  Separately, verify that textual representation for each
> > > TOC entry type is what we expect.
> > 
> > I'm not sure how that's different..?  We do check that the textual
> > representation is what we expect, both directly from pg_dump and from
> > pg_restore output, and using the exact same code to check both (which I
> > generally think is a good thing since we want the results from both to
> > more-or-less match up).  What you're proposing here sounds like we're
> > throwing that away in favor of keeping all the same code to test the
> > textual representation but then only checking for listed contents from
> > pg_restore instead of checking that the textual representation is
> > correct.  That doesn't actually reduce the amount of code though..
> 
> Well, the current implementation compares a dozen of pg_dump output text
> files, three hundred lines apiece, against a thousand of regexes (give
> or take).  Whenever there is a mismatch, what you get is "this regexp
> failed to match " (or sometimes "matched when it
> should have not"), so looking for the mismatch is quite annoying.

Yeah, the big output isn't fun, I agree with that.

> My proposal is that instead of looking at three hundred lines, you'd
> look for 50 lines of `pg_restore -l` output -- is element XYZ in there
> or not.  Quite a bit simpler for the guy adding a new test.  This tests
> combinations of pg_dump switches: are we dumping the right set of
> objects.

That might be possible to do, though I'm not sure that it'd really be
only 50 lines.

> *Separately* test that each individual TOC entry type ("table data",
> "index", "tablespace") is dumped as this or that SQL command, where you
> create a separate dump file for each object type.  So you match a single
> TOC entry to a dozen lines of SQL, half of them comments -- pretty easy
> to see where a mismatch is.

Yikes.  If you're thinking that we would call pg_restore independently
for each object, I'd be worried about the runtime increasing quite a
bit.  There's also a few cases where we check dependencies between
objects that we'd need to be mindful of, though those we might be able
to take care of, if we can keep the runtime down.  Certainly, part of
the way the existing code works was specifically to try and minimize the
runtime.  Perhaps the approach taken of minimizing the invocations and
building a bunch of regexps to run against the resulting output is more
expensive than running pg_restore a bunch of times, but I'd want to see
that to be sure as some really simple timings locally on a 6.5k -Fc dump
file seems to take 30-40ms just to produce the TOC.

This also doesn't really help with the big perl hash, but these two
ideas don't seem to conflict, so perhaps we could possibly still tackle
the big perl hash with the approach I described up-thread and also find
a way to implement this, if it doesn't cause the regression tests to be
too much slower.

Thanks!

Stephen


signature.asc
Description: PGP signature


Should we introduce a "really strict" volatility about return values?

2018-01-25 Thread Andres Freund
Hi,

Currently a good bit of time evaluation more complex expressions is
spent checking whether a strict [transition] function should be called
or not. There's a good number of cases where we could optimize that away
based on the underlying data - if e.g. a slot column refers to a NOT
NULL column (cf [1]), we do not need to do the NOT NULL checks.

A significant limit of that is that for even slightly morecomplex
expressions, say a + b + c that doesn't work for all the operator
invocations, because even though functions like int8pl are strict, we do
*not* guarantee that the return value is also strict.

Thus I wonder if we shouldn't introduce a 'really strict' volatility
level that signals, in addition to normal strict guarantees, that no
other case returns NULL. In a lot of cases that should allow us to
completely eliminate strictness checks from execution.

I don't plan to work on this anytime soon, but I though it's interesting
enough to put out there and see what others think.

Greetings,

Andres Freund

[1] 
http://archives.postgresql.org/message-id/20171206093717.vqdxe5icqttpxs3p%40alap3.anarazel.de



Documentation of pgcrypto AES key sizes

2018-01-25 Thread Thomas Munro
Hi,

I noticed that the documentation for encrypt()/decrypt() says "aes —
AES (Rijndael-128)", but in fact 192 and 256 bit keys are also
supported, whether you build --with-openssl or --without-openssl.
Should that say "AES (Rijndael-128, -192 or -256)" instead?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Removing WITH clause support in CREATE FUNCTION, for isCachable and isStrict

2018-01-25 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 15 Jan 2018, at 03:27, Michael Paquier  wrote:
>> As noticed by Daniel here:
>> https://www.postgresql.org/message-id/d5f34c9d-3ab7-4419-af2e-12f67581d...@yesql.se

> In that thread I proposed a patch to fix this, but I certainly don’t object to
> just removing it to make both syntax and code clearer. +1

I poked into the git log and confirmed Michael's statement that
CREATE FUNCTION ... WITH has been documented as deprecated since
7.3 (commit 94bdc4855 to be exact).

I think the original intention was that we'd keep it for PG-specific
function attributes (ie those not found in the SQL spec), so as to
avoid creating keywords for every attribute we thought of.  That
intention has been roundly ignored since then, though, since COST
and ROWS and LEAKPROOF and PARALLEL (UN)SAFE have all been implemented
as separate keywords not WITH options.  I'm dubious that that was a good
plan, but it's probably too late to change direction now.

The only other argument I can see for keeping it is that pre-7.3
dump files could contain CREATE FUNCTION commands with the old
spelling.  But, even assuming somebody is still running <= 7.2
and wants to migrate to >= 11 in one jump, they could use some
intermediate version of pg_dump to produce a dump with the modern
spelling, or just whip out their trusty text editor.  Maintaining
backwards compatibility is good, but 15 years seems like enough.

In short, I'm on board with removing the WITH clause.  I've not
reviewed the patch in detail, but will do so and push it if there's
not objections pretty soon.

regards, tom lane



Re: [HACKERS] GnuTLS support

2018-01-25 Thread Daniel Gustafsson
> On 25 Jan 2018, at 15:07, Peter Eisentraut  
> wrote:
> 
> On 1/19/18 13:43, Peter Eisentraut wrote:
>> Comparing the existing {be,fe}-secure-openssl.c with the proposed
>> {be,fe}-secure-gnutls.c, and with half an eye on the previously proposed
>> Apple Secure Transport implementation, I have identified a few more
>> areas of refactoring that should be done in order to avoid excessive
>> copy-and-pasting in the new implementations:
> 
> And here is another place that needs cleaning up, where the OpenSSL API
> was used directly.

+1 on these cleanups.

Regarding this hunk:

 extern int be_tls_get_cipher_bits(Port *port);
 extern bool be_tls_get_compression(Port *port);
-extern void be_tls_get_version(Port *port, char *ptr, size_t len);
-extern void be_tls_get_cipher(Port *port, char *ptr, size_t len);
+extern const char *be_tls_get_version(Port *port);
+extern const char *be_tls_get_cipher(Port *port);
 extern void be_tls_get_peerdn_name(Port *port, char *ptr, size_t len);

While only tangentially related to the issue this patch solves, converting
be_tls_get_peerdn_name() to return const char * seems reasonable too to keep
the API consistent.

cheers ./daniel


Re: [HACKERS] Optional message to user when terminating/cancelling backend

2018-01-25 Thread Daniel Gustafsson
> On 24 Jan 2018, at 16:45, Alvaro Herrera  wrote:
> 
> A quick suggestion from a passer-by -- would you put the new code in
> src/backend/storage/ipc/backend_signal.c instead?  Sounds like a better
> place than utils/misc/; and "signal" is more general in nature than just
> "cancel".  A bunch of stuff from utils/adt/misc.c (???) can migrate to
> the new file -- everything from line 201 to 362 at least, that is:
> 
> pg_signal_backend
> pg_cancel_backend
> pg_terminate_backend
> pg_reload_conf
> pg_rotate_logfile

+1, this makes a lot of sense.  When writing this I didn’t find a good fit
anywhere so I modelled it after utils/misc/backend_random.c, not because it was
a good fit but it was the least bad one I could come up with.  This seems a lot
cleaner.

> Maybe have two patches, 0001 creates the files moving the contents over,
> then 0002 adds your new stuff on top.

The two attached patches implements this.

> /me wonders if there's anything that would suggest to make this
> extensive to processes other than backends ...

Perhaps, do you have an API in mind?

cheers ./daniel



0001-Refactor-backend-signalling-code-v6.patch
Description: Binary data


0002-Support-optional-message-in-backend-cancel-terminate-v6.patch
Description: Binary data


Re: PATCH: pgbench - option to build using ppoll() for larger connection counts

2018-01-25 Thread Fabien COELHO


Hello Doug,


This time with the revised patch file:  pgbench11-ppoll-v8.patch


Patch applies cleanly. Compiles cleanly and runs fine in both ppoll & 
select cases.


I'm okay with having a preferred ppoll implementation because of its improved
capability.

A few minor additional comments/suggestions:

Cpp has an #elif that could be used to manage the ppoll/select alternative.
It is already used elsewhere in the file. Or not.

I must admit that I'm not fond of the alloc_socket_set trick with MAXCLIENTS,
especially without any comment. I'd suggest to just have two distinct functions
in their corresponding sections.

I would add a comment that free_socket_set code is common to both 
versions, and maybe consider moving it afterwards. Or maybe just duplicate 
if in each section for homogeneity.


It looks like error_on_socket and ignore_socket should return a boolean instead
of an int. Also, maybe simplify the implementation of the later by avoiding
the ?: expression.

ISTM that the error_on_socket function in the ppoll case deserves some 
comments, especially on the condition.




[...] Replaced USE_PPOLL with HAVE_PPOLL as having both seems redundant.


I'm okay with that. I'm wondering whether there should be a way to force 
using one or the other when both are available. Not sure.


--
Fabien.



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-25 Thread Tom Lane
Alvaro Herrera  writes:
> Well, the current implementation compares a dozen of pg_dump output text
> files, three hundred lines apiece, against a thousand of regexes (give
> or take).  Whenever there is a mismatch, what you get is "this regexp
> failed to match " (or sometimes "matched when it
> should have not"), so looking for the mismatch is quite annoying.

Yeah, I've been getting my nose rubbed in that over the past couple
days of pg_dump hacking: when you get a failure, the output is really
unfriendly, even by the seriously low standards of the rest of our
TAP tests.

I'm not sure if Alvaro's idea is the best way to improve that, but
it's something to think about.

The thing that was annoying me though is that it seems like every
test case interacts with every other test case, or at least way more
of them than one could wish, due to the construction of that big
hash constant.  That's what I find unmaintainable.

regards, tom lane



Re: MCV lists for highly skewed distributions

2018-01-25 Thread Tom Lane
Dean Rasheed  writes:
> It occurs to me that maybe a better test to exclude a value from the
> MCV list would be to demand that its relative standard error not be
> too high. Such a test, in addition to the existing tests, might be
> sufficient to solve the opposite problem of too many values in the MCV
> list, because the real problem there is including a value after having
> seen relatively few occurrences of it in the sample, and thus having a
> wildly inaccurate estimate for it. Setting a bound on the relative
> standard error would mean that we could have a reasonable degree of
> confidence in estimates produced from the sample.

This patch is marked Ready for Committer, but that seems wildly optimistic
based on the state of the discussion.  It doesn't look to me like we
even have consensus on an algorithm, let alone code for it.  Certainly,
whatever we do needs to address the too-many-MCVs issue from [1] as
well as Jeff's too-few-MCVs case.

Even if we had a plausible patch, I'm not sure how we get to the
point of having enough consensus to commit.  In the previous thread,
it seemed that some people would object to any change whatsoever :-(

In any case, since it looks like the next step is for someone to come
up with a new proposal, I'm going to set this to Waiting on Author.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/32261.1496611829%40sss.pgh.pa.us



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-25 Thread Alvaro Herrera
Stephen Frost wrote:
> Alvaro, Tom,
> 
> * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote:

> > Crazy idea: maybe a large fraction of that test could be replaced with
> > comparisons of the "pg_restore -l" output file rather than pg_dump's
> > text output (i.e. the TOC entry for each object, rather than the
> > object's textual representation.)  Sounds easier in code than current
> > implementation.  Separately, verify that textual representation for each
> > TOC entry type is what we expect.
> 
> I'm not sure how that's different..?  We do check that the textual
> representation is what we expect, both directly from pg_dump and from
> pg_restore output, and using the exact same code to check both (which I
> generally think is a good thing since we want the results from both to
> more-or-less match up).  What you're proposing here sounds like we're
> throwing that away in favor of keeping all the same code to test the
> textual representation but then only checking for listed contents from
> pg_restore instead of checking that the textual representation is
> correct.  That doesn't actually reduce the amount of code though..

Well, the current implementation compares a dozen of pg_dump output text
files, three hundred lines apiece, against a thousand of regexes (give
or take).  Whenever there is a mismatch, what you get is "this regexp
failed to match " (or sometimes "matched when it
should have not"), so looking for the mismatch is quite annoying.

My proposal is that instead of looking at three hundred lines, you'd
look for 50 lines of `pg_restore -l` output -- is element XYZ in there
or not.  Quite a bit simpler for the guy adding a new test.  This tests
combinations of pg_dump switches: are we dumping the right set of
objects.

*Separately* test that each individual TOC entry type ("table data",
"index", "tablespace") is dumped as this or that SQL command, where you
create a separate dump file for each object type.  So you match a single
TOC entry to a dozen lines of SQL, half of them comments -- pretty easy
to see where a mismatch is.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Invalid result from hash_page_items function

2018-01-25 Thread Masahiko Sawada
Hi,

While researching hash index, I found that hash_page_items could
return an invalid result as follows.

postgres(1:1056)=# create table hash_test (c int);
CREATE TABLE
postgres(1:1056)=# insert into hash_test select generate_series(1,50) % 5;
INSERT 0 50
postgres(1:1056)=# create index hash_idx on hash_test using hash (c);
CREATE INDEX
postgres(1:1056)=# create extension pageinspect ;
CREATE EXTENSION
postgres(1:1056)=# select * from hash_page_items(get_raw_page('hash_idx',
4));
itemoffset | ctid | data
++
1 | (0,49) | 3283889963
2 | (2139062143,32639) | 2139062143
3 | (2139062143,32639) | 2139062143
4 | (2139062143,32639) | 2139062143
5 | (2139062143,32639) | 2139062143
6 | (2139062143,32639) | 2139062143
7 | (2139062143,32639) | 2139062143
8 | (2139062143,32639) | 2139062143
9 | (2139062143,32639) | 2139062143
10 | (2139062143,32639) | 2139062143
(10 rows)

This appears at PostgreSQL 10 and current HEAD. The cause of this
seems that hash_page_items allocates the memory space for the page
before switching memory context. AFAICS there is no similar problem in
pageinspect contrib module. Attached patch fixes it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_hash_page_items.patch
Description: Binary data


Re: After dropping the rule - Not able to insert / server crash (one time ONLY)

2018-01-25 Thread Bruce Momjian

Can someone review this thread and determine if this patch is needed? 
Thanks.

---

On Fri, Dec 22, 2017 at 04:58:47PM +0530, Dilip Kumar wrote:
> On Mon, Dec 11, 2017 at 4:33 PM, Dilip Kumar  wrote:
> 
> On Mon, Dec 11, 2017 at 3:54 PM, tushar 
> wrote:
> 
> Hi,
> 
> While testing something , I found that even after rule has dropped  
> not
> able to insert data  and in an another scenario , there is a Crash/
> 
> Please refer this scenario's -
> 
> 1) Rows not inserted after dropping the RULE
> 
> postgres=# create table e(n int);
> CREATE TABLE
> postgres=# create rule e1 as on insert to e do instead nothing;
> CREATE RULE
> postgres=# create function e11(n int) returns int as $$ begin insert
> into e values(10); return 1; end; $$ language 'plpgsql';
> CREATE FUNCTION
> postgres=# select e11(1);
>  e11
> -
>    1
> (1 row)
> 
> postgres=# select e11(1);
>  e11
> -
>    1
> (1 row)
> 
> postgres=# select * from e;  => Expected , due to  the RULE enforced
>  n
> ---
> (0 rows)
> 
> 
> postgres=# drop rule e1 on e;  ==>Drop the rule
> DROP RULE
> 
> postgres=# select e11(1);  =>again try to insert into table
>  e11
> -
>    1
> (1 row)
> 
> postgres=# select * from e;  =>This time , value should be inserted 
> but
> still showing 0 row.
>  n
> ---
> (0 rows)
> 
> 
> I think this is becuase we are not invalidating the plan cache if rule is
> dropped.  You can reproduce same with prepared statements.
>  
> 
> 2)Server crash (one time)
> 
> postgres=# create table en(n int);
> CREATE TABLE
> postgres=# create function en1(n int) returns int as $$ begin insert
> into en values(10); return 1; end; $$ language 'plpgsql';
> CREATE FUNCTION
> postgres=#
> postgres=# select en1(1);
>  en1
> -
>    1
> (1 row)
> 
> postgres=# select * from en;
>  n
> 
>  10
> (1 row)
> 
> postgres=# create rule en11 as on insert to en do instead nothing;
> CREATE RULE
> postgres=# select en1(1);
> ostgres=# select en1(1);
> TRAP: FailedAssertion("!(!stmt->mod_stmt)", File: "pl_exec.c", Line:
> 3721)
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> 
> 
> I have looked into this crash, seems like below assert in 
> exec_stmt_execsql
> is not correct
> 
> case SPI_OK_REWRITTEN:
> 
> Assert(!stmt->mod_stmt);
>
> In this issue first time execution of select en1(1); statement, stmt->
> mod_stmt is set to true for the insert query inside the function. Then
> before next execution we have created the rule "create rule en11 as on
> insert to en do instead nothing;".  Because of this nothing will be
> executed and output will be SPI_OK_REWRITTEN.  But we are asserting that 
> stmt->mod_stmt should be false (which were set to true in first 
> execution).
> 
> 
> IMHO if the query is rewritten, then this assert is not valid. I have attached
> a patch which removes this assert.
> 
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com

> diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
> index dd575e7..9b85df8 100644
> --- a/src/pl/plpgsql/src/pl_exec.c
> +++ b/src/pl/plpgsql/src/pl_exec.c
> @@ -3727,8 +3727,6 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
>   break;
>  
>   case SPI_OK_REWRITTEN:
> - Assert(!stmt->mod_stmt);
> -
>   /*
>* The command was rewritten into another kind of 
> command. It's
>* not clear what FOUND would mean in that case (and 
> SPI doesn't


-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: reducing isolation tests runtime

2018-01-25 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Here's a concrete proposal.  Runtime is 45.7 seconds on my laptop.  It
> > can be further reduced, but not by more than a second or two unless you
> > get in the business of modifying other tests.  (I only modified
> > deadlock-soft-2 because it saves 5 seconds).
> 
> Looks reasonable to me, but do we want to set any particular convention
> about the max number of tests to run in parallel?  If so there should
> be at least a comment saying what.

Hmm, I ran this in a limited number of connections and found that it
fails with less than 27; and there's no MAX_CONNECTIONS like there is
for pg_regress.  So I'll put this back on the drawing board until I'm
back from vacations ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2018-01-25 Thread Thomas Munro
On Fri, Jan 26, 2018 at 9:38 AM, Claudio Freire  wrote:
> I had the tests running in a loop all day long, and I cannot reproduce
> that variance.
>
> Can you share your steps to reproduce it, including configure flags?

Here are two build logs where it failed:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/332968819
https://travis-ci.org/postgresql-cfbot/postgresql/builds/332592511

Here's one where it succeeded:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/333139855

The full build script used is:

./configure --enable-debug --enable-cassert --enable-coverage
--enable-tap-tests --with-tcl --with-python --with-perl --with-ldap
--with-icu && make -j4 all contrib docs && make -Otarget -j3
check-world

This is a virtualised 4 core system.  I wonder if "make -Otarget -j3
check-world" creates enough load on it to produce some weird timing
effect that you don't see on your development system.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: reducing isolation tests runtime

2018-01-25 Thread Tom Lane
Alvaro Herrera  writes:
> Here's a concrete proposal.  Runtime is 45.7 seconds on my laptop.  It
> can be further reduced, but not by more than a second or two unless you
> get in the business of modifying other tests.  (I only modified
> deadlock-soft-2 because it saves 5 seconds).

Looks reasonable to me, but do we want to set any particular convention
about the max number of tests to run in parallel?  If so there should
be at least a comment saying what.

> Admittedly the new isolation_schedule file is a bit ugly.

Meh, seems fine.

We won't know if this really works till it hits the buildfarm,
I suppose.

regards, tom lane



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-25 Thread Stephen Frost
Alvaro, Tom,

* Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote:
> Tom Lane wrote:
> 
> > The changes in t/002_pg_dump.pl largely failed to apply, which is
> > partially due to the age of the patch but IMO speaks more to the
> > unmaintainability of that TAP test.  It still didn't run after I'd
> > manually fixed the merge failures, so I gave up in disgust and
> > did not push any of the test changes.  If someone is excited enough
> > about testing this, they can submit a followon patch for that,
> > but I judge it not really worth either the human effort or the
> > future testing cycles.
> > 
> > (Am I right in thinking that 002_pg_dump.pl is, by design, roughly
> > O(N^2) in the number of features tested?  Ick.)

It certainly tries to cover a lot of combinations, but I did try to
write it to not completely cross everything against everything but
instead to try and cover the different code paths.

> Yeah, that 002 test is pretty nasty stuff.  I think we only put up with
> it because it's the only idea we've come up with; maybe there are better
> ideas.

I've discussed the 'better idea' for it previously, which is to
essentially break out the various parts of the big perl hash into
multilpe independent files that would be a lot easier to work with and
manage, and perhaps make them not be perl hashes but something else.  In
other words, make the core perl code similar to pg_regress and the
input/output files likewise similar to how pg_regress works.  Maybe we
should rework the whole thing to work with pg_regress directly but we'd
need to teach it about regexp's, I'd think..

> Crazy idea: maybe a large fraction of that test could be replaced with
> comparisons of the "pg_restore -l" output file rather than pg_dump's
> text output (i.e. the TOC entry for each object, rather than the
> object's textual representation.)  Sounds easier in code than current
> implementation.  Separately, verify that textual representation for each
> TOC entry type is what we expect.

I'm not sure how that's different..?  We do check that the textual
representation is what we expect, both directly from pg_dump and from
pg_restore output, and using the exact same code to check both (which I
generally think is a good thing since we want the results from both to
more-or-less match up).  What you're proposing here sounds like we're
throwing that away in favor of keeping all the same code to test the
textual representation but then only checking for listed contents from
pg_restore instead of checking that the textual representation is
correct.  That doesn't actually reduce the amount of code though..

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: proposal: alternative psql commands quit and exit

2018-01-25 Thread Bruce Momjian
On Mon, Jan 15, 2018 at 11:10:44AM -0500, Robert Haas wrote:
> On Mon, Jan 15, 2018 at 10:57 AM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> I've discovered one thing about this design that is not so good, which
> >> is that if you open a single, double, or dollar quote, then the
> >> instructions that are provided under that design do not work:
> >
> > I was kind of imagining that we could make the hint text vary depending
> > on the parsing state.  Maybe that's too hard to get to --- but if the
> > prompt-creating code knows what it is, perhaps this can too.
> 
> prompt_status does seem to be available in psql's MainLoop(), so I
> think that could be done, but the problem is that I don't know exactly
> what message would be useful to print when we're in a "in quotes"
> state.  If I had a good message text for that state, I might just
> choose to use it always rather than multiplying the number of
> messages.
> 
> More broadly, I think what is needed here is less C-fu than
> English-fu.  If we come up with something good, we can make it print
> that thing.

I just read this thread and have some ideas.  First, the reason 'help'
was added so easily is because it pointed users at getting more
information, and it required to be the first command in the query
buffer.

I realize we are now considering allowing 'help', 'quit', and 'exit' to
appear alone on a line with whitespace before it, and remove the
requirement that it be the first thing in the query buffer.

I think there are a few things to consider.

First, allowing whitespace to be before the keyword --- do we really
think that typing exit will more likely be typed by a user trying
to exit than part of an SQL query?  I think requiring no space around
the keyword would reduce the number of false hints.

Second, I am thinking we can check prompt_status and report "Use \q" if
we are not in a quoted string, and suggest "Use Control-D then \q" (or
"Use Control-C then \q" on Windows) and be done with it.  By printing
the Control-D only when we are in a quoted strong, we minimize the
number of times that we are wrong due to stty changes.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-25 Thread Alvaro Herrera
Tom Lane wrote:

> The changes in t/002_pg_dump.pl largely failed to apply, which is
> partially due to the age of the patch but IMO speaks more to the
> unmaintainability of that TAP test.  It still didn't run after I'd
> manually fixed the merge failures, so I gave up in disgust and
> did not push any of the test changes.  If someone is excited enough
> about testing this, they can submit a followon patch for that,
> but I judge it not really worth either the human effort or the
> future testing cycles.
> 
> (Am I right in thinking that 002_pg_dump.pl is, by design, roughly
> O(N^2) in the number of features tested?  Ick.)

Yeah, that 002 test is pretty nasty stuff.  I think we only put up with
it because it's the only idea we've come up with; maybe there are better
ideas.

Crazy idea: maybe a large fraction of that test could be replaced with
comparisons of the "pg_restore -l" output file rather than pg_dump's
text output (i.e. the TOC entry for each object, rather than the
object's textual representation.)  Sounds easier in code than current
implementation.  Separately, verify that textual representation for each
TOC entry type is what we expect.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2018-01-25 Thread Claudio Freire
On Thu, Jan 25, 2018 at 10:56 AM, Claudio Freire  wrote:
> On Thu, Jan 25, 2018 at 4:11 AM, Thomas Munro
>  wrote:
>> On Thu, Jan 18, 2018 at 9:17 AM, Claudio Freire  
>> wrote:
>>> Huh. That was simpler than I thought.
>>>
>>> Attached rebased versions.
>>
>> Hi Claudio,
>>
>> FYI the regression test seems to have some run-to-run variation.
>> Though it usually succeeds, recently I have seen a couple of failures
>> like this:
>>
>> = Contents of ./src/test/regress/regression.diffs
>> *** 
>> /home/travis/build/postgresql-cfbot/postgresql/src/test/regress/expected/vacuum.out
>> 2018-01-24 01:41:28.200454371 +
>> --- 
>> /home/travis/build/postgresql-cfbot/postgresql/src/test/regress/results/vacuum.out
>> 2018-01-24 01:51:07.970049937 +
>> ***
>> *** 128,134 
>>   SELECT pg_relation_size('vactst', 'main');
>>pg_relation_size
>>   --
>> ! 0
>>   (1 row)
>>
>>   SELECT count(*) FROM vactst;
>> --- 128,134 
>>   SELECT pg_relation_size('vactst', 'main');
>>pg_relation_size
>>   --
>> !  8192
>>   (1 row)
>>
>>   SELECT count(*) FROM vactst;
>> ==
>>
>> --
>> Thomas Munro
>> http://www.enterprisedb.com
>
> I'll look into it

I had the tests running in a loop all day long, and I cannot reproduce
that variance.

Can you share your steps to reproduce it, including configure flags?



Re: reducing isolation tests runtime

2018-01-25 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:

> > I think we could solve this by putting in the same parallel group only
> > slow tests that mostly sleeps, ie. nothing that would monopolize CPU for
> > long enough to cause a problem.  Concretely:
> > test: timeouts tuplelock-update deadlock-hard deadlock-soft-2
> 
> OK, but there'd better be a comment there explaining the concern
> very precisely, or somebody will break it.

Here's a concrete proposal.  Runtime is 45.7 seconds on my laptop.  It
can be further reduced, but not by more than a second or two unless you
get in the business of modifying other tests.  (I only modified
deadlock-soft-2 because it saves 5 seconds).

Admittedly the new isolation_schedule file is a bit ugly.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From f80a2fd9e3320d033017e17a648ac1ca13396ef0 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 25 Jan 2018 17:11:03 -0300
Subject: [PATCH] Parallelize isolation tests a bit

It is possible to run isolation tests in parallel, which saves quite a
bit of runtime (from 76 to 45 seconds in my machine).

Test 'timeout' has strict timing requirements, which may fail in animals
that are slow (either because of running under valgrind, doing cache
clobbering or expensive memory checks, or hardware is just slow), so
only put it in the same group with other tests that mostly sleep, to
avoid disruption.

Discussion: https://postgr.es/m/20180124231006.z7spaz5gkzbdvob5@alvherre.pgsql
---
 src/test/isolation/expected/deadlock-soft-2.out | 12 ++--
 src/test/isolation/isolation_schedule   | 84 +
 src/test/isolation/specs/deadlock-soft-2.spec   | 22 +++
 3 files changed, 33 insertions(+), 85 deletions(-)

diff --git a/src/test/isolation/expected/deadlock-soft-2.out 
b/src/test/isolation/expected/deadlock-soft-2.out
index 14b0343ba4..5c33a5cdaa 100644
--- a/src/test/isolation/expected/deadlock-soft-2.out
+++ b/src/test/isolation/expected/deadlock-soft-2.out
@@ -1,12 +1,12 @@
 Parsed test spec with 4 sessions
 
 starting permutation: s1a s2a s2b s3a s4a s1b s1c s2c s3c s4c
-step s1a: LOCK TABLE a1 IN SHARE UPDATE EXCLUSIVE MODE;
-step s2a: LOCK TABLE a2 IN ACCESS SHARE MODE;
-step s2b: LOCK TABLE a1 IN SHARE UPDATE EXCLUSIVE MODE; 
-step s3a: LOCK TABLE a2 IN ACCESS EXCLUSIVE MODE; 
-step s4a: LOCK TABLE a2 IN ACCESS EXCLUSIVE MODE; 
-step s1b: LOCK TABLE a2 IN SHARE UPDATE EXCLUSIVE MODE; 
+step s1a: LOCK TABLE aa1 IN SHARE UPDATE EXCLUSIVE MODE;
+step s2a: LOCK TABLE aa2 IN ACCESS SHARE MODE;
+step s2b: LOCK TABLE aa1 IN SHARE UPDATE EXCLUSIVE MODE; 
+step s3a: LOCK TABLE aa2 IN ACCESS EXCLUSIVE MODE; 
+step s4a: LOCK TABLE aa2 IN ACCESS EXCLUSIVE MODE; 
+step s1b: LOCK TABLE aa2 IN SHARE UPDATE EXCLUSIVE MODE; 
 step s1b: <... completed>
 step s1c: COMMIT;
 step s2b: <... completed>
diff --git a/src/test/isolation/isolation_schedule 
b/src/test/isolation/isolation_schedule
index 74d7d59546..0df00c6a53 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -1,68 +1,16 @@
-test: read-only-anomaly
-test: read-only-anomaly-2
-test: read-only-anomaly-3
-test: read-write-unique
-test: read-write-unique-2
-test: read-write-unique-3
-test: read-write-unique-4
-test: simple-write-skew
-test: receipt-report
-test: temporal-range-integrity
-test: project-manager
-test: classroom-scheduling
-test: total-cash
-test: referential-integrity
-test: ri-trigger
-test: partial-index
-test: two-ids
-test: multiple-row-versions
-test: index-only-scan
-test: deadlock-simple
-test: deadlock-hard
-test: deadlock-soft
-test: deadlock-soft-2
-test: fk-contention
-test: fk-deadlock
-test: fk-deadlock2
-test: eval-plan-qual
-test: lock-update-delete
-test: lock-update-traversal
-test: insert-conflict-do-nothing
-test: insert-conflict-do-nothing-2
-test: insert-conflict-do-update
-test: insert-conflict-do-update-2
-test: insert-conflict-do-update-3
-test: insert-conflict-toast
-test: delete-abort-savept
-test: delete-abort-savept-2
-test: aborted-keyrevoke
-test: multixact-no-deadlock
-test: multixact-no-forget
-test: lock-committed-update
-test: lock-committed-keyupdate
-test: update-locked-tuple
-test: propagate-lock-delete
-test: tuplelock-conflict
-test: tuplelock-update
-test: freeze-the-dead
-test: nowait
-test: nowait-2
-test: nowait-3
-test: nowait-4
-test: nowait-5
-test: skip-locked
-test: skip-locked-2
-test: skip-locked-3
-test: skip-locked-4
-test: drop-index-concurrently-1
-test: multiple-cic
-test: alter-table-1
-test: alter-table-2
-test: alter-table-3
-test: alter-table-4
-test: create-trigger
-test: sequence-ddl
-test: async-notify
-test: vacuum-reltuples
-test: timeouts
-test: vacuum-concurrent-drop
+# In the first group of parallel tests, test "timeout" might be susceptible to
+# additional load placed on the machine, so avoid anything in this group that
+# does much more other than sleeping.

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-25 Thread Tom Lane
Robins Tharakan  writes:
> Attached is an updated patch (v4) with basic tests for pg_dump / pg_dumpall.

I've reviewed and pushed this, after making some changes so that the
switch works in pg_restore as well, and minor cosmetic adjustments.

The changes in t/002_pg_dump.pl largely failed to apply, which is
partially due to the age of the patch but IMO speaks more to the
unmaintainability of that TAP test.  It still didn't run after I'd
manually fixed the merge failures, so I gave up in disgust and
did not push any of the test changes.  If someone is excited enough
about testing this, they can submit a followon patch for that,
but I judge it not really worth either the human effort or the
future testing cycles.

(Am I right in thinking that 002_pg_dump.pl is, by design, roughly
O(N^2) in the number of features tested?  Ick.)

regards, tom lane



Re: PATCH: Exclude unlogged tables from base backups

2018-01-25 Thread David Steele
On 1/25/18 12:31 AM, Masahiko Sawada wrote:
> On Thu, Jan 25, 2018 at 3:25 AM, David Steele  wrote:
>>>
>>> Here is the first review comments.
>>>
>>> +   unloggedDelim = strrchr(path, '/');
>>>
>>> I think it doesn't work fine on windows. How about using
>>> last_dir_separator() instead?
>>
>> I think this function is OK on Windows -- we use it quite a bit.
>> However, last_dir_separator() is clearer so I have changed it.
> 
> Thank you for updating this. I was concerned about a separator
> character '/' might not work fine on windows.

Ah yes, I see what you mean now.

> On Thu, Jan 25, 2018 at 6:23 AM, David Steele  wrote:
>> On 1/24/18 4:02 PM, Adam Brightwell wrote:
>> Actually, I was talking to Stephen about this it seems like #3 would be
>> more practical if we just stat'd the init fork for each relation file
>> found.  I doubt the stat would add a lot of overhead and we can track
>> each unlogged relation in a hash table to reduce overhead even more.
>>
> 
> Can the readdir handle files that are added during the loop? I think
> that we still cannot exclude a new unlogged relation if the relation
> is added after we execute readdir first time. To completely eliminate
> it we need a sort of lock that prevents to create new unlogged
> relation from current backends. Or we need to do readdir loop multiple
> times to see if no new relations were added during sending files.

As far as I know readdir() is platform-dependent in terms of how it
scans the dir and if files created after the opendir() will appear.

It shouldn't matter, though, since WAL replay will recreate those files.

> If you're updating the patch to implement #3, this patch should be
> marked as "Waiting on Author". After updated I'll review it again.
Attached is a new patch that uses stat() to determine if the init fork
for a relation file exists.  I decided not to build a hash table as it
could use considerable memory and I didn't think it would be much faster
than a simple stat() call.

The reinit.c refactor has been removed since it was no longer needed.
I'll submit the tests I wrote for reinit.c as a separate patch for the
next CF.

Thanks,
-- 
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 4c5ed1e6d6..5854ec1533 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2552,6 +2552,12 @@ The commands accepted in walsender mode are:
  with pgsql_tmp.
 

+   
+
+ Unlogged relations, except for the init fork which is required to
+ recreate the (empty) unlogged relation on recovery.
+
+   

 
  pg_wal, including subdirectories. If the backup 
is run
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index dd7ad64862..688790ad0d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -26,6 +26,7 @@
 #include "nodes/pg_list.h"
 #include "pgtar.h"
 #include "pgstat.h"
+#include "port.h"
 #include "postmaster/syslogger.h"
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
@@ -33,6 +34,7 @@
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/reinit.h"
 #include "utils/builtins.h"
 #include "utils/elog.h"
 #include "utils/ps_status.h"
@@ -959,12 +961,44 @@ sendDir(const char *path, int basepathlen, bool sizeonly, 
List *tablespaces,
charpathbuf[MAXPGPATH * 2];
struct stat statbuf;
int64   size = 0;
+   const char  *lastDir;   /* Split last dir from 
parent path. */
+   boolisDbDir = false;/* Does this directory contain 
relations? */
+
+   /*
+* Determine if the current path is a database directory that can
+* contain relations.
+*
+* Start by finding the location of the delimiter between the parent
+* path and the current path.
+*/
+   lastDir = last_dir_separator(path);
+
+   /* Does this path look like a database path (i.e. all digits)? */
+   if (lastDir != NULL &&
+   strspn(lastDir + 1, "0123456789") == strlen(lastDir + 1))
+   {
+   /* Part of path that contains the parent directory. */
+   int parentPathLen = lastDir - path;
+
+   /*
+* Mark path as a database directory if the parent path is 
either
+* $PGDATA/base or a tablespace version path.
+*/
+   if (strncmp(path, "./base", parentPathLen) == 0 ||
+   (parentPathLen >= (sizeof(TABLESPACE_VERSION_DIRECTORY) 
- 1) &&
+strncmp(lastDir - 
(sizeof(TABLESPACE_VERSION_DIRECTORY) - 1),
+TABLESPACE_VERSION_DIRECTORY,
+sizeof(TABLESPACE_VERSION_DIRECTORY) - 
1) == 0))
+  

Re: [HACKERS] Proposal: Local indexes for partitioned table

2018-01-25 Thread Alvaro Herrera
Alvaro Herrera wrote:
> I think this is the right fix for this problem.  I wonder about
> exploring other callers of RelationGetIndexList to see who else could be
> confused ...

CLUSTER was also affected (and ALTER TABLE .. CLUSTER ON).  Patched both
and added simple tests.  Couldn't detect any other immediate problems.
Maybe UNIQUE indexes will have a similar hazard, with replica identity.

Case closed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: JIT compiling with LLVM v9.0

2018-01-25 Thread Andres Freund
Hi,

On 2018-01-25 18:40:53 +0300, Konstantin Knizhnik wrote:
> As far as I understand generation of native code is now always done for all
> supported expressions and individually by each backend.

Mostly, yes. It's done "always" done, because there's cost based checks
whether to do so or not.


> I wonder it will be useful to do more efforts to understand when compilation
> to native code should be done and when interpretation is better.
> For example many JIT-able languages like Lua are using traces, i.e. query is
> first interpreted  and trace is generated. If the same trace is followed
> more than N times, then native code is generated for it.

Right. That's where I actually had started out, but my experimentation
showed that that's not that interesting a path to pursue. Emitting code
in much smaller increments (as you'd do so for individual expressions)
has considerable overhead. We also have a planner that allows us
reasonable guesses when to JIT and when not - something not available in
many other languages.

That said, nothing in the infrastructure would preent you from pursuing
that, it'd just be a wrapper function for the generated exprs that
tracks infocations.


> Another question is whether it is sensible to redundantly do expensive work
> (llvm compilation) in all backends.

Right now we kinda have to, but I really want to get rid of
that. There's some pointers included as constants in the generated
code. I plan to work on getting rid of that requirement, but after
getting the basics in (i.e. realistically not this release).  Even after
that I'm personally much more interested in caching the generated code
inside a backend, rather than across backends.   Function addresses et
al being different between backends would add some complications, can be
overcome, but I'm doubtful it's immediately worth it.


> So before starting code generation, ExecReadyCompiledExpr can first
> build signature and check if correspondent library is already present.
> Also it will be easier to control space used by compiled libraries in
> this

Right, I definitely think we want to do that at some point not too far
away in the future. That makes the applicability of JITing much broader.

More advanced forms of this are that you JIT in the background for
frequently executed code (so not to incur latency the first time
somebody executes). Aand/or that you emit unoptimized code the first
time through, which is quite quick, and run the optimizer after the
query has been executed a number of times.

Greetings,

Andres Freund



Re: JIT compiling with LLVM v9.0

2018-01-25 Thread Andres Freund
Hi,

On 2018-01-25 10:00:14 +0100, Pierre Ducroquet wrote:
> I don't know when this would be released,

August-October range.


> but the minimal supported LLVM 
> version will have a strong influence on the availability of that feature. If 
> today this JIT compiling was released with only LLVM 5/6 support, it would be 
> unusable for most Debian users (llvm-5 is only available in sid). Even llvm 4 
> is not available in latest stable.
> I'm already trying to build with llvm-4 and I'm going to try further with 
> llvm 
> 3.9 (Debian Stretch doesn't have a more recent than this one, and I won't 
> have 
> something better to play with my data), I'll keep you informed. For sport, I 
> may also try llvm 3.5 (for Debian Jessie).

I don't think it's unreasonable to not support super old llvm
versions. This is a complex feature, and will take some time to
mature. Supporting too many LLVM versions at the outset will have some
cost.  Versions before 3.8 would require supporting mcjit rather than
orc, and I don't think that'd be worth doing.  I think 3.9 might be a
reasonable baseline...

Greetings,

Andres Freund



Re: Possible performance regression with pg_dump of a large number of relations

2018-01-25 Thread Stephen Frost
Luke,

* Luke Cowell (lcow...@gmail.com) wrote:
> > On Jan 24, 2018, at 2:56 PM, Stephen Frost  wrote:
> >>> ERROR:  relation "pg_init_privs" does not exist
> >>> LINE 139: LEFT JOIN pg_init_privs pip
> > 
> > I certainly hope that works on 9.6, since that's when pg_init_privs was
> > added..
> 
> My mistake. That error is from my 9.5 server. It does error on 9.6, but I get 
> the following error:

No worries!

> $ psql --version
> psql (PostgreSQL) 9.6.6
> $ psql postgres < query.sql
> ERROR:  function pg_get_partkeydef(oid) does not exist
> LINE 126: pg_get_partkeydef(c.oid) AS partkeydef,
>   ^
> HINT:  No function matches the given name and argument types. You might need 
> to add explicit type casts.

Ah, yes, that makes sense, that function wasn't included in 9.6.  That's
not really a problem though, pg_dump knows how to talk to different
versions of PG, it just happens that the query I sent was the one for
HEAD and not for older versions.

> > Presuming I can make it work, the idea would be to back-port it to 9.6
> > and 10, since pg_init_privs and this code was added in 9.6.
> 
> A 9.6 backport would be excellent.

Yup, I'll make sure that the query that's fun for 9.6 server systems
works on, well, 9.6 systems. ;)

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: plpgsql function startup-time improvements

2018-01-25 Thread Pavel Stehule
Hi

2018-01-25 0:16 GMT+01:00 Tom Lane :

> Pavel Stehule  writes:
> > please, can you rebase all three patches necessary for patching?
>
> Done.  These now need to be applied over
> https://www.postgresql.org/message-id/833.1516834...@sss.pgh.pa.us


Thank you

I checked it

1. there are no problem with patching
2. all tests passed
3. I can confirm so some best case speedup is about 15-20%. Worst case is
hard to test - but these changes should not be slower than current master.
4. faster-plpgsql-datum-setup-2.patch is simple patch with few lines of new
code
5. plpgsql-promises-2.patch is little bit more complex, but still it is
simple
6. the code is well formatted and well commented
7. there are not new test and code, what is not problem - these patches
doesn't add any new feature

I'll mark these patches as ready for commiter

Regards

Pavel



>
>
> regards, tom lane
>
>


Re: CONSTANT/NOT NULL/initializer properties for plpgsql record variables

2018-01-25 Thread Daniel Gustafsson
> On 25 Jan 2018, at 16:34, Tom Lane  wrote:
> Daniel Gustafsson  writes:

>> One tiny thing: while not introduced in this patch, I wonder if it would be
>> worth adding an errhint in the following hunk when applied to arrays, to
>> clarify what CONSTANT in an array declaration mean.  I have seen confusion
>> around what y means in ‘x int[y]’, and I can see ‘x CONSTANT int[y]’ being
>> misinterpreted as “length fixed to y”.
> 
> Hmm.  What would you imagine the hint saying?  It's certainly true that
> a lot of people don't understand that a declared array length doesn't
> mean anything in Postgres, but that's not specific to this context.

I was thinking about something along the lines of:

  errhint("In array declarations, CONSTANT refers to elements and not length."),

.. when datatype->typisarray is true, but again it might just be waffling that
doesn’t help anyone.

cheers ./daniel


Re: Possible performance regression with pg_dump of a large number of relations

2018-01-25 Thread Luke Cowell


> On Jan 24, 2018, at 2:56 PM, Stephen Frost  wrote:
> 
> Hi there!
> 
> 
>>> ERROR:  relation "pg_init_privs" does not exist
>>> LINE 139: LEFT JOIN pg_init_privs pip
> 
> I certainly hope that works on 9.6, since that's when pg_init_privs was
> added..

My mistake. That error is from my 9.5 server. It does error on 9.6, but I get 
the following error:

$ psql --version
psql (PostgreSQL) 9.6.6
$ psql postgres < query.sql
ERROR:  function pg_get_partkeydef(oid) does not exist
LINE 126: pg_get_partkeydef(c.oid) AS partkeydef,
  ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.

> Presuming I can make it work, the idea would be to back-port it to 9.6
> and 10, since pg_init_privs and this code was added in 9.6.

A 9.6 backport would be excellent.

Thanks again!

Luke


Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-01-25 Thread Nikolay Shaplov
В письме от 25 января 2018 11:29:34 пользователь Tom Lane написал:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> Well, maybe the right answer is to address that.  It's clear to me
> >> why that would happen if we store these things as reloptions on the
> >> toast table, but can't they be stored on the parent table?
> > 
> > Actually, Nikolay provided a possible solution: if you execute ALTER
> > TABLE SET (toast.foobar = xyz), and a toast table doesn't exist, create
> > one at that point.
> 
> That adds a lot of overhead if you never actually need the toast table.
I think this overhead case is not that terrible if it is properly warned ;-)

> Still, maybe it's an appropriate amount of effort compared to the size
> of the use-case for this.
If you came to some final conclustion, please close the commiffest item with 
"Return with feedback" resolution, and I write another patch... 

-- 
Do code for fun.

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] [POC] Faster processing at Gather node

2018-01-25 Thread Robert Haas
On Tue, Jan 9, 2018 at 7:09 PM, Andres Freund  wrote:
>> + * mq_sender and mq_bytes_written can only be changed by the sender.
>> + * mq_receiver and mq_sender are protected by mq_mutex, although, 
>> importantly,
>> + * they cannot change once set, and thus may be read without a lock once 
>> this
>> + * is known to be the case.
>
> I don't recall our conversation around this anymore, and haven't read
> down far enough to see the relevant code. Lest I forget: Such construct
> often need careful use of barriers.

I think the only thing the code assumes here is that if we previously
read the value with the spinlock and didn't get NULL, we can later
read the value without the spinlock and count on seeing the same value
we saw previously.  I think that's safe enough.

> s/should/is/ or similar?

I prefer it the way that I have it.

> Perhaps a short benchmark for 32bit systems using shm_mq wouldn't hurt?
> I suspect there won't be much of a performance impact, but it's probably
> worth checking.

I don't think I understand your concern here.  If this is used on a
system where we're emulating atomics and barriers in painful ways, it
might hurt performance, but I think we have a policy of not caring.

Also, I don't know where I'd find a 32-bit system to test.

>> - * Importantly, mq_ring can be safely read and written without a lock.  Were
>> - * this not the case, we'd have to hold the spinlock for much longer
>> - * intervals, and performance might suffer.  Fortunately, that's not
>> - * necessary.  At any given time, the difference between mq_bytes_read and
>> + * At any given time, the difference between mq_bytes_read and
>
> Hm, why did you remove the first part about mq_ring itself?

Bad editing.  Restored.

>> @@ -848,18 +868,19 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, 
>> const void *data,
>>
>>   while (sent < nbytes)
>>   {
>> - booldetached;
>>   uint64  rb;
>> + uint64  wb;
>>
>>   /* Compute number of ring buffer bytes used and available. */
>> - rb = shm_mq_get_bytes_read(mq, &detached);
>> - Assert(mq->mq_bytes_written >= rb);
>> - used = mq->mq_bytes_written - rb;
>> + rb = pg_atomic_read_u64(&mq->mq_bytes_read);
>> + wb = pg_atomic_read_u64(&mq->mq_bytes_written);
>> + Assert(wb >= rb);
>> + used = wb - rb;
>
> Just to make sure my understanding is correct: No barriers needed here
> because "bytes_written" is only written by the sending backend, and
> "bytes_read" cannot lap it. Correct?

We can't possibly read a stale value of mq_bytes_written because we
are the only process that can write it.  It's possible that the
receiver has just increased mq_bytes_read and that the change isn't
visible to us yet, but if so, the sender's also going to set our
latch, or has done so already.  So the worst thing that happens is
that we decide to sleep because it looks like no space is available
and almost immediately get woken up because there really is space.

>>   Assert(used <= ringsize);
>>   available = Min(ringsize - used, nbytes - sent);
>>
>>   /* Bail out if the queue has been detached. */
>> - if (detached)
>> + if (mq->mq_detached)
>
> Hm, do all paths here guarantee that mq->mq_detached won't be stored on
> the stack / register in the first iteration, and then not reread any
> further? I think it's fine because every branch of the if below ends up
> in a syscall / barrier, but it might be worth noting on that here.

Aargh.  I hate compilers.  I added a comment.  Should I think about
changing mq_detached to pg_atomic_uint32 instead?

> Perhaps mention that this could lead to spuriously signalling the wrong
> backend in case of detach, but that that is fine?

I think that's a general risk of latches that doesn't need to be
specifically recapitulated here.

> I know the scheme isn't new, but I do find it not immediately obvious
> that 'wb' is short for 'bytes_written'.

Sorry.

>> - /* Write as much data as we can via a single memcpy(). 
>> */
>> + /*
>> +  * Write as much data as we can via a single memcpy(). 
>> Make sure
>> +  * these writes happen after the read of 
>> mq_bytes_read, above.
>> +  * This barrier pairs with the one in 
>> shm_mq_inc_bytes_read.
>> +  */
>
> s/above/above. Otherwise a newer mq_bytes_read could become visible
> before the corresponding reads have fully finished./?

I don't find that very clear.  A newer mq_bytes_read could become
visible whenever, and a barrier doesn't prevent that from happening.
What it does is ensure (together with the one in
shm_mq_inc_bytes_read) that we don't try to read bytes that aren't
fully *written* yet.

Generally, my mental model is that barriers make things happen in
program order 

Re: Further cleanup of pg_dump/pg_restore item selection code

2018-01-25 Thread Tom Lane
"David G. Johnston"  writes:
> Should pg_restore fail if asked to --create without a database entry in the
> TOC?

Yeah, I wondered about that too.  This patch makes it a non-issue for
archives created with v11 or later pg_dump, but there's still a hazard
if you're restoring from an archive made by an older pg_dump.  Is it
worth putting in logic to catch that?

Given the lack of field complaints, I felt fixing it going forward is
sufficient, but there's room to argue differently.

regards, tom lane



Re: [PROPOSAL] Shared Ispell dictionaries

2018-01-25 Thread Arthur Zakirov
Hello,

Thank you for your review! Good catches.

On Thu, Jan 25, 2018 at 03:26:46PM +0300, Ildus Kurbangaliev wrote:
> In 0001 there are few lines where is only indentation has changed.

Fixed.

> 0002:
> - TsearchShmemSize - calculating size using hash_estimate_size seems
> redundant since you use DSA hash now.

Fixed. True, there is no need in hash_estimate_size anymore.

> - ts_dict_shmem_release - LWLockAcquire in the beginning makes no
>   sense, since dict_table couldn't change anyway.

Fixed. In earlier version tsearch_ctl was used here, but I forgot to remove 
LWLockAcquire.

> 0003:
> - ts_dict_shmem_location could return IspellDictData, it makes more
>   sense.

I assume that ts_dict_shmem_location can be used by various types of 
dictionaries, not only by Ispell. So void * more suitable here.

> 0006:
> It's very subjective, but I think it would nicer to call option as
> Shared (as property of dictionary) or UseSharedMemory, the boolean
> option called SharedMemory sounds weird.

Agree. In our offline conversation we came to Shareable, that is a dictionary 
can be shared. It may be more appropriate because setting Shareable=true 
doesn't guarantee that a dictionary will be allocated in shared memory due to 
max_shared_dictionaries_size GUC.

Attached new version of the patch.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index b9fdd77e19..e071994523 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -78,6 +78,8 @@
 #define tmpalloc(sz)  MemoryContextAlloc(Conf->buildCxt, (sz))
 #define tmpalloc0(sz)  MemoryContextAllocZero(Conf->buildCxt, (sz))
 
+#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str))
+
 /*
  * Prepare for constructing an ISpell dictionary.
  *
@@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char 
*flag)
Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + 
strlen(word) + 1);
strcpy(Conf->Spell[Conf->nspell]->word, word);
Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0')
-   ? cpstrdup(Conf, flag) : VoidString;
+   ? tmpstrdup(flag) : VoidString;
Conf->nspell++;
 }
 
@@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, 
CompoundAffixFlag *entry,
entry->flag.i = i;
}
else
-   entry->flag.s = cpstrdup(Conf, s);
+   entry->flag.s = tmpstrdup(s);
 
entry->flagMode = Conf->flagMode;
entry->value = val;
@@ -1536,6 +1538,9 @@ nextline:
return;
 
 isnewformat:
+   pfree(recoded);
+   pfree(pstr);
+
if (oldformat)
ereport(ERROR,
(errcode(ERRCODE_CONFIG_FILE_ERROR),
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 45b2af14eb..46617df852 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1364,6 +1364,35 @@ include_dir 'conf.d'
   
  
 
+ 
+  max_shared_dictionaries_size 
(integer)
+  
+   max_shared_dictionaries_size configuration 
parameter
+  
+  
+  
+   
+Sets the maximum size of all text search dictionaries loaded into 
shared
+memory.  The default is 100 megabytes (100MB).  This
+parameter can only be set at server start.
+   
+
+   
+Currently controls only loading of Ispell
+dictionaries (see ).
+After compiling the dictionary it will be copied into shared memory.
+Another backends on first use of the dictionary will use it from shared
+memory, so it doesn't need to compile the dictionary second time.
+   
+
+   
+If total size of simultaneously loaded dictionaries reaches the maximum
+allowed size then a new dictionary will be loaded into local memory of
+a backend.
+   
+  
+ 
+
  
   huge_pages (enum)
   
diff --git a/src/backend/commands/tsearchcmds.c 
b/src/backend/commands/tsearchcmds.c
index bdf3857ce4..42be77d045 100644
--- a/src/backend/commands/tsearchcmds.c
+++ b/src/backend/commands/tsearchcmds.c
@@ -39,6 +39,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "tsearch/ts_cache.h"
+#include "tsearch/ts_shared.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -396,7 +397,8 @@ verify_dictoptions(Oid tmplId, List *dictoptions)
 * Call the init method and see if it complains.  We don't 
worry about
 * it leaking memory, since our command will soon be over 
anyway.
 */
-   (void) OidFunctionCall1(initmethod, 
PointerGetDatum(dictoptions));
+   (void) OidFunctionCall2(initmethod, 
PointerGetDatum(dictoptions),
+   
ObjectIdGetDatum(InvalidOid));
}
 
Releas

Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-01-25 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Well, maybe the right answer is to address that.  It's clear to me
>> why that would happen if we store these things as reloptions on the
>> toast table, but can't they be stored on the parent table?

> Actually, Nikolay provided a possible solution: if you execute ALTER
> TABLE SET (toast.foobar = xyz), and a toast table doesn't exist, create
> one at that point.

That adds a lot of overhead if you never actually need the toast table.

Still, maybe it's an appropriate amount of effort compared to the size
of the use-case for this.

regards, tom lane



Re: Doc tweak for huge_pages?

2018-01-25 Thread Peter Eisentraut
On 1/22/18 01:10, Thomas Munro wrote:
> Sorry, right, that was 100% wrong.  It would probably be correct to
> remove the "not", but let's just remove that bit.  New version
> attached.

Committed that.

I reordered some of the existing material because it seemed to have
gotten a bit out of order with repeated patching.

I also softened the advice against THP just a bit, since that is
apparently still changing all the time.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: unique indexes on partitioned tables

2018-01-25 Thread Jesper Pedersen

Hi Alvaro,

On 01/22/2018 05:55 PM, Alvaro Herrera wrote:

Alvaro Herrera wrote:

Version 4 of this patch, rebased on today's master.





Passes make check-world.

Maybe add a test case to indexing.sql that highlights that hash indexes 
doesn't support UNIQUE; although not unique to partitioned indexes.


Thanks for working on this !

Best regards,
 Jesper



CONSTANT/NOT NULL/initializer properties for plpgsql record variables

2018-01-25 Thread David G. Johnston
On Thursday, January 25, 2018, Tom Lane  wrote:
>
> The documentation currently says
>
> The CONSTANT option prevents the variable from being assigned to
> after initialization, so that its value will remain constant for
> the duration of the block.
>

While we don't really have the concept of mutable types other languages
do.  Maybe just saying "from being assigned or mutated after
initialization" would suffice.

I don't see a desirable way to reinforce that the y in arr[y] is unenforced
documentation - in the code or docs - beyond what is already done.  That
distinction doesn't usually come up and for those where it does the docs
will probably be an after-the-fact confirmation of observed behavior.

David J.


Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-01-25 Thread Alvaro Herrera
Tom Lane wrote:
> Peter Eisentraut  writes:
> > On 1/23/18 13:39, Robert Haas wrote:
> >> I don't understand.  AAUI, it is currently the case that if you set
> >> the options before the TOAST table exists, they are lost.
> 
> > Well, that's also weird.
> 
> Well, maybe the right answer is to address that.  It's clear to me
> why that would happen if we store these things as reloptions on the
> toast table, but can't they be stored on the parent table?

Actually, Nikolay provided a possible solution: if you execute ALTER
TABLE SET (toast.foobar = xyz), and a toast table doesn't exist, create
one at that point.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: JIT compiling with LLVM v9.0

2018-01-25 Thread Konstantin Knizhnik



On 24.01.2018 10:20, Andres Freund wrote:

Hi,

I've spent the last weeks working on my LLVM compilation patchset. In
the course of that I *heavily* revised it. While still a good bit away
from committable, it's IMO definitely not a prototype anymore.

There's too many small changes, so I'm only going to list the major
things. A good bit of that is new. The actual LLVM IR emissions itself
hasn't changed that drastically.  Since I've not described them in
detail before I'll describe from scratch in a few cases, even if things
haven't fully changed.


== JIT Interface ==

To avoid emitting code in very small increments (increases mmap/mremap
rw vs exec remapping, compile/optimization time), code generation
doesn't happen for every single expression individually, but in batches.

The basic object to emit code via is a jit context created with:
   extern LLVMJitContext *llvm_create_context(bool optimize);
which in case of expression is stored on-demand in the EState. For other
usecases that might not be the right location.

To emit LLVM IR (ie. the portabe code that LLVM then optimizes and
generates native code for), one gets a module from that with:
   extern LLVMModuleRef llvm_mutable_module(LLVMJitContext *context);

to which "arbitrary" numbers of functions can be added. In case of
expression evaluation, we get the module once for every expression, and
emit one function for the expression itself, and one for every
applicable/referenced deform function.

As explained above, we do not want to emit code immediately from within
ExecInitExpr()/ExecReadyExpr(). To facilitate that readying a JITed
expression sets the function to callback, which gets the actual native
function on the first actual call.  That allows to batch together the
generation of all native functions that are defined before the first
expression is evaluated - in a lot of queries that'll be all.

Said callback then calls
   extern void *llvm_get_function(LLVMJitContext *context, const char 
*funcname);
which'll emit code for the "in progress" mutable module if necessary,
and then searches all generated functions for the name. The names are
created via
   extern void *llvm_get_function(LLVMJitContext *context, const char 
*funcname);
currently "evalexpr" and deform" with a generation and counter suffix.

Currently expression which do not have access to an EState, basically
all "parent" less expressions, aren't JIT compiled. That could be
changed, but I so far do not see a huge need.


Hi,

As far as I understand generation of native code is now always done for 
all supported expressions and individually by each backend.
I wonder it will be useful to do more efforts to understand when 
compilation to native code should be done and when interpretation is better.
For example many JIT-able languages like Lua are using traces, i.e. 
query is first interpreted  and trace is generated. If the same trace is 
followed more than N times, then native code is generated for it.


In context of DBMS executor it is obvious that only frequently executed 
or expensive queries have to be compiled.
So we can use estimated plan cost and number of query executions as 
simple criteria for JIT-ing the query.
May be compilation of simple queries (with small cost) should be done 
only for prepared statements...


Another question is whether it is sensible to redundantly do expensive 
work (llvm compilation) in all backends.
This question refers to shared prepared statement cache. But even 
without such cache, it seems to be possible to use for library name some 
signature of the compiled expression and allow
to share this libraries between backends. So before starting code 
generation, ExecReadyCompiledExpr can first build signature and check if 
correspondent library is already present.
Also it will be easier to control space used by compiled libraries in 
this case.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: CONSTANT/NOT NULL/initializer properties for plpgsql record variables

2018-01-25 Thread Tom Lane
Daniel Gustafsson  writes:
> I’ve reviewed this patch (disclaimer: I did not review the patches listed 
> above
> which it is based on) and the functionality introduced.  The code is straight-
> forward, there are ample tests and I can’t make it break however many weird
> combinations thrown at it.  Regarding the functionality it’s clearly a +1 on
> getting this in.

Thanks for reviewing!

> One tiny thing: while not introduced in this patch, I wonder if it would be
> worth adding an errhint in the following hunk when applied to arrays, to
> clarify what CONSTANT in an array declaration mean.  I have seen confusion
> around what y means in ‘x int[y]’, and I can see ‘x CONSTANT int[y]’ being
> misinterpreted as “length fixed to y”.

Hmm.  What would you imagine the hint saying?  It's certainly true that
a lot of people don't understand that a declared array length doesn't
mean anything in Postgres, but that's not specific to this context.

> That might not be a common enough misunderstanding to warrant it, but it was
> the only thing that stood out to me (and perhaps it would be better in the 
> docs
> if done at all).

The documentation currently says

The CONSTANT option prevents the variable from being assigned to
after initialization, so that its value will remain constant for
the duration of the block.

I don't mind changing that if we can think of clearer wording, but to
me it seems pretty clear already.

regards, tom lane



Re: [HACKERS] SERIALIZABLE with parallel query

2018-01-25 Thread Robert Haas
On Wed, Jan 24, 2018 at 7:39 PM, Thomas Munro
 wrote:
> This started crashing some time yesterday with an assertion failure in
> the isolation tests after commit 2badb5af landed.  Reordering of code
> in parallel.c confused patch's fuzz heuristics leading
> SetSerializableXact() to be called too soon.  Here's a fix for that.

I took a look at this today and thought it might be OK to commit,
modulo a few minor issues: (1) you didn't document the new tranche and
(2) I prefer to avoid if (blah) { Assert(thing) } in favor of
Assert(!blah || thing).

But then I got a little bit concerned about ReleasePredicateLocks().
Suppose that in the middle of a read-only transaction,
SXACT_FLAG_RO_SAFE becomes true. The next call to
SerializationNeededForRead in each process will call
ReleasePredicateLocks.  In the workers, this won't do anything, so
we'll just keep coming back there.  But in the leader, we'll go ahead
do all that stuff, including MySerializableXact =
InvalidSerializableXact.  But in the workers, it's still set.  Maybe
that's OK, but I'm not sure that it's OK...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PATCH: pgbench - option to build using ppoll() for larger connection counts

2018-01-25 Thread Fabien COELHO


ISTM that the v7 patch version you sent is identical to v6.

--
Fabien.



Re: reducing isolation tests runtime

2018-01-25 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Alvaro Herrera  writes:
>>> On the subject of test total time, we could paralelize isolation tests.

>> BTW, one small issue there is that the reason the timeouts test is so
>> slow is that we have to use multi-second timeouts to be sure slower
>> buildfarm critters (eg valgrind animals) will get the expected results.
>> So I'm worried that if the machine isn't otherwise idle, we will get
>> random failures.

> I think we could solve this by putting in the same parallel group only
> slow tests that mostly sleeps, ie. nothing that would monopolize CPU for
> long enough to cause a problem.  Concretely:
> test: timeouts tuplelock-update deadlock-hard deadlock-soft-2

OK, but there'd better be a comment there explaining the concern
very precisely, or somebody will break it.

regards, tom lane



Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2018-01-25 Thread Alvaro Herrera
Claudio Freire wrote:
> On Thu, Jan 25, 2018 at 4:11 AM, Thomas Munro
>  wrote:

> > *** 128,134 
> >   SELECT pg_relation_size('vactst', 'main');
> >pg_relation_size
> >   --
> > ! 0
> >   (1 row)
> >
> >   SELECT count(*) FROM vactst;
> > --- 128,134 
> >   SELECT pg_relation_size('vactst', 'main');
> >pg_relation_size
> >   --
> > !  8192
> >   (1 row)

> However, shouldn't an empty relation have an initial page anyway? In
> that case shouldn't the correct value be 8192?

No, it's legal for an empty table to have size 0 on disk.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-01-25 Thread Tom Lane
Peter Eisentraut  writes:
> On 1/23/18 13:39, Robert Haas wrote:
>> I don't understand.  AAUI, it is currently the case that if you set
>> the options before the TOAST table exists, they are lost.

> Well, that's also weird.

Well, maybe the right answer is to address that.  It's clear to me
why that would happen if we store these things as reloptions on the
toast table, but can't they be stored on the parent table?

Backward compatibility might be an issue, but I doubt that there
are enough people using these options that we couldn't get away
with breaking things, if we're unable to find a decent automatic
conversion method.

regards, tom lane



  1   2   >