Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-09-04 Thread Alex Ignatov


-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila
Sent: Monday, September 4, 2017 3:32 PM
To: i.kartys...@postgrespro.ru
Cc: pgsql-hackers 
Subject: Re: [HACKERS] WIP: long transactions on hot standby feedback replica / 
proof of concept

On Mon, Sep 4, 2017 at 4:34 PM,   wrote:
> Our clients complain about this issue and therefore I want to raise 
> the discussion and suggest several solutions to this problem:
>
> I. Why does PG use Fatal when Error is enough to release lock that 
> rises lock conflict?
> "If (RecoveryConflictPending && DoingCommandRead)"
>
> II. Do we really need to truncate the table on hot standby exactly at 
> the same time when truncate on master occurs?
>
> In my case conflict happens when the autovacuum truncates table tbl1 
> on master while backend on replica is performing a long transaction 
> involving the same table tbl1. This happens because truncate takes an 
> AccessExclusiveLock. To tackle this issue we have several options:
>
> 1. We can postpone the truncate on the master until all the replicas 
> have finished their transactions (in this case, feedback requests to 
> the master should be sent frequently) Patch 1 
> vacuum_lazy_truncate.patch
>
> 2. Maybe there is an option somehow not to send AccessExclusiveLock 
> and not to truncate table on the replica right away. We could try to 
> wait a little and truncate tbl1 on replica again.
>

Can max_standby_streaming_delay help in this situation (point number - 2)?


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Hello!
In this situation this parameter (max_standby_streaming_delay) wont help 
because if you have subsequent statement on standby (following info is from 
documentation and from our experience ): Thus, if one query has resulted in 
significant delay, subsequent conflicting queries will have much less grace 
time until the standby server has caught up again. And you never now how to set 
this parameter exept to -1 which mean up to infinity delayed standby. 

On our experience only autovacuum on master took AccesExclusiveLock that raise 
this Fatal message on standby. After this AccessExclusive reached standby and 
max_standby_streaming_delay > -1 you definitely sooner or later  get this Fatal 
on recovery . 
With this patch we try to get rid of AccessEclusiveLock applied on standby 
while we have active statement on it.



--
Alex Ignatov 
Postgres Professional: http://www.postgrespro.com 
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel COPY FROM execution

2017-08-11 Thread Alex K
Greetings pgsql-hackers,

I have completed the general infrastructure for parallel COPY FROM execution,
consisting of Main (master) process and multiple BGWorkers connected with master
using a personal message query (shm_mq).

Master process does:
- Dynamic shared memory allocation with parallel state across
BGWorkers and master
- Attaching every worker to the personal message query (shm_mq)
- Wait workers initialization using Latch
- Read raw text lines using CopyReadLine and puts them into shm_mq's
  via round-robin to balance queries load
- When EOF is reached sends zero-length message and workers are safely
  shut down when receive it
- Wait for worker until they complete their jobs using ConditionalVariable

Each BGWorker does:
- Signal master on initialization via Latch
- Receive raw text lines over the personal shm_mq and put them into
the log (for now)
- Reinitialize db connection using the same db_id and user_id as main process
- Signal master via ConditionalVariable on job done

All parallel state modifications are done under LWLocks.

You can find actual code here https://github.com/ololobus/postgres/pull/2/files
(it is still in progress, so has a lot of duplications and comments,
which are to-be-deleted)

To go forward I have to overcome some obstacles:

- Currently all copy.c methods are designed to work with one giant
structure – CopyState.
  It includes buffers, many initial parameters which stay unchanged
and a few variables
  which vary during COPY FROM execution. Since I need all these
parameters, I have to
  obtain them somehow inside each BGWorker process. I see two possible
solutions here:

  1) Perform BeginCopyFrom initialization inside master and put
required parameters into
  shared memory. However, many of them are arrays of a variable size
  (e.g. partition_tupconv_maps, force_notnull_flags), so I cannot
put them into shmem
  inside one single struct. The best idea I have is to put each
parameter under the personal
  shmem TOC key, which seems to be quite ugly.

  2) Perform BeginCopyFrom initialization inside each BGWorker.
However, it also opens
  input file/pipe for read, which is not necessary for workers and
may cause some
  interference with master, but I can modify BeginCopyFrom.

- I have used both Latch and ConditionalVariable for the same
purpose–wait until some signal
  occurs–and for me as an end user they perform quite similar. I
looked into the condition_variable.c
  code and it uses Latch and SpinLock under the hood. So what are
differences and dis-/advantages
  between Latch and ConditionalVariable?

I will be glad if someone will help me to find an answer to my
question; also any comments
and remarks to the overall COPY FROM processing architecture are very welcome.


Alexey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel COPY FROM execution

2017-06-30 Thread Alex K
On Fri, Jun 30, 2017 at 3:35 PM, Pavel Stehule  wrote:
>
>
> 2017-06-30 14:23 GMT+02:00 Alex K :
>>
>> Thus, it results in a ~60% performance boost per each x2 multiplication of
>> parallel processes, which is consistent with the initial estimation.
>>
>
> the important use case is big table with lot of indexes. Did you test
> similar case?

Not yet, I will try it, thank you for a suggestion. But how much is it
'big table' and 'lot of indexes' in numbers approximately?

Also, index updates and constraint checks performance are what I cannot
control during COPY execution, so probably I have not to care too much
about that. But of course, it is interesting, how does COPY perform in
that case.


Alexey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel COPY FROM execution

2017-06-30 Thread Alex K
Greetings pgsql-hackers,

I am a GSOC student this year, my initial proposal has been discussed
in the following thread
https://www.postgresql.org/message-id/flat/7179F2FD-49CE-4093-AE14-1B26C5DFB0DA%40gmail.com

Patch with COPY FROM errors handling seems to be quite finished, so
I have started thinking about parallelism in COPY FROM, which is the next
point in my proposal.

In order to understand are there any expensive calls in COPY, which
can be executed in parallel, I did a small research. First, please, find
flame graph of the most expensive copy.c calls during the 'COPY FROM file'
attached (copy_from.svg). It reveals, that inevitably serial operations like
CopyReadLine (<15%), heap_multi_insert (~15%) take less than 50% of
time in summary, while remaining operations like heap_form_tuple and
multiple checks inside NextCopyFrom probably can be executed well in parallel.

Second, I have compared an execution time of 'COPY FROM a single large
file (~300 MB, 5000 lines)' vs. 'COPY FROM four equal parts of the
original file executed in the four parallel processes'. Though it is a
very rough test, it helps to obtain an overall estimation:

Serial:
real 0m56.571s
user 0m0.005s
sys 0m0.006s

Parallel (x4):
real 0m22.542s
user 0m0.015s
sys 0m0.018s

Thus, it results in a ~60% performance boost per each x2 multiplication of
parallel processes, which is consistent with the initial estimation.

After several discussions I have two possible solutions on my mind:

1) Simple solution

Let us focus only on the 'COPY FROM file', then it is relatively easy to
implement, just give the same file and offset to each worker.

++ Simple; more reliable solution; probably it will give us the most possible
 performance boost

- - Limited number of use cases. Though 'COPY FROM file' is a frequent case,
even when one use it with psql \copy, client-side file read and stdin
streaming to the backend are actually performed

2) True parallelism

Implement a pool of bg_workers and simple shared_buffer/query. While main
COPY process will read an input data and put raw lines into the query, parallel
bg_workers will take lines from there and process.

++ More general solution; support of various COPY FROM use-cases

- - Much more sophisticated solution; probably less performance boost
compared to 1)

I will be glad to any comments and criticism.


Alexey

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why restore_command is called for existing files in pg_xlog?

2017-06-12 Thread Alex Kliukin
On Fri, Jun 2, 2017, at 11:51 AM, Alexander Kukushkin wrote:
> Hello hackers,
> There is one strange and awful thing I don't understand about
> restore_command: it is always being called for every single WAL
> segment postgres wants to apply (even if such segment already exists
> in pg_xlog) until replica start streaming from the master.
The real problem this question is related to is being unable to bring a
former master, demoted after a crash, online, since the WAL segments
required to get it to the consistent state were not archived while it
was still a master, and local segments in pg_xlog are ignored when a
restore_command is defined. The other replicas wouldn't be good
candidates for promotion as well, as they were way behind the master
(because the last N WAL segments were not archived and streaming
replication had a few seconds delay).
Is this a correct list for such questions, or would it be more
appropriate to ask elsewhere (i.e. pgsql-bugs?)
> 
> If there is no restore_command in the recovery.conf - it perfectly
> works, i.e. postgres replays existing wal segments and at some point
> connects to the master and start streaming from it.> 
> When recovery_conf is there, starting of a replica could become a real
> problem, especially if restore_command is slow.> 
> Is it possible to change this behavior somehow? First look into
> pg_xlog and only if file is missing or "corrupted" call
> restore_command.> 
> 
> Regards,
> ---
> Alexander Kukushkin

Sincerely,
Alex



Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-06-07 Thread Alex K
Hi pgsql-hackers,

Thank you again for all these replies. I have started working under this
project
and learnt a lot of new stuff last month, so here are some new thoughts
about
ERRORS handling in COPY. I decided to stick to the same thread, since it
has a neutral subject.

(1) One of my mentors--Alvaro Herrera--suggested me to have a look on the
UPSERT. It may be a good point to be able to achieve the same functionality
as during the ON CONFLICT DO NOTHING, when COPY actually inserts tuples
and errors handling is turned on. It could additionally reduce number of
failed
subtransactions and reduce XIDs consumption, while still ignoring some
common
errors like unique index violation.

Adding a full support of ON CONFLICT DO NOTHING/UPDATE to COPY seems
to be a large separated task and is out of the current project scope, but
maybe there is
a relatively simple way to somehow perform internally tuples insert with
ON CONFLICT DO NOTHING? I have added Peter Geoghegan to cc, as
I understand he is the major contributor of UPSERT in PostgreSQL. It would
be great
if he will answer this question.

(2) Otherwise, I am still going to use subtransactions via
BeginInternalSubTransaction
and PG_TRY / PG_CATCH with
ReleaseCurrentSubTransaction / RollbackAndReleaseCurrentSubTransaction.
To minimize XIDs consumption I will try to insert tuples in batches and
pre-validate
them as much as possible (as was suggested in the thread before).



Alexey


Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-04-10 Thread Alex K
Hi Alexander!

I've missed your reply, since proposal submission deadline have passed last
Monday and I didn't check hackers mailing list too frequently.

(1) It seems that starting new subtransaction at step 4 is not necessary.
We can just gather all error lines in one pass and at the end of input
start the only one additional subtransaction with all safe-lines at once:
[1, ..., k1 - 1, k1 + 1, ..., k2 - 1, k2 + 1, ...], where ki is an error
line number.

But assuming that the only livable use-case is when number of errors is
relatively small compared to the total rows number, because if the input is
in totally inconsistent format, then it seems useless to import it into the
db. Thus, it is not 100% clear for me, would it be any real difference in
performance, if one starts new subtransaction at step 4 or not.

(2) Hmm, good question. As far as I know it is impossible to get stdin
input size, thus it is impossible to distribute stdin directly to the
parallel workers. The first approach which comes to the mind is to store
stdin input in any kind of buffer/query and next read it in parallel by
workers. The question is how it will perform in the case of large file, I
guess poor, at least from the memory consumption perspective. But would
parallel execution still be faster is the next question.


Alexey



On Thu, Apr 6, 2017 at 4:47 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Hi, Alexey!
>
> On Tue, Mar 28, 2017 at 1:54 AM, Alexey Kondratov <
> kondratov.alek...@gmail.com> wrote:
>
>> Thank you for your responses and valuable comments!
>>
>> I have written draft proposal https://docs.google.c
>> om/document/d/1Y4mc_PCvRTjLsae-_fhevYfepv4sxaqwhOo4rlxvK1c/edit
>>
>> It seems that COPY currently is able to return first error line and error
>> type (extra or missing columns, type parse error, etc).
>> Thus, the approach similar to the Stas wrote should work and, being
>> optimised for a small number of error rows, should not
>> affect COPY performance in such case.
>>
>> I will be glad to receive any critical remarks and suggestions.
>>
>
> I've following questions about your proposal.
>
> 1. Suppose we have to insert N records
>> 2. We create subtransaction with these N records
>> 3. Error is raised on k-th line
>> 4. Then, we can safely insert all lines from 1st and till (k - 1)
>>
> 5. Report, save to errors table or silently drop k-th line
>> 6. Next, try to insert lines from (k + 1) till N with another
>> subtransaction
>> 7. Repeat until the end of file
>
>
> Do you assume that we start new subtransaction in 4 since subtransaction
> we started in 2 is rolled back?
>
> I am planning to use background worker processes for parallel COPY
>> execution. Each process will receive equal piece of the input file. Since
>> file is splitted by size not by lines, each worker will start import from
>> the first new line to do not hit a broken line.
>
>
> I think that situation when backend is directly reading file during COPY
> is not typical.  More typical case is \copy psql command.  In that case
> "COPY ... FROM stdin;" is actually executed while psql is streaming the
> data.
> How can we apply parallel COPY in this case?
>
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>


Re: [HACKERS] Back-patch use of unnamed POSIX semaphores for Linux?

2016-12-07 Thread Alex Hunsaker
On Wed, Dec 7, 2016 at 3:42 PM, Tom Lane  wrote:

>
>
> Hmm ... after further experimentation, I still can't get this version of
> systemd (231) to do anything evil.  It turns out that Fedora ships it with
> KillUserProcesses turned off by default, and maybe having that on is a
> prerequisite for the other behavior?  But that doesn't make a lot of sense
> because we'd never be seeing the reports of databases moaning about lost
> semaphores if the processes got killed first.  Anyway, I see nothing bad
> happening if KillUserProcesses is off, while if it's on then the database
> gets shut down reasonably politely via SIGTERM.
>
> Color me confused ... maybe systemd's behavior has changed?
>

Hrm, the following incantation seems to break for me on a fresh Fedora 25
system:
1) As root su to $USER and start postgres.
2) ssh in as $USER and then logout
3) # psql localhost

FATAL: semctl(4980742, 3, SETVAL, 0) failed: Invalid argument
LOG: server process (PID 14569) exited with exit code 1
...


Re: [HACKERS] Back-patch use of unnamed POSIX semaphores for Linux?

2016-12-07 Thread Alex Hunsaker
On Wed, Dec 7, 2016 at 1:12 PM, Tom Lane  wrote:


> But this is all kind of moot if Peter is right that systemd will zap
> POSIX shmem along with SysV semaphores.  I've been trying to reproduce
> the issue on a Fedora 25 installation, and so far I can't get it to
> zap anything, so I'm a bit at a loss how to prove things one way or
> the other.
>


Don't know precisely about Fedora 25, but I've had success in the past with:
ssh in as the user
start postgres under tmux/screen
logout
do another ssh login/logout cycle

After logon, you should see "/usr/lib/systemd/systemd --user" running for
that
user. After logout out, said proc should exit. If either of those is not
true,
either systemd is not setup to track sessions (probably via pam) or it
thinks
you still have an active logon. Another way to check if systemd thinks the
user
is logged in is if /run/user/$USER exists.


Re: [HACKERS] Parallel sec scan in plpgsql

2016-09-20 Thread Alex Ignatov


On 18.09.2016 06:54, Amit Kapila wrote:

On Fri, Sep 16, 2016 at 8:48 PM, Alex Ignatov  wrote:


On 16.09.2016 16:50, Amit Kapila wrote:



Can you try by setting force_parallel_mode = off;?  I think it is
sending the whole function execution to worker due to
force_parallel_mode.




No changes:



Okay, it just skipped from my mind that we don't support parallel
queries for SQL statement execution (or statements executed via
exec_stmt_execsql) from plpgsql.  For detailed explanation of why that
is not feasible you can refer one of my earlier e-mails [1] on similar
topic.  I think if we can somehow get the results via Perform
statement, then it could be possible to use parallelism via plpgsql.

However, you can use it via SQL functions, an example is below:

set min_parallel_relation_size =0;
set parallel_tuple_cost=0;
set parallel_setup_cost=0;

Load 'auto_explain';
set auto_explain.log_min_duration = 0;
set auto_explain.log_analyze = true;
set auto_explain.log_nested_statements = true;

create table test_plpgsql(c1 int, c2 char(1000));
insert into test_plpgsql values(generate_series(1,10),'aaa');

create or replace function parallel_test_set_sql() returns
setof bigint as $$
select count(*) from test_plpgsql;
$$language sql PARALLEL SAFE STRICT STABLE;

Then execute function as: select * from parallel_test_set_sql();  You
can see below plan if auto_explain module is loaded.

Finalize Aggregate  (cost=14806.85..14806.86 rows=1 width=8) (actual tim
e=1094.966..1094.967 rows=1 loops=1)
  ->  Gather  (cost=14806.83..14806.84 rows=2 width=8) (actual time=472.
216..1094.943 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Partial Aggregate  (cost=14806.83..14806.84 rows=1 width=8)
(actual time=177.867..177.868 rows=1 loops=3)
  ->  Parallel Seq Scan on test_plpgsql  (cost=0.00..14702.6
7 rows=41667 width=0) (actual time=0.384..142.565 rows=3 loops=3)
CONTEXT:  SQL function "parallel_test_set_sql" statement 1
LOG:  duration: 2965.040 ms  plan:
Query Text: select * from parallel_test_set_sql();
Function Scan on parallel_test_set_sql  (cost=0.25..10.25 rows=1000 widt
h=8) (actual time=2538.620..2776.955 rows=1 loops=1)


In general, I think we should support the cases as required (or
written) by you from plpgsql or sql functions.  We need more work to
support such cases. There are probably two ways of supporting such
cases, we can build some intelligence in plpgsql execution such that
it can recognise such queries and allow to use parallelism or we need
to think of enabling parallelism for cases where we don't run the plan
to completion.  Most of the use cases from plpgsql or sql function
fall into later category as they don't generally run the plan to
completion.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1K8kaO_jRk42-o2rmhSRbKV-3mR%2BiNVcONLdbcSXW5TfQ%40mail.gmail.com



Thank you for you sugestion! That works.

But what  we can do with this function:
create or replace function parallel_test_sql(t int) returns setof bigint as
$$
   select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 
group by a,b,c,d,e)t;
$$ language sql PARALLEL SAFE STRICT STABLE;

explain (analyze,buffers) select  * from  parallel_test_sql(2);

"Function Scan on parallel_test_sql  (cost=0.25..10.25 rows=1000 width=8) (actual 
time=2410.789..2410.790 rows=1 loops=1)"
"  Buffers: shared hit=63696"
"Planning time: 0.082 ms"
"Execution time: 2410.841 ms"

2016-09-20 14:09:04 MSK [13037]: [75-1] user=ipdr,db=ipdr,app=pgAdmin III - 
Query Tool,client=127.0.0.1 LOG:  duration: 2410.135 ms  plan:
Query Text:
   select count(*) from (select a,b,c,d,e,sum(bytes) from test where 
a>= $1 group by a,b,c,d,e)t;

Aggregate  (cost=230701.42..230701.43 rows=1 width=8)
  ->  HashAggregate  (cost=230363.59..230513.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
->  Seq Scan on test  (cost=0.00..188696.44 rows=372 
width=20)
      Filter: (a >= $1)


No parallelism again. Looks like that Filter: (a >= $1)  breaks parallelism



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel sec scan in plpgsql

2016-09-16 Thread Alex Ignatov


On 16.09.2016 16:50, Amit Kapila wrote:

On Fri, Sep 16, 2016 at 6:57 PM, Alex Ignatov  wrote:

No it doesn't.
Paralleling neither sql function nor plpgsql:
Here is example :

ipdr=> show max_worker_processes ;
 max_worker_processes
--
 128
(1 row)
ipdr=> set max_parallel_workers_per_gather to 128;
SET
ipdr=> set force_parallel_mode=on;
SET
ipdr=> set min_parallel_relation_size =0;
SET
ipdr=> set parallel_tuple_cost=0;
SET



Can you try by setting force_parallel_mode = off;?  I think it is
sending the whole function execution to worker due to
force_parallel_mode.




No changes:

ipdr=> set max_parallel_workers_per_gather to 128;
SET
ipdr=> set min_parallel_relation_size =0;
SET
ipdr=> set parallel_tuple_cost=0;
SET
ipdr=> set force_parallel_mode = off;
SET
ipdr=> select name,setting from pg_settings where name 
in('max_parallel_workers_per_gather',
ipdr(>
'min_parallel_relation_size',
ipdr(>'parallel_tuple_cost',
ipdr(>
'force_parallel_mode');
  name   | setting
-+-
 force_parallel_mode | off
 max_parallel_workers_per_gather | 128
 min_parallel_relation_size  | 0
 parallel_tuple_cost | 0
(4 rows)

ipdr=> explain (analyze,buffers) select count(*) from (select 
a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;
   QUERY PLAN
-
 Aggregate  (cost=87702.33..87702.34 rows=1 width=8) (actual 
time=709.643..709.643 rows=1 loops=1)
   Buffers: shared hit=65015
   ->  Finalize HashAggregate  (cost=87364.49..87514.64 rows=15015 width=28) 
(actual time=706.382..708.456 rows=15015 loops=1)
 Group Key: test.a, test.b, test.c, test.d, test.e
 Buffers: shared hit=65015
 ->  Gather  (cost=85149.78..85299.93 rows=165165 width=20) (actual 
time=478.626..645.209 rows=180180 loops=1)
   Workers Planned: 11
   Workers Launched: 11
   Buffers: shared hit=65015
   ->  Partial HashAggregate  (cost=84149.78..84299.93 rows=15015 
width=20) (actual time=473.890..478.309 rows=15015 loops=12)
 Group Key: test.a, test.b, test.c, test.d, test.e
 Buffers: shared hit=63695
 ->  Parallel Seq Scan on test  (cost=0.00..72786.01 
rows=909101 width=20) (actual time=0.021..163.120 rows=83 loops=12)
   Buffers: shared hit=63695
 Planning time: 0.318 ms
 Execution time: 710.600 ms
(16 rows)


ipdr=> explain (analyze,buffers) select  parallel_test_plpgsql();
QUERY PLAN
--
 Result  (cost=0.00..0.26 rows=1 width=8) (actual time=4003.719..4003.720 
rows=1 loops=1)
   Buffers: shared hit=63869
 Planning time: 0.021 ms
 Execution time: 4003.769 ms
(4 rows)

auto_explain:
2016-09-16 18:02:29 MSK [29353]: [53-1] 
user=ipdr,db=ipdr,app=psql,client=[local] LOG:  duration: 4001.275 ms  plan:
Query Text: select count(*)  from (select a,b,c,d,e,sum(bytes) 
from test group by a,b,c,d,e)t
Aggregate  (cost=289035.43..289035.44 rows=1 width=8)
  ->  HashAggregate  (cost=288697.59..288847.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
->  Seq Scan on test  (cost=0.00..163696.15 rows=1115 
width=20)
2016-09-16 18:02:29 MSK [29353]: [54-1] user=ipdr,db=ipdr,app=psql,client=[local] 
CONTEXT:  SQL statement "select count(*)  from (select a,b,c,d,e,sum(bytes) 
from test group by a,b,c,d,e)t"
PL/pgSQL function parallel_test_plpgsql() line 5 at SQL statement


ipdr=> explain (analyze,buffers) select  parallel_test_plpgsql();
QUERY PLAN
--
 Result  (cost=0.00..0.26 rows=1 width=8) (actual time=4497.820..4497.822 
rows=1 loops=1)
   Buffers: shared hit=63695
 Planning time: 0.023 ms
 Execution time: 4497.872 ms
(4 rows)

auto_explain:
2016-09-16 18:03:23 MSK [29353]: [57-1] 
user=ipdr,db=ipdr,app=psql,client=[local] LOG:  duration: 4497.050 ms  plan:
Query Text: select count(*)  from (select a,b,c,d,e,sum(bytes) 
from test group by a,b,c,d,e)t
Aggregate  (cost=289035.43..289035.44 rows=1 width=8)
  ->  HashAggregate  (cost=288697.59..288847.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e

Re: [HACKERS] Parallel sec scan in plpgsql

2016-09-16 Thread Alex Ignatov
 Aggregate  (cost=289035.43..289035.44 rows=1 width=8)
  ->  HashAggregate  (cost=288697.59..288847.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
->  Seq Scan on test  (cost=0.00..163696.15 rows=1115 
width=20)



So as we can see parallel secscan doesn't works in plpgsql and sql functions.
Can somebody explains me where I was wrong?


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


On 16.09.2016 07:27, Ashutosh Bapat wrote:

On Thu, Sep 15, 2016 at 9:15 PM, Alex Ignatov  wrote:

Hello!
Does parallel secscan works in plpgsql?



Parallel seq scan is a query optimization that will work independent
of the source of the query - i.e whether it comes directly from a
client or a procedural language like plpgsql. So, I guess, answer to
your question is yes. If you are expecting something else, more
context will help.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel sec scan in plpgsql

2016-09-15 Thread Alex Ignatov

Hello!
Does parallel secscan works in plpgsql?

--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2016-08-23 Thread Alex Ignatov


On 23.08.2016 15:41, Michael Paquier wrote:

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada  wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.


Rotating disks is not a problem - you can always raid them and etc. 8k 
allocation per relation  once per half an hour that is the problem. Seq 
scan is this way = random scan...



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we lost Uber as a user

2016-07-28 Thread Alex Ignatov


On 28.07.2016 17:53, Vladimir Sitnikov wrote:



>> That's a recipe for runaway table bloat; VACUUM can't do much
because
>> there's always some minutes-old transaction hanging around (and
SNAPSHOT
>> TOO OLD doesn't really help, we're talking about minutes here), and
>> because of all of the indexes HOT isn't effective.


Just curious: what if PostgreSQL supported index that stores "primary 
key" (or unique key) instead of tids?
Am I right that kind of index would not suffer from that bloat? I'm 
assuming the primary key is not updated, thus secondary indices build 
in that way should be much less prone to bloat when updates land to 
other columns (even if tid moves, its PK does not change, thus 
secondary index row could be reused).


If that works, it could reduce index bloat, reduce the amount of WAL 
(less indices will need be updated). Of course it will make index scan 
a bit worse, however it looks like at least Uber is fine with that 
extra cost of index scan.


Does it make sense to implement that kind of index as an access method?

Vladimir


You mean IOT like Oracle have?

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Strange behavior of some volatile function like random(), nextval()

2016-06-29 Thread Alex Ignatov


On 29.06.2016 15:30, David G. Johnston wrote:

More specifically...
On Wed, Jun 29, 2016 at 7:34 AM, Michael Paquier 
mailto:michael.paqu...@gmail.com>>wrote:


On Wed, Jun 29, 2016 at 7:43 PM, Alex Ignatov
mailto:a.igna...@postgrespro.ru>> wrote:
> Hello!
>
> Got some strange behavior of random() function:
>
> postgres=# select (select random() ) from generate_series(1,10)
as i;
>   random
> ---
>  0.831577288918197
> [...]
> (10 rows)

I recall that this is treated as an implicit LATERAL, meaning that
random() is calculated only once.


A non-correlated (i.e., does not refer to outer variables) subquery 
placed into the target-list need only have its value computed once - 
so that is what happens.  The fact that a volatile function can return 
different values given the same arguments doesn't mean much when the 
function is only ever called a single time.​



> postgres=# select (select random()+i*0 ) from
generate_series(1,10) as i;
>   ?column?
> 
>0.97471913928166
> [...]
> (10 rows)

But not that. So those results do not surprise me.


​A correlated subquery, on the other hand, has to be called once for 
every row and is evaluated within the context supplied by said row​.  
Each time random is called it returns a new value.


Section 4.2.11 (9.6 docs)
https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SQL-SYNTAX-SCALAR-SUBQUERIES

Maybe this could be worded better but the first part talks about a 
single execution while "any one execution" is mentioned in reference 
to "the surrounding query".


​I do think that defining "correlated" and "non-correlated" subqueries 
within this section would be worthwhile.


David J.
​


In this subquery(below) we have reference to outer variables but it is 
not working as it should(or i dont understand something):


postgres=# postgres=# select id, ( select string_agg('a','') from 
generate_series(1,trunc(10*random()+1)::int) where id=id) from 
generate_series(1,10) as id;

 id | string_agg
+
  1 | aaa
  2 | aaa
...
but this query(with reference to outer var) working perfectly:
postgres=# select id,(select random() where id=id) from 
generate_series(1,10) as id;

 id |   random
+
  1 |  0.974509597290307
  2 |  0.219822214450687
...

Also this query  is working good( (id-id) do the job):
postgres=# select id, ( select string_agg('a','') from 
generate_series(1,trunc(10*random()+1)::int+(id-id)) ) from 
generate_series(1,10) as id;

 id | string_agg
+
  1 | aaa
  2 | a
...

It means that even reference to outer variables  doesn't mean that 
executor execute volatile function from subquery every time. Or there is 
something else what i should know?



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




[HACKERS] Strange behavior of some volatile function like random(), nextval()

2016-06-29 Thread Alex Ignatov

Hello!

Got some strange behavior of random() function:

postgres=# select (select random() ) from generate_series(1,10) as i;
  random
---
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
 0.831577288918197
(10 rows)

postgres=# select (select random()+i*0 ) from generate_series(1,10) as i;
  ?column?

   0.97471913928166
 0.0532126761972904
  0.331358563620597
 0.0573496259748936
  0.321165383327752
   0.48836630070582
  0.444201893173158
 0.0729857799597085
  0.661443184129894
  0.706566562876105
(10 rows)

postgres=# explain select (select random() ) from generate_series(1,10) 
as i;

QUERY PLAN
--
 Function Scan on generate_series i  (cost=0.02..10.01 rows=1000 width=0)
   InitPlan 1 (returns $0)
 ->  Result  (cost=0.00..0.01 rows=1 width=0)
(3 rows)

postgres=# explain select (select random()+i*0 ) from 
generate_series(1,10) as i;

QUERY PLAN
--
 Function Scan on generate_series i  (cost=0.00..30.00 rows=1000 width=4)
   SubPlan 1
 ->  Result  (cost=0.00..0.02 rows=1 width=0)
(3 rows)

postgres=# \df+ random();
List of functions
   Schema   |  Name  | Result data type | Argument data types | Type  | 
Security | Volatility |  Owner   | Language | Source code | Description

++--+-++--++--+--+-+--
 pg_catalog | random | double precision | | normal 
| invoker  | volatile   | postgres | internal | drandom | random value

(1 row)


Also:

postgres=# create sequence test;
CREATE SEQUENCE
postgres=# SELECT (SELECT nextval('test')) FROM generate_series(1,10) as i;
 nextval
-
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
(10 rows)

postgres=# SELECT (SELECT nextval('test')+i*0) FROM 
generate_series(1,10) as i;

 ?column?
--
2
3
4
5
6
7
8
9
   10
   11
(10 rows)


postgres=# \df+ nextval() ;

List of functions
   Schema   |  Name   | Result data type | Argument data types | Type  
| Security | Volatility |  Owner   | Language | Source code | 
Description

+-+--+-++--++--+--+-+-
 pg_catalog | nextval | bigint   | regclass| normal 
| invoker  | volatile   | postgres | internal | nextval_oid | sequence 
next value

(1 row)


Both function is volatile so from docs :

"A VOLATILE function can do anything, including modifying the database. 
It can return different results on successive calls with the same 
arguments. The optimizer makes no assumptions about the behavior of such 
functions. A query using a volatile function will re-evaluate the 
function at every row where its value is needed."


Something wrong with executor? Is it bug or executor feature related 
with  subquery?


--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2016-06-24 Thread Alex Ignatov


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

On 20.06.2016 17:09, Albe Laurenz wrote:

Tom Lane wrote:

I don't necessarily have an opinion yet.  I would like to see more than
just an unsupported assertion about what Oracle's behavior is.  Also,
how should FM mode affect this?

I can supply what Oracle 12.1 does:

SQL> SELECT to_timestamp('2016-06-13 15:43:36', ' /MM/DD HH24:MI:SS') AS ts 
FROM dual;

TS

2016-06-13 15:43:36.0 AD

SQL> SELECT to_timestamp('2016-06-13 15:43:36', '/MM/DD  HH24:MI:SS') AS ts 
FROM dual;

TS

2016-06-13 15:43:36.0 AD

SQL> SELECT to_timestamp('2016-06-1315:43:36', '/MM/DD  HH24:MI:SS') AS 
ts FROM dual;

TS

2016-06-13 15:43:36.0 AD

(to_timestamp_tz behaves the same way.)

So Oracle seems to make no difference between one or more spaces.

Yours,
Laurenz Albe

Guys, do we need to change this behavior or may be you can tell me that 
is normal because this and this:


postgres=# SELECT TO_TIMESTAMP('2016-02-30 15:43:36', '-MM-DD 
HH24:MI:SS');

  to_timestamp

 2016-03-01 15:43:36+03
(1 row)

but on the other side we have :

postgres=# select '2016-02-30 15:43:36'::timestamp;
ERROR:  date/time field value out of range: "2016-02-30 15:43:36"
LINE 1: select '2016-02-30 15:43:36'::timestamp;

Another bug in to_timestamp/date()?


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-06-24 Thread Alex Ignatov


On 23.06.2016 20:40, Tom Lane wrote:

Robert Haas  writes:

On Thu, Jun 23, 2016 at 1:12 PM, David G. Johnston
 wrote:

My understanding is that is not going to change for 9.6.

That's exactly what is under discussion here.

I would definitely agree with David on that point.  Making to_timestamp
noticeably better on this score seems like a nontrivial project, and
post-beta is not the time for that sort of thing, even if we had full
consensus on what to do.  I'd suggest somebody work on a patch and put
it up for review in the next cycle.

Now, if you were to narrowly define the problem as "whether to skip
non-spaces for a space in the format", maybe that could be fixed
post-beta, but I think that's a wrongheaded approach.  to_timestamp's
issues with input that doesn't match the format are far wider than that.
IMO we should try to resolve the whole problem with one coherent change,
not make incremental incompatible changes at the margins.

At the very least I'd want to see a thought-through proposal that
addresses all three of these interrelated points:

* what should a space in the format match
* what should a non-space, non-format-code character in the format match
* how should we handle fields that are not exactly the width suggested
by the format

regards, tom lane


Totally agree that we need more discussion about error handling in this 
function!


Also this behavior is observed in to_date() and to_number() function:

postgres=# SELECT 
TO_DATE('2!0!1!6!0!6-/-/-/-/-/-/-1!/-/-/-/-/-/-/-3!', '-MM-DD');

  to_date

 0002-01-01
(1 row)

postgres=# postgres=# select to_number('1$#@!!,2,%,%4,5,@%5@4..8-', 
'999G999D9S');

 to_number
---
12
(1 row)

On the our side we have some discussions about to write a patch that 
will change this incorrect  behavior. So stay tuned.


--

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-06-23 Thread Alex Ignatov


On 23.06.2016 19:37, David G. Johnston wrote:
On Thu, Jun 23, 2016 at 12:16 PM, Alex Ignatov 
mailto:a.igna...@postgrespro.ru>>wrote:



On 23.06.2016 16:30, Bruce Momjian wrote:

On Thu, Jun 23, 2016 at 07:41:26AM +, amul sul wrote:

On Monday, 20 June 2016 8:53 PM, Alex Ignatov
mailto:a.igna...@postgrespro.ru>> wrote:


On 13.06.2016 18:52, amul sul wrote:

And it wont stop on some simple whitespace. By using
to_timestamp you
can get any output results by providing illegal input
parameters values:
postgres=# SELECT TO_TIMESTAMP('2016-06-13 99
:99:99', 'MMDD
HH24:MI:SS');
   to_timestamp

  2016-01-06 14:40:39+03

(1 row)

We do consume extra space from input string, but not if it
is in format string, see below:

postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36',
'/MM/DD HH24:MI:SS');
to_timestamp

2016-06-13 15:43:36-07
(1 row)

We should have same treatment for format string too.

Thoughts? Comments?

Well, the user specifies the format string, while the input
string comes
from the data, so I don't see having them behave the same as
necessary.


To be honest they not just behave differently. to_timestamp is
just incorrectly  handles input data and nothing else.There is no
excuse for such behavior:

postgres=# SELECT TO_TIMESTAMP('20:-16-06:13: 15_43:!36',
'/MM/DD HH24:MI:SS');
 to_timestamp
--
 0018-08-05 13:15:43+02:30:17
(1 row)


T
​o be honest I don't see how this is relevant to quoted content.  And 
you've already made this point quite clearly - repeating it isn't 
constructive.  This behavior has existed for a long time and I don't 
see that changing it is a worthwhile endeavor.  I believe a new 
function is required that has saner behavior. Otherwise given good 
input and a well-formed parse string the function does exactly what it 
is designed to do.  Avoid giving it garbage and you will be fine. 
Maybe wrap the call to the in a function that also checks for the 
expected layout and RAISE EXCEPTION if it doesn't match.


​David J.
​
​
Arguing just like that one can say that we don't even need exception 
like "division by zero". Just use well-formed numbers in denominator...
Input data  sometimes can be generated automagically. Without exception 
throwing debugging stored function containing to_timestamp can be painful.




Re: [HACKERS] Bug in to_timestamp().

2016-06-23 Thread Alex Ignatov


On 23.06.2016 16:30, Bruce Momjian wrote:

On Thu, Jun 23, 2016 at 07:41:26AM +, amul sul wrote:

On Monday, 20 June 2016 8:53 PM, Alex Ignatov  wrote:



On 13.06.2016 18:52, amul sul wrote:

And it wont stop on some simple whitespace. By using to_timestamp you
can get any output results by providing illegal input parameters values:
postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD
HH24:MI:SS');
   to_timestamp

  2016-01-06 14:40:39+03

(1 row)

We do consume extra space from input string, but not if it is in format string, 
see below:

postgres=# SELECT TO_TIMESTAMP('2016-06-13  15:43:36', '/MM/DD 
HH24:MI:SS');
to_timestamp

2016-06-13 15:43:36-07
(1 row)

We should have same treatment for format string too.

Thoughts? Comments?

Well, the user specifies the format string, while the input string comes
from the data, so I don't see having them behave the same as necessary.



To be honest they not just behave differently.  to_timestamp is just 
incorrectly  handles input data and nothing else.There is no excuse for 
such behavior:


postgres=# SELECT TO_TIMESTAMP('20:-16-06:13: 15_43:!36', '/MM/DD 
HH24:MI:SS');

 to_timestamp
------
 0018-08-05 13:15:43+02:30:17
(1 row)



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-06-20 Thread Alex Ignatov


On 13.06.2016 18:52, amul sul wrote:

Hi,

It's look like bug in to_timestamp() function when format string has more 
whitespaces compare to input string, see below:

Ex.1: Two white spaces before HH24 whereas one before input time string

postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', '/MM/DD  HH24:MI:SS');
to_timestamp

2016-06-13 05:43:36-07   <— incorrect time
(1 row)



Ex.2: One whitespace before  format string

postgres=# SELECT TO_TIMESTAMP('2016/06/13 15:43:36', ' /MM/DD HH24:MI:SS');
to_timestamp
--
0016-06-13 15:43:36-07:52:58  <— incorrect year
(1 row)



If there are one or more consecutive whitespace in the format, we should skip 
those as long as we could get an actual field.
Thoughts?
Thanks & Regards,
Amul Sul




From docs about to_timestamp() ( 
https://www.postgresql.org/docs/9.5/static/functions-formatting.html)
"These functions interpret input liberally, with minimal error checking. 
While they produce valid output, the conversion can yield unexpected 
results. For example, input to these functions is not restricted by 
normal ranges, thus to_date('20096040','MMDD') returns 2014-01-17 
rather than causing an error. Casting does not have this behavior."


And it wont stop on some simple whitespace. By using to_timestamp you 
can get any output results by providing illegal input parameters values:


postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYYMMDD 
HH24:MI:SS');

  to_timestamp

 2016-01-06 14:40:39+03
(1 row)


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-06-20 Thread Alex Ignatov


On 20.06.2016 16:36, Tom Lane wrote:

Robert Haas  writes:

On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas  wrote:

I think a space in the format string should skip a whitespace
character in the input string, but not a non-whitespace character.
It's my understanding that these functions exist in no small part for
compatibility with Oracle, and Oracle declines to skip the digit '1'
on the basis of an extra space in the format string, which IMHO is the
behavior any reasonable user would expect.

So Amul and I are of one opinion and Tom is of another.  Anyone else
have an opinion?

I don't necessarily have an opinion yet.  I would like to see more than
just an unsupported assertion about what Oracle's behavior is.  Also,
how should FM mode affect this?

regards, tom lane




Oracle:
SQL> SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS') 
from dual;

SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS') from dual
*
ERROR at line 1:
ORA-01843: not a valid month

PG:

postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD 
HH24:MI:SS');

  to_timestamp

 2016-01-06 14:40:39+03
(1 row)


I know about:
"These functions interpret input liberally, with minimal error checking. 
While they produce valid output, the conversion can yield unexpected 
results" from docs but by providing illegal input parameters  we have no 
any exceptions or errors about that.
I think that to_timestamp() need to has more format checking than it has 
now.


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why we don't have checksums on clog files

2016-06-06 Thread Alex Ignatov

Hello!

Why we don't have checksums on clog files.

We have checksum on pg_control, optional checksumming on data files, 
some form of checksumming on wal's. But why we don't have any 
checksumming on clogs. Corruptions on clogs lead to transaction 
visisbility problems and database consistency violation.


Can anybody explain this situation with clogs?


--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

2016-05-13 Thread Alex Ignatov


On 13.05.2016 9:39, Michael Paquier wrote:

Hi all,

Beginning a new thread because the ext4 issues are closed, and because
pg_basebackup data durability meritates a new thread. And in short
about the problem: pg_basebackup makes no effort in being sure that
the data it backs up is on disk, which is bad... One possible
recommendation is to use initdb -S after running pg_basebackup, but
making sure that data is on disk should be done before pg_basebackup
ends.

On Thu, May 12, 2016 at 8:09 PM, I wrote:

And actually this won't fly high if there is no equivalent of
walkdir() or if the fsync()'s are not applied recursively. On master
at least the refactoring had better be done cleanly first... For the
back branches, we could just have some recursive call like
fsync_recursively and keep that in src/bin/pg_basebackup. Andres, do
you think that this should be part of fe_utils or src/common/? I'd
tend to think the latter is more adapted as there is an equivalent in
the backend. On back-branches, we could just have something like
fsync_recursively that walks though the paths. An even more simple
approach would be to fsync() individually things that have been
written, but that would suck in performance.


So, attached are two patches that apply on HEAD to address the problem
of pg_basebackup that does not sync the data it writes. As
pg_basebackup cannot use directly initdb -S because, as a client-side
utility, it may be installed while initdb is not (see Fedora and
RHEL), I have refactored the code so as the routines in initdb.c doing
the fsync of PGDATA and other fsync stuff are in src/fe_utils/, and
this is 0001.

Patch 0002 is a set of fixes for pg_basebackup:
- In plain mode, fsync_pgdata is used so as all the tablespaces are
fsync'd at once. This takes care as well of the case where pg_xlog is
a symlink.
- In tar mode (no stdout), each tar file is synced individually, and
the base directory is synced once at the end.
In both cases, failures are not considered fatal.

With pg_basebackup -X and pg_receivexlog, the manipulation of WAL
files is made durable by using fsync and durable_rename where needed
(credits to Andres mainly for this part).

This set of patches is aimed only at HEAD. Back-patchable versions of
this patch would need to copy fsync_pgdata and friends into
streamutil.c for example.

I am adding that to the next CF for review as a bug fix.
Regards,





Hi!
Do we have any confidence that data file is not being corrupted? I.e 
contains some corrupted page? Can pg_basebackup check page checksum (db 
init with initdb -k) while backing up files?


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is pg_control file crashsafe?

2016-05-06 Thread Alex Ignatov


On 05.05.2016 7:16, Amit Kapila wrote:

On Wed, May 4, 2016 at 8:03 PM, Tom Lane mailto:t...@sss.pgh.pa.us>> wrote:
 >
 > Amit Kapila mailto:amit.kapil...@gmail.com>> writes:
 > > On Wed, May 4, 2016 at 4:02 PM, Alex Ignatov
mailto:a.igna...@postgrespro.ru>>
 > > wrote:
 > >> On 03.05.2016 2:17, Tom Lane wrote:
 > >>> Writing a single sector ought to be atomic too.
 >
 > >> pg_control is 8k long(i think it is legth of one page in default PG
 > >> compile settings).
 >
 > > The actual data written is always sizeof(ControlFileData) which
should be
 > > less than one sector.
 >
 > Yes.  We don't care what happens to the rest of the file as long as the
 > first sector's worth is updated atomically.  See the comments for
 > PG_CONTROL_SIZE and the code in ReadControlFile/WriteControlFile.
 >
 > We could change to a different PG_CONTROL_SIZE pretty easily, and there's
 > certainly room to argue that reducing it to 512 or 1024 would be more
 > efficient.  I think the motivation for setting it at 8K was basically
 > "we're already assuming that 8K writes are efficient, so let's assume
 > it here too".  But since the file is only written once per checkpoint,
 > efficiency is not really a key selling point anyway.  If you could make
 > an argument that some other size would reduce the risk of failures,
 > it would be interesting --- but I suspect any such argument would be
 > very dependent on the quirks of a specific file system.
 >

How about using 512 bytes as a write size and perform direct writes
rather than going via OS buffer cache for control file?   Alex, is the
issue reproducible (to ensure that if we try to solve it in some way, do
we have way to test it as well)?

 >
 > One point worth considering is that on most file systems, rewriting
 > a fraction of a page is *less* efficient than rewriting a full page,
 > because the kernel first has to read in the old contents to fill
 > the disk buffer it's going to partially overwrite with new data.
 > This motivates against trying to reduce the write size too much.
 >

Yes, you are very much right and I have observed that recently during my
work on WAL Re-Writes [1].  However, I think that won't be the issue if
we use direct writes for control file.


[1] -
http://www.postgresql.org/message-id/CAA4eK1+=O33dZZ=jbtjxbfyd67r5dlcqfyomj4f-qmfxbp1...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>


Hi!
No issue happened only once. Also any attempts to reproduce it is not 
successful yet



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is pg_control file crashsafe?

2016-05-06 Thread Alex Ignatov


On 06.05.2016 0:42, Greg Stark wrote:

On 5 May 2016 12:32 am, "Tom Lane" mailto:t...@sss.pgh.pa.us>> wrote:
 >
 > To repeat, I'm pretty hesitant to change this logic.  While this is not
 > the first report we've ever heard of loss of pg_control, I believe I
could
 > count those reports without running out of fingers on one hand --- and
 > that's counting since the last century. It will take quite a lot of
 > evidence to convince me that some other implementation will be more
 > reliable.  If you just come and present a patch to use direct write, or
 > rename, or anything else for that matter, I'm going to reject it out of
 > hand unless you provide very strong evidence that it's going to be more
 > reliable than the current code across all the systems we support.

One thing we could do without much worry of being less reliable would be
to keep two copies of pg_control. Write one, fsync, then write to the
other and fsync that one.

Oracle keeps a copy of the old control file so that you can always go
back to an older version if a hardware or software bug currupts it. But
they keep a lot more data in their control file and they can be quite large.

Oracle can create more then one copy of control file. They are the same, 
not old copy and current. And their advise is just to store this copies 
on separate storage to be more fault tolerant.


PS By the way on my initial post about "is pg_control safe" i wrote in p 
3. some thoughts about multiple copies of pg_control file. Glad to see 
identity of views on this issue



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is pg_control file crashsafe?

2016-05-04 Thread Alex Ignatov



On 03.05.2016 2:17, Tom Lane wrote:

Alex Ignatov  writes:

I think that rename can help a little bit. At least on some FS it is
atomic operation.


Writing a single sector ought to be atomic too.  I'm very skeptical that
it'll be an improvement to just move the risk from one filesystem
operation to another; especially not to one where there's not even a
terribly portable way to request fsync.

regards, tom lane


pg_control is 8k long(i think it is legth of one page in default PG 
compile settings).
I also think that 8k recording can be atomic. Even if recording of one 
sector is atomic nobody can say about what sector from 8k record of 
pg_control  should be written first. It can be last sector or say sector 
number 10 from 16. That why i mentioned renaming from tmp file to 
pg_control. Renaming in FS usually is atomic operation. And after power 
loss we have either old version of pg_control or new version of it. But 
not torn pg_control file.



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is pg_control file crashsafe?

2016-05-04 Thread Alex Ignatov



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


On 03.05.2016 2:21, Andres Freund wrote:

Hi,

On 2016-04-28 21:58:00 +, Alex Ignatov wrote:

We have some issue with truncated pg_control file on Windows after
power failure.My questions is : 1) Is pg_control protected from say ,
power crash or partial write?


It should be. I think to make progress on this thread we're going to
need a bit more details about the exact corruption. Was the length of
the file change? Did the checksum fail? Did you just observe too old
contents?

Greetings,

Andres Freund




Length was 0 bytes after crash. It was Windows and ntfs + ssd in raid 1. 
File zeroed after power loss.


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is pg_control file crashsafe?

2016-05-02 Thread Alex Ignatov



On 01.05.2016 0:55, Bruce Momjian wrote:

On Thu, Apr 28, 2016 at 09:58:00PM +, Alex Ignatov wrote:

Hello everyone!
We have some issue with truncated pg_control file on Windows after power
failure.
My questions is :
1) Is pg_control protected from say , power crash or partial write?
2) How PG update pg_control? By writing in it or writing in some temp file and
after that rename it to pg_control to be atomic?

We write pg_controldata in one write() OS call:

 if (write(fd, buffer, PG_CONTROL_SIZE) != PG_CONTROL_SIZE)


3) Can PG have  multiple pg_control copy to be more fault tolerant?

PS During some experiments we found that at present time there is no any method
to do crash recovery with "restored" version of pg_control (based on some
manipulations with pg_resetxlog ).
  Only by using pg_resetxlog and setting it parameters to values taken from wal
file (pg_xlogdump)we can at least start PG and saw that PG state is at the
moment of last check point. But we have no real confidence that PG is in
consistent state(also docs on pg_resetxlogs told us about it too)

We have talked about improving the reliability of pg_control, but
failures are so rare we have never done anything to improve it.  I know
Tatsuo has talked about making pg_control more reliable, so I am CC'ing
him.


Oh! Good. Thank you!
It is rare but as we saw now it is our reality too. One of our customers 
had this issue on previous week =)


I think that rename can help a little bit. At least on some FS it is 
atomic operation.


--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Is pg_control file crashsafe?

2016-04-28 Thread Alex Ignatov
Hello everyone!
We have some issue with truncated pg_control file on Windows after power 
failure.My questions is : 1) Is pg_control protected from say , power crash or 
partial write? 2) How PG update pg_control? By writing in it or writing in some 
temp file and after that rename it to pg_control to be atomic?3) Can PG have  
multiple pg_control copy to be more fault tolerant?
PS During some experiments we found that at present time there is no any method 
to do crash recovery with "restored" version of pg_control (based on some 
manipulations with pg_resetxlog ). Only by using pg_resetxlog and setting it 
parameters to values taken from wal file (pg_xlogdump)we can at least start PG 
and saw that PG state is at the moment of last check point. But we have no real 
confidence that PG is in consistent state(also docs on pg_resetxlogs told us 
about it too)

Alex IgnatovPostgres Professional: http://www.postgrespro.comRussian Postgres 
Company



Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-03 Thread Alex Shulgin
On Mon, Apr 4, 2016 at 1:06 AM, Tom Lane  wrote:

> Alex Shulgin  writes:
> > On Sun, Apr 3, 2016 at 10:53 PM, Tom Lane  wrote:
> >> The reason for checking toowide_cnt is that if it's greater than zero,
> >> then in fact the track list does NOT include all values seen, and it's
> >> flat-out wrong to claim that it is an exhaustive set of values.
>
> > But do we really state that with the short path?
>
> Well, yes: the point of the short path is that we're hypothesizing that
> the track list contains all values in the table, and that they should all
> be considered MCVs.  Another way to think about it is that if we didn't
> have the width-limit implementation restriction, those values would appear
> in the track list, almost certainly with count 1, and so we would not
> have taken the short path anyway.
>
> Now you can argue that the long path would have accepted all the real
> track-list entries as MCVs, and have rejected all these hypothetical
> count-1 entries for too-wide values, and so the end result would be the
> same.  But that gets back to the fact that that's not necessarily how
> the long path behaves, either today or with the proposed patch.
>

 Agreed.

The design intention was that the short path would handle columns
> with a finite, small set of values (think booleans or enums) where the
> ideal thing is that the table population is completely represented by
> the MCV list.  As soon as we have good reason to think that the MCV
> list won't represent the table contents fully, we should switch over
> to a different approach where we're trying to identify which sample
> values are common enough to justify putting in the MCV list.


This is a precious detail that I unfortunately couldn't find in any of the
sources of information available to me online. :-)

I don't have a habit of hanging on IRC channels, but now I wonder how
likely is it that I could learn this by just asking around on #postgresql
(or mailing you directly as the committer of this early implementation--is
that OK at all?)

Again, having this type of design decisions documented in the code might
save some time and confusion for the sociopath^W introvert-type of folks
like myself. ;-)

In that
> situation there are good reasons to not blindly fill the MCV list all
> the way to the stats-target limit, but to try to cut it off at the
> point of diminishing returns, so that the planner isn't saddled with
> a huge MCV list that doesn't really contain a lot of useful information.
>

This came to be my understanding also at some point.

So that's the logic behind there being two code paths with discontinuous
> behavior.  I'm not sure whether we need to try to narrow the discontinuity
> or whether it's OK to act that way and we just need to refine the decision
> rule about which path to take.  But anyway, comparisons of frequencies
> of candidate MCVs seem to me to make sense in a large-ndistinct scenario
> (where we have to be selective) but not a small-ndistinct scenario
> (where we should just take 'em all).
>

Yeah, this seems to be an open question.  And a totally new one to me in
the light of recent revelations.

>> The point of the original logic was to try to decide whether the
> >> values in the sample are significantly more common than typical values
> >> in the whole table population.  I think we may have broken that with
> >> 3d3bf62f3: you can't make any such determination if you consider only
> >> what's in the sample without trying to estimate what is not in the
> >> sample.
>
> > Speaking of rabbit holes...
> > I'm out of ideas, unfortunately.  We badly need more eyes/brainpower on
> > this, which is why I have submitted a talk proposal on this topic to
> > PGDay.ru this summer in St. Petersburg, fearing that it might be too late
> > to commit a satisfactory version during the current dev cycle for 9.6,
> and
> > in hope to draw at least some attention to it.
>
> If you're thinking it's too late to get more done for 9.6,


Not necessarily, but given the time constraints and some personal issues
that just keep popping up I'm not that optimistic as I was just 24h ago
anymore.


> I'm inclined to
> revert the aspect of 3d3bf62f3 that made us work from "d" (the observed
> number of distinct values in the sample) rather than stadistinct (the
> extrapolated estimate for the table).  On reflection I think that that's
> inconsistent with the theory behind the old MCV-cutoff rule.  It wouldn't
> matter if we were going to replace the cutoff rule with something else,
> but it's beginning to sound like that won't happen for 9.6.
>

Please feel free to do what you think is in the best interest of the people
preparing 9.6 for the freeze.  I'm not all that familiar with the process,
but I guess reverting this early might save some head-scratching if some
interesting interactions of this change combined with some others are going
to show up.

Cheers!
--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-03 Thread Alex Shulgin
On Sun, Apr 3, 2016 at 10:53 PM, Tom Lane  wrote:

> Alex Shulgin  writes:
> > This recalled observation can now also explain to me why in the
> regression
> > you've seen, the short path was not followed: my bet is that stadistinct
> > appeared negative.
>
> Yes, I think that's right.  The table under consideration had just a few
> live rows (I think 3), so that even though there was only one value in
> the sample, the "if (stats->stadistinct > 0.1 * totalrows)" condition
> succeeded.
>

Yeah, this part of the logic can be really surprising at times.

> Given that we change the logic in the complex path substantially, the
> > assumptions that lead to the "Take all MVCs" condition above might no
> > longer hold, and I see it as a pretty compelling argument to remove the
> > extra checks, thus keeping the only one: track_cnt == ndistinct.  This
> > should also bring the patch's effect more close to the thread's topic,
> > which is "More stable query plans".
>
> The reason for checking toowide_cnt is that if it's greater than zero,
> then in fact the track list does NOT include all values seen, and it's
> flat-out wrong to claim that it is an exhaustive set of values.
>

But do we really state that with the short path?

If there would be only one too wide value, it might be the only thing left
for the histogram in the end and will be discarded anyway, so from the end
result perspective there is no difference.

If there are multiple too wide values, they will be effectively discarded
by the histogram calculation part also, so again no difference from the
perspective of the end result.

The reason for the track_cnt <= num_mcv condition is that if that's not
> true, the track list has to be trimmed to meet the statistics target.
> Again, that's not optional.
>

Yes, but this check we only need in compute_distinct_stats() and we are
talking about compute_scalar_stats() now where track_cnt is always less
than or equals to num_mcv (again, please see at the bottom of the
thread-starting email), or is my analysis broken on this part?

I think the reasoning for having the stats->stadistinct > 0 test in there
> was that if we'd set it negative, then we think that the set of distinct
> values will grow --- which again implies that the set of values actually
> seen should not be considered exhaustive.


This is actually very neat.  So the idea here as I get it is that if we
have enough distinct values to suspect that more unique ones will be added
later as the table grows (which is a natural tendency with most of the
tables anyway), then at the moment the statistics we produce are going to
be actually used by the planner, it is likely that we no longer cover all
the distinct values by the MCV list, right?

I would *love* to see that documented in code comments at the least.

Of course, with a table as
> small as that regression-test example, we have little evidence to support
> either that conclusion or its opposite.
>

I think it might be possible to record historical ndistinct values between
the ANALYZE runs and use that as better evidence that the number of
distincts is actually growing rather than basing that decision on that
hard-coded 10% limit rule.  What do you think?

We do not support migration of pg_statistic system table during major
version upgrades (yet), so if we somehow achieve what I've just described,
it might be not a compatibility-breaking change anyway.

It's possible that what we should do to eliminate the sudden change
> of behaviors is to drop the "track list includes all values seen, and all
> will fit" code path entirely, and always go through the track list
> one-at-a-time.
>

That could also be an option, that I have considered initially.  Now that I
read your explanation of each check, I'm not that sure anymore.

If we do, though, the currently-proposed filter rules aren't going to
> be too satisfactory: if we have a relatively small group of roughly
> equally common MCVs, this logic would reject all of them, which is
> surely not what we want.
>

Indeed. :-(


> The point of the original logic was to try to decide whether the
> values in the sample are significantly more common than typical values
> in the whole table population.  I think we may have broken that with
> 3d3bf62f3: you can't make any such determination if you consider only
> what's in the sample without trying to estimate what is not in the
> sample.
>

Speaking of rabbit holes...

I'm out of ideas, unfortunately.  We badly need more eyes/brainpower on
this, which is why I have submitted a talk proposal on this topic to
PGDay.ru this summer in St. Petersburg, fearing that it might be too late
to commit a satisfactory version during the current dev cycle for 9.6, and
in hope to draw at least some attention to it.

Regards,
--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-03 Thread Alex Shulgin
On Sun, Apr 3, 2016 at 8:24 AM, Alex Shulgin  wrote:
>
> On Sun, Apr 3, 2016 at 7:49 AM, Tom Lane  wrote:
>>
>> Alex Shulgin  writes:
>> > On Sun, Apr 3, 2016 at 7:18 AM, Tom Lane  wrote:
>> >> Well, we have to do *something* with the last (possibly only) value.
>> >> Neither "include always" nor "omit always" seem sane to me.  What
other
>> >> decision rule do you want there?
>>
>> > Well, what implies that the last value is somehow special?  I would
think
>> > we should just do with it whatever we do with the rest of the candidate
>> > MCVs.
>>
>> Sure, but both of the proposed decision rules break down when there are
no
>> values after the one under consideration.  We need to do something sane
>> there.
>
>
> Hm... There are indeed the case where it would beneficial to have at
least 2 values in the histogram (to have at least the low/high bounds for
inequality comparison selectivity) instead of taking both to the MCV list
or taking one to the MCVs and having to discard the other.

I was thinking about this in the background...

Popularity of the last sample value (which is not the only) one can be:

a) As high as 50%, in case we have an even division between the only two
values in the sample.  Quite obviously, we should take this one into the
MCV list (well, unless the user has specified statistics_target of 1 for
some bizarre reason, but that should not be our problem).

b) As low as 2/(statistics_target*300), which is with the target set to a
maximum allowed value of 10,000 amounts to 2/(10,000*300) = 1 in
1,500,000.  This seems like a really tiny number, but if your table has
some tens of billions of rows, for example, seeing such a value at least
twice means that it might correspond to some thousands of rows in the
table, whereas seeing a value only once might mean just that: it's a unique
value.

In this case, putting such a duplicate value in the MCV list will allow a
much better selectivity estimate for equality comparison, as I've mentioned
earlier.  It also allows for better estimate with inequality comparison,
since MCVs are also consulted in this case.  I see no good reason to
discard such a value.

c) Or anything in between the above figures.

In my opinion that amounts to "include always" being the sane option.  Do
you see anything else as a problem here?

> Obviously, we need a fresh idea on how to handle this.

On reflection, the case where we have a duplicate value in the track list
which is not followed by any other sample should be covered by the short
path where we put all the tracked values in the MCV list, so there should
be no point to even consider all of the above!

But the exact short path condition is formulated like this:

if (track_cnt == ndistinct && toowide_cnt == 0 &&
stats->stadistinct > 0 &&
track_cnt <= num_mcv)
{
/* Track list includes all values seen, and all will fit */

So the execution path here is additionally put in dependence of two
factors: whether we've seen at least one too wide sample or the distinct
estimation produced a number higher than 10% of the estimated total table
size (which is yet another arbitrary limit, but that's not in scope of this
patch).

I've been puzzled by these conditions a lot, as I have mentioned in the
last section of this thread's starting email and I could not find anything
that would hint why they exist there, in the documentation, code comments
or emails on hackers leading to the introduction of analyze.c in the form
we know it today.  Probably we will never know, unless Tom still has some
notes on this topic from 15 years ago. ;-)

This recalled observation can now also explain to me why in the regression
you've seen, the short path was not followed: my bet is that stadistinct
appeared negative.

Given that we change the logic in the complex path substantially, the
assumptions that lead to the "Take all MVCs" condition above might no
longer hold, and I see it as a pretty compelling argument to remove the
extra checks, thus keeping the only one: track_cnt == ndistinct.  This
should also bring the patch's effect more close to the thread's topic,
which is "More stable query plans".

Regards,
--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-03 Thread Alex Shulgin
On Sun, Apr 3, 2016, 18:40 Tom Lane  wrote:

> Alex Shulgin  writes:
>
> > Well, if it's the only value it will be accepted simply because we are
> > checking that special case already and don't even bother to loop through
> > the track list.
>
> That was demonstrably not the case in the failing regression test.
> I forget what aspect of the test case allowed it to get past the short
> circuit, but it definitely got into the scan-the-track-list code.
>

Hm, I'll have to see that for myself, probably there was something more to
it.

--
Alex


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-04-03 Thread Alex Shulgin
On Sun, Apr 3, 2016, 18:32 Tom Lane  wrote:

> Peter Geoghegan  writes:
> > On Thu, Feb 11, 2016 at 9:50 AM, Robert Haas 
> wrote:
> >> Wow, that's a fabulous idea.  I see Oleksandr has tried to implement
> >> it, although I haven't looked at the patch.  But I think this would be
> >> REALLY helpful.
>
> > +1
>
> Pushed.
>

Yay!


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-02 Thread Alex Shulgin
On Sun, Apr 3, 2016 at 7:49 AM, Tom Lane  wrote:

> Alex Shulgin  writes:
> > On Sun, Apr 3, 2016 at 7:18 AM, Tom Lane  wrote:
> >> Well, we have to do *something* with the last (possibly only) value.
> >> Neither "include always" nor "omit always" seem sane to me.  What other
> >> decision rule do you want there?
>
> > Well, what implies that the last value is somehow special?  I would think
> > we should just do with it whatever we do with the rest of the candidate
> > MCVs.
>
> Sure, but both of the proposed decision rules break down when there are no
> values after the one under consideration.  We need to do something sane
> there.
>

Hm... There are indeed the case where it would beneficial to have at least
2 values in the histogram (to have at least the low/high bounds for
inequality comparison selectivity) instead of taking both to the MCV list
or taking one to the MCVs and having to discard the other.

Obviously, we need a fresh idea on how to handle this.

> For "the only value" case: we cannot build a histogram out of a single
> > value, so omitting it from MCVs is not a good strategy, ISTM.
> > From my point of view that amounts to "include always".
>
> If there is only one value, it will have 100% of the samples, so it would
> get included under just about any decision rule (other than "more than
> 100% of this value plus following values").  I don't think making sure
> this case works is sufficient to get us to a reasonable rule --- it's
> a necessary case, but not a sufficient case.
>

Well, if it's the only value it will be accepted simply because we are
checking that special case already and don't even bother to loop through
the track list.

--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-02 Thread Alex Shulgin
On Sun, Apr 3, 2016 at 7:18 AM, Tom Lane  wrote:

> Alex Shulgin  writes:
> > On Sun, Apr 3, 2016 at 3:43 AM, Alex Shulgin 
> wrote:
> >> I'm not sure yet about the 1% rule for the last value, but would also
> love
> >> to see if we can avoid the arbitrary limit here.  What happens with a
> last
> >> value which is less than 1% popular in the current code anyway?
>
> > Now that I think about it, I don't really believe this arbitrary
> heuristic
> > is any good either, sorry.
>
> Yeah, it was just a placeholder to produce a working patch.
>
> Maybe we could base this cutoff on the stats target for the column?
> That is, "1%" would be the right number if stats target is 100,
> otherwise scale appropriately.
>
> > What was your motivation to introduce some limit at the bottom anyway?
>
> Well, we have to do *something* with the last (possibly only) value.
> Neither "include always" nor "omit always" seem sane to me.  What other
> decision rule do you want there?
>

Well, what implies that the last value is somehow special?  I would think
we should just do with it whatever we do with the rest of the candidate
MCVs.

For "the only value" case: we cannot build a histogram out of a single
value, so omitting it from MCVs is not a good strategy, ISTM.

>From my point of view that amounts to "include always".  What problems do
you see with this approach exactly?

--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-02 Thread Alex Shulgin
On Sun, Apr 3, 2016 at 3:43 AM, Alex Shulgin  wrote:

>
> I'm not sure yet about the 1% rule for the last value, but would also love
> to see if we can avoid the arbitrary limit here.  What happens with a last
> value which is less than 1% popular in the current code anyway?
>

Tom,

Now that I think about it, I don't really believe this arbitrary heuristic
is any good either, sorry.  What if you have a value that is just a bit
under 1% popular, but is being used in 50% of your queries in WHERE clause
with equality comparison?  Without this value in the MCV list the planner
will likely use SeqScan instead of an IndexScan that might be more
appropriate here.  I think we are much better off if we don't touch this
aspect of the current code.

What was your motivation to introduce some limit at the bottom anyway?  If
that was to prevent accidental division by zero, then an explicit check on
denominator not being 0 seems to me like a better safeguard than this.

Regards.
--
Alex


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-04-02 Thread Alex Shulgin
On Sat, Apr 2, 2016 at 11:41 PM, Tom Lane  wrote:

> "Shulgin, Oleksandr"  writes:
> > On Mon, Mar 14, 2016 at 7:55 PM, Tom Lane  wrote:
> >> Yeah, I don't much like that either.  But I don't think we can avoid
> >> some refactoring there; as designed, conversion of an error message into
> >> user-visible form is too tightly tied to receipt of the message.
>
> > True.  Attached is a v2 which addresses all of the points raised earlier
> I
> > believe.
>
> I took a closer look at what's going on here and realized that actually
> it's not that hard to decouple the message-building routine from the
> PGconn state, because mostly it works with fields it's extracting out
> of the PGresult anyway.  The only piece of information that's lacking
> is conn->last_query.  I propose therefore that instead of doing it like
> this, we copy last_query into error PGresults.  This is strictly less
> added storage requirement than storing the whole verbose message would be,
> and it saves time compared to the v2 patch in the typical case where
> the application never does ask for an alternately-formatted error message.
> Plus we can actually support requests for any variant format, not only
> VERBOSE.
>
> Attached is a libpq-portion-only version of a patch doing it this way.
> I've not yet looked at the psql part of your patch.
>
> Comments?
>

Ah, neat, that's even better. :-)

What about regression tests?  My assumption was that we won't be able to
add them with the usual expected file approach, but that we also don't need
it that hard.  Everyone's in favor?

--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-02 Thread Alex Shulgin
On Sat, Apr 2, 2016 at 8:57 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:
> On Apr 2, 2016 18:38, "Tom Lane"  wrote:
>
>> I did not like the fact that the compute_scalar_stats logic
>> would allow absolutely anything into the MCV list once num_hist falls
>> below 2. I think it's important that we continue to reject values that
>> are only seen once in the sample, because there's no very good reason to
>> think that they are MCVs and not just infrequent values that by luck
>> appeared in the sample.
>
> In my understanding we only put a value in the track list if we've seen it
> at least twice, no?

This is actually the case for compute_scalar_stats, but not for
compute_distinct_stats.  In the latter case we can still have
track[i].count == 1, but we can also break out of the loop if we see the
first tracked item like that.

>> Before I noticed the regression failure, I'd been thinking that maybe
it'd
>> be better if the decision rule were not "at least 100+x% of the average
>> frequency of this value and later ones", but "at least 100+x% of the
>> average frequency of values after this one".
>
> Hm, sounds pretty similar to what I wanted to achieve, but better
> formalized.
>
>> With that formulation, we're
>> not constrained as to the range of x.  Now, if there are *no* values
after
>> this one, then this way needs an explicit special case in order not to
>> compute 0/0; but the preceding para shows that we need a special case for
>> the last value anyway.
>>
>> So, attached is a patch rewritten along these lines.  I used 50% rather
>> than 25% as the new cutoff percentage --- obviously it should be higher
>> in this formulation than before, but I have no idea if that particular
>> number is good or we should use something else.  Also, the rule for the
>> last value is "at least 1% of the non-null samples".  That's a pure guess
>> as well.
>>
>> I do not have any good corpuses of data to try this on.  Can folks who
>> have been following this thread try it on their data and see how it
>> does?  Also please try some other multipliers besides 1.5, so we can
>> get a feeling for where that cutoff should be placed.
>
> Expect me to run it on my pet db early next week. :-)

I was trying to come up with some examples where 50% could be a good or a
bad choice and then I noticed that we might be able to turn it it the other
way round: instead of inventing an arbitrary limit at the MCVs frequency we
could use the histogram as the criteria for a candidate MCV to be
considered "common enough".  If we can prove that the value would produce
duplicates in the histogram, we should rather put it in the MCV list
(unless it's already fully occupied, then we can't do anything).

A value is guaranteed to produce a duplicate if it has appeared at least
2*hfreq+1 times in the sample (hfreq from your version of the patch, which
is recalculated on every loop iteration).  I could produce an updated patch
on Monday or anyone else following this discussion should be able to do
that.

This approach would be a huge win in my opinion, because this way we can
avoid all the arbitrariness of that .25 / .50 multiplier.  Otherwise there
might be (valid) complaints that for some data .40 (or .60) is a better
fit, but we have already hard-coded something and there would be no easy
way to improve situation for some users while avoiding to break it for the
rest (unless we introduce a per-attribute configurable parameter like
statistics_target for this multiplier, which I'd like to avoid even
thinking about ;-)

While we don't (well, can't) build a histogram in the
compute_distinct_stats variant, we could also apply the above mind trick
there for the same reason and to make the output of both functions more
consistent (and to have less maintenance burden between the variants).  And
anyway it would be rather surprising to see that depending on the presence
of an order operator for the type, the resulting MCV lists after the
ANALYZE would be different (I mean not only due to the random nature of the
sample).

I'm not sure yet about the 1% rule for the last value, but would also love
to see if we can avoid the arbitrary limit here.  What happens with a last
value which is less than 1% popular in the current code anyway?

Cheers!
--
Alex


Re: [HACKERS] Sanity checking for ./configure options?

2016-03-14 Thread Alex Shulgin
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Looks good to me.  It only allows valid number between 1 and 65535, disallows 
leading zero, empty string, or non-digit chars.  Error messages looks good.

Marking this Ready for Committer.

--
Alex


The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] empty array case in plperl_ref_from_pg_array not handled correctly

2016-03-08 Thread Alex Hunsaker
On Mon, Mar 7, 2016 at 11:32 PM, Andres Freund  wrote:

> Hi,
>
> Per the new valgrind animal we get:
>
>
> http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=skink&dt=2016-03-08%2004%3A22%3A00
> 2016-03-08 05:56:05.566 UTC [56de6971.723:5] LOG:  statement: select
> plperl_sum_array('{}');
> ==1827== Invalid write of size 4
> ==1827==at 0x14E35DD1: plperl_ref_from_pg_array (plperl.c:1459)
>
>
[ I think you may have meant to CC me not Alexey K. I'm probably the person
responsible :D. ]

Indeed, I think the simplest fix is to make plperl_ref_from_pg_array()
return an "empty" array in that case. The attached fixes the valgrind
warning for me. (cassert enabled build on master).


>
> ISTM the assumption that an array always has a dimension is a bit more
> widespread... E.g. split_array() looks like it'd not work nicely with a
> zero dimensional array...
>

Hrm, plperl_ref_from_pg_array() is the only caller of split_array(), so I
added an Assert() to make its contract more clear.

Recursive calls to split_array() should be fine as nested 0 dim arrays get
collapsed down to single 0 dim array (I hand tested it nonetheless):
 # select ARRAY[ARRAY[ARRAY[]]]::int[];
 array
───
 {}
(1 row)

Looking around, everything that makes an array seems to pass through
plperl_ref_from_pg_array(), so I think once that is fixed all is good. I
also looked for dimensions and ARR_NDIM() just to make sure (didn't find
anything fishy).

Thanks,


plperl_array_valgrind.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove Windows crash dump support?

2015-12-22 Thread Alex Ignatov

On 22.12.2015 18:28, Magnus Hagander wrote:



On Tue, Dec 22, 2015 at 3:53 PM, Craig Ringer <mailto:cr...@2ndquadrant.com>> wrote:


On 22 December 2015 at 22:50, Craig Ringer mailto:cr...@2ndquadrant.com>> wrote:

Hi all

Back in 2010 I submitted a small feature to allow the creation
of minidumps when backends crashed; see
commit dcb09b595f88a3bca6097a6acc17bf2ec935d55f .

At the time Windows lacked useful support for postmortem
debugging and crash-dump management in the operating system
its self, especially for applications running as services.
That has since improved considerably.

The feature was also included in 9.4


Ahem. 9.1. This is what I get for multi-tasking between writing
this and packaging an extension for 9.4.


In which version(s) of Windows was this improvement added? I think 
that's really the part that matters here, not necessarily which 
version of PostgreSQL.



--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Hi all!
I think that you can debug crash dump since windbg exists.
Also I think that Postgres on Windows number  of instalations is so 
tiny  because people even today think that it is not so solid as unix 
version thats why you think that nobody use your code ;).


Today if my memory serves me right this code can not deal with buffer 
overflow. Am i right?
May be we need to add this functionality instead of drop support of it 
entirely?


--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] error message diff with Perl 5.22.0

2015-07-01 Thread Alex Hunsaker
On Sat, Jun 6, 2015 at 7:03 PM, Peter Eisentraut  wrote:

> With the recently released Perl 5.22.0, the tests fail thus:
>
> -ERROR:  Global symbol "$global" requires explicit package name at line 3.
> -Global symbol "$other_global" requires explicit package name at line 4.
> +ERROR:  Global symbol "$global" requires explicit package name (did you
> forget to declare "my $global"?) at line 3.
> +Global symbol "$other_global" requires explicit package name (did you
> forget to declare "my $other_global"?) at line 4.
>  CONTEXT:  compilation of PL/Perl function "uses_global"
>
>
FWIW the committed expected file seems fine to me. There is not a perl
option to toggle this behavior (and even if there was, I think the
resulting changes to pl/perl functions we be quite ugly).

(What about the back branches? :D)


Re: [HACKERS] Precedence of standard comparison operators

2015-03-10 Thread Alex Hunsaker
On Tue, Mar 10, 2015 at 10:11 AM, Tom Lane  wrote:

> I wrote:
>
> This thread seems to have died off without any clear resolution.  I'd
> hoped somebody would try the patch on some nontrivial application to
> see if it broke anything or caused any warnings, but it doesn't seem
> like that is happening.
>
>
I tried it on a fairly typical web application. Around 5000 or so distinct
statements according to pg_stat_statements. With
operator_precedence_warning = on, no warnings yet.


[HACKERS] POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2014-12-25 Thread Alex Shulgin
Trent Shipley  writes:

> On Friday 2007-12-14 16:22, Tom Lane wrote:
>> Neil Conway  writes:
>> > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
>> > to drop (and log) rows that contain malformed data. That is, rows with
>> > too many or too few columns, rows that result in constraint violations,
>> > and rows containing columns where the data type's input function raises
>> > an error. The last case is the only thing that would be a bit tricky to
>> > implement, I think: you could use PG_TRY() around the InputFunctionCall,
>> > but I guess you'd need a subtransaction to ensure that you reset your
>> > state correctly after catching an error.
>>
>> Yeah.  It's the subtransaction per row that's daunting --- not only the
>> cycles spent for that, but the ensuing limitation to 4G rows imported
>> per COPY.
>
> You could extend the COPY FROM syntax with a COMMIT EVERY n clause.  This 
> would help with the 4G subtransaction limit.  The cost to the ETL process is 
> that a simple rollback would not be guaranteed send the process back to it's 
> initial state.  There are easy ways to deal with the rollback issue though.  
>
> A {NO} RETRY {USING algorithm} clause might be useful.   If the NO RETRY 
> option is selected then the COPY FROM can run without subtransactions and in 
> excess of the 4G per transaction limit.  NO RETRY should be the default since 
> it preserves the legacy behavior of COPY FROM.
>
> You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not give 
> the 
> option of sending exceptions to a table since they are presumably malformed, 
> otherwise they would not be exceptions.  (Users should re-process exception 
> files if they want an if good then table a else exception to table b ...)
>
> EXCEPTIONS TO and NO RETRY would be mutually exclusive.
>
>
>> If we could somehow only do a subtransaction per failure, things would
>> be much better, but I don't see how.

Hello,

Attached is a proof of concept patch for this TODO item.  There is no
docs yet, I just wanted to know if approach is sane.

The added syntax is like the following:

  COPY [table] FROM [file/program/stdin] EXCEPTIONS TO [file or stdout]

The way it's done it is abusing Copy Both mode and from my limited
testing, that seems to just work.  The error trapping itself is done
using PG_TRY/PG_CATCH and can only catch formatting or before-insert
trigger errors, no attempt is made to recover from a failed unique
constraint, etc.

Example in action:

postgres=# \d test_copy2
  Table "public.test_copy2"
 Column |  Type   | Modifiers 
+-+---
 id | integer | 
 val| integer | 

postgres=# copy test_copy2 from program 'seq 3' exceptions to stdout;
1
NOTICE:  missing data for column "val"
CONTEXT:  COPY test_copy2, line 1: "1"
2
NOTICE:  missing data for column "val"
CONTEXT:  COPY test_copy2, line 2: "2"
3
NOTICE:  missing data for column "val"
CONTEXT:  COPY test_copy2, line 3: "3"
NOTICE:  total exceptions ignored: 3

postgres=# \d test_copy1
  Table "public.test_copy1"
 Column |  Type   | Modifiers 
+-+---
 id | integer | not null

postgres=# set client_min_messages to warning;
SET
postgres=# copy test_copy1 from program 'ls /proc' exceptions to stdout;
...
vmstat
zoneinfo
postgres=# 

Limited performance testing shows no significant difference between
error-catching and plain code path.  For example, timing

  copy test_copy1 from program 'seq 100' [exceptions to stdout]

shows similar numbers with or without the added "exceptions to" clause.

Now that I'm sending this I wonder if the original comment about the
need for subtransaction around every loaded line still holds.  Any
example of what would be not properly rolled back by just PG_TRY?

Happy hacking!
--
Alex

>From 50f7ab0a503a0d61776add8a138abf2622fc6c35 Mon Sep 17 00:00:00 2001
From: Alex Shulgin 
Date: Fri, 19 Dec 2014 18:21:31 +0300
Subject: [PATCH] POC: COPY FROM ... EXCEPTIONS TO

---
 contrib/file_fdw/file_fdw.c |   4 +-
 src/backend/commands/copy.c | 251 +---
 src/backend/parser/gram.y   |  26 +++-
 src/bin/psql/common.c   |  14 +-
 src/bin/psql/copy.c | 119 ++-
 src/bin/psql/settings.h |   1 +
 src/bin/psql/startup.c  |   1 +
 src/bin/psql/tab-complete.c |  12 +-
 src/include/commands/copy.h |   3 +-
 src/include/nodes/parsenodes.h  |   1 +
 src/include/parser/kwlist.h |   1 +
 src/interfaces/ecpg/preproc/ecpg.addons |   2 +-
 12 files changed, 

Re: [HACKERS] Turning recovery.conf into GUCs

2014-12-24 Thread Alex Shulgin
Alex Shulgin  writes:

> Petr Jelinek  writes:
>>
>> I had a quick look, the patch does not apply cleanly anymore but it's
>> just release notes so nothing too bad.
>
> Yes, there were some ongoing changes that touched some parts of this and
> I must have missed the latest one (or maybe I was waiting for it to
> settle down).

The rebased version is attached.

>> - the StandbyModeRequested and StandbyMode should be lowercased like
>> the rest of the GUCs
>
> Yes, except that standby_mode is linked to StandbyModeRequested, not the
> other one.  I can try see if there's a sane way out of this.

As I see it, renaming these will only add noise to this patch, and there
is also ArchiveRecoveryRequested / InArchiveRecovery.  This is going to
be tricky and I'm not sure it's really worth the effort.

>> - The whole CopyErrorData and memory context logic is not needed in
>> check_recovery_target_time() as the FlushErrorState() is not called
>> there
>
> You must be right.  I recall having some trouble with strings being
> free'd prematurely, but if it's ERROR, then there should be no need for
> that.  I'll check again.

Hrm, after going through this again I'm pretty sure that was correct:
the only way to obtain the current error message is to use
CopyErrorData(), but that requires you to switch back to non-error
memory context (via Assert).

The FlushErrorState() call is not there because it's handled by the hook
caller (or it can exit via ereport() with elevel==ERROR).

>> - The new code in StartupXLOG() like
>> +if (recovery_target_xid_string != NULL)
>> +InitRecoveryTarget(RECOVERY_TARGET_XID);
>> +
>> +if (recovery_target_time_string != NULL)
>> +InitRecoveryTarget(RECOVERY_TARGET_TIME);
>> +
>> +if (recovery_target_name != NULL)
>> +InitRecoveryTarget(RECOVERY_TARGET_NAME);
>> +
>> +if (recovery_target_string != NULL)
>> +InitRecoveryTarget(RECOVERY_TARGET_IMMEDIATE);
>>
>> would probably be better in separate function that you then call
>> StartupXLOG() as StartupXLOG() is crazy long already so I think it's
>> preferable to not make it worse.
>
> We can move it at top of CheckStartingAsStandby() obviously.

Moved.

--
Alex



recovery_guc_v5.6.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2014-12-23 Thread Alex Shulgin

Petr Jelinek  writes:
>
> I had a quick look, the patch does not apply cleanly anymore but it's
> just release notes so nothing too bad.

Yes, there were some ongoing changes that touched some parts of this and
I must have missed the latest one (or maybe I was waiting for it to
settle down).

> I did not do any testing yet, but I have few comments about the code:
>
> - the patch mixes functionality change with the lowercasing of the
> config variables, I wonder if those could be separated into 2 separate
> diffs - it would make review somewhat easier, but I can live with it
> as it is if it's too much work do separate into 2 patches

If we can get the lowercasing committed soon, I'd be in favor of that,
otherwise it doesn't sound as pleasing, and there's some renaming to be
made too (that standby.enabled, trigger_file, etc.)

> - the StandbyModeRequested and StandbyMode should be lowercased like
> the rest of the GUCs

Yes, except that standby_mode is linked to StandbyModeRequested, not the
other one.  I can try see if there's a sane way out of this.

> - I am wondering if StandbyMode and hot_standby should be merged into
> one GUC if we are breaking backwards compatibility anyway

I also have the feeling that there's room for merging some knobs
together.

> - Could you explain the reasoning behind:
> + if ((*newval)[0] == 0)
> + xid = InvalidTransactionId;
> + else
>
> in check_recovery_target_xid() (and same check in
> check_recovery_target_timeline()), isn't the strtoul which is called
> later enough?

Well, that makes it a bit more apparent to the reader I think and we're
not abusing the fact that InvalidTransactionId equals zero.

> - The whole CopyErrorData and memory context logic is not needed in
> check_recovery_target_time() as the FlushErrorState() is not called
> there

You must be right.  I recall having some trouble with strings being
free'd prematurely, but if it's ERROR, then there should be no need for
that.  I'll check again.

> - The new code in StartupXLOG() like
> + if (recovery_target_xid_string != NULL)
> + InitRecoveryTarget(RECOVERY_TARGET_XID);
> +
> + if (recovery_target_time_string != NULL)
> + InitRecoveryTarget(RECOVERY_TARGET_TIME);
> +
> + if (recovery_target_name != NULL)
> + InitRecoveryTarget(RECOVERY_TARGET_NAME);
> +
> + if (recovery_target_string != NULL)
> + InitRecoveryTarget(RECOVERY_TARGET_IMMEDIATE);
>
> would probably be better in separate function that you then call
> StartupXLOG() as StartupXLOG() is crazy long already so I think it's
> preferable to not make it worse.

We can move it at top of CheckStartingAsStandby() obviously.

> - I wonder if we should rename trigger_file to standby_tigger_file

I was also suggesting that, just not sure if mixing it into the same
patch is any good.

Thank you for the review!
--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload

2014-12-19 Thread Alex Shulgin

Craig Ringer  writes:

> On 12/19/2014 11:41 PM, Alex Shulgin wrote:
>> I don't think so.  The scenario this patch relies on assumes that the
>> DBA will remember to look in the log if something goes wrong
>
> Well, actually, the whole point was that the user who's connecting
> (likely also the "DBA") will see a HINT telling them that there's more
> in the logs.
>
> But that got removed.

While it sounds useful at first glance, I believe Peter's arguments
upthread provide enough justification to avoid sending the hint to the
client.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload

2014-12-19 Thread Alex Shulgin

Steve Singer  writes:

> On 12/15/2014 11:38 AM, Alex Shulgin wrote:
>
>> These are all valid concerns IMHO. Attached is the modified version
>> of the original patch by Craig, addressing the handling of the new
>> hint_log error data field and removing the client-side HINT. I'm
>> also moving this to the current CF. -- Alex
>>
>>
>
>
> The updated patch removes the client message. I feel this addresses
> Peter's concern.  The updated patch also addresses the missing free I
> mentioned in my original review.
>
> The patch applies cleanly to head,
>
> One thing I'm noticed while testing is that if you do the following
>
> 1. Edit pg_hba.conf  to allow a connection from somewhere
> 2. Attempt to connect, you get an error + the hind in the server log
> 3. Make another change to pg_hba.conf and introduce an error in the file
> 4. Attempt to reload the config files, pg_hba.conf the reload fails
> because of the above error
> 5. Attempt to connect you still can't connect since we have the
> original pg_hba.conf loaded
>
> You don't get the warning in step 5 since we update PgReloadTime even
> if the reload didn't work.
>
> Is this enough of a concern to justify the extra complexity in
> tracking the reload time of the pg_hba and pg_ident times directly?

I don't think so.  The scenario this patch relies on assumes that the
DBA will remember to look in the log if something goes wrong, and in
your case there would be a message like the following:

WARNING:  pg_hba.conf not reloaded

So an extra hint about file timestamp is unneeded.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: Track TRUNCATE via pgstat

2014-12-17 Thread Alex Shulgin

Alvaro Herrera  writes:

> Alex Shulgin wrote:
>
>> OK, I think I have now all bases covered, though the updated patch is
>> not that "pretty".
>> 
>> The problem is that we don't know in advance if the (sub)transaction is
>> going to succeed or abort, and in case of aborted truncate we need to
>> use the stats gathered prior to truncate.  Thus the need to track
>> insert/update/deletes that happened before first truncate separately.
>
> Ugh, this is messy indeed.  I grant that TRUNCATE is a tricky case to
> handle correctly, so some complexity is expected.  Can you please
> explain in detail how this works?

The main idea is that aborted transaction can leave dead tuples behind
(that is every insert and update), but when TRUNCATE is issued we need
to reset insert/update/delete counters to 0: otherwise we won't get
accurate live and dead counts at commit time.

If the transaction that issued TRUNCATE is instead aborted, the
insert/update counters that we were incrementing *after* truncate are
not relevant to accurate calculation of dead tuples in the original
relfilenode we are now back to due to abort.  We need the insert/updates
counts that happened *before* the first TRUNCATE, hence the need for
separate counters.

>> To the point of making a dedicated pgstat testing tool: let's have
>> another TODO item?
>
> Sure.

Added one.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: Track TRUNCATE via pgstat

2014-12-17 Thread Alex Shulgin
Alvaro Herrera  writes:
>
>> Even with the current approach of checking the stats after every
>> isolated case it's sometimes takes quite a little more than a glance
>> to verify correctness due to side-effects of rollback (ins/upd/del
>> counters are still updated), and the way stats are affecting the dead
>> tuples counter.
>
> Honestly I think pg_regress is not the right tool to test stat counter
> updates.  It kind-of works today, but only because we don't stress it
> too much.  If you want to create a new test framework for pgstats, and
> include some tests for truncate, be my guest.

OK, I think I have now all bases covered, though the updated patch is
not that "pretty".

The problem is that we don't know in advance if the (sub)transaction is
going to succeed or abort, and in case of aborted truncate we need to
use the stats gathered prior to truncate.  Thus the need to track
insert/update/deletes that happened before first truncate separately.

To the point of making a dedicated pgstat testing tool: let's have
another TODO item?

--
Alex

>From 0b3161191a3ddb999cd9d0da08e1b6088ce07a84 Mon Sep 17 00:00:00 2001
From: Alex Shulgin 
Date: Tue, 9 Dec 2014 16:35:14 +0300
Subject: [PATCH] WIP: track TRUNCATEs in pgstat transaction stats.

The n_live_tup and n_dead_tup counters need to be set to 0 after a
TRUNCATE on the relation.  We can't issue a special message to the stats
collector because the xact might be later aborted, so we track the fact
that the relation was truncated during the xact (and reset this xact's
insert/update/delete counters).  When xact is committed, we use the
`truncated' flag to reset the n_live_tup and n_dead_tup counters.
---
 src/backend/commands/tablecmds.c |  3 +
 src/backend/postmaster/pgstat.c  | 98 ++--
 src/include/pgstat.h | 14 ++--
 src/test/regress/expected/prepared_xacts.out | 50 ++
 src/test/regress/expected/stats.out  | 63 ++
 src/test/regress/sql/prepared_xacts.sql  | 27 
 src/test/regress/sql/stats.sql   | 68 +++
 7 files changed, 315 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
new file mode 100644
index 1e737a0..4f0e3d8
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 71,76 
--- 71,77 
  #include "parser/parse_type.h"
  #include "parser/parse_utilcmd.h"
  #include "parser/parser.h"
+ #include "pgstat.h"
  #include "rewrite/rewriteDefine.h"
  #include "rewrite/rewriteHandler.h"
  #include "rewrite/rewriteManip.h"
*** ExecuteTruncate(TruncateStmt *stmt)
*** 1224,1229 
--- 1225,1232 
  			 */
  			reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST);
  		}
+ 
+ 		pgstat_count_heap_truncate(rel);
  	}
  
  	/*
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
new file mode 100644
index f71fdeb..9d19cf9
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
*** typedef struct TwoPhasePgStatRecord
*** 199,206 
--- 199,210 
  	PgStat_Counter tuples_inserted;		/* tuples inserted in xact */
  	PgStat_Counter tuples_updated;		/* tuples updated in xact */
  	PgStat_Counter tuples_deleted;		/* tuples deleted in xact */
+ 	PgStat_Counter inserted_pre_trunc;	/* tuples inserted prior to truncate */
+ 	PgStat_Counter updated_pre_trunc;	/* tuples updated prior to truncate */
+ 	PgStat_Counter deleted_pre_trunc;	/* tuples deleted prior to truncate */
  	Oid			t_id;			/* table's OID */
  	bool		t_shared;		/* is it a shared catalog? */
+ 	bool		t_truncated;	/* was the relation truncated? */
  } TwoPhasePgStatRecord;
  
  /*
*** pgstat_count_heap_delete(Relation rel)
*** 1863,1868 
--- 1867,1919 
  	}
  }
  
+ static void
+ record_xact_truncate_stats(PgStat_TableXactStatus *trans)
+ {
+ 	if (!trans->truncated)
+ 	{
+ 		trans->inserted_pre_trunc = trans->tuples_inserted;
+ 		trans->updated_pre_trunc = trans->tuples_updated;
+ 		trans->deleted_pre_trunc = trans->tuples_deleted;
+ 	}
+ }
+ 
+ static void
+ restore_xact_truncate_stats(PgStat_TableXactStatus *trans)
+ {
+ 	if (trans->truncated)
+ 	{
+ 		trans->tuples_inserted = trans->inserted_pre_trunc;
+ 		trans->tuples_updated = trans->updated_pre_trunc;
+ 		trans->tuples_deleted = trans->deleted_pre_trunc;
+ 	}
+ }
+ 
+ /*
+  * pgstat_count_heap_truncate - update tuple counters due to truncate
+  */
+ void
+ pgstat_count_heap_truncate(Relation rel)
+ {
+ 	PgStat_TableStatus *pgstat_info = rel->pgstat_info;
+ 
+ 	if (pgstat_info != NULL)
+ 	{
+ 		/* We have to log the effect at the proper transactional level */
+ 		int			nest_level = GetCurren

Re: [HACKERS] REVIEW: Track TRUNCATE via pgstat

2014-12-16 Thread Alex Shulgin

Alvaro Herrera  writes:

> Alex Shulgin wrote:
>
>> Alvaro Herrera  writes:
>> >> 
>> >> Another idea would be exposing pgstat_report_stat(true) at SQL level.
>> >> That would eleminate the need for explicit pg_sleep(>=0.5), but we'll
>> >> still need the wait_for_... call to make sure the collector has picked
>> >> it up.
>> >
>> > We already have a stats test that sleeps.  Why not add this stuff there,
>> > to avoid making another test slow?
>> 
>> Well, if we could come up with a set of statements to test that would
>> produce the end result unambigously, so that we can be certain the stats
>> we check at the end cannot be a result of neat interaction of buggy
>> behavior...
>
> It is always possible that things work just right because two bugs
> cancel each other.
>
>> I'm not sure this is at all possible, but I know for sure it will make
>> debugging the possible fails harder.
>
> Surely if some other patch introduces a failure, nobody will try to
> debug it by looking only at the failures of this test.
>
>> Even with the current approach of checking the stats after every
>> isolated case it's sometimes takes quite a little more than a glance
>> to verify correctness due to side-effects of rollback (ins/upd/del
>> counters are still updated), and the way stats are affecting the dead
>> tuples counter.
>
> Honestly I think pg_regress is not the right tool to test stat counter
> updates.  It kind-of works today, but only because we don't stress it
> too much.  If you want to create a new test framework for pgstats, and
> include some tests for truncate, be my guest.

Yes, these are all good points.  Actually, moving the test to stats.sql
helped me realize the current patch behavior is not strictly correct:
there's a corner case when you insert/update before truncate in
transaction, which is later aborted.  I need to take a closer look.

--
Alex
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: Track TRUNCATE via pgstat

2014-12-16 Thread Alex Shulgin

Alvaro Herrera  writes:
>> 
>> Another idea would be exposing pgstat_report_stat(true) at SQL level.
>> That would eleminate the need for explicit pg_sleep(>=0.5), but we'll
>> still need the wait_for_... call to make sure the collector has picked
>> it up.
>
> We already have a stats test that sleeps.  Why not add this stuff there,
> to avoid making another test slow?

Well, if we could come up with a set of statements to test that would
produce the end result unambigously, so that we can be certain the stats
we check at the end cannot be a result of neat interaction of buggy
behavior...

I'm not sure this is at all possible, but I know for sure it will make
debugging the possible fails harder.  Even with the current approach of
checking the stats after every isolated case it's sometimes takes quite
a little more than a glance to verify correctness due to side-effects of
rollback (ins/upd/del counters are still updated), and the way stats are
affecting the dead tuples counter.

I'll try to see if the checks can be converged though.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: Track TRUNCATE via pgstat

2014-12-16 Thread Alex Shulgin
Jim Nasby  writes:

> https://commitfest.postgresql.org/action/patch_view?id=1661 (apologies for 
> not replying to the thread; I can't find it in my inbox)
>
> Patch applies and passes make check. Code formatting looks good.

Jim,

> The regression test partially tests this. It does not cover 2PC, nor
> does it test rolling back a subtransaction that contains a
> truncate. The latter actually doesn't work correctly.

Thanks for pointing out the missing 2PC test, I've added one.

The test you've added for rolling back a subxact actually works
correctly, if you consider the fact that aborted (sub)xacts still
account for insert/update/delete in pgstat.  I've added this test with
the corrected expected results.

> The test also adds 2.5 seconds of forced pg_sleep. I think that's both
> bad and unnecessary. When I removed the sleeps I still saw times of
> less than 0.1 seconds.

Well, I never liked that part, but the stats don't get updated if we
don't put the session to sleep for at least PGSTAT_STAT_INTERVAL (which
is 500ms).

Removing these extra sleep calls would theoretically not make a
difference as wait_for_trunc_test_stats() seems to have enough sleep
calls itself, but due to the pgstat_report_stat() being called from the
main loop only, there's no way short of placing the explicit pg_sleep at
top level, if we want to be able to check the effects reproducibly.

Another idea would be exposing pgstat_report_stat(true) at SQL level.
That would eleminate the need for explicit pg_sleep(>=0.5), but we'll
still need the wait_for_... call to make sure the collector has picked
it up.

> Also, wait_for_trunc_test_stats() should display something if it times
> out; otherwise you'll have a test failure and won't have any
> indication why.

Oh, good catch.  Since I've copied this function from stats.sql, we
might want to update that one too in a separate commit.

> I've attached a patch that adds logging on timeout and contains a test
> case that demonstrates the rollback to savepoint bug.

I'm attaching the updated patch version.

Thank you for the review!
--
Alex

PS: re: your CF comment: I'm producing the patches using

  git format-patch --ext-diff

where diff.external is set to '/bin/bash src/tools/git-external-diff'.

Now that I try to apply it using git, looks like git doesn't like the
copied context diff very much...

>From cc51823a01a194ef6fcd90bc763fa26498837322 Mon Sep 17 00:00:00 2001
From: Alex Shulgin 
Date: Tue, 9 Dec 2014 16:35:14 +0300
Subject: [PATCH] WIP: track TRUNCATEs in pgstat transaction stats.

The n_live_tup and n_dead_tup counters need to be set to 0 after a
TRUNCATE on the relation.  We can't issue a special message to the stats
collector because the xact might be later aborted, so we track the fact
that the relation was truncated during the xact (and reset this xact's
insert/update/delete counters).  When xact is committed, we use the
`truncated' flag to reset the n_live_tup and n_dead_tup counters.
---
 src/backend/commands/tablecmds.c |   3 +
 src/backend/postmaster/pgstat.c  |  52 -
 src/include/pgstat.h |   3 +
 src/test/regress/expected/prepared_xacts.out |  50 
 src/test/regress/expected/truncate.out   | 164 +++
 src/test/regress/sql/prepared_xacts.sql  |  27 +
 src/test/regress/sql/truncate.sql| 118 +++
 7 files changed, 414 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
new file mode 100644
index 1e737a0..4f0e3d8
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 71,76 
--- 71,77 
  #include "parser/parse_type.h"
  #include "parser/parse_utilcmd.h"
  #include "parser/parser.h"
+ #include "pgstat.h"
  #include "rewrite/rewriteDefine.h"
  #include "rewrite/rewriteHandler.h"
  #include "rewrite/rewriteManip.h"
*** ExecuteTruncate(TruncateStmt *stmt)
*** 1224,1229 
--- 1225,1232 
  			 */
  			reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST);
  		}
+ 
+ 		pgstat_count_heap_truncate(rel);
  	}
  
  	/*
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
new file mode 100644
index f71fdeb..b02e4a1
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
*** typedef struct TwoPhasePgStatRecord
*** 201,206 
--- 201,207 
  	PgStat_Counter tuples_deleted;		/* tuples deleted in xact */
  	Oid			t_id;			/* table's OID */
  	bool		t_shared;		/* is it a shared catalog? */
+ 	bool		t_truncated;	/* was the relation truncated? */
  } TwoPhasePgStatRecord;
  
  /*
*** pgstat_count_heap_delete(Relation rel)
*** 1864,

Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload

2014-12-15 Thread Alex Shulgin
Peter Eisentraut  writes:

> On 10/16/14 11:34 PM, Craig Ringer wrote:
>> psql: FATAL:  Peer authentication failed for user "fred"
>> HINT:  See the server error log for additional information.
>
> I think this is wrong for many reasons.
>
> I have never seen an authentication system that responds with, hey, what
> you just did didn't get you in, but the administrators are currently in
> the process of making a configuration change, so why don't you check
> that out.
>
> We don't know whether the user has access to the server log.  They
> probably don't.  Also, it is vastly more likely that the user really
> doesn't have access in the way they chose, so throwing in irrelevant
> hints will be distracting.
>
> Moreover, it will be confusing to regular users if this message
> sometimes shows up and sometimes doesn't, independent of their own state
> and actions.
>
> Finally, the fact that a configuration change is in progress is
> privileged information.  Unprivileged users can deduct from the presence
> of this message that administrators are doing something, and possibly
> that they have done something wrong.
>
> I think it's fine to log a message in the server log if the pg_hba.conf
> file needs reloading.  But the client shouldn't know about this at all.

These are all valid concerns IMHO.

Attached is the modified version of the original patch by Craig,
addressing the handling of the new hint_log error data field and
removing the client-side HINT.

I'm also moving this to the current CF.

--
Alex

>From 702e0ac31f3d8023ad8c064d90bdf5a8441fddea Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 17 Oct 2014 11:18:18 +0800
Subject: [PATCH 1/2] Add an errhint_log, akin to errdetail_log

This allows a different HINT to be sent to the server error log
and to the client, which will be useful where there's security
sensitive information that's more appropriate for a HINT than
a DETAIL message.
---
 src/backend/utils/error/elog.c | 59 --
 src/include/utils/elog.h   |  7 +
 2 files changed, 53 insertions(+), 13 deletions(-)

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
new file mode 100644
index 2316464..da55207
*** a/src/backend/utils/error/elog.c
--- b/src/backend/utils/error/elog.c
*** errfinish(int dummy,...)
*** 503,508 
--- 503,510 
  		pfree(edata->detail_log);
  	if (edata->hint)
  		pfree(edata->hint);
+ 	if (edata->hint_log)
+ 		pfree(edata->hint_log);
  	if (edata->context)
  		pfree(edata->context);
  	if (edata->schema_name)
*** errhint(const char *fmt,...)
*** 1015,1020 
--- 1017,1042 
  	return 0;	/* return value does not matter */
  }
  
+ /*
+  * errhint_log --- add a hint_log error message text to the current error
+  */
+ int
+ errhint_log(const char *fmt,...)
+ {
+ 	ErrorData  *edata = &errordata[errordata_stack_depth];
+ 	MemoryContext oldcontext;
+ 
+ 	recursion_depth++;
+ 	CHECK_STACK_DEPTH();
+ 	oldcontext = MemoryContextSwitchTo(edata->assoc_context);
+ 
+ 	EVALUATE_MESSAGE(edata->domain, hint_log, false, true);
+ 
+ 	MemoryContextSwitchTo(oldcontext);
+ 	recursion_depth--;
+ 	return 0;	/* return value does not matter */
+ }
+ 
  
  /*
   * errcontext_msg --- add a context error message text to the current error
*** CopyErrorData(void)
*** 1498,1503 
--- 1520,1527 
  		newedata->detail_log = pstrdup(newedata->detail_log);
  	if (newedata->hint)
  		newedata->hint = pstrdup(newedata->hint);
+ 	if (newedata->hint_log)
+ 		newedata->hint_log = pstrdup(newedata->hint_log);
  	if (newedata->context)
  		newedata->context = pstrdup(newedata->context);
  	if (newedata->schema_name)
*** FreeErrorData(ErrorData *edata)
*** 1536,1541 
--- 1560,1567 
  		pfree(edata->detail_log);
  	if (edata->hint)
  		pfree(edata->hint);
+ 	if (edata->hint_log)
+ 		pfree(edata->hint_log);
  	if (edata->context)
  		pfree(edata->context);
  	if (edata->schema_name)
*** ThrowErrorData(ErrorData *edata)
*** 1607,1612 
--- 1633,1640 
  		newedata->detail_log = pstrdup(edata->detail_log);
  	if (edata->hint)
  		newedata->hint = pstrdup(edata->hint);
+ 	if (edata->hint_log)
+ 		newedata->hint_log = pstrdup(edata->hint_log);
  	if (edata->context)
  		newedata->context = pstrdup(edata->context);
  	if (edata->schema_name)
*** ReThrowError(ErrorData *edata)
*** 1669,1674 
--- 1697,1704 
  		newedata->detail_log = pstrdup(newedata->detail_log);
  	if (newedata->hint)
  		newedata->hint = pstrdup(newedata->hint);
+ 	if (newedata->hint_log)
+ 		newedata->hint_log = pstrdup(newedata->hint_log);
 

Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-12-15 Thread Alex Shulgin
Michael Paquier  writes:
>>>
>>> Perhaps ssloptions.[ch], unless you plan to add non-option-related code
>>> there later?
>>
>> I don't think anything else than common options-related code fits in
>> there, so ssloptions.c makes sense to me.
>>
>>> BTW, there is no Regent code in your openssl.c, so the copyright
>>> statement is incorrect.
>>
>> Good catch, I just blindly copied that from some other file.
> There have been arguments in favor and against this patch... But
> marking it as returned with feedback because of a lack of activity in
> the last couple of weeks. Feel free to update if you think that's not
> correct, and please move this patch to commit fest 2014-12.

Attached is a new version that addresses the earlier feedback: renamed
the added *.[ch] files and removed incorrect copyright line.

I'm moving this to the current CF.

--
Alex

>From 18388300f9ed34fa47c66b8a2da098aeb19a7f71 Mon Sep 17 00:00:00 2001
From: Alex Shulgin 
Date: Wed, 19 Nov 2014 19:49:29 +0300
Subject: [PATCH] DRAFT: ssl_protocols GUC

---
 doc/src/sgml/config.sgml  |  29 ++
 doc/src/sgml/libpq.sgml   |  25 ++
 src/backend/libpq/Makefile|   2 +-
 src/backend/libpq/be-secure-openssl.c |  29 --
 src/backend/libpq/be-secure.c |   3 +-
 src/backend/libpq/ssloptions.c| 123 ++
 src/backend/utils/misc/guc.c  |  15 
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/libpq/libpq.h |   1 +
 src/include/libpq/ssloptions.h|  48 ++
 src/interfaces/libpq/Makefile |   8 +-
 src/interfaces/libpq/fe-connect.c |   4 +
 src/interfaces/libpq/fe-secure-openssl.c  |  18 +++-
 src/interfaces/libpq/libpq-int.h  |   1 +
 14 files changed, 297 insertions(+), 10 deletions(-)
 create mode 100644 src/backend/libpq/ssloptions.c
 create mode 100644 src/include/libpq/ssloptions.h

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index 48ae3e4..699f0f9
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** include_dir 'conf.d'
*** 1055,1060 
--- 1055,1089 

   
  
+  
+   ssl_protocols (string)
+   
+ssl_protocols configuration parameter
+   
+   
+   
+
+ Specifies a colon-separated list of SSL protocols that are
+ allowed to be used on secure connections. Each entry in the list can
+ be prefixed by a + (add to the current list)
+ or - (remove from the current list). If no prefix is used,
+ the list is replaced with the specified protocol.
+
+
+ The full list of supported protocols can be found in the
+ the openssl manual page.  In addition to the names of
+ individual protocols, the following keywords can be
+ used: ALL (all protocols supported by the underlying
+ crypto library), SSL (all supported versions
+ of SSL) and TLS (all supported versions
+ of TLS).
+
+
+ The default is ALL:-SSL.
+
+   
+  
+ 
   
ssl_ciphers (string)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
new file mode 100644
index d829a4b..62ee0b4
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
*** postgresql://%2Fvar%2Flib%2Fpostgresql/d
*** 1230,1235 
--- 1230,1250 

   
  
+  
+   sslprotocols
+   
+
+ Specifies a colon-separated list of SSL protocols that are
+ allowed to be used on secure connections.
+ See  server configuration parameter
+ for format.
+
+
+ Defaults to ALL:-SSL.
+
+   
+  
+ 
   
sslcompression

*** myEventProc(PGEventId evtId, void *evtIn
*** 6693,6698 
--- 6708,6723 
   
  
  
+ 
+  
+   
+PGSSLPROTOCOLS
+   
+   PGSSLPROTOCOLS behaves the same as the  connection parameter.
+  
+ 
+ 
  
   

diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile
new file mode 100644
index 09410c4..62abd46
*** a/src/backend/libpq/Makefile
--- b/src/backend/libpq/Makefile
*** OBJS = be-fsstubs.o be-secure.o auth.o c
*** 18,24 
 pqformat.o pqmq.o pqsignal.o
  
  ifeq ($(with_openssl),yes)
! OBJS += be-secure-openssl.o
  endif
  
  include $(top_srcdir)/src/backend/common.mk
--- 18,24 
 pqformat.o pqmq.o pqsignal.o
  
  ifeq ($(with_openssl),yes)
! OBJS += be-secure-openssl.o ssloptions.o
  endif
  
  include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-s

Re: [HACKERS] Turning recovery.conf into GUCs

2014-12-12 Thread Alex Shulgin
Alex Shulgin  writes:

> Alex Shulgin  writes:
>>
>> Here's an attempt to revive this patch.
>
> Here's the patch rebased against current HEAD, that is including the
> recently committed action_at_recovery_target option.
>
> The default for the new GUC is 'pause', as in HEAD, and
> pause_at_recovery_target is removed completely in favor of it.
>
> I've also taken the liberty to remove that part that errors out when
> finding $PGDATA/recovery.conf.  Now get your rotten tomatoes ready. ;-)

This was rather short-sighted, so I've restored that part.

Also, rebased on current HEAD, following the rename of
action_at_recovery_target to recovery_target_action.

--
Alex



recovery_guc_v5.5.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL information view

2014-12-11 Thread Alex Shulgin

Magnus Hagander  writes:
>>
>> You should add it to the next CF for proper tracking, there are already many
>> patches in the queue waiting for reviews :)
>
> Absolutely - I just wanted those that were already involved in the
> thread to get a chance to look at it early :) didn't want to submit it
> until it was complete.
>
> Which it is now - attached is a new version that includes docs.

Here's my review of the patch:

* Applies to the current HEAD, no failed hunks.
* Compiles and works as advertised.
* Docs included.


The following catches my eye:

psql (9.5devel)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 
256, compression: off)
Type "help" for help.

postgres=# select * from pg_stat_ssl;
 pid  | ssl | bits | compression | version |   cipher| 
clientdn 
--+-+--+-+-+-+--
 1343 | t   |  256 | f   | TLSv1.2 | ECDHE-RSA-AES256-GCM-SHA384 | 
(1 row)

I think the order of details in the psql prompt makes more sense,
because it puts more important details first.

I suggest we reorder the view columns, while also renaming 'version' to
'protocol' (especially since we have another patch in the works that
adds a GUC named 'ssl_protocols'):

  pid, ssl, protocol, cipher, bits, compression, clientdn.


Next, this one:

+ be_tls_get_cipher(Port *port, char *ptr, size_t len)
+ {
+   if (port->ssl)
+   strlcpy(ptr, SSL_get_cipher(port->ssl), NAMEDATALEN);

should be this:

+   strlcpy(ptr, SSL_get_cipher(port->ssl), len);

The same goes for this one:

+ be_tls_get_peerdn_name(Port *port, char *ptr, size_t len)
+ {
+   if (port->peer)
+   strlcpy(ptr, 
X509_NAME_to_cstring(X509_get_subject_name(port->peer)), NAMEDATALEN);

The NAMEDATALEN constant is passed in the calling code in
pgstat_bestart().


Other than that, looks good to me.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small TRUNCATE glitch

2014-12-10 Thread Alex Shulgin
Bruce Momjian  writes:

> On Wed, Dec 10, 2014 at 10:32:42AM +0200, Heikki Linnakangas wrote:
>> >I don't think we need to have 2PC files survive a pg_upgrade.  It seems
>> >perfectly okay to remove them from the new cluster at some appropriate
>> >time, *if* they are copied from the old cluster at all (I don't think
>> >they should be.)
>> 
>> I think pg_upgrade should check if there are any prepared
>> transactions pending, and refuse to upgrade if there are. It could
>> be made to work, but it's really not worth the trouble. If there are
>> any pending prepared transactions in the system when you run
>> pg_upgrade, it's more likely to be a mistake or oversight in the
>> first place, than on purpose.
>
> pg_upgrade already checks for prepared transactions and errors out if
> they exist;  see check_for_prepared_transactions().

Alright, that's good to know.  So I'm just adding a new bool field to
the 2PC pgstat record instead of the bit magic.

Attached is v0.2, now with a regression test included.

--
Alex

>From 4c8fae27ecd9c94e7c3da0997f03099045a152d9 Mon Sep 17 00:00:00 2001
From: Alex Shulgin 
Date: Tue, 9 Dec 2014 16:35:14 +0300
Subject: [PATCH] WIP: track TRUNCATEs in pgstat transaction stats.

The n_live_tup and n_dead_tup counters need to be set to 0 after a
TRUNCATE on the relation.  We can't issue a special message to the stats
collector because the xact might be later aborted, so we track the fact
that the relation was truncated during the xact (and reset this xact's
insert/update/delete counters).  When xact is committed, we use the
`truncated' flag to reset the n_live_tup and n_dead_tup counters.
---
 src/backend/commands/tablecmds.c   |   2 +
 src/backend/postmaster/pgstat.c|  52 -
 src/include/pgstat.h   |   3 +
 src/test/regress/expected/truncate.out | 136 +
 src/test/regress/sql/truncate.sql  |  98 
 5 files changed, 288 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
new file mode 100644
index 1e737a0..192d033
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
*** ExecuteTruncate(TruncateStmt *stmt)
*** 1224,1229 
--- 1224,1231 
  			 */
  			reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST);
  		}
+ 
+ 		pgstat_count_heap_truncate(rel);
  	}
  
  	/*
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
new file mode 100644
index c7f41a5..88c83d2
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
*** typedef struct TwoPhasePgStatRecord
*** 201,206 
--- 201,207 
  	PgStat_Counter tuples_deleted;		/* tuples deleted in xact */
  	Oid			t_id;			/* table's OID */
  	bool		t_shared;		/* is it a shared catalog? */
+ 	bool		t_truncated;	/* was the relation truncated? */
  } TwoPhasePgStatRecord;
  
  /*
*** pgstat_count_heap_delete(Relation rel)
*** 1864,1869 
--- 1865,1894 
  }
  
  /*
+  * pgstat_count_heap_truncate - update tuple counters due to truncate
+  */
+ void
+ pgstat_count_heap_truncate(Relation rel)
+ {
+ 	PgStat_TableStatus *pgstat_info = rel->pgstat_info;
+ 
+ 	if (pgstat_info != NULL)
+ 	{
+ 		/* We have to log the effect at the proper transactional level */
+ 		int			nest_level = GetCurrentTransactionNestLevel();
+ 
+ 		if (pgstat_info->trans == NULL ||
+ 			pgstat_info->trans->nest_level != nest_level)
+ 			add_tabstat_xact_level(pgstat_info, nest_level);
+ 
+ 		pgstat_info->trans->tuples_inserted = 0;
+ 		pgstat_info->trans->tuples_updated = 0;
+ 		pgstat_info->trans->tuples_deleted = 0;
+ 		pgstat_info->trans->truncated = true;
+ 	}
+ }
+ 
+ /*
   * pgstat_update_heap_dead_tuples - update dead-tuples count
   *
   * The semantics of this are that we are reporting the nontransactional
*** AtEOXact_PgStat(bool isCommit)
*** 1927,1932 
--- 1952,1959 
  			tabstat->t_counts.t_tuples_deleted += trans->tuples_deleted;
  			if (isCommit)
  			{
+ tabstat->t_counts.t_truncated = trans->truncated;
+ 
  /* insert adds a live tuple, delete removes one */
  tabstat->t_counts.t_delta_live_tuples +=
  	trans->tuples_inserted - trans->tuples_deleted;
*** AtEOSubXact_PgStat(bool isCommit, int ne
*** 1991,1999 
  			{
  if (trans->upper && trans->upper->nest_level == nestDepth - 1)
  {
! 	trans->upper->tuples_inserted += trans->tuples_inserted;
! 	trans->upper->tuples_updated += trans->tuples_updated;
! 	trans->upper->tuples_deleted += trans->tuples_deleted;
  	tabstat->trans = trans->upper;
  	pfree(trans);
  }
--- 2018,2037 
  			{
  if (trans-

Re: [HACKERS] Small TRUNCATE glitch

2014-12-09 Thread Alex Shulgin

Alex Shulgin  writes:
>
> Bruce Momjian  writes:
>>
>> Added to TODO:
>>
>> o Clear table counters on TRUNCATE
>>
>>   http://archives.postgresql.org/pgsql-hackers/2008-04/msg00169.php
>
> Hello,
>
> Attached is a WIP patch for this TODO.

This part went as an attachment, which wasn't my intent:


It does the trick by tracking if a TRUNCATE command was issued under a
(sub)transaction and uses this knowledge to reset the live/dead tuple
counters later if the transaction was committed.  Testing in simple
cases shows that this clears the counters correctly, including use of
savepoints.

The 2PC part requires extending bool flag to fit the trunc flag, is this
approach sane?  Given that 2PC transaction should survive server
restart, it's reasonable to expect it to also survive the upgrade, so I
see no clean way of adding another bool field to the
TwoPhasePgStatRecord struct (unless some would like to add checks on
record length, etc.).

I'm going to add some regression tests, but not sure what would be the
best location for this.  The truncate.sql seems like natural choice, but
stats are not updating realtime, so I'd need to borrow some tricks from
stats.sql or better put the new tests in the stats.sql itself?

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small TRUNCATE glitch

2014-12-09 Thread Alex Shulgin
Bruce Momjian  writes:
>
> Added to TODO:
>
> o Clear table counters on TRUNCATE
>
>   http://archives.postgresql.org/pgsql-hackers/2008-04/msg00169.php

Hello,

Attached is a WIP patch for this TODO.

>From 97665ef1ca7d1847e90d4dfab38562135f01fb2b Mon Sep 17 00:00:00 2001
From: Alex Shulgin 
Date: Tue, 9 Dec 2014 16:35:14 +0300
Subject: [PATCH] WIP: track TRUNCATEs in pgstat transaction stats.

The n_live_tup and n_dead_tup counters need to be set to 0 after a
TRUNCATE on the relation.  We can't issue a special message to the stats
collector because the xact might be later aborted, so we track the fact
that the relation was truncated during the xact (and reset this xact's
insert/update/delete counters).  When xact is committed, we use the
`truncated' flag to reset the n_live_tup and n_dead_tup counters.
---
 src/backend/commands/tablecmds.c |  2 ++
 src/backend/postmaster/pgstat.c  | 70 
 src/include/pgstat.h |  3 ++
 3 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
new file mode 100644
index 1e737a0..192d033
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
*** ExecuteTruncate(TruncateStmt *stmt)
*** 1224,1229 
--- 1224,1231 
  			 */
  			reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST);
  		}
+ 
+ 		pgstat_count_heap_truncate(rel);
  	}
  
  	/*
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
new file mode 100644
index c7f41a5..7ff66b5
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
*** typedef struct TwoPhasePgStatRecord
*** 200,208 
  	PgStat_Counter tuples_updated;		/* tuples updated in xact */
  	PgStat_Counter tuples_deleted;		/* tuples deleted in xact */
  	Oid			t_id;			/* table's OID */
! 	bool		t_shared;		/* is it a shared catalog? */
  } TwoPhasePgStatRecord;
  
  /*
   * Info about current "snapshot" of stats file
   */
--- 200,211 
  	PgStat_Counter tuples_updated;		/* tuples updated in xact */
  	PgStat_Counter tuples_deleted;		/* tuples deleted in xact */
  	Oid			t_id;			/* table's OID */
! 	char		t_flags;		/* see TWOPHASE_PGSTAT_RECORD_*_FLAGs */
  } TwoPhasePgStatRecord;
  
+ #define TWOPHASE_PGSTAT_RECORD_SHARED_FLAG	0x01	/* is it a shared catalog? */
+ #define TWOPHASE_PGSTAT_RECORD_TRUNC_FLAG	0x02	/* was the relation truncated? */
+ 
  /*
   * Info about current "snapshot" of stats file
   */
*** pgstat_count_heap_delete(Relation rel)
*** 1864,1869 
--- 1867,1896 
  }
  
  /*
+  * pgstat_count_heap_truncate - update tuple counters due to truncate
+  */
+ void
+ pgstat_count_heap_truncate(Relation rel)
+ {
+ 	PgStat_TableStatus *pgstat_info = rel->pgstat_info;
+ 
+ 	if (pgstat_info != NULL)
+ 	{
+ 		/* We have to log the effect at the proper transactional level */
+ 		int			nest_level = GetCurrentTransactionNestLevel();
+ 
+ 		if (pgstat_info->trans == NULL ||
+ 			pgstat_info->trans->nest_level != nest_level)
+ 			add_tabstat_xact_level(pgstat_info, nest_level);
+ 
+ 		pgstat_info->trans->tuples_inserted = 0;
+ 		pgstat_info->trans->tuples_updated = 0;
+ 		pgstat_info->trans->tuples_deleted = 0;
+ 		pgstat_info->trans->truncated = true;
+ 	}
+ }
+ 
+ /*
   * pgstat_update_heap_dead_tuples - update dead-tuples count
   *
   * The semantics of this are that we are reporting the nontransactional
*** AtEOXact_PgStat(bool isCommit)
*** 1927,1932 
--- 1954,1961 
  			tabstat->t_counts.t_tuples_deleted += trans->tuples_deleted;
  			if (isCommit)
  			{
+ tabstat->t_counts.t_truncated = trans->truncated;
+ 
  /* insert adds a live tuple, delete removes one */
  tabstat->t_counts.t_delta_live_tuples +=
  	trans->tuples_inserted - trans->tuples_deleted;
*** AtEOSubXact_PgStat(bool isCommit, int ne
*** 1991,1999 
  			{
  if (trans->upper && trans->upper->nest_level == nestDepth - 1)
  {
! 	trans->upper->tuples_inserted += trans->tuples_inserted;
! 	trans->upper->tuples_updated += trans->tuples_updated;
! 	trans->upper->tuples_deleted += trans->tuples_deleted;
  	tabstat->trans = trans->upper;
  	pfree(trans);
  }
--- 2020,2039 
  			{
  if (trans->upper && trans->upper->nest_level == nestDepth - 1)
  {
! 	if (trans->truncated)
! 	{
! 		trans->upper->truncated = true;
! 		/* replace upper xact stats with ours */
! 		trans->upper->tuples_inserted = trans->tuples_inserted;
! 		trans->upper->tuples_updated = trans->tuples_updated;
! 		trans->upper->tuples_deleted = trans->tuples_deleted;
! 	}
! 	else
! 	{
! 		trans->upper->tu

Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-04 Thread Alex Shulgin

Alvaro Herrera  writes:

> Alex Shulgin wrote:
>
>> DEBUG:  inserting column 7 value "varchar_transform"
>> 
>> Breakpoint 1, GetSnapshotData (snapshot=0xdb2d60 )
>> at /home/ash/src/postgresql/src/backend/storage/ipc/procarray.c:1413
>> 1413 xmax = ShmemVariableCache->latestCompletedXid;
>> 
>> (gdb) p ShmemVariableCache->latestCompletedXid 
>> $1 = 4294967295
>
> I think you might need to "make distclean", then recompile.  If you're
> not passing --enable-depend to configure, I suggest you do so; that
> greatly reduces such problems.

Yeah, that did it.  Sorry for the noise, I must have messed it up with
the recompilations (note to self to always use --enable-depend).

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-04 Thread Alex Shulgin

Alvaro Herrera  writes:

> Alex Shulgin wrote:
>
>> DEBUG:  inserting column 7 value "varchar_transform"
>> 
>> Breakpoint 1, GetSnapshotData (snapshot=0xdb2d60 )
>> at /home/ash/src/postgresql/src/backend/storage/ipc/procarray.c:1413
>> 1413 xmax = ShmemVariableCache->latestCompletedXid;
>> 
>> (gdb) p ShmemVariableCache->latestCompletedXid 
>> $1 = 4294967295
>
> I think you might need to "make distclean", then recompile.  If you're
> not passing --enable-depend to configure, I suggest you do so; that
> greatly reduces such problems.

Well I tried that before posting the complaint, let me try again though.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-04 Thread Alex Shulgin

Alex Shulgin  writes:
>
> Figured it out with a pg_usleep in bootstrap.c anyway.  Does this look sane?
>
>
> DEBUG:  inserting column 6 value "0"
> DEBUG:  inserted -> 0
> DEBUG:  inserting column 7 value "varchar_transform"
> TRAP: FailedAssertion("!(((xmax) >= ((TransactionId) 3)))", File: 
> "/home/ash/src/postgresql/src/backend/storage/ipc/procarray.c", Line: 1414)

I've tried to debug this and I feel really dumbfound...


DEBUG:  inserting column 7 value "varchar_transform"

Breakpoint 1, GetSnapshotData (snapshot=0xdb2d60 )
at /home/ash/src/postgresql/src/backend/storage/ipc/procarray.c:1413
1413xmax = ShmemVariableCache->latestCompletedXid;

(gdb) p ShmemVariableCache->latestCompletedXid 
$1 = 4294967295

(gdb) p *ShmemVariableCache
$2 = {nextOid = 1, oidCount = 0, nextXid = 3, oldestXid = 3, 
  xidVacLimit = 20003, xidWarnLimit = 2136483650, 
  xidStopLimit = 2146483650, xidWrapLimit = 2147483650, oldestXidDB = 1, 
  oldestCommitTs = 1, newestCommitTs = 0, latestCompletedXid = 4294967295}

(gdb) p xmax
$3 = 0

(gdb) n
1414Assert(TransactionIdIsNormal(xmax));

(gdb) p xmax
$4 = 1

(gdb) p *ShmemVariableCache
$5 = {nextOid = 1, oidCount = 0, nextXid = 3, oldestXid = 3, 
  xidVacLimit = 20003, xidWarnLimit = 2136483650, 
  xidStopLimit = 2146483650, xidWrapLimit = 2147483650, oldestXidDB = 1, 
  oldestCommitTs = 1, newestCommitTs = 0, latestCompletedXid = 4294967295}

(gdb) p ShmemVariableCache->latestCompletedXid 
$6 = 4294967295

(gdb) 


How?  Is there an another concurrent process with the old view of
VariableCacheData struct where latestCompletedXid still points to
oldestCommitTs?

This only happens with the CommitTs commit in effect.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-04 Thread Alex Shulgin

Craig Ringer  writes:

> On 12/04/2014 10:50 PM, Alex Shulgin wrote:
>> Is there a way to pause the bootstrap process so I can attach gdb to it?
>
> With a newer gdb, you can instead tell gdb to follow all forks. I wrote
> some notes on it recently.
>
> http://blog.2ndquadrant.com/processes-breakpoints-watchpoints-postgresql/
>
> I've found it really handy when debugging crashes in regression tests.
>
> However, it's often simpler to just:
>
> ulimit -c unlimited
> make check
>
> (or whatever your crashing test is) then look at the core files that are
> produced.

Good one, it didn't occur to me that assert makes core files.

Figured it out with a pg_usleep in bootstrap.c anyway.  Does this look sane?


DEBUG:  inserting column 6 value "0"
DEBUG:  inserted -> 0
DEBUG:  inserting column 7 value "varchar_transform"
TRAP: FailedAssertion("!(((xmax) >= ((TransactionId) 3)))", File: 
"/home/ash/src/postgresql/src/backend/storage/ipc/procarray.c", Line: 1414)

Program received signal SIGABRT, Aborted.
0x7f2757128d27 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x7f2757128d27 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f275712a418 in __GI_abort () at abort.c:89
#2  0x009088b2 in ExceptionalCondition (
conditionName=0xac0710 "!(((xmax) >= ((TransactionId) 3)))", 
errorType=0xac01d8 "FailedAssertion", 
fileName=0xac0178 
"/home/ash/src/postgresql/src/backend/storage/ipc/procarray.c", lineNumber=1414)
at /home/ash/src/postgresql/src/backend/utils/error/assert.c:54
#3  0x0079e125 in GetSnapshotData (
snapshot=0xdb2d60 )
at /home/ash/src/postgresql/src/backend/storage/ipc/procarray.c:1414
#4  0x0094e02d in GetNonHistoricCatalogSnapshot (relid=1255)
at /home/ash/src/postgresql/src/backend/utils/time/snapmgr.c:298
#5  0x0094dfdd in GetCatalogSnapshot (relid=1255)
at /home/ash/src/postgresql/src/backend/utils/time/snapmgr.c:272
#6  0x004c8f0d in systable_beginscan (heapRelation=0x1d0e5c0, 
indexId=2691, indexOK=1 '\001', snapshot=0x0, nkeys=1, key=0x7fff201bbc40)
at /home/ash/src/postgresql/src/backend/access/index/genam.c:275
#7  0x00885070 in regprocin (fcinfo=0x7fff201bbce0)
at /home/ash/src/postgresql/src/backend/utils/adt/regproc.c:106
#8  0x00914fe7 in InputFunctionCall (flinfo=0x7fff201bc0c0, 
str=0x1d349b8 "varchar_transform", typioparam=24, typmod=-1)
---Type  to continue, or q  to quit---
at /home/ash/src/postgresql/src/backend/utils/fmgr/fmgr.c:1914
#9  0x0091533e in OidInputFunctionCall (functionId=44, 
str=0x1d349b8 "varchar_transform", typioparam=24, typmod=-1)
at /home/ash/src/postgresql/src/backend/utils/fmgr/fmgr.c:2045
#10 0x0052af91 in InsertOneValue (value=0x1d349b8 "varchar_transform", 
i=7) at /home/ash/src/postgresql/src/backend/bootstrap/bootstrap.c:840
#11 0x00527409 in boot_yyparse ()
at /home/ash/src/postgresql/src/backend/bootstrap/bootparse.y:455
#12 0x0052a26b in BootstrapModeMain ()
at /home/ash/src/postgresql/src/backend/bootstrap/bootstrap.c:494
#13 0x0052a177 in AuxiliaryProcessMain (argc=6, argv=0x1cc8378)
at /home/ash/src/postgresql/src/backend/bootstrap/bootstrap.c:414
#14 0x006a327c in main (argc=7, argv=0x1cc8370)
at /home/ash/src/postgresql/src/backend/main/main.c:212
(gdb)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-04 Thread Alex Shulgin

Alvaro Herrera  writes:
>
> Uh, that's odd.  Can you please get a stack trace?  Do you have unusual
> settings or a patched build?

Is there a way to pause the bootstrap process so I can attach gdb to it?

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-04 Thread Alex Shulgin

Alvaro Herrera  writes:

> Alex Shulgin wrote:
>
>> Also this commit breaks initdb of `make check' for me:
>> 
>> creating template1 database in
>> /home/ash/build/postgresql/HEAD/src/test/regress/./tmp_check/data/base/1
>> ... TRAP: FailedAssertion("!(((xmax) >= ((TransactionId) 3)))",
>> File:
>> "/home/ash/src/postgresql/src/backend/storage/ipc/procarray.c",
>> Line: 1414)
>> Aborted (core dumped)
>> child process exited with exit code 134
>
> Uh, that's odd.  Can you please get a stack trace?  Do you have unusual
> settings or a patched build?

Not really, and I would mention that.  Will get a trace.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-04 Thread Alex Shulgin

Heikki Linnakangas  writes:

> On 12/03/2014 04:54 PM, Alvaro Herrera wrote:
>> ir commit timestamp directly as they commit,
>> or an external transaction c
>
> Sorry, I'm late to the party, but here's some random comments on this
> after a quick review:

Also this commit breaks initdb of `make check' for me:

creating template1 database in 
/home/ash/build/postgresql/HEAD/src/test/regress/./tmp_check/data/base/1 ... 
TRAP: FailedAssertion("!(((xmax) >= ((TransactionId) 3)))", File: 
"/home/ash/src/postgresql/src/backend/storage/ipc/procarray.c", Line: 1414)
Aborted (core dumped)
child process exited with exit code 134

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2014-12-02 Thread Alex Shulgin

Andres Freund  writes:

> On 2014-12-02 17:25:14 +0300, Alex Shulgin wrote:
>> I'd be in favor of a solution that works the same way as before the
>> patch, without the need for extra trigger files, etc., but that doesn't
>> seem to be nearly possible.  Whatever tricks we might employ will likely
>> be defeated by the fact that the oldschool user will fail to *include*
>> recovery.conf in the main conf file.
>
> I think removing the ability to define a trigger file is pretty much
> unacceptable. It's not *too* bad to rewrite recovery.conf's contents
> into actual valid postgresql.conf format, but it's harder to change an
> existing complex failover setup that relies on the existance of such a
> trigger.  I think aiming for removal of that is a sure way to prevent
> the patch from getting in.

To make it clear, I was talking not about trigger_file, but about
standby.enabled file which triggers the recovery/pitr/standby scenario
in the current form of the patch and stands as a replacement check
instead of the original check for the presense of recovery.conf.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2014-12-02 Thread Alex Shulgin

Michael Paquier  writes:

> On Tue, Dec 2, 2014 at 1:58 AM, Alex Shulgin  wrote:
>> Here's the patch rebased against current HEAD, that is including the
>> recently committed action_at_recovery_target option.
> If this patch gets in, it gives a good argument to jump to 10.0 IMO.
> That's not a bad thing, only the cost of making recovery params as
> GUCs which is still a feature wanted.
>
>> The default for the new GUC is 'pause', as in HEAD, and
>> pause_at_recovery_target is removed completely in favor of it.
> Makes sense. Another idea that popped out was to rename this parameter
> as recovery_target_action as well, but that's not really something
> this patch should care about.

Indeed, but changing the name after the fact is straightforward.

>> I've also taken the liberty to remove that part that errors out when
>> finding $PGDATA/recovery.conf.
> I am not in favor of this part. It may be better to let the users know
> that their old configuration is not valid anymore with an error. This
> patch cuts in the flesh with a huge axe, let's be sure that users do
> not ignore the side pain effects, or recovery.conf would be simply
> ignored and users would not be aware of that.

Yeah, that is good point.

I'd be in favor of a solution that works the same way as before the
patch, without the need for extra trigger files, etc., but that doesn't
seem to be nearly possible.  Whatever tricks we might employ will likely
be defeated by the fact that the oldschool user will fail to *include*
recovery.conf in the main conf file.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2014-12-01 Thread Alex Shulgin
Alex Shulgin  writes:
>
> Here's an attempt to revive this patch.

Here's the patch rebased against current HEAD, that is including the
recently committed action_at_recovery_target option.

The default for the new GUC is 'pause', as in HEAD, and
pause_at_recovery_target is removed completely in favor of it.

I've also taken the liberty to remove that part that errors out when
finding $PGDATA/recovery.conf.  Now get your rotten tomatoes ready. ;-)

--
Alex



recovery_guc_v5.4.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf

2014-11-28 Thread Alex Shulgin

Christoph Berg  writes:

> Re: Petr Jelinek 2014-11-25 <5474efea.2040...@2ndquadrant.com>
>> >Patch committed.

Before I go and rebase that recovery.conf -> GUC patch on top of
this...  is it final?

>> 
>> Thanks!
>
> I'm a bit late to the party, but wouldn't
>
> recovery_target_action = ...
>
> have been a better name for this? It'd be in line with the other
> recovery_target_* parameters, and also a bit shorter than the imho
> somewhat ugly "action_at_recovery_target".

FWIW, I too think that "recovery_target_action" is a better name.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace

2014-11-28 Thread Alex Shulgin
Julien Tachoires  writes:
>
> 2011/12/15 Alvaro Herrera :
>>
>> Uhm, surely you could compare the original toast tablespace to the heap
>> tablespace, and if they differ, handle appropriately when creating the
>> new toast table? =A0Just pass down the toast tablespace into
>> AlterTableCreateToastTable, instead of having it assume that
>> rel->rd_rel->relnamespace is sufficient. =A0This should be done in all
>> cases where a toast tablespace is created, which shouldn't be more than
>> a handful of them.
>
> Thank you, that way seems right.
> Now, I distinguish before each creation of a TOAST table with
> AlterTableCreateToastTable() : if it will create a new one or recreate
> an existing one.
> Thus, in create_toast_table() when toastTableSpace is equal to
> InvalidOid, we are able :
> - to fallback to the main table tablespace in case of new TOAST table creat=
> ion
> - to keep it previous tablespace in case of recreation.
>
> Here's a new version rebased against HEAD.

Almost 3 years later, here's a version rebased against current HEAD. :-)

It compiles and even passes the included regression test.

There were doubts originally if this is actually a useful feature, but
there is at least one plausible use case (main table on SSD, TOAST on
HDD):

  http://www.postgresql.org/message-id/4f3267ee.80...@deltasoft.no

I didn't add anything on top of the latest version, but I notice we've
added mat. views since then, so it looks like we now also need this:

  ALTER MATERIALIZED VIEW SET [VIEW | TOAST] TABLESPACE


Should I add this patch to the next CommitFest?

--
Alex



set_toast_tablespace_v0.11.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-11-27 Thread Alex Shulgin

Dag-Erling Smørgrav  writes:

> Alex Shulgin  writes:
>> OK, looks like I've come up with something workable: I've added
>> sslprotocol connection string keyword similar to pre-existing
>> sslcompression, etc.  Please see attached v2 of the original patch.
>> I'm having doubts about the name of openssl.h header though,
>> libpq-openssl.h?
>
> Perhaps ssloptions.[ch], unless you plan to add non-option-related code
> there later?

I don't think anything else than common options-related code fits in
there, so ssloptions.c makes sense to me.

> BTW, there is no Regent code in your openssl.c, so the copyright
> statement is incorrect.

Good catch, I just blindly copied that from some other file.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-11-26 Thread Alex Shulgin
Alex Shulgin  writes:
>>>
>>> I can do that too, just need a hint where to look at in libpq/psql to
>>> add the option.
>>
>> The place to *enforce* the option is src/interfaces/libpq/fe-secure.c
>> (look for SSLv23_method() and SSL_CTX_set_options()).  I haven't looked
>> into how to set it.
>
> Yes, I've figured it out.  Guess we'd better share the ssl_protocol
> value parser code between libpq and the backend.  Any precedent?

OK, looks like I've come up with something workable: I've added
sslprotocol connection string keyword similar to pre-existing
sslcompression, etc.

Please see attached v2 of the original patch.  I'm having doubts about
the name of openssl.h header though, libpq-openssl.h?

--
Alex

>From 7cd60e962ce5e7fc10dc52ed9c92b0b2b5c4c7f1 Mon Sep 17 00:00:00 2001
From: Alex Shulgin 
Date: Wed, 19 Nov 2014 19:49:29 +0300
Subject: [PATCH] DRAFT: ssl_protocols GUC

---
 doc/src/sgml/config.sgml  |  29 ++
 doc/src/sgml/libpq.sgml   |  25 ++
 src/backend/libpq/Makefile|   2 +-
 src/backend/libpq/be-secure-openssl.c |  29 --
 src/backend/libpq/be-secure.c |   3 +-
 src/backend/libpq/openssl.c   | 124 ++
 src/backend/utils/misc/guc.c  |  15 
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/libpq/libpq.h |   1 +
 src/include/libpq/openssl.h   |  50 +++
 src/interfaces/libpq/Makefile |   8 +-
 src/interfaces/libpq/fe-connect.c |   4 +
 src/interfaces/libpq/fe-secure-openssl.c  |  18 +++-
 src/interfaces/libpq/libpq-int.h  |   1 +
 14 files changed, 300 insertions(+), 10 deletions(-)
 create mode 100644 src/backend/libpq/openssl.c
 create mode 100644 src/include/libpq/openssl.h

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index ab8c263..8b701df
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** include_dir 'conf.d'
*** 1027,1032 
--- 1027,1061 

   
  
+  
+   ssl_protocols (string)
+   
+ssl_protocols configuration parameter
+   
+   
+   
+
+ Specifies a colon-separated list of SSL protocols that are
+ allowed to be used on secure connections. Each entry in the list can
+ be prefixed by a + (add to the current list)
+ or - (remove from the current list). If no prefix is used,
+ the list is replaced with the specified protocol.
+
+
+ The full list of supported protocols can be found in the
+ the openssl manual page.  In addition to the names of
+ individual protocols, the following keywords can be
+ used: ALL (all protocols supported by the underlying
+ crypto library), SSL (all supported versions
+ of SSL) and TLS (all supported versions
+ of TLS).
+
+
+ The default is ALL:-SSL.
+
+   
+  
+ 
   
ssl_ciphers (string)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
new file mode 100644
index e23e91d..9ea71a4
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
*** postgresql://%2Fvar%2Flib%2Fpostgresql/d
*** 1230,1235 
--- 1230,1250 

   
  
+  
+   sslprotocols
+   
+
+ Specifies a colon-separated list of SSL protocols that are
+ allowed to be used on secure connections.
+ See  server configuration parameter
+ for format.
+
+
+ Defaults to ALL:-SSL.
+
+   
+  
+ 
   
sslcompression

*** myEventProc(PGEventId evtId, void *evtIn
*** 6711,6716 
--- 6726,6741 
   
  
  
+ 
+  
+   
+PGSSLPROTOCOLS
+   
+   PGSSLPROTOCOLS behaves the same as the  connection parameter.
+  
+ 
+ 
  
   

diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile
new file mode 100644
index 09410c4..1c058ec
*** a/src/backend/libpq/Makefile
--- b/src/backend/libpq/Makefile
*** OBJS = be-fsstubs.o be-secure.o auth.o c
*** 18,24 
 pqformat.o pqmq.o pqsignal.o
  
  ifeq ($(with_openssl),yes)
! OBJS += be-secure-openssl.o
  endif
  
  include $(top_srcdir)/src/backend/common.mk
--- 18,24 
 pqformat.o pqmq.o pqsignal.o
  
  ifeq ($(with_openssl),yes)
! OBJS += be-secure-openssl.o openssl.o
  endif
  
  include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
new file mode 100644
index b05364c..3801455
*** a/src/backend/libpq/be-secure-openssl.c
--- b/src/backend/libpq/be-secure-openssl.c
***
*** 7

Re: [HACKERS] Follow up to irc on CREATE INDEX vs. maintenance_work_mem on 9.3

2014-11-26 Thread Alex Shulgin

Andrew Gierth  writes:

>>>>>> "Alex" == Alex Shulgin  writes:
>
>  > Tom Lane  writes:
>  >> Must've been my evil twin.
>
>  Alex> Sorry, I must be under false impression that RhodiumToad is
>  Alex> *your* nick on #postgresql at freenode.  I don't recall who
>  Alex> told me that, but I was pretty sure it's you. :-p
>
> ... what
>
> People do occasionally make jokes on IRC about me being Tom's clone; I
> know they mean it in a positive way but I still find it *extremely*
> annoying, so I do try and discourage it. (If they're making those same
> jokes elsewhere, I haven't been aware of it, but please consider this
> a polite public request to stop.)
>
> My first name is easily visible in the irc gecos field:
>
> *** RhodiumToad is ~andrew@[my hostname] (Andrew)
>
> and there is also the IRC users list on the wiki:
> http://wiki.postgresql.org/wiki/IRC2RWNames

Andrew, Tom,

Sorry for the confusion.  And, Andrew, thanks again for the help! :-)

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Follow up to irc on CREATE INDEX vs. maintenance_work_mem on 9.3

2014-11-26 Thread Alex Shulgin

Tom Lane  writes:

> Stephen Frost  writes:
>> * Alex Shulgin (a...@commandprompt.com) wrote:
>>> Tom,
>>> 
>>> First of all, thanks for your help on IRC last time with that CREATE
>>> INDEX memory consumption problem.
>
>> Doubt it was Tom, but if it was, wanna share what channel on IRC it was?
>> :D
>
> Must've been my evil twin.

Sorry, I must be under false impression that RhodiumToad is *your* nick
on #postgresql at freenode.  I don't recall who told me that, but I was
pretty sure it's you. :-p

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Follow up to irc on CREATE INDEX vs. maintenance_work_mem on 9.3

2014-11-26 Thread Alex Shulgin

Tom,

First of all, thanks for your help on IRC last time with that CREATE
INDEX memory consumption problem.

As has been pointed out in a stackexchange answer to my question[1], it
is indeed the limitation of pre-9.4 versions, but the limit is imposed
on memtuples array, rather than total memory the sort in CREATE INDEX
may allocate.  The memtuples won't grow further than MaxAllocSize and
I've got 24x50x10^6 = 1200MB, which just doesn't fit.

We've got a customer who is testing a migration to PostgreSQL-9.3 (from
$some_other_db), thus they load the tables first (some of their tables
have 10-100 million rows), then create the indexes and they constantly
see disk sort being used despite lots of available RAM and
maintenance_work_mem set to increasingly higher values.

Now my question, is it feasible to back-patch this to 9.3?  Or should we
tell the customer to wait before 9.4 is released?

Thanks.
--
Alex

[1] 
http://dba.stackexchange.com/questions/83600/postgresql-create-index-memory-requirement


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication connection URI?

2014-11-25 Thread Alex Shulgin

Heikki Linnakangas  writes:
>>>
>>> The second one is a self-contained fix, but the third one which is the
>>> actual patch depends on the second one, because it specifies the dbname
>>> keyword two times: first to parse the conninfo/URI, then to override any
>>> dbname provided by the user with "replication" pseudo-database name.
>>
>> Hmm. Should we backpatch the second patch? It sure seems like an
>> oversight rather than deliberate that you can't override dbname from the
>> connection string with a later dbname keyword. I'd say "yes".
>>
>> How about the third patch? Probably not; it was an oversight with the
>> connection URI patch that it could not be used in primary_conninfo, but
>> it's not a big deal in practice as you can always use a non-URI
>> connection string instead.
>
> Ok, committed the second patch to all stable branches, and the third
> patch to master.
>
> In the second patch, I added a sentence to the docs to mention that
> only the first "dbname" parameter is expanded. It's debatable if
> that's what we actually want. It would be handy to be able to merge
> multiple connection strings, by specifying multiple dbname
> parameters. But now the docs at least match the code, changing the
> behavior would be a bigger change.
>
> From the third patch, I removed the docs changes. It's necessary to
> say "connection string or URI" everywhere, the URI format is just one
> kind of a connection string. I also edited the code that builds the
> keyword/value array, to make it a bit more readable.

Yay, many thanks! :-)

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Missing OOM checks in libpq (was Re: [HACKERS] Replication connection URI?)

2014-11-25 Thread Alex Shulgin

Heikki Linnakangas  writes:
>>>
>>> I think we need to fix all of those, and backpatch. Per attached.
>>
>> Yikes!  Looks sane to me.
>
> Ok thanks, committed. It didn't apply cleanly to 9.0, 9.1 and 9.2, so
> the patch for those branches looks a bit different.

Great.  Are you looking at the actual replication URI patch?  Or should
I add it to commitfest so we don't lose track of it?

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Missing OOM checks in libpq (was Re: [HACKERS] Replication connection URI?)

2014-11-25 Thread Alex Shulgin

Heikki Linnakangas  writes:

> On 11/24/2014 06:05 PM, Alex Shulgin wrote:
>> The first patch is not on topic, I just spotted this missing check.
>
>> *** a/src/interfaces/libpq/fe-connect.c
>> --- b/src/interfaces/libpq/fe-connect.c
>> *** conninfo_array_parse(const char *const *
>> *** 4402,4407 
>> --- 4402,4415 
>>  if 
>> (options[k].val)
>>  
>> free(options[k].val);
>>  options[k].val 
>> = strdup(str_option->val);
>> +if 
>> (!options[k].val)
>> +{
>> +
>> printfPQExpBuffer(errorMessage,
>> +
>>   libpq_gettext("out of memory\n"));
>> +
>> PQconninfoFree(options);
>> +
>> PQconninfoFree(dbname_options);
>> +return 
>> NULL;
>> +}
>>  break;
>>  }
>>  }
>
> Oh. There are actually many more places in connection option parsing
> that don't check the return value of strdup(). The one in fillPGConn
> even has an XXX comment saying it probably should check it. You can
> get quite strange behavior if one of them fails. If for example the
> strdup() on dbname fails, you might end up connecting to different
> database than intended. And if the "conn->sslmode =
> strdup(DefaultSSLMode);" call in connectOptions2 fails, you'll get a
> segfault later because at least connectDBstart assumes that sslmode is
> not NULL.
>
> I think we need to fix all of those, and backpatch. Per attached.

Yikes!  Looks sane to me.

I've checked that patches #2 and #3 can be applied on top of this, w/o
rebasing.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication connection URI?

2014-11-24 Thread Alex Shulgin

Alex Shulgin  writes:

> Heikki Linnakangas  writes:
>>>
>>> It appears that replication connection doesn't support URI but only the
>>> traditional conninfo string.
>>>
>>> src/backend/replication/libpqwalreceiver/libpqwalreceiver.c:99: in 
>>> libpqrcv_connect():
>>>
>>>  snprintf(conninfo_repl, sizeof(conninfo_repl),
>>>   "%s dbname=replication replication=true 
>>> fallback_application_name=walreceiver",
>>>   conninfo);
>>>
>>> A patch to fix this welcome?
>>
>> Yeah, seems like an oversight. Hopefully you can fix that without
>> teaching libpqwalreceiver what connection URIs look like..
>
> Please see attached.  We're lucky that PQconnectdbParams has an option
> to parse and expand the first dbname parameter if it looks like a
> connection string (or a URI).
>
> The first patch is not on topic, I just spotted this missing check.
>
> The second one is a self-contained fix, but the third one which is the
> actual patch depends on the second one, because it specifies the dbname
> keyword two times: first to parse the conninfo/URI, then to override any
> dbname provided by the user with "replication" pseudo-database name.

These patches are really simple, I hope a committer will pick them up?
Or should I add them to the commitfest?

Also, I'd rather get this committed first, then rebase that
recovery.conf->GUC patch onto it and submit an updated version.

Thanks.
--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2014-11-24 Thread Alex Shulgin

Josh Berkus  writes:
>>>
>>> Before I go into my ideas, though, what does the current patch do
>>> regarding non-replication PITR?
>> 
>> It removes that $PGDATA/standby.enable trigger file it relies on to
>> start the PITR in the first place.
>
> OK, and that's required for replication too?  I'm OK with that if it
> gets the patch in.

In the current form of the patch, yes.  Thought I don't think I like it.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2014-11-24 Thread Alex Shulgin

Josh Berkus  writes:
>> 
>> only that you need to start in recovery mode to start replication
>
> Right, but my point is that having a trigger file *is not necessary for
> replication, only for PITR* -- and maybe not necessary even for PITR.
> That is, in a streaming replication configuration, having a
> "standby_mode = on|off" parameter is 100% sufficient to control
> replication with the small detail that "pg_ctl promote" needs to set it
> in pg.auto.conf or conf.d.
>
> And, now, having given it some thought, I'm going to argue that it's not
> required for PITR either, provided that we can use the auto.conf method.
>
> Before I go into my ideas, though, what does the current patch do
> regarding non-replication PITR?

It removes that $PGDATA/standby.enable trigger file it relies on to
start the PITR in the first place.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2014-11-24 Thread Alex Shulgin

Jaime Casanova  writes:
>>>
>>> Either way, from the code it is clear that we only stay in recovery if
>>> standby_mode is directly turned on.  This makes the whole check for a
>>> specially named file unnecessary, IMO: we should just check the value of
>>> standby_mode (which is off by default).
>
> no. currently we enter in recovery mode when postgres see a
> recovery.conf and stays in recovery mode when standby_mode is on or an
> appropiate restore_command is provided.
>
> which means recovery.conf has two uses:
> 1) start in recovery mode (not continuous)
> 2) provide parameters for recovery mode and for streaming
>
> we still need a "recovery trigger" file that forces postgres to start
> in recovery mode and acts accordingly to recovery GUCs

Yes, these were my latest findings also, but if instead of removing the
trigger file upon successful recovery the server would set
standby_mode=off, that would also work.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication connection URI?

2014-11-24 Thread Alex Shulgin
Heikki Linnakangas  writes:
>>
>> It appears that replication connection doesn't support URI but only the
>> traditional conninfo string.
>>
>> src/backend/replication/libpqwalreceiver/libpqwalreceiver.c:99: in 
>> libpqrcv_connect():
>>
>>  snprintf(conninfo_repl, sizeof(conninfo_repl),
>>   "%s dbname=replication replication=true 
>> fallback_application_name=walreceiver",
>>   conninfo);
>>
>> A patch to fix this welcome?
>
> Yeah, seems like an oversight. Hopefully you can fix that without
> teaching libpqwalreceiver what connection URIs look like..

Please see attached.  We're lucky that PQconnectdbParams has an option
to parse and expand the first dbname parameter if it looks like a
connection string (or a URI).

The first patch is not on topic, I just spotted this missing check.

The second one is a self-contained fix, but the third one which is the
actual patch depends on the second one, because it specifies the dbname
keyword two times: first to parse the conninfo/URI, then to override any
dbname provided by the user with "replication" pseudo-database name.

Have a nice day!
--
Alex

>From 156e6faa96ad6a2ce58055ad72883ed78c576e5b Mon Sep 17 00:00:00 2001
From: Alex Shulgin 
Date: Mon, 24 Nov 2014 16:55:50 +0300
Subject: [PATCH 1/3] Add missing check on OOM in expand_dbname path of
 conninfo_array_parse().

---
 src/interfaces/libpq/fe-connect.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
new file mode 100644
index 3fe8c21..d7f2ec2
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
*** conninfo_array_parse(const char *const *
*** 4402,4407 
--- 4402,4415 
  if (options[k].val)
  	free(options[k].val);
  options[k].val = strdup(str_option->val);
+ if (!options[k].val)
+ {
+ 	printfPQExpBuffer(errorMessage,
+ 	  libpq_gettext("out of memory\n"));
+ 	PQconninfoFree(options);
+ 	PQconninfoFree(dbname_options);
+ 	return NULL;
+ }
  break;
  			}
  		}
-- 
2.1.0

>From 44d9d6a2c9cf5af83988f9d3b6eeb39c36104ef9 Mon Sep 17 00:00:00 2001
From: Alex Shulgin 
Date: Mon, 24 Nov 2014 18:12:51 +0300
Subject: [PATCH 2/3] Allow further dbname=value to override conninfo parsed
 from an expanded dbname in conninfo_array_parse().

---
 src/interfaces/libpq/fe-connect.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
new file mode 100644
index d7f2ec2..5b45128
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
*** conninfo_parse(const char *conninfo, PQE
*** 4302,4311 
   * Defaults are supplied (from a service file, environment variables, etc)
   * for unspecified options, but only if use_defaults is TRUE.
   *
!  * If expand_dbname is non-zero, and the value passed for keyword "dbname" is a
!  * connection string (as indicated by recognized_connection_string) then parse
!  * and process it, overriding any previously processed conflicting
!  * keywords. Subsequent keywords will take precedence, however.
   */
  static PQconninfoOption *
  conninfo_array_parse(const char *const * keywords, const char *const * values,
--- 4302,4313 
   * Defaults are supplied (from a service file, environment variables, etc)
   * for unspecified options, but only if use_defaults is TRUE.
   *
!  * If expand_dbname is non-zero, and the value passed for the first occurrence
!  * of "dbname" keyword is a connection string (as indicated by
!  * recognized_connection_string) then parse and process it, overriding any
!  * previously processed conflicting keywords. Subsequent keywords will take
!  * precedence, however. In particular, a further occurrence of "dbname" may
!  * override the dbname provided in the connection string.
   */
  static PQconninfoOption *
  conninfo_array_parse(const char *const * keywords, const char *const * values,
*** conninfo_array_parse(const char *const *
*** 4381,4387 
  			}
  
  			/*
! 			 * If we are on the dbname parameter, and we have a parsed
  			 * connection string, copy those parameters across, overriding any
  			 * existing previous settings.
  			 */
--- 4383,4389 
  			}
  
  			/*
! 			 * If we are on the *first* dbname parameter, and we have a parsed
  			 * connection string, copy those parameters across, overriding any
  			 * existing previous settings.
  			 */
*** conninfo_array_parse(const char *const *
*** 4415,4420 
--- 4417,4428 
  		}
  	}
  }
+ /*
+  * Clear out the parsed dbname, so that a possible further
+  * occu

[HACKERS] Replication connection URI?

2014-11-24 Thread Alex Shulgin

Hackers,

It appears that replication connection doesn't support URI but only the
traditional conninfo string.

src/backend/replication/libpqwalreceiver/libpqwalreceiver.c:99: in 
libpqrcv_connect():

snprintf(conninfo_repl, sizeof(conninfo_repl),
 "%s dbname=replication replication=true 
fallback_application_name=walreceiver",
 conninfo);

A patch to fix this welcome?

--
Alex

PS: I wrote the original URI parser used in libpq.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2014-11-24 Thread Alex Shulgin

Josh Berkus  writes:
>> 
>> Well, the latest version of this patch fails to start when it sees
>> 'recovery.conf' in PGDATA:
>> 
>>   FATAL:  "recovery.conf" is not supported anymore as a recovery method
>>   DETAIL:  Refer to appropriate documentation about migration methods
>> 
>> I've missed all the discussion behind this decision and after reading
>> the ALTER SYSTEM/conf.d thread I'm even more confused so I'd like
>> someone more knowledgeable to speak up on the status of this.
>
> The argument was that people wanted to be able to have an empty
> recovery.conf file as a "we're in standby" trigger so that they could
> preserve backwards compatibility with external tools.  I don't agree
> with this argument, but several people championed it.

It doesn't look like we can provide for 100% backwards compatibility
with existing tools, here's why:

a) The only sensible way to use recovery.conf as GUC source is via
   include/include_dir from postgresql.conf.

b) The server used to rename $PGDATA/recovery.conf to
   $PGDATA/recovery.done so when it is re-started it doesn't
   accidentally start in recovery mode.

We can't keep doing (b) because that will make postgres fail to start on
a missing include.  Well, it won't error out if we place the file under
include_dir, as only files with the ".conf" suffix are included, but
then again the server wouldn't know which file to rename unless we tell
it or hard-code the the filename.  Either way this is not 100% backwards
compatible.

Now the question is if we are going to break compatibility anyway:
should we try to minimize the difference or will it disserve the user by
providing a false sense of compatibility?

For one, if we're breaking things, the trigger_file GUC should be
renamed to end_recovery_trigger_file or something like that, IMO.
That's if we decide to keep it at all.

>> The docs use the term "continuous recovery".
>> 
>> Either way, from the code it is clear that we only stay in recovery if
>> standby_mode is directly turned on.  This makes the whole check for a
>> specially named file unnecessary, IMO: we should just check the value of
>> standby_mode (which is off by default).
>
> So, what happens when someone does "pg_ctl promote"?  Somehow
> standby_mode needs to get set to "off".  Maybe we write "standby_mode =
> off" to postgresql.auto.conf?

Well, standby_mode is only consulted at startup after crash recovery.
But suddenly, a tool that *writes* recovery.conf needs to also
consult/edit postgresql.auto.conf...  Maybe that's inevitable, but still
a point to consider.

>> By the way, is there any use in setting standby_mode=on and any of the
>> recovery_target* GUCs at the same time?
>
> See my thread on this topic from last week.  Short answer: No.
>
>> I think it can only play together if you set the target farther than the
>> latest point you've got in the archive locally.  So that's sort of
>> "Point-in-Future-Recovery".  Does that make any sense at all?
>
> Again, see the thread.  This doesn't work in a useful way, so there's no
> reason to write code to enable it.

Makes sense.  Should we incorporate the actual tech and doc fix in this
patch?

>>>>  /* File path names (all relative to $PGDATA) */
>>>> -#define RECOVERY_COMMAND_FILE "recovery.conf"
>>>> -#define RECOVERY_COMMAND_DONE "recovery.done"
>>>> +#define RECOVERY_ENABLE_FILE  "standby.enabled"
>>>
>>> Imo the file and variable names should stay coherent.
>> 
>> Yes, once we settle on the name (and if we really need that extra
>> trigger file.)
>
> Personally, if we have three methods of promotion:
>
> 1) pg_ctl promote
> 2) edit postgresql.conf and reload
> 3) ALTER SYSTEM SET and reload
>
> ... I don't honestly think we need a 4th method for promotion.

Me neither.  It is tempting to make everything spin around GUCs without
the need for any external trigger files.

> The main reason to want a "we're in recovery file" is for PITR rather
> than for replication, where it has a number of advantages as a method,
> the main one being that recovery.conf is unlikely to be overwritten by
> the contents of the backup.
>
> HOWEVER, there's a clear out for this with conf.d.  If we enable conf.d
> by default, then we can simply put recovery settings in conf.d, and
> since that specific file (which can have whatever name the user chooses)
> will not be part of backups, it would have the same advantage as
> recovery.conf, without the drawbacks.
>
> Discuss?

Well, I don't readily see how conf.d is special with regard to base
backup, wouldn't you need to exclude it explicitly still?

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2014-11-24 Thread Alex Shulgin

Alex Shulgin  writes:
>
>>> > * Why do you include xlog_internal.h in guc.c and not xlog.h?
>>
>>> we actually need both but including xlog_internal.h also includes xlog.h
>>> i added xlog.h and if someone things is enough only putting
>>> xlog_internal.h let me know
>>
>> What's required from xlog_internal.h?
>
> Looks like this was addressed before me.

Actually, it's there to bring in MAXFNAMELEN which is used in
check_recovery_target_name.  Not sure how it even compiled without that,
but after some branch switching it doesn't anymore. :-p

Maybe we should move these check/assign hooks to xlog.c, but that's
likely going to create header files dependency problem due to use of
GucSource in the hook prototype...

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2014-11-21 Thread Alex Shulgin
mptz_in() and even
> DateTimeParseError(), replacing the ereport()s by the guc error
> reporting.

The use of PG_TRY/CATCH does protect from FATALs in SIGHUP indeed.
Using CopyErrorData() we can also fetch the actual error message from
timestamptz_in, though I wonder we really have to make a full copy.

>> > * Why do you include xlog_internal.h in guc.c and not xlog.h?
>
>> we actually need both but including xlog_internal.h also includes xlog.h
>> i added xlog.h and if someone things is enough only putting
>> xlog_internal.h let me know
>
> What's required from xlog_internal.h?

Looks like this was addressed before me.

>> diff --git a/src/backend/access/transam/xlog.c 
>> b/src/backend/access/transam/xlog.c
>> index b53ae87..54f6a0d 100644
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> @@ -64,11 +64,12 @@
>>  extern uint32 bootstrap_data_checksum_version;
>>  
>>  /* File path names (all relative to $PGDATA) */
>> -#define RECOVERY_COMMAND_FILE   "recovery.conf"
>> -#define RECOVERY_COMMAND_DONE   "recovery.done"
>> +#define RECOVERY_ENABLE_FILE"standby.enabled"
>
> Imo the file and variable names should stay coherent.

Yes, once we settle on the name (and if we really need that extra
trigger file.)

>> +/* recovery.conf is not supported anymore */
>> +#define RECOVERY_COMMAND_FILE   "recovery.conf"
>
>> +boolStandbyModeRequested = false;
>> +static TimestampTz recoveryDelayUntilTime;
>
> This imo should be lowercase now, the majority of GUC variables are.

This is not a GUC variable, though it's calculated based on a GUC
recovery_min_apply_delay.

>> +/* are we currently in standby mode? */
>> +bool StandbyMode = false;
>
> Why did you move this?

It was easy to move it back though.

>> -if (rtliGiven)
>> +if (strcmp(recovery_target_timeline_string, "") != 0)
>>  {
>
> Why not have the convention that NULL indicates a unset target_timeline
> like you use for other GUCs? Mixing things like this is confusing.
>
> Why is recovery_target_timeline stored as a string? Because it's a
> unsigned int? If so, you should have an assign hook setting up a)
> rtliGiven, b) properly typed variable.

Yes, I believe setting these to NULL by default makes a lot more sense.
Then one can check if there was a non-default setting by checking the
*_string variable, which is not possible with int or TimestampTz.

>> -if (rtli)
>> +if (recovery_target_timeline)
>>  {
>>  /* Timeline 1 does not have a history file, all else 
>> should */
>> -if (rtli != 1 && !existsTimeLineHistory(rtli))
>> +if (recovery_target_timeline != 1 && 
>> +
>> !existsTimeLineHistory(recovery_target_timeline))
>>  ereport(FATAL,
>>  (errmsg("recovery target 
>> timeline %u does not exist",
>> -rtli)));
>> -recoveryTargetTLI = rtli;
>> +
>> recovery_target_timeline)));
>> +recoveryTargetTLI = recovery_target_timeline;
>>  recoveryTargetIsLatest = false;
>
> So, now we have a recoveryTargetTLI and recovery_target_timeline
> variable? Really? Why do we need recoveryTargetTLI at all now?

Looks like we still do need both of them.  The initial value of
recoveryTargetTLI is derived from pg_control, but later it can be
overriden by this setting.

However, if the recovery_target_timeline was "latest" we need to find
the latest TLI, based on the initial value from pg_control.  But we
can't set final recoveryTargetTLI value in the GUC assign hook, because
we might need to fetch some history files first.

>> +static void
>> +assign_recovery_target_time(const char *newval, void *extra)
>> +{
>> +recovery_target_time = *((TimestampTz *) extra);
>> +
>> +if (recovery_target_xid != InvalidTransactionId)
>> +recovery_target = RECOVERY_TARGET_XID;
>> +else if (recovery_target_name[0])
>> +recovery_target = RECOVERY_TARGET_NAME;
>> +else if (recovery_target_time != 0)
>> +recovery_target = RECOVERY_TARGET_TIME;
>> +else
>> +recovery_target = RECOVERY_TARGET_UNSET;
>> +}
>> +
>
> I don't think it's correct to do such han

Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-11-21 Thread Alex Shulgin

Dag-Erling Smørgrav  writes:

> Alex Shulgin  writes:
>> I can do that too, just need a hint where to look at in libpq/psql to
>> add the option.
>
> The place to *enforce* the option is src/interfaces/libpq/fe-secure.c
> (look for SSLv23_method() and SSL_CTX_set_options()).  I haven't looked
> into how to set it.

Yes, I've figured it out.  Guess we'd better share the ssl_protocol
value parser code between libpq and the backend.  Any precedent?

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Merging recovery.conf into PostgreSQL.conf -- where did this go?

2014-11-20 Thread Alex Shulgin

Andres Freund  writes:
>
> On 2014-11-19 15:09:10 -0800, Josh Berkus wrote:
>> One patch that got deferred from 9.4 was the merger of recovery.conf and
>> postgresql.conf, due to conflicts with ALTER SYSTEM SET.  However, this
>> is still a critical TODO, because recovery.conf is still an ongoing
>> major management problem for PostgreSQL DBAs.
>> 
>> So, what happened to this?  I kinda expected it to get committed right
>> after we opened 9.5.
>
> Nobody did the remaining work.

I'd like to work on this.  AFAICT, this CommitFest entry is the latest
on the subject, right?

  https://commitfest.postgresql.org/action/patch_view?id=1293

Should/may I move it to the next Open fest?

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-11-20 Thread Alex Shulgin

Dag-Erling Smørgrav  writes:

> Alex Shulgin  writes:
>> * The patch works as advertised, though the only way to verify that
>>   connections made with the protocol disabled by the GUC are indeed
>>   rejected is to edit fe-secure-openssl.c to only allow specific TLS
>>   versions.  Adding configuration on the libpq side as suggested in the
>>   original discussion might help here.
>
> I can easily do that, but I won't have time until next week or so.

I can do that too, just need a hint where to look at in libpq/psql to
add the option.

For SSL we have sslmode and sslcompression, etc. in conninfo, so adding
sslprotocols seems to be an option.  As an aside note: should we also
expose a parameter to choose SSL ciphers (would be a separate patch)?

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-11-19 Thread Alex Shulgin

Dag-Erling Smørgrav  writes:
>
> The attached patches add an ssl_protocols configuration option which
> control which versions of SSL or TLS the server will use.  The syntax is
> similar to Apache's SSLProtocols directive, except that the list is
> colon-separated instead of whitespace-separated, although that is easy
> to change if it proves unpopular.

Hello,

Here is my review of the patch against master branch:

* The patch applies and compiles cleanly, except for sgml docs:

postgres.xml:66141: element varlistentry: validity error : Element
varlistentry content does not follow the DTD, expecting (term+ ,
listitem), got (term indexterm listitem)

under the  tag in the material
  added to config.sgml by the patch.

* The patch works as advertised, though the only way to verify that
  connections made with the protocol disabled by the GUC are indeed
  rejected is to edit fe-secure-openssl.c to only allow specific TLS
  versions.  Adding configuration on the libpq side as suggested in the
  original discussion might help here.

* The code allows specifying SSLv2 and SSLv3 in the GUC, but removes
  them forcibly after parsing the complete string (a warning is issued).
  Should we also add a note about this to the documentation?

* In case the list of allowed protocols comes empty a FATAL error
  message is logged: "could not set the protocol list (no valid
  protocols available)".  I think it's worth changing that to "could not
  set SSL protocol list..."  All other related error/warning logs
  include "SSL".

* The added code follows conventions and looks good to me.  I didn't
  spot any problems in the parser.  I've tried a good deal of different
  values and all seem to be parsed correctly.

I would try to apply patches for older branches if there is consensus
that we really need this patch and we want to back-patch it.

Thanks.
--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing unreferenced files

2014-11-18 Thread Alex Shulgin

Bruce Momjian  writes:
>
> Here is a cleaned-up version of the unreference file patch that was
> discussed extensively in May of 2005.  I want to get it into the
> archives in case someone else want to work on it.
>
> Here is a reference to the work still needed on the patch:
>
>   http://archives.postgresql.org/pgsql-patches/2005-05/msg00024.php

Hello,

This is a TODO item, but I'd like to know if there's general interest in
this feature at the moment.

I think it can be useful in a situation when the DBA knows that the
database was shutdown or crashed during extensive data load and has a
way to proceed without the need to start over with the clean database.
In such a case having a tool to report or remove any stale files makes
perfect sense.

My other concern is that if at some point there is a bug in DROP
TABLE/INDEX that leaves the files it's supposed to remove, we would
likely want to know it.

Regardless of the outcome of this discussion it is interesting to know
if such functionality can only be programmed in backend/startup or
making it a separate tool is feasible (the tool will only run with the
stopped server)?  There was once a tool named pgfsck[1], which was never
updated after 8.2, so the name is already taken...

Yet another point is that one might be interested not only in reporting
stale files, but any *missing* ones too.  Currently, you would only know
if a relation data file is missing when actually trying to read from it
and no attempt is made to verify this on startup.  This might be not a
very useful check since if the file is not missing, but corrupted the
check doesn't buy you much.  (I am likely kicking a dead horse here.)

Thank you.
--
Alex

[1] http://svana.org/kleptog/pgsql/pgfsck.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Escaping from blocked send() reprised.

2014-11-17 Thread Alex Shulgin

Andres Freund  writes:
>
> I've invested some more time in this:
> 0002 now makes sense on its own and doesn't change anything around the
>  interrupt handling. Oh, and it compiles without 0003.

In this patch, the endif appears to be misplaced in PostgresMain:

+   if (MyProcPort != NULL)
+   {
+#ifdef WIN32
+   pgwin32_noblock = true;
+#else
+   if (!pg_set_noblock(MyProcPort->sock))
+   ereport(COMMERROR,
+   (errmsg("could not set socket to 
nonblocking mode: %m")));
+   }
+#endif
+
pqinitmask();

> 0003 Sinval/notify processing got simplified further. There really isn't
>  any need for DisableNotifyInterrupt/DisableCatchupInterrupt
>  anymore. Also begin_client_read/client_read_ended don't make much
>  sense anymore. Instead introduce ProcessClientReadInterrupt (which
>  wants a better name).
> There's also a very WIP
> 0004 Allows secure_read/write be interrupted when ProcDiePending is
>  set. All of that happens via the latch mechanism, nothing happens
>  inside signal handlers. So I do think it's quite an improvement
>  over what's been discussed in this thread.
>  But it (and the other approaches) do noticeably increase the
>  likelihood of clients not getting the error message if the client
>  isn't actually dead. The likelihood of write() being blocked
>  *temporarily* due to normal bandwidth constraints is quite high
>  when you consider COPY FROM and similar. Right now we'll wait till
>  we can put the error message into the socket afaics.
>
> 1-3 need some serious comment work, but I think the approach is
> basically sound. I'm much, much less sure about allowing send() to be
> interrupted.

After re-reading these I don't see the rest of items I wanted to inqury
about anymore, so it just makes more sense now.

One thing I did try is sending a NOTICE to the client when in
ProcessInterrupts() and DoingCommandRead is true.  I think[1] it was
expected to be delivered instantly, but actually the client (psql) only
displays it after sending the next statement.

While I'm reading on FE/BE protocol someone might want to share his
wisdom on this subject.  My guess: psql blocks on readline/libedit call
and can't effectively poll the server socket before complete input from
user.

--
Alex

[1] http://www.postgresql.org/message-id/1262173040.19367.5015.camel@ebony

  ``AFAIK, NOTICE was suggested because it can be sent at any time,
whereas ERRORs are only associated with statements.''


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idle transaction cancel/timeout and SSL revisited

2014-11-17 Thread Alex Shulgin

Andres Freund  writes:
> On 2014-11-15 00:11:36 +0300, Alex Shulgin wrote: 
>> After reading up through archives on the two $subj related TODO items
>> I'm under impression that the patches[1,2] didn't make it mainly because
>> of the risk of breaking SSL internals if we try to longjump out of the
>> signal handler in the middle of a blocking SSL read and/or if we try to
>> report that final "transaction/connection aborted" notice to the client.
>
> I've written a patch that allows that - check
> http://archives.postgresql.org/message-id/20140927225421.GE5423%40alap3.anarazel.de
>
> I feel pretty confident that that's the way to go. I just need some time
> to polish it.
>
>> What if instead of trying to escape from the signal handler we would set
>> a flag and use it to simulate EOF after the recv() is restarted due to
>> EINTR?  From the backend perspective it should look like the client has
>> just hang up.
>
> That's pretty much what the above does. Except that it actually deals
> with blocking syscalls by not using them and relying on latches/select
> instead.

Yay, that's nice, because my EOF approach is sort of fishy.

I've checked the patches: they apply and compile on top of current HEAD
(one failed hunk in pqcomm.c), so I can help with testing if needed.

There is a (short) list of items that caught my attention.  I will post
in reply to your patches thread.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idle transaction cancel/timeout and SSL revisited

2014-11-14 Thread Alex Shulgin

Andres Freund  writes:
>
> On 2014-11-15 00:11:36 +0300, Alex Shulgin wrote: 
>> After reading up through archives on the two $subj related TODO items
>> I'm under impression that the patches[1,2] didn't make it mainly because
>> of the risk of breaking SSL internals if we try to longjump out of the
>> signal handler in the middle of a blocking SSL read and/or if we try to
>> report that final "transaction/connection aborted" notice to the client.
>
> I've written a patch that allows that - check
> http://archives.postgresql.org/message-id/20140927225421.GE5423%40alap3.anarazel.de
>
> I feel pretty confident that that's the way to go. I just need some time
> to polish it.
>
>> What if instead of trying to escape from the signal handler we would set
>> a flag and use it to simulate EOF after the recv() is restarted due to
>> EINTR?  From the backend perspective it should look like the client has
>> just hang up.
>
> That's pretty much what the above does. Except that it actually deals
> with blocking syscalls by not using them and relying on latches/select
> instead.

Neat, I must have missed it.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   4   5   6   7   >