Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept
-Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila Sent: Monday, September 4, 2017 3:32 PM To: i.kartys...@postgrespro.ru Cc: pgsql-hackers Subject: Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept On Mon, Sep 4, 2017 at 4:34 PM, wrote: > Our clients complain about this issue and therefore I want to raise > the discussion and suggest several solutions to this problem: > > I. Why does PG use Fatal when Error is enough to release lock that > rises lock conflict? > "If (RecoveryConflictPending && DoingCommandRead)" > > II. Do we really need to truncate the table on hot standby exactly at > the same time when truncate on master occurs? > > In my case conflict happens when the autovacuum truncates table tbl1 > on master while backend on replica is performing a long transaction > involving the same table tbl1. This happens because truncate takes an > AccessExclusiveLock. To tackle this issue we have several options: > > 1. We can postpone the truncate on the master until all the replicas > have finished their transactions (in this case, feedback requests to > the master should be sent frequently) Patch 1 > vacuum_lazy_truncate.patch > > 2. Maybe there is an option somehow not to send AccessExclusiveLock > and not to truncate table on the replica right away. We could try to > wait a little and truncate tbl1 on replica again. > Can max_standby_streaming_delay help in this situation (point number - 2)? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com Hello! In this situation this parameter (max_standby_streaming_delay) wont help because if you have subsequent statement on standby (following info is from documentation and from our experience ): Thus, if one query has resulted in significant delay, subsequent conflicting queries will have much less grace time until the standby server has caught up again. And you never now how to set this parameter exept to -1 which mean up to infinity delayed standby. On our experience only autovacuum on master took AccesExclusiveLock that raise this Fatal message on standby. After this AccessExclusive reached standby and max_standby_streaming_delay > -1 you definitely sooner or later get this Fatal on recovery . With this patch we try to get rid of AccessEclusiveLock applied on standby while we have active statement on it. -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel COPY FROM execution
Greetings pgsql-hackers, I have completed the general infrastructure for parallel COPY FROM execution, consisting of Main (master) process and multiple BGWorkers connected with master using a personal message query (shm_mq). Master process does: - Dynamic shared memory allocation with parallel state across BGWorkers and master - Attaching every worker to the personal message query (shm_mq) - Wait workers initialization using Latch - Read raw text lines using CopyReadLine and puts them into shm_mq's via round-robin to balance queries load - When EOF is reached sends zero-length message and workers are safely shut down when receive it - Wait for worker until they complete their jobs using ConditionalVariable Each BGWorker does: - Signal master on initialization via Latch - Receive raw text lines over the personal shm_mq and put them into the log (for now) - Reinitialize db connection using the same db_id and user_id as main process - Signal master via ConditionalVariable on job done All parallel state modifications are done under LWLocks. You can find actual code here https://github.com/ololobus/postgres/pull/2/files (it is still in progress, so has a lot of duplications and comments, which are to-be-deleted) To go forward I have to overcome some obstacles: - Currently all copy.c methods are designed to work with one giant structure – CopyState. It includes buffers, many initial parameters which stay unchanged and a few variables which vary during COPY FROM execution. Since I need all these parameters, I have to obtain them somehow inside each BGWorker process. I see two possible solutions here: 1) Perform BeginCopyFrom initialization inside master and put required parameters into shared memory. However, many of them are arrays of a variable size (e.g. partition_tupconv_maps, force_notnull_flags), so I cannot put them into shmem inside one single struct. The best idea I have is to put each parameter under the personal shmem TOC key, which seems to be quite ugly. 2) Perform BeginCopyFrom initialization inside each BGWorker. However, it also opens input file/pipe for read, which is not necessary for workers and may cause some interference with master, but I can modify BeginCopyFrom. - I have used both Latch and ConditionalVariable for the same purpose–wait until some signal occurs–and for me as an end user they perform quite similar. I looked into the condition_variable.c code and it uses Latch and SpinLock under the hood. So what are differences and dis-/advantages between Latch and ConditionalVariable? I will be glad if someone will help me to find an answer to my question; also any comments and remarks to the overall COPY FROM processing architecture are very welcome. Alexey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel COPY FROM execution
On Fri, Jun 30, 2017 at 3:35 PM, Pavel Stehule wrote: > > > 2017-06-30 14:23 GMT+02:00 Alex K : >> >> Thus, it results in a ~60% performance boost per each x2 multiplication of >> parallel processes, which is consistent with the initial estimation. >> > > the important use case is big table with lot of indexes. Did you test > similar case? Not yet, I will try it, thank you for a suggestion. But how much is it 'big table' and 'lot of indexes' in numbers approximately? Also, index updates and constraint checks performance are what I cannot control during COPY execution, so probably I have not to care too much about that. But of course, it is interesting, how does COPY perform in that case. Alexey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel COPY FROM execution
Greetings pgsql-hackers, I am a GSOC student this year, my initial proposal has been discussed in the following thread https://www.postgresql.org/message-id/flat/7179F2FD-49CE-4093-AE14-1B26C5DFB0DA%40gmail.com Patch with COPY FROM errors handling seems to be quite finished, so I have started thinking about parallelism in COPY FROM, which is the next point in my proposal. In order to understand are there any expensive calls in COPY, which can be executed in parallel, I did a small research. First, please, find flame graph of the most expensive copy.c calls during the 'COPY FROM file' attached (copy_from.svg). It reveals, that inevitably serial operations like CopyReadLine (<15%), heap_multi_insert (~15%) take less than 50% of time in summary, while remaining operations like heap_form_tuple and multiple checks inside NextCopyFrom probably can be executed well in parallel. Second, I have compared an execution time of 'COPY FROM a single large file (~300 MB, 5000 lines)' vs. 'COPY FROM four equal parts of the original file executed in the four parallel processes'. Though it is a very rough test, it helps to obtain an overall estimation: Serial: real 0m56.571s user 0m0.005s sys 0m0.006s Parallel (x4): real 0m22.542s user 0m0.015s sys 0m0.018s Thus, it results in a ~60% performance boost per each x2 multiplication of parallel processes, which is consistent with the initial estimation. After several discussions I have two possible solutions on my mind: 1) Simple solution Let us focus only on the 'COPY FROM file', then it is relatively easy to implement, just give the same file and offset to each worker. ++ Simple; more reliable solution; probably it will give us the most possible performance boost - - Limited number of use cases. Though 'COPY FROM file' is a frequent case, even when one use it with psql \copy, client-side file read and stdin streaming to the backend are actually performed 2) True parallelism Implement a pool of bg_workers and simple shared_buffer/query. While main COPY process will read an input data and put raw lines into the query, parallel bg_workers will take lines from there and process. ++ More general solution; support of various COPY FROM use-cases - - Much more sophisticated solution; probably less performance boost compared to 1) I will be glad to any comments and criticism. Alexey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why restore_command is called for existing files in pg_xlog?
On Fri, Jun 2, 2017, at 11:51 AM, Alexander Kukushkin wrote: > Hello hackers, > There is one strange and awful thing I don't understand about > restore_command: it is always being called for every single WAL > segment postgres wants to apply (even if such segment already exists > in pg_xlog) until replica start streaming from the master. The real problem this question is related to is being unable to bring a former master, demoted after a crash, online, since the WAL segments required to get it to the consistent state were not archived while it was still a master, and local segments in pg_xlog are ignored when a restore_command is defined. The other replicas wouldn't be good candidates for promotion as well, as they were way behind the master (because the last N WAL segments were not archived and streaming replication had a few seconds delay). Is this a correct list for such questions, or would it be more appropriate to ask elsewhere (i.e. pgsql-bugs?) > > If there is no restore_command in the recovery.conf - it perfectly > works, i.e. postgres replays existing wal segments and at some point > connects to the master and start streaming from it.> > When recovery_conf is there, starting of a replica could become a real > problem, especially if restore_command is slow.> > Is it possible to change this behavior somehow? First look into > pg_xlog and only if file is missing or "corrupted" call > restore_command.> > > Regards, > --- > Alexander Kukushkin Sincerely, Alex
Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling
Hi pgsql-hackers, Thank you again for all these replies. I have started working under this project and learnt a lot of new stuff last month, so here are some new thoughts about ERRORS handling in COPY. I decided to stick to the same thread, since it has a neutral subject. (1) One of my mentors--Alvaro Herrera--suggested me to have a look on the UPSERT. It may be a good point to be able to achieve the same functionality as during the ON CONFLICT DO NOTHING, when COPY actually inserts tuples and errors handling is turned on. It could additionally reduce number of failed subtransactions and reduce XIDs consumption, while still ignoring some common errors like unique index violation. Adding a full support of ON CONFLICT DO NOTHING/UPDATE to COPY seems to be a large separated task and is out of the current project scope, but maybe there is a relatively simple way to somehow perform internally tuples insert with ON CONFLICT DO NOTHING? I have added Peter Geoghegan to cc, as I understand he is the major contributor of UPSERT in PostgreSQL. It would be great if he will answer this question. (2) Otherwise, I am still going to use subtransactions via BeginInternalSubTransaction and PG_TRY / PG_CATCH with ReleaseCurrentSubTransaction / RollbackAndReleaseCurrentSubTransaction. To minimize XIDs consumption I will try to insert tuples in batches and pre-validate them as much as possible (as was suggested in the thread before). Alexey
Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling
Hi Alexander! I've missed your reply, since proposal submission deadline have passed last Monday and I didn't check hackers mailing list too frequently. (1) It seems that starting new subtransaction at step 4 is not necessary. We can just gather all error lines in one pass and at the end of input start the only one additional subtransaction with all safe-lines at once: [1, ..., k1 - 1, k1 + 1, ..., k2 - 1, k2 + 1, ...], where ki is an error line number. But assuming that the only livable use-case is when number of errors is relatively small compared to the total rows number, because if the input is in totally inconsistent format, then it seems useless to import it into the db. Thus, it is not 100% clear for me, would it be any real difference in performance, if one starts new subtransaction at step 4 or not. (2) Hmm, good question. As far as I know it is impossible to get stdin input size, thus it is impossible to distribute stdin directly to the parallel workers. The first approach which comes to the mind is to store stdin input in any kind of buffer/query and next read it in parallel by workers. The question is how it will perform in the case of large file, I guess poor, at least from the memory consumption perspective. But would parallel execution still be faster is the next question. Alexey On Thu, Apr 6, 2017 at 4:47 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > Hi, Alexey! > > On Tue, Mar 28, 2017 at 1:54 AM, Alexey Kondratov < > kondratov.alek...@gmail.com> wrote: > >> Thank you for your responses and valuable comments! >> >> I have written draft proposal https://docs.google.c >> om/document/d/1Y4mc_PCvRTjLsae-_fhevYfepv4sxaqwhOo4rlxvK1c/edit >> >> It seems that COPY currently is able to return first error line and error >> type (extra or missing columns, type parse error, etc). >> Thus, the approach similar to the Stas wrote should work and, being >> optimised for a small number of error rows, should not >> affect COPY performance in such case. >> >> I will be glad to receive any critical remarks and suggestions. >> > > I've following questions about your proposal. > > 1. Suppose we have to insert N records >> 2. We create subtransaction with these N records >> 3. Error is raised on k-th line >> 4. Then, we can safely insert all lines from 1st and till (k - 1) >> > 5. Report, save to errors table or silently drop k-th line >> 6. Next, try to insert lines from (k + 1) till N with another >> subtransaction >> 7. Repeat until the end of file > > > Do you assume that we start new subtransaction in 4 since subtransaction > we started in 2 is rolled back? > > I am planning to use background worker processes for parallel COPY >> execution. Each process will receive equal piece of the input file. Since >> file is splitted by size not by lines, each worker will start import from >> the first new line to do not hit a broken line. > > > I think that situation when backend is directly reading file during COPY > is not typical. More typical case is \copy psql command. In that case > "COPY ... FROM stdin;" is actually executed while psql is streaming the > data. > How can we apply parallel COPY in this case? > > -- > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company >
Re: [HACKERS] Back-patch use of unnamed POSIX semaphores for Linux?
On Wed, Dec 7, 2016 at 3:42 PM, Tom Lane wrote: > > > Hmm ... after further experimentation, I still can't get this version of > systemd (231) to do anything evil. It turns out that Fedora ships it with > KillUserProcesses turned off by default, and maybe having that on is a > prerequisite for the other behavior? But that doesn't make a lot of sense > because we'd never be seeing the reports of databases moaning about lost > semaphores if the processes got killed first. Anyway, I see nothing bad > happening if KillUserProcesses is off, while if it's on then the database > gets shut down reasonably politely via SIGTERM. > > Color me confused ... maybe systemd's behavior has changed? > Hrm, the following incantation seems to break for me on a fresh Fedora 25 system: 1) As root su to $USER and start postgres. 2) ssh in as $USER and then logout 3) # psql localhost FATAL: semctl(4980742, 3, SETVAL, 0) failed: Invalid argument LOG: server process (PID 14569) exited with exit code 1 ...
Re: [HACKERS] Back-patch use of unnamed POSIX semaphores for Linux?
On Wed, Dec 7, 2016 at 1:12 PM, Tom Lane wrote: > But this is all kind of moot if Peter is right that systemd will zap > POSIX shmem along with SysV semaphores. I've been trying to reproduce > the issue on a Fedora 25 installation, and so far I can't get it to > zap anything, so I'm a bit at a loss how to prove things one way or > the other. > Don't know precisely about Fedora 25, but I've had success in the past with: ssh in as the user start postgres under tmux/screen logout do another ssh login/logout cycle After logon, you should see "/usr/lib/systemd/systemd --user" running for that user. After logout out, said proc should exit. If either of those is not true, either systemd is not setup to track sessions (probably via pam) or it thinks you still have an active logon. Another way to check if systemd thinks the user is logged in is if /run/user/$USER exists.
Re: [HACKERS] Parallel sec scan in plpgsql
On 18.09.2016 06:54, Amit Kapila wrote: On Fri, Sep 16, 2016 at 8:48 PM, Alex Ignatov wrote: On 16.09.2016 16:50, Amit Kapila wrote: Can you try by setting force_parallel_mode = off;? I think it is sending the whole function execution to worker due to force_parallel_mode. No changes: Okay, it just skipped from my mind that we don't support parallel queries for SQL statement execution (or statements executed via exec_stmt_execsql) from plpgsql. For detailed explanation of why that is not feasible you can refer one of my earlier e-mails [1] on similar topic. I think if we can somehow get the results via Perform statement, then it could be possible to use parallelism via plpgsql. However, you can use it via SQL functions, an example is below: set min_parallel_relation_size =0; set parallel_tuple_cost=0; set parallel_setup_cost=0; Load 'auto_explain'; set auto_explain.log_min_duration = 0; set auto_explain.log_analyze = true; set auto_explain.log_nested_statements = true; create table test_plpgsql(c1 int, c2 char(1000)); insert into test_plpgsql values(generate_series(1,10),'aaa'); create or replace function parallel_test_set_sql() returns setof bigint as $$ select count(*) from test_plpgsql; $$language sql PARALLEL SAFE STRICT STABLE; Then execute function as: select * from parallel_test_set_sql(); You can see below plan if auto_explain module is loaded. Finalize Aggregate (cost=14806.85..14806.86 rows=1 width=8) (actual tim e=1094.966..1094.967 rows=1 loops=1) -> Gather (cost=14806.83..14806.84 rows=2 width=8) (actual time=472. 216..1094.943 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=14806.83..14806.84 rows=1 width=8) (actual time=177.867..177.868 rows=1 loops=3) -> Parallel Seq Scan on test_plpgsql (cost=0.00..14702.6 7 rows=41667 width=0) (actual time=0.384..142.565 rows=3 loops=3) CONTEXT: SQL function "parallel_test_set_sql" statement 1 LOG: duration: 2965.040 ms plan: Query Text: select * from parallel_test_set_sql(); Function Scan on parallel_test_set_sql (cost=0.25..10.25 rows=1000 widt h=8) (actual time=2538.620..2776.955 rows=1 loops=1) In general, I think we should support the cases as required (or written) by you from plpgsql or sql functions. We need more work to support such cases. There are probably two ways of supporting such cases, we can build some intelligence in plpgsql execution such that it can recognise such queries and allow to use parallelism or we need to think of enabling parallelism for cases where we don't run the plan to completion. Most of the use cases from plpgsql or sql function fall into later category as they don't generally run the plan to completion. [1] - https://www.postgresql.org/message-id/CAA4eK1K8kaO_jRk42-o2rmhSRbKV-3mR%2BiNVcONLdbcSXW5TfQ%40mail.gmail.com Thank you for you sugestion! That works. But what we can do with this function: create or replace function parallel_test_sql(t int) returns setof bigint as $$ select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 group by a,b,c,d,e)t; $$ language sql PARALLEL SAFE STRICT STABLE; explain (analyze,buffers) select * from parallel_test_sql(2); "Function Scan on parallel_test_sql (cost=0.25..10.25 rows=1000 width=8) (actual time=2410.789..2410.790 rows=1 loops=1)" " Buffers: shared hit=63696" "Planning time: 0.082 ms" "Execution time: 2410.841 ms" 2016-09-20 14:09:04 MSK [13037]: [75-1] user=ipdr,db=ipdr,app=pgAdmin III - Query Tool,client=127.0.0.1 LOG: duration: 2410.135 ms plan: Query Text: select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 group by a,b,c,d,e)t; Aggregate (cost=230701.42..230701.43 rows=1 width=8) -> HashAggregate (cost=230363.59..230513.74 rows=15015 width=28) Group Key: test.a, test.b, test.c, test.d, test.e -> Seq Scan on test (cost=0.00..188696.44 rows=372 width=20) Filter: (a >= $1) No parallelism again. Looks like that Filter: (a >= $1) breaks parallelism Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel sec scan in plpgsql
On 16.09.2016 16:50, Amit Kapila wrote: On Fri, Sep 16, 2016 at 6:57 PM, Alex Ignatov wrote: No it doesn't. Paralleling neither sql function nor plpgsql: Here is example : ipdr=> show max_worker_processes ; max_worker_processes -- 128 (1 row) ipdr=> set max_parallel_workers_per_gather to 128; SET ipdr=> set force_parallel_mode=on; SET ipdr=> set min_parallel_relation_size =0; SET ipdr=> set parallel_tuple_cost=0; SET Can you try by setting force_parallel_mode = off;? I think it is sending the whole function execution to worker due to force_parallel_mode. No changes: ipdr=> set max_parallel_workers_per_gather to 128; SET ipdr=> set min_parallel_relation_size =0; SET ipdr=> set parallel_tuple_cost=0; SET ipdr=> set force_parallel_mode = off; SET ipdr=> select name,setting from pg_settings where name in('max_parallel_workers_per_gather', ipdr(> 'min_parallel_relation_size', ipdr(>'parallel_tuple_cost', ipdr(> 'force_parallel_mode'); name | setting -+- force_parallel_mode | off max_parallel_workers_per_gather | 128 min_parallel_relation_size | 0 parallel_tuple_cost | 0 (4 rows) ipdr=> explain (analyze,buffers) select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t; QUERY PLAN - Aggregate (cost=87702.33..87702.34 rows=1 width=8) (actual time=709.643..709.643 rows=1 loops=1) Buffers: shared hit=65015 -> Finalize HashAggregate (cost=87364.49..87514.64 rows=15015 width=28) (actual time=706.382..708.456 rows=15015 loops=1) Group Key: test.a, test.b, test.c, test.d, test.e Buffers: shared hit=65015 -> Gather (cost=85149.78..85299.93 rows=165165 width=20) (actual time=478.626..645.209 rows=180180 loops=1) Workers Planned: 11 Workers Launched: 11 Buffers: shared hit=65015 -> Partial HashAggregate (cost=84149.78..84299.93 rows=15015 width=20) (actual time=473.890..478.309 rows=15015 loops=12) Group Key: test.a, test.b, test.c, test.d, test.e Buffers: shared hit=63695 -> Parallel Seq Scan on test (cost=0.00..72786.01 rows=909101 width=20) (actual time=0.021..163.120 rows=83 loops=12) Buffers: shared hit=63695 Planning time: 0.318 ms Execution time: 710.600 ms (16 rows) ipdr=> explain (analyze,buffers) select parallel_test_plpgsql(); QUERY PLAN -- Result (cost=0.00..0.26 rows=1 width=8) (actual time=4003.719..4003.720 rows=1 loops=1) Buffers: shared hit=63869 Planning time: 0.021 ms Execution time: 4003.769 ms (4 rows) auto_explain: 2016-09-16 18:02:29 MSK [29353]: [53-1] user=ipdr,db=ipdr,app=psql,client=[local] LOG: duration: 4001.275 ms plan: Query Text: select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t Aggregate (cost=289035.43..289035.44 rows=1 width=8) -> HashAggregate (cost=288697.59..288847.74 rows=15015 width=28) Group Key: test.a, test.b, test.c, test.d, test.e -> Seq Scan on test (cost=0.00..163696.15 rows=1115 width=20) 2016-09-16 18:02:29 MSK [29353]: [54-1] user=ipdr,db=ipdr,app=psql,client=[local] CONTEXT: SQL statement "select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t" PL/pgSQL function parallel_test_plpgsql() line 5 at SQL statement ipdr=> explain (analyze,buffers) select parallel_test_plpgsql(); QUERY PLAN -- Result (cost=0.00..0.26 rows=1 width=8) (actual time=4497.820..4497.822 rows=1 loops=1) Buffers: shared hit=63695 Planning time: 0.023 ms Execution time: 4497.872 ms (4 rows) auto_explain: 2016-09-16 18:03:23 MSK [29353]: [57-1] user=ipdr,db=ipdr,app=psql,client=[local] LOG: duration: 4497.050 ms plan: Query Text: select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t Aggregate (cost=289035.43..289035.44 rows=1 width=8) -> HashAggregate (cost=288697.59..288847.74 rows=15015 width=28) Group Key: test.a, test.b, test.c, test.d, test.e
Re: [HACKERS] Parallel sec scan in plpgsql
Aggregate (cost=289035.43..289035.44 rows=1 width=8) -> HashAggregate (cost=288697.59..288847.74 rows=15015 width=28) Group Key: test.a, test.b, test.c, test.d, test.e -> Seq Scan on test (cost=0.00..163696.15 rows=1115 width=20) So as we can see parallel secscan doesn't works in plpgsql and sql functions. Can somebody explains me where I was wrong? Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company On 16.09.2016 07:27, Ashutosh Bapat wrote: On Thu, Sep 15, 2016 at 9:15 PM, Alex Ignatov wrote: Hello! Does parallel secscan works in plpgsql? Parallel seq scan is a query optimization that will work independent of the source of the query - i.e whether it comes directly from a client or a procedural language like plpgsql. So, I guess, answer to your question is yes. If you are expecting something else, more context will help. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel sec scan in plpgsql
Hello! Does parallel secscan works in plpgsql? -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On 23.08.2016 15:41, Michael Paquier wrote: On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada wrote: As for PoC, I implemented parallel vacuum so that each worker processes both 1 and 2 phases for particular block range. Suppose we vacuum 1000 blocks table with 4 workers, each worker processes 250 consecutive blocks in phase 1 and then reclaims dead tuples from heap and indexes (phase 2). So each worker is assigned a range of blocks, and processes them in parallel? This does not sound performance-wise. I recall Robert and Amit emails on the matter for sequential scan that this would suck performance out particularly for rotating disks. Rotating disks is not a problem - you can always raid them and etc. 8k allocation per relation once per half an hour that is the problem. Seq scan is this way = random scan... Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we lost Uber as a user
On 28.07.2016 17:53, Vladimir Sitnikov wrote: >> That's a recipe for runaway table bloat; VACUUM can't do much because >> there's always some minutes-old transaction hanging around (and SNAPSHOT >> TOO OLD doesn't really help, we're talking about minutes here), and >> because of all of the indexes HOT isn't effective. Just curious: what if PostgreSQL supported index that stores "primary key" (or unique key) instead of tids? Am I right that kind of index would not suffer from that bloat? I'm assuming the primary key is not updated, thus secondary indices build in that way should be much less prone to bloat when updates land to other columns (even if tid moves, its PK does not change, thus secondary index row could be reused). If that works, it could reduce index bloat, reduce the amount of WAL (less indices will need be updated). Of course it will make index scan a bit worse, however it looks like at least Uber is fine with that extra cost of index scan. Does it make sense to implement that kind of index as an access method? Vladimir You mean IOT like Oracle have? Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Strange behavior of some volatile function like random(), nextval()
On 29.06.2016 15:30, David G. Johnston wrote: More specifically... On Wed, Jun 29, 2016 at 7:34 AM, Michael Paquier mailto:michael.paqu...@gmail.com>>wrote: On Wed, Jun 29, 2016 at 7:43 PM, Alex Ignatov mailto:a.igna...@postgrespro.ru>> wrote: > Hello! > > Got some strange behavior of random() function: > > postgres=# select (select random() ) from generate_series(1,10) as i; > random > --- > 0.831577288918197 > [...] > (10 rows) I recall that this is treated as an implicit LATERAL, meaning that random() is calculated only once. A non-correlated (i.e., does not refer to outer variables) subquery placed into the target-list need only have its value computed once - so that is what happens. The fact that a volatile function can return different values given the same arguments doesn't mean much when the function is only ever called a single time. > postgres=# select (select random()+i*0 ) from generate_series(1,10) as i; > ?column? > >0.97471913928166 > [...] > (10 rows) But not that. So those results do not surprise me. A correlated subquery, on the other hand, has to be called once for every row and is evaluated within the context supplied by said row. Each time random is called it returns a new value. Section 4.2.11 (9.6 docs) https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SQL-SYNTAX-SCALAR-SUBQUERIES Maybe this could be worded better but the first part talks about a single execution while "any one execution" is mentioned in reference to "the surrounding query". I do think that defining "correlated" and "non-correlated" subqueries within this section would be worthwhile. David J. In this subquery(below) we have reference to outer variables but it is not working as it should(or i dont understand something): postgres=# postgres=# select id, ( select string_agg('a','') from generate_series(1,trunc(10*random()+1)::int) where id=id) from generate_series(1,10) as id; id | string_agg + 1 | aaa 2 | aaa ... but this query(with reference to outer var) working perfectly: postgres=# select id,(select random() where id=id) from generate_series(1,10) as id; id | random + 1 | 0.974509597290307 2 | 0.219822214450687 ... Also this query is working good( (id-id) do the job): postgres=# select id, ( select string_agg('a','') from generate_series(1,trunc(10*random()+1)::int+(id-id)) ) from generate_series(1,10) as id; id | string_agg + 1 | aaa 2 | a ... It means that even reference to outer variables doesn't mean that executor execute volatile function from subquery every time. Or there is something else what i should know? Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] Strange behavior of some volatile function like random(), nextval()
Hello! Got some strange behavior of random() function: postgres=# select (select random() ) from generate_series(1,10) as i; random --- 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 0.831577288918197 (10 rows) postgres=# select (select random()+i*0 ) from generate_series(1,10) as i; ?column? 0.97471913928166 0.0532126761972904 0.331358563620597 0.0573496259748936 0.321165383327752 0.48836630070582 0.444201893173158 0.0729857799597085 0.661443184129894 0.706566562876105 (10 rows) postgres=# explain select (select random() ) from generate_series(1,10) as i; QUERY PLAN -- Function Scan on generate_series i (cost=0.02..10.01 rows=1000 width=0) InitPlan 1 (returns $0) -> Result (cost=0.00..0.01 rows=1 width=0) (3 rows) postgres=# explain select (select random()+i*0 ) from generate_series(1,10) as i; QUERY PLAN -- Function Scan on generate_series i (cost=0.00..30.00 rows=1000 width=4) SubPlan 1 -> Result (cost=0.00..0.02 rows=1 width=0) (3 rows) postgres=# \df+ random(); List of functions Schema | Name | Result data type | Argument data types | Type | Security | Volatility | Owner | Language | Source code | Description ++--+-++--++--+--+-+-- pg_catalog | random | double precision | | normal | invoker | volatile | postgres | internal | drandom | random value (1 row) Also: postgres=# create sequence test; CREATE SEQUENCE postgres=# SELECT (SELECT nextval('test')) FROM generate_series(1,10) as i; nextval - 1 1 1 1 1 1 1 1 1 1 (10 rows) postgres=# SELECT (SELECT nextval('test')+i*0) FROM generate_series(1,10) as i; ?column? -- 2 3 4 5 6 7 8 9 10 11 (10 rows) postgres=# \df+ nextval() ; List of functions Schema | Name | Result data type | Argument data types | Type | Security | Volatility | Owner | Language | Source code | Description +-+--+-++--++--+--+-+- pg_catalog | nextval | bigint | regclass| normal | invoker | volatile | postgres | internal | nextval_oid | sequence next value (1 row) Both function is volatile so from docs : "A VOLATILE function can do anything, including modifying the database. It can return different results on successive calls with the same arguments. The optimizer makes no assumptions about the behavior of such functions. A query using a volatile function will re-evaluate the function at every row where its value is needed." Something wrong with executor? Is it bug or executor feature related with subquery? -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Bug in to_timestamp().
Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company On 20.06.2016 17:09, Albe Laurenz wrote: Tom Lane wrote: I don't necessarily have an opinion yet. I would like to see more than just an unsupported assertion about what Oracle's behavior is. Also, how should FM mode affect this? I can supply what Oracle 12.1 does: SQL> SELECT to_timestamp('2016-06-13 15:43:36', ' /MM/DD HH24:MI:SS') AS ts FROM dual; TS 2016-06-13 15:43:36.0 AD SQL> SELECT to_timestamp('2016-06-13 15:43:36', '/MM/DD HH24:MI:SS') AS ts FROM dual; TS 2016-06-13 15:43:36.0 AD SQL> SELECT to_timestamp('2016-06-1315:43:36', '/MM/DD HH24:MI:SS') AS ts FROM dual; TS 2016-06-13 15:43:36.0 AD (to_timestamp_tz behaves the same way.) So Oracle seems to make no difference between one or more spaces. Yours, Laurenz Albe Guys, do we need to change this behavior or may be you can tell me that is normal because this and this: postgres=# SELECT TO_TIMESTAMP('2016-02-30 15:43:36', '-MM-DD HH24:MI:SS'); to_timestamp 2016-03-01 15:43:36+03 (1 row) but on the other side we have : postgres=# select '2016-02-30 15:43:36'::timestamp; ERROR: date/time field value out of range: "2016-02-30 15:43:36" LINE 1: select '2016-02-30 15:43:36'::timestamp; Another bug in to_timestamp/date()? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
On 23.06.2016 20:40, Tom Lane wrote: Robert Haas writes: On Thu, Jun 23, 2016 at 1:12 PM, David G. Johnston wrote: My understanding is that is not going to change for 9.6. That's exactly what is under discussion here. I would definitely agree with David on that point. Making to_timestamp noticeably better on this score seems like a nontrivial project, and post-beta is not the time for that sort of thing, even if we had full consensus on what to do. I'd suggest somebody work on a patch and put it up for review in the next cycle. Now, if you were to narrowly define the problem as "whether to skip non-spaces for a space in the format", maybe that could be fixed post-beta, but I think that's a wrongheaded approach. to_timestamp's issues with input that doesn't match the format are far wider than that. IMO we should try to resolve the whole problem with one coherent change, not make incremental incompatible changes at the margins. At the very least I'd want to see a thought-through proposal that addresses all three of these interrelated points: * what should a space in the format match * what should a non-space, non-format-code character in the format match * how should we handle fields that are not exactly the width suggested by the format regards, tom lane Totally agree that we need more discussion about error handling in this function! Also this behavior is observed in to_date() and to_number() function: postgres=# SELECT TO_DATE('2!0!1!6!0!6-/-/-/-/-/-/-1!/-/-/-/-/-/-/-3!', '-MM-DD'); to_date 0002-01-01 (1 row) postgres=# postgres=# select to_number('1$#@!!,2,%,%4,5,@%5@4..8-', '999G999D9S'); to_number --- 12 (1 row) On the our side we have some discussions about to write a patch that will change this incorrect behavior. So stay tuned. -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
On 23.06.2016 19:37, David G. Johnston wrote: On Thu, Jun 23, 2016 at 12:16 PM, Alex Ignatov mailto:a.igna...@postgrespro.ru>>wrote: On 23.06.2016 16:30, Bruce Momjian wrote: On Thu, Jun 23, 2016 at 07:41:26AM +, amul sul wrote: On Monday, 20 June 2016 8:53 PM, Alex Ignatov mailto:a.igna...@postgrespro.ru>> wrote: On 13.06.2016 18:52, amul sul wrote: And it wont stop on some simple whitespace. By using to_timestamp you can get any output results by providing illegal input parameters values: postgres=# SELECT TO_TIMESTAMP('2016-06-13 99 :99:99', 'MMDD HH24:MI:SS'); to_timestamp 2016-01-06 14:40:39+03 (1 row) We do consume extra space from input string, but not if it is in format string, see below: postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', '/MM/DD HH24:MI:SS'); to_timestamp 2016-06-13 15:43:36-07 (1 row) We should have same treatment for format string too. Thoughts? Comments? Well, the user specifies the format string, while the input string comes from the data, so I don't see having them behave the same as necessary. To be honest they not just behave differently. to_timestamp is just incorrectly handles input data and nothing else.There is no excuse for such behavior: postgres=# SELECT TO_TIMESTAMP('20:-16-06:13: 15_43:!36', '/MM/DD HH24:MI:SS'); to_timestamp -- 0018-08-05 13:15:43+02:30:17 (1 row) T o be honest I don't see how this is relevant to quoted content. And you've already made this point quite clearly - repeating it isn't constructive. This behavior has existed for a long time and I don't see that changing it is a worthwhile endeavor. I believe a new function is required that has saner behavior. Otherwise given good input and a well-formed parse string the function does exactly what it is designed to do. Avoid giving it garbage and you will be fine. Maybe wrap the call to the in a function that also checks for the expected layout and RAISE EXCEPTION if it doesn't match. David J. Arguing just like that one can say that we don't even need exception like "division by zero". Just use well-formed numbers in denominator... Input data sometimes can be generated automagically. Without exception throwing debugging stored function containing to_timestamp can be painful.
Re: [HACKERS] Bug in to_timestamp().
On 23.06.2016 16:30, Bruce Momjian wrote: On Thu, Jun 23, 2016 at 07:41:26AM +, amul sul wrote: On Monday, 20 June 2016 8:53 PM, Alex Ignatov wrote: On 13.06.2016 18:52, amul sul wrote: And it wont stop on some simple whitespace. By using to_timestamp you can get any output results by providing illegal input parameters values: postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS'); to_timestamp 2016-01-06 14:40:39+03 (1 row) We do consume extra space from input string, but not if it is in format string, see below: postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', '/MM/DD HH24:MI:SS'); to_timestamp 2016-06-13 15:43:36-07 (1 row) We should have same treatment for format string too. Thoughts? Comments? Well, the user specifies the format string, while the input string comes from the data, so I don't see having them behave the same as necessary. To be honest they not just behave differently. to_timestamp is just incorrectly handles input data and nothing else.There is no excuse for such behavior: postgres=# SELECT TO_TIMESTAMP('20:-16-06:13: 15_43:!36', '/MM/DD HH24:MI:SS'); to_timestamp ------ 0018-08-05 13:15:43+02:30:17 (1 row) Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
On 13.06.2016 18:52, amul sul wrote: Hi, It's look like bug in to_timestamp() function when format string has more whitespaces compare to input string, see below: Ex.1: Two white spaces before HH24 whereas one before input time string postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', '/MM/DD HH24:MI:SS'); to_timestamp 2016-06-13 05:43:36-07 <— incorrect time (1 row) Ex.2: One whitespace before format string postgres=# SELECT TO_TIMESTAMP('2016/06/13 15:43:36', ' /MM/DD HH24:MI:SS'); to_timestamp -- 0016-06-13 15:43:36-07:52:58 <— incorrect year (1 row) If there are one or more consecutive whitespace in the format, we should skip those as long as we could get an actual field. Thoughts? Thanks & Regards, Amul Sul From docs about to_timestamp() ( https://www.postgresql.org/docs/9.5/static/functions-formatting.html) "These functions interpret input liberally, with minimal error checking. While they produce valid output, the conversion can yield unexpected results. For example, input to these functions is not restricted by normal ranges, thus to_date('20096040','MMDD') returns 2014-01-17 rather than causing an error. Casting does not have this behavior." And it wont stop on some simple whitespace. By using to_timestamp you can get any output results by providing illegal input parameters values: postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYYMMDD HH24:MI:SS'); to_timestamp 2016-01-06 14:40:39+03 (1 row) Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
On 20.06.2016 16:36, Tom Lane wrote: Robert Haas writes: On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas wrote: I think a space in the format string should skip a whitespace character in the input string, but not a non-whitespace character. It's my understanding that these functions exist in no small part for compatibility with Oracle, and Oracle declines to skip the digit '1' on the basis of an extra space in the format string, which IMHO is the behavior any reasonable user would expect. So Amul and I are of one opinion and Tom is of another. Anyone else have an opinion? I don't necessarily have an opinion yet. I would like to see more than just an unsupported assertion about what Oracle's behavior is. Also, how should FM mode affect this? regards, tom lane Oracle: SQL> SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS') from dual; SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS') from dual * ERROR at line 1: ORA-01843: not a valid month PG: postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS'); to_timestamp 2016-01-06 14:40:39+03 (1 row) I know about: "These functions interpret input liberally, with minimal error checking. While they produce valid output, the conversion can yield unexpected results" from docs but by providing illegal input parameters we have no any exceptions or errors about that. I think that to_timestamp() need to has more format checking than it has now. Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why we don't have checksums on clog files
Hello! Why we don't have checksums on clog files. We have checksum on pg_control, optional checksumming on data files, some form of checksumming on wal's. But why we don't have any checksumming on clogs. Corruptions on clogs lead to transaction visisbility problems and database consistency violation. Can anybody explain this situation with clogs? -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
On 13.05.2016 9:39, Michael Paquier wrote: Hi all, Beginning a new thread because the ext4 issues are closed, and because pg_basebackup data durability meritates a new thread. And in short about the problem: pg_basebackup makes no effort in being sure that the data it backs up is on disk, which is bad... One possible recommendation is to use initdb -S after running pg_basebackup, but making sure that data is on disk should be done before pg_basebackup ends. On Thu, May 12, 2016 at 8:09 PM, I wrote: And actually this won't fly high if there is no equivalent of walkdir() or if the fsync()'s are not applied recursively. On master at least the refactoring had better be done cleanly first... For the back branches, we could just have some recursive call like fsync_recursively and keep that in src/bin/pg_basebackup. Andres, do you think that this should be part of fe_utils or src/common/? I'd tend to think the latter is more adapted as there is an equivalent in the backend. On back-branches, we could just have something like fsync_recursively that walks though the paths. An even more simple approach would be to fsync() individually things that have been written, but that would suck in performance. So, attached are two patches that apply on HEAD to address the problem of pg_basebackup that does not sync the data it writes. As pg_basebackup cannot use directly initdb -S because, as a client-side utility, it may be installed while initdb is not (see Fedora and RHEL), I have refactored the code so as the routines in initdb.c doing the fsync of PGDATA and other fsync stuff are in src/fe_utils/, and this is 0001. Patch 0002 is a set of fixes for pg_basebackup: - In plain mode, fsync_pgdata is used so as all the tablespaces are fsync'd at once. This takes care as well of the case where pg_xlog is a symlink. - In tar mode (no stdout), each tar file is synced individually, and the base directory is synced once at the end. In both cases, failures are not considered fatal. With pg_basebackup -X and pg_receivexlog, the manipulation of WAL files is made durable by using fsync and durable_rename where needed (credits to Andres mainly for this part). This set of patches is aimed only at HEAD. Back-patchable versions of this patch would need to copy fsync_pgdata and friends into streamutil.c for example. I am adding that to the next CF for review as a bug fix. Regards, Hi! Do we have any confidence that data file is not being corrupted? I.e contains some corrupted page? Can pg_basebackup check page checksum (db init with initdb -k) while backing up files? Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is pg_control file crashsafe?
On 05.05.2016 7:16, Amit Kapila wrote: On Wed, May 4, 2016 at 8:03 PM, Tom Lane mailto:t...@sss.pgh.pa.us>> wrote: > > Amit Kapila mailto:amit.kapil...@gmail.com>> writes: > > On Wed, May 4, 2016 at 4:02 PM, Alex Ignatov mailto:a.igna...@postgrespro.ru>> > > wrote: > >> On 03.05.2016 2:17, Tom Lane wrote: > >>> Writing a single sector ought to be atomic too. > > >> pg_control is 8k long(i think it is legth of one page in default PG > >> compile settings). > > > The actual data written is always sizeof(ControlFileData) which should be > > less than one sector. > > Yes. We don't care what happens to the rest of the file as long as the > first sector's worth is updated atomically. See the comments for > PG_CONTROL_SIZE and the code in ReadControlFile/WriteControlFile. > > We could change to a different PG_CONTROL_SIZE pretty easily, and there's > certainly room to argue that reducing it to 512 or 1024 would be more > efficient. I think the motivation for setting it at 8K was basically > "we're already assuming that 8K writes are efficient, so let's assume > it here too". But since the file is only written once per checkpoint, > efficiency is not really a key selling point anyway. If you could make > an argument that some other size would reduce the risk of failures, > it would be interesting --- but I suspect any such argument would be > very dependent on the quirks of a specific file system. > How about using 512 bytes as a write size and perform direct writes rather than going via OS buffer cache for control file? Alex, is the issue reproducible (to ensure that if we try to solve it in some way, do we have way to test it as well)? > > One point worth considering is that on most file systems, rewriting > a fraction of a page is *less* efficient than rewriting a full page, > because the kernel first has to read in the old contents to fill > the disk buffer it's going to partially overwrite with new data. > This motivates against trying to reduce the write size too much. > Yes, you are very much right and I have observed that recently during my work on WAL Re-Writes [1]. However, I think that won't be the issue if we use direct writes for control file. [1] - http://www.postgresql.org/message-id/CAA4eK1+=O33dZZ=jbtjxbfyd67r5dlcqfyomj4f-qmfxbp1...@mail.gmail.com With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/> Hi! No issue happened only once. Also any attempts to reproduce it is not successful yet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is pg_control file crashsafe?
On 06.05.2016 0:42, Greg Stark wrote: On 5 May 2016 12:32 am, "Tom Lane" mailto:t...@sss.pgh.pa.us>> wrote: > > To repeat, I'm pretty hesitant to change this logic. While this is not > the first report we've ever heard of loss of pg_control, I believe I could > count those reports without running out of fingers on one hand --- and > that's counting since the last century. It will take quite a lot of > evidence to convince me that some other implementation will be more > reliable. If you just come and present a patch to use direct write, or > rename, or anything else for that matter, I'm going to reject it out of > hand unless you provide very strong evidence that it's going to be more > reliable than the current code across all the systems we support. One thing we could do without much worry of being less reliable would be to keep two copies of pg_control. Write one, fsync, then write to the other and fsync that one. Oracle keeps a copy of the old control file so that you can always go back to an older version if a hardware or software bug currupts it. But they keep a lot more data in their control file and they can be quite large. Oracle can create more then one copy of control file. They are the same, not old copy and current. And their advise is just to store this copies on separate storage to be more fault tolerant. PS By the way on my initial post about "is pg_control safe" i wrote in p 3. some thoughts about multiple copies of pg_control file. Glad to see identity of views on this issue -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is pg_control file crashsafe?
On 03.05.2016 2:17, Tom Lane wrote: Alex Ignatov writes: I think that rename can help a little bit. At least on some FS it is atomic operation. Writing a single sector ought to be atomic too. I'm very skeptical that it'll be an improvement to just move the risk from one filesystem operation to another; especially not to one where there's not even a terribly portable way to request fsync. regards, tom lane pg_control is 8k long(i think it is legth of one page in default PG compile settings). I also think that 8k recording can be atomic. Even if recording of one sector is atomic nobody can say about what sector from 8k record of pg_control should be written first. It can be last sector or say sector number 10 from 16. That why i mentioned renaming from tmp file to pg_control. Renaming in FS usually is atomic operation. And after power loss we have either old version of pg_control or new version of it. But not torn pg_control file. Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is pg_control file crashsafe?
Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company On 03.05.2016 2:21, Andres Freund wrote: Hi, On 2016-04-28 21:58:00 +, Alex Ignatov wrote: We have some issue with truncated pg_control file on Windows after power failure.My questions is : 1) Is pg_control protected from say , power crash or partial write? It should be. I think to make progress on this thread we're going to need a bit more details about the exact corruption. Was the length of the file change? Did the checksum fail? Did you just observe too old contents? Greetings, Andres Freund Length was 0 bytes after crash. It was Windows and ntfs + ssd in raid 1. File zeroed after power loss. Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is pg_control file crashsafe?
On 01.05.2016 0:55, Bruce Momjian wrote: On Thu, Apr 28, 2016 at 09:58:00PM +, Alex Ignatov wrote: Hello everyone! We have some issue with truncated pg_control file on Windows after power failure. My questions is : 1) Is pg_control protected from say , power crash or partial write? 2) How PG update pg_control? By writing in it or writing in some temp file and after that rename it to pg_control to be atomic? We write pg_controldata in one write() OS call: if (write(fd, buffer, PG_CONTROL_SIZE) != PG_CONTROL_SIZE) 3) Can PG have multiple pg_control copy to be more fault tolerant? PS During some experiments we found that at present time there is no any method to do crash recovery with "restored" version of pg_control (based on some manipulations with pg_resetxlog ). Only by using pg_resetxlog and setting it parameters to values taken from wal file (pg_xlogdump)we can at least start PG and saw that PG state is at the moment of last check point. But we have no real confidence that PG is in consistent state(also docs on pg_resetxlogs told us about it too) We have talked about improving the reliability of pg_control, but failures are so rare we have never done anything to improve it. I know Tatsuo has talked about making pg_control more reliable, so I am CC'ing him. Oh! Good. Thank you! It is rare but as we saw now it is our reality too. One of our customers had this issue on previous week =) I think that rename can help a little bit. At least on some FS it is atomic operation. -- Alex Ignatov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Is pg_control file crashsafe?
Hello everyone! We have some issue with truncated pg_control file on Windows after power failure.My questions is : 1) Is pg_control protected from say , power crash or partial write? 2) How PG update pg_control? By writing in it or writing in some temp file and after that rename it to pg_control to be atomic?3) Can PG have multiple pg_control copy to be more fault tolerant? PS During some experiments we found that at present time there is no any method to do crash recovery with "restored" version of pg_control (based on some manipulations with pg_resetxlog ). Only by using pg_resetxlog and setting it parameters to values taken from wal file (pg_xlogdump)we can at least start PG and saw that PG state is at the moment of last check point. But we have no real confidence that PG is in consistent state(also docs on pg_resetxlogs told us about it too) Alex IgnatovPostgres Professional: http://www.postgrespro.comRussian Postgres Company
Re: [HACKERS] More stable query plans via more predictable column statistics
On Mon, Apr 4, 2016 at 1:06 AM, Tom Lane wrote: > Alex Shulgin writes: > > On Sun, Apr 3, 2016 at 10:53 PM, Tom Lane wrote: > >> The reason for checking toowide_cnt is that if it's greater than zero, > >> then in fact the track list does NOT include all values seen, and it's > >> flat-out wrong to claim that it is an exhaustive set of values. > > > But do we really state that with the short path? > > Well, yes: the point of the short path is that we're hypothesizing that > the track list contains all values in the table, and that they should all > be considered MCVs. Another way to think about it is that if we didn't > have the width-limit implementation restriction, those values would appear > in the track list, almost certainly with count 1, and so we would not > have taken the short path anyway. > > Now you can argue that the long path would have accepted all the real > track-list entries as MCVs, and have rejected all these hypothetical > count-1 entries for too-wide values, and so the end result would be the > same. But that gets back to the fact that that's not necessarily how > the long path behaves, either today or with the proposed patch. > Agreed. The design intention was that the short path would handle columns > with a finite, small set of values (think booleans or enums) where the > ideal thing is that the table population is completely represented by > the MCV list. As soon as we have good reason to think that the MCV > list won't represent the table contents fully, we should switch over > to a different approach where we're trying to identify which sample > values are common enough to justify putting in the MCV list. This is a precious detail that I unfortunately couldn't find in any of the sources of information available to me online. :-) I don't have a habit of hanging on IRC channels, but now I wonder how likely is it that I could learn this by just asking around on #postgresql (or mailing you directly as the committer of this early implementation--is that OK at all?) Again, having this type of design decisions documented in the code might save some time and confusion for the sociopath^W introvert-type of folks like myself. ;-) In that > situation there are good reasons to not blindly fill the MCV list all > the way to the stats-target limit, but to try to cut it off at the > point of diminishing returns, so that the planner isn't saddled with > a huge MCV list that doesn't really contain a lot of useful information. > This came to be my understanding also at some point. So that's the logic behind there being two code paths with discontinuous > behavior. I'm not sure whether we need to try to narrow the discontinuity > or whether it's OK to act that way and we just need to refine the decision > rule about which path to take. But anyway, comparisons of frequencies > of candidate MCVs seem to me to make sense in a large-ndistinct scenario > (where we have to be selective) but not a small-ndistinct scenario > (where we should just take 'em all). > Yeah, this seems to be an open question. And a totally new one to me in the light of recent revelations. >> The point of the original logic was to try to decide whether the > >> values in the sample are significantly more common than typical values > >> in the whole table population. I think we may have broken that with > >> 3d3bf62f3: you can't make any such determination if you consider only > >> what's in the sample without trying to estimate what is not in the > >> sample. > > > Speaking of rabbit holes... > > I'm out of ideas, unfortunately. We badly need more eyes/brainpower on > > this, which is why I have submitted a talk proposal on this topic to > > PGDay.ru this summer in St. Petersburg, fearing that it might be too late > > to commit a satisfactory version during the current dev cycle for 9.6, > and > > in hope to draw at least some attention to it. > > If you're thinking it's too late to get more done for 9.6, Not necessarily, but given the time constraints and some personal issues that just keep popping up I'm not that optimistic as I was just 24h ago anymore. > I'm inclined to > revert the aspect of 3d3bf62f3 that made us work from "d" (the observed > number of distinct values in the sample) rather than stadistinct (the > extrapolated estimate for the table). On reflection I think that that's > inconsistent with the theory behind the old MCV-cutoff rule. It wouldn't > matter if we were going to replace the cutoff rule with something else, > but it's beginning to sound like that won't happen for 9.6. > Please feel free to do what you think is in the best interest of the people preparing 9.6 for the freeze. I'm not all that familiar with the process, but I guess reverting this early might save some head-scratching if some interesting interactions of this change combined with some others are going to show up. Cheers! -- Alex
Re: [HACKERS] More stable query plans via more predictable column statistics
On Sun, Apr 3, 2016 at 10:53 PM, Tom Lane wrote: > Alex Shulgin writes: > > This recalled observation can now also explain to me why in the > regression > > you've seen, the short path was not followed: my bet is that stadistinct > > appeared negative. > > Yes, I think that's right. The table under consideration had just a few > live rows (I think 3), so that even though there was only one value in > the sample, the "if (stats->stadistinct > 0.1 * totalrows)" condition > succeeded. > Yeah, this part of the logic can be really surprising at times. > Given that we change the logic in the complex path substantially, the > > assumptions that lead to the "Take all MVCs" condition above might no > > longer hold, and I see it as a pretty compelling argument to remove the > > extra checks, thus keeping the only one: track_cnt == ndistinct. This > > should also bring the patch's effect more close to the thread's topic, > > which is "More stable query plans". > > The reason for checking toowide_cnt is that if it's greater than zero, > then in fact the track list does NOT include all values seen, and it's > flat-out wrong to claim that it is an exhaustive set of values. > But do we really state that with the short path? If there would be only one too wide value, it might be the only thing left for the histogram in the end and will be discarded anyway, so from the end result perspective there is no difference. If there are multiple too wide values, they will be effectively discarded by the histogram calculation part also, so again no difference from the perspective of the end result. The reason for the track_cnt <= num_mcv condition is that if that's not > true, the track list has to be trimmed to meet the statistics target. > Again, that's not optional. > Yes, but this check we only need in compute_distinct_stats() and we are talking about compute_scalar_stats() now where track_cnt is always less than or equals to num_mcv (again, please see at the bottom of the thread-starting email), or is my analysis broken on this part? I think the reasoning for having the stats->stadistinct > 0 test in there > was that if we'd set it negative, then we think that the set of distinct > values will grow --- which again implies that the set of values actually > seen should not be considered exhaustive. This is actually very neat. So the idea here as I get it is that if we have enough distinct values to suspect that more unique ones will be added later as the table grows (which is a natural tendency with most of the tables anyway), then at the moment the statistics we produce are going to be actually used by the planner, it is likely that we no longer cover all the distinct values by the MCV list, right? I would *love* to see that documented in code comments at the least. Of course, with a table as > small as that regression-test example, we have little evidence to support > either that conclusion or its opposite. > I think it might be possible to record historical ndistinct values between the ANALYZE runs and use that as better evidence that the number of distincts is actually growing rather than basing that decision on that hard-coded 10% limit rule. What do you think? We do not support migration of pg_statistic system table during major version upgrades (yet), so if we somehow achieve what I've just described, it might be not a compatibility-breaking change anyway. It's possible that what we should do to eliminate the sudden change > of behaviors is to drop the "track list includes all values seen, and all > will fit" code path entirely, and always go through the track list > one-at-a-time. > That could also be an option, that I have considered initially. Now that I read your explanation of each check, I'm not that sure anymore. If we do, though, the currently-proposed filter rules aren't going to > be too satisfactory: if we have a relatively small group of roughly > equally common MCVs, this logic would reject all of them, which is > surely not what we want. > Indeed. :-( > The point of the original logic was to try to decide whether the > values in the sample are significantly more common than typical values > in the whole table population. I think we may have broken that with > 3d3bf62f3: you can't make any such determination if you consider only > what's in the sample without trying to estimate what is not in the > sample. > Speaking of rabbit holes... I'm out of ideas, unfortunately. We badly need more eyes/brainpower on this, which is why I have submitted a talk proposal on this topic to PGDay.ru this summer in St. Petersburg, fearing that it might be too late to commit a satisfactory version during the current dev cycle for 9.6, and in hope to draw at least some attention to it. Regards, -- Alex
Re: [HACKERS] More stable query plans via more predictable column statistics
On Sun, Apr 3, 2016 at 8:24 AM, Alex Shulgin wrote: > > On Sun, Apr 3, 2016 at 7:49 AM, Tom Lane wrote: >> >> Alex Shulgin writes: >> > On Sun, Apr 3, 2016 at 7:18 AM, Tom Lane wrote: >> >> Well, we have to do *something* with the last (possibly only) value. >> >> Neither "include always" nor "omit always" seem sane to me. What other >> >> decision rule do you want there? >> >> > Well, what implies that the last value is somehow special? I would think >> > we should just do with it whatever we do with the rest of the candidate >> > MCVs. >> >> Sure, but both of the proposed decision rules break down when there are no >> values after the one under consideration. We need to do something sane >> there. > > > Hm... There are indeed the case where it would beneficial to have at least 2 values in the histogram (to have at least the low/high bounds for inequality comparison selectivity) instead of taking both to the MCV list or taking one to the MCVs and having to discard the other. I was thinking about this in the background... Popularity of the last sample value (which is not the only) one can be: a) As high as 50%, in case we have an even division between the only two values in the sample. Quite obviously, we should take this one into the MCV list (well, unless the user has specified statistics_target of 1 for some bizarre reason, but that should not be our problem). b) As low as 2/(statistics_target*300), which is with the target set to a maximum allowed value of 10,000 amounts to 2/(10,000*300) = 1 in 1,500,000. This seems like a really tiny number, but if your table has some tens of billions of rows, for example, seeing such a value at least twice means that it might correspond to some thousands of rows in the table, whereas seeing a value only once might mean just that: it's a unique value. In this case, putting such a duplicate value in the MCV list will allow a much better selectivity estimate for equality comparison, as I've mentioned earlier. It also allows for better estimate with inequality comparison, since MCVs are also consulted in this case. I see no good reason to discard such a value. c) Or anything in between the above figures. In my opinion that amounts to "include always" being the sane option. Do you see anything else as a problem here? > Obviously, we need a fresh idea on how to handle this. On reflection, the case where we have a duplicate value in the track list which is not followed by any other sample should be covered by the short path where we put all the tracked values in the MCV list, so there should be no point to even consider all of the above! But the exact short path condition is formulated like this: if (track_cnt == ndistinct && toowide_cnt == 0 && stats->stadistinct > 0 && track_cnt <= num_mcv) { /* Track list includes all values seen, and all will fit */ So the execution path here is additionally put in dependence of two factors: whether we've seen at least one too wide sample or the distinct estimation produced a number higher than 10% of the estimated total table size (which is yet another arbitrary limit, but that's not in scope of this patch). I've been puzzled by these conditions a lot, as I have mentioned in the last section of this thread's starting email and I could not find anything that would hint why they exist there, in the documentation, code comments or emails on hackers leading to the introduction of analyze.c in the form we know it today. Probably we will never know, unless Tom still has some notes on this topic from 15 years ago. ;-) This recalled observation can now also explain to me why in the regression you've seen, the short path was not followed: my bet is that stadistinct appeared negative. Given that we change the logic in the complex path substantially, the assumptions that lead to the "Take all MVCs" condition above might no longer hold, and I see it as a pretty compelling argument to remove the extra checks, thus keeping the only one: track_cnt == ndistinct. This should also bring the patch's effect more close to the thread's topic, which is "More stable query plans". Regards, -- Alex
Re: [HACKERS] More stable query plans via more predictable column statistics
On Sun, Apr 3, 2016, 18:40 Tom Lane wrote: > Alex Shulgin writes: > > > Well, if it's the only value it will be accepted simply because we are > > checking that special case already and don't even bother to loop through > > the track list. > > That was demonstrably not the case in the failing regression test. > I forget what aspect of the test case allowed it to get past the short > circuit, but it definitely got into the scan-the-track-list code. > Hm, I'll have to see that for myself, probably there was something more to it. -- Alex
Re: [HACKERS] Add schema-qualified relnames in constraint error messages.
On Sun, Apr 3, 2016, 18:32 Tom Lane wrote: > Peter Geoghegan writes: > > On Thu, Feb 11, 2016 at 9:50 AM, Robert Haas > wrote: > >> Wow, that's a fabulous idea. I see Oleksandr has tried to implement > >> it, although I haven't looked at the patch. But I think this would be > >> REALLY helpful. > > > +1 > > Pushed. > Yay!
Re: [HACKERS] More stable query plans via more predictable column statistics
On Sun, Apr 3, 2016 at 7:49 AM, Tom Lane wrote: > Alex Shulgin writes: > > On Sun, Apr 3, 2016 at 7:18 AM, Tom Lane wrote: > >> Well, we have to do *something* with the last (possibly only) value. > >> Neither "include always" nor "omit always" seem sane to me. What other > >> decision rule do you want there? > > > Well, what implies that the last value is somehow special? I would think > > we should just do with it whatever we do with the rest of the candidate > > MCVs. > > Sure, but both of the proposed decision rules break down when there are no > values after the one under consideration. We need to do something sane > there. > Hm... There are indeed the case where it would beneficial to have at least 2 values in the histogram (to have at least the low/high bounds for inequality comparison selectivity) instead of taking both to the MCV list or taking one to the MCVs and having to discard the other. Obviously, we need a fresh idea on how to handle this. > For "the only value" case: we cannot build a histogram out of a single > > value, so omitting it from MCVs is not a good strategy, ISTM. > > From my point of view that amounts to "include always". > > If there is only one value, it will have 100% of the samples, so it would > get included under just about any decision rule (other than "more than > 100% of this value plus following values"). I don't think making sure > this case works is sufficient to get us to a reasonable rule --- it's > a necessary case, but not a sufficient case. > Well, if it's the only value it will be accepted simply because we are checking that special case already and don't even bother to loop through the track list. -- Alex
Re: [HACKERS] More stable query plans via more predictable column statistics
On Sun, Apr 3, 2016 at 7:18 AM, Tom Lane wrote: > Alex Shulgin writes: > > On Sun, Apr 3, 2016 at 3:43 AM, Alex Shulgin > wrote: > >> I'm not sure yet about the 1% rule for the last value, but would also > love > >> to see if we can avoid the arbitrary limit here. What happens with a > last > >> value which is less than 1% popular in the current code anyway? > > > Now that I think about it, I don't really believe this arbitrary > heuristic > > is any good either, sorry. > > Yeah, it was just a placeholder to produce a working patch. > > Maybe we could base this cutoff on the stats target for the column? > That is, "1%" would be the right number if stats target is 100, > otherwise scale appropriately. > > > What was your motivation to introduce some limit at the bottom anyway? > > Well, we have to do *something* with the last (possibly only) value. > Neither "include always" nor "omit always" seem sane to me. What other > decision rule do you want there? > Well, what implies that the last value is somehow special? I would think we should just do with it whatever we do with the rest of the candidate MCVs. For "the only value" case: we cannot build a histogram out of a single value, so omitting it from MCVs is not a good strategy, ISTM. >From my point of view that amounts to "include always". What problems do you see with this approach exactly? -- Alex
Re: [HACKERS] More stable query plans via more predictable column statistics
On Sun, Apr 3, 2016 at 3:43 AM, Alex Shulgin wrote: > > I'm not sure yet about the 1% rule for the last value, but would also love > to see if we can avoid the arbitrary limit here. What happens with a last > value which is less than 1% popular in the current code anyway? > Tom, Now that I think about it, I don't really believe this arbitrary heuristic is any good either, sorry. What if you have a value that is just a bit under 1% popular, but is being used in 50% of your queries in WHERE clause with equality comparison? Without this value in the MCV list the planner will likely use SeqScan instead of an IndexScan that might be more appropriate here. I think we are much better off if we don't touch this aspect of the current code. What was your motivation to introduce some limit at the bottom anyway? If that was to prevent accidental division by zero, then an explicit check on denominator not being 0 seems to me like a better safeguard than this. Regards. -- Alex
Re: [HACKERS] Add schema-qualified relnames in constraint error messages.
On Sat, Apr 2, 2016 at 11:41 PM, Tom Lane wrote: > "Shulgin, Oleksandr" writes: > > On Mon, Mar 14, 2016 at 7:55 PM, Tom Lane wrote: > >> Yeah, I don't much like that either. But I don't think we can avoid > >> some refactoring there; as designed, conversion of an error message into > >> user-visible form is too tightly tied to receipt of the message. > > > True. Attached is a v2 which addresses all of the points raised earlier > I > > believe. > > I took a closer look at what's going on here and realized that actually > it's not that hard to decouple the message-building routine from the > PGconn state, because mostly it works with fields it's extracting out > of the PGresult anyway. The only piece of information that's lacking > is conn->last_query. I propose therefore that instead of doing it like > this, we copy last_query into error PGresults. This is strictly less > added storage requirement than storing the whole verbose message would be, > and it saves time compared to the v2 patch in the typical case where > the application never does ask for an alternately-formatted error message. > Plus we can actually support requests for any variant format, not only > VERBOSE. > > Attached is a libpq-portion-only version of a patch doing it this way. > I've not yet looked at the psql part of your patch. > > Comments? > Ah, neat, that's even better. :-) What about regression tests? My assumption was that we won't be able to add them with the usual expected file approach, but that we also don't need it that hard. Everyone's in favor? -- Alex
Re: [HACKERS] More stable query plans via more predictable column statistics
On Sat, Apr 2, 2016 at 8:57 PM, Shulgin, Oleksandr < oleksandr.shul...@zalando.de> wrote: > On Apr 2, 2016 18:38, "Tom Lane" wrote: > >> I did not like the fact that the compute_scalar_stats logic >> would allow absolutely anything into the MCV list once num_hist falls >> below 2. I think it's important that we continue to reject values that >> are only seen once in the sample, because there's no very good reason to >> think that they are MCVs and not just infrequent values that by luck >> appeared in the sample. > > In my understanding we only put a value in the track list if we've seen it > at least twice, no? This is actually the case for compute_scalar_stats, but not for compute_distinct_stats. In the latter case we can still have track[i].count == 1, but we can also break out of the loop if we see the first tracked item like that. >> Before I noticed the regression failure, I'd been thinking that maybe it'd >> be better if the decision rule were not "at least 100+x% of the average >> frequency of this value and later ones", but "at least 100+x% of the >> average frequency of values after this one". > > Hm, sounds pretty similar to what I wanted to achieve, but better > formalized. > >> With that formulation, we're >> not constrained as to the range of x. Now, if there are *no* values after >> this one, then this way needs an explicit special case in order not to >> compute 0/0; but the preceding para shows that we need a special case for >> the last value anyway. >> >> So, attached is a patch rewritten along these lines. I used 50% rather >> than 25% as the new cutoff percentage --- obviously it should be higher >> in this formulation than before, but I have no idea if that particular >> number is good or we should use something else. Also, the rule for the >> last value is "at least 1% of the non-null samples". That's a pure guess >> as well. >> >> I do not have any good corpuses of data to try this on. Can folks who >> have been following this thread try it on their data and see how it >> does? Also please try some other multipliers besides 1.5, so we can >> get a feeling for where that cutoff should be placed. > > Expect me to run it on my pet db early next week. :-) I was trying to come up with some examples where 50% could be a good or a bad choice and then I noticed that we might be able to turn it it the other way round: instead of inventing an arbitrary limit at the MCVs frequency we could use the histogram as the criteria for a candidate MCV to be considered "common enough". If we can prove that the value would produce duplicates in the histogram, we should rather put it in the MCV list (unless it's already fully occupied, then we can't do anything). A value is guaranteed to produce a duplicate if it has appeared at least 2*hfreq+1 times in the sample (hfreq from your version of the patch, which is recalculated on every loop iteration). I could produce an updated patch on Monday or anyone else following this discussion should be able to do that. This approach would be a huge win in my opinion, because this way we can avoid all the arbitrariness of that .25 / .50 multiplier. Otherwise there might be (valid) complaints that for some data .40 (or .60) is a better fit, but we have already hard-coded something and there would be no easy way to improve situation for some users while avoiding to break it for the rest (unless we introduce a per-attribute configurable parameter like statistics_target for this multiplier, which I'd like to avoid even thinking about ;-) While we don't (well, can't) build a histogram in the compute_distinct_stats variant, we could also apply the above mind trick there for the same reason and to make the output of both functions more consistent (and to have less maintenance burden between the variants). And anyway it would be rather surprising to see that depending on the presence of an order operator for the type, the resulting MCV lists after the ANALYZE would be different (I mean not only due to the random nature of the sample). I'm not sure yet about the 1% rule for the last value, but would also love to see if we can avoid the arbitrary limit here. What happens with a last value which is less than 1% popular in the current code anyway? Cheers! -- Alex
Re: [HACKERS] Sanity checking for ./configure options?
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Looks good to me. It only allows valid number between 1 and 65535, disallows leading zero, empty string, or non-digit chars. Error messages looks good. Marking this Ready for Committer. -- Alex The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] empty array case in plperl_ref_from_pg_array not handled correctly
On Mon, Mar 7, 2016 at 11:32 PM, Andres Freund wrote: > Hi, > > Per the new valgrind animal we get: > > > http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=skink&dt=2016-03-08%2004%3A22%3A00 > 2016-03-08 05:56:05.566 UTC [56de6971.723:5] LOG: statement: select > plperl_sum_array('{}'); > ==1827== Invalid write of size 4 > ==1827==at 0x14E35DD1: plperl_ref_from_pg_array (plperl.c:1459) > > [ I think you may have meant to CC me not Alexey K. I'm probably the person responsible :D. ] Indeed, I think the simplest fix is to make plperl_ref_from_pg_array() return an "empty" array in that case. The attached fixes the valgrind warning for me. (cassert enabled build on master). > > ISTM the assumption that an array always has a dimension is a bit more > widespread... E.g. split_array() looks like it'd not work nicely with a > zero dimensional array... > Hrm, plperl_ref_from_pg_array() is the only caller of split_array(), so I added an Assert() to make its contract more clear. Recursive calls to split_array() should be fine as nested 0 dim arrays get collapsed down to single 0 dim array (I hand tested it nonetheless): # select ARRAY[ARRAY[ARRAY[]]]::int[]; array ─── {} (1 row) Looking around, everything that makes an array seems to pass through plperl_ref_from_pg_array(), so I think once that is fixed all is good. I also looked for dimensions and ARR_NDIM() just to make sure (didn't find anything fishy). Thanks, plperl_array_valgrind.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove Windows crash dump support?
On 22.12.2015 18:28, Magnus Hagander wrote: On Tue, Dec 22, 2015 at 3:53 PM, Craig Ringer <mailto:cr...@2ndquadrant.com>> wrote: On 22 December 2015 at 22:50, Craig Ringer mailto:cr...@2ndquadrant.com>> wrote: Hi all Back in 2010 I submitted a small feature to allow the creation of minidumps when backends crashed; see commit dcb09b595f88a3bca6097a6acc17bf2ec935d55f . At the time Windows lacked useful support for postmortem debugging and crash-dump management in the operating system its self, especially for applications running as services. That has since improved considerably. The feature was also included in 9.4 Ahem. 9.1. This is what I get for multi-tasking between writing this and packaging an extension for 9.4. In which version(s) of Windows was this improvement added? I think that's really the part that matters here, not necessarily which version of PostgreSQL. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ Hi all! I think that you can debug crash dump since windbg exists. Also I think that Postgres on Windows number of instalations is so tiny because people even today think that it is not so solid as unix version thats why you think that nobody use your code ;). Today if my memory serves me right this code can not deal with buffer overflow. Am i right? May be we need to add this functionality instead of drop support of it entirely? -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] error message diff with Perl 5.22.0
On Sat, Jun 6, 2015 at 7:03 PM, Peter Eisentraut wrote: > With the recently released Perl 5.22.0, the tests fail thus: > > -ERROR: Global symbol "$global" requires explicit package name at line 3. > -Global symbol "$other_global" requires explicit package name at line 4. > +ERROR: Global symbol "$global" requires explicit package name (did you > forget to declare "my $global"?) at line 3. > +Global symbol "$other_global" requires explicit package name (did you > forget to declare "my $other_global"?) at line 4. > CONTEXT: compilation of PL/Perl function "uses_global" > > FWIW the committed expected file seems fine to me. There is not a perl option to toggle this behavior (and even if there was, I think the resulting changes to pl/perl functions we be quite ugly). (What about the back branches? :D)
Re: [HACKERS] Precedence of standard comparison operators
On Tue, Mar 10, 2015 at 10:11 AM, Tom Lane wrote: > I wrote: > > This thread seems to have died off without any clear resolution. I'd > hoped somebody would try the patch on some nontrivial application to > see if it broke anything or caused any warnings, but it doesn't seem > like that is happening. > > I tried it on a fairly typical web application. Around 5000 or so distinct statements according to pg_stat_statements. With operator_precedence_warning = on, no warnings yet.
[HACKERS] POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Trent Shipley writes: > On Friday 2007-12-14 16:22, Tom Lane wrote: >> Neil Conway writes: >> > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY >> > to drop (and log) rows that contain malformed data. That is, rows with >> > too many or too few columns, rows that result in constraint violations, >> > and rows containing columns where the data type's input function raises >> > an error. The last case is the only thing that would be a bit tricky to >> > implement, I think: you could use PG_TRY() around the InputFunctionCall, >> > but I guess you'd need a subtransaction to ensure that you reset your >> > state correctly after catching an error. >> >> Yeah. It's the subtransaction per row that's daunting --- not only the >> cycles spent for that, but the ensuing limitation to 4G rows imported >> per COPY. > > You could extend the COPY FROM syntax with a COMMIT EVERY n clause. This > would help with the 4G subtransaction limit. The cost to the ETL process is > that a simple rollback would not be guaranteed send the process back to it's > initial state. There are easy ways to deal with the rollback issue though. > > A {NO} RETRY {USING algorithm} clause might be useful. If the NO RETRY > option is selected then the COPY FROM can run without subtransactions and in > excess of the 4G per transaction limit. NO RETRY should be the default since > it preserves the legacy behavior of COPY FROM. > > You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not give > the > option of sending exceptions to a table since they are presumably malformed, > otherwise they would not be exceptions. (Users should re-process exception > files if they want an if good then table a else exception to table b ...) > > EXCEPTIONS TO and NO RETRY would be mutually exclusive. > > >> If we could somehow only do a subtransaction per failure, things would >> be much better, but I don't see how. Hello, Attached is a proof of concept patch for this TODO item. There is no docs yet, I just wanted to know if approach is sane. The added syntax is like the following: COPY [table] FROM [file/program/stdin] EXCEPTIONS TO [file or stdout] The way it's done it is abusing Copy Both mode and from my limited testing, that seems to just work. The error trapping itself is done using PG_TRY/PG_CATCH and can only catch formatting or before-insert trigger errors, no attempt is made to recover from a failed unique constraint, etc. Example in action: postgres=# \d test_copy2 Table "public.test_copy2" Column | Type | Modifiers +-+--- id | integer | val| integer | postgres=# copy test_copy2 from program 'seq 3' exceptions to stdout; 1 NOTICE: missing data for column "val" CONTEXT: COPY test_copy2, line 1: "1" 2 NOTICE: missing data for column "val" CONTEXT: COPY test_copy2, line 2: "2" 3 NOTICE: missing data for column "val" CONTEXT: COPY test_copy2, line 3: "3" NOTICE: total exceptions ignored: 3 postgres=# \d test_copy1 Table "public.test_copy1" Column | Type | Modifiers +-+--- id | integer | not null postgres=# set client_min_messages to warning; SET postgres=# copy test_copy1 from program 'ls /proc' exceptions to stdout; ... vmstat zoneinfo postgres=# Limited performance testing shows no significant difference between error-catching and plain code path. For example, timing copy test_copy1 from program 'seq 100' [exceptions to stdout] shows similar numbers with or without the added "exceptions to" clause. Now that I'm sending this I wonder if the original comment about the need for subtransaction around every loaded line still holds. Any example of what would be not properly rolled back by just PG_TRY? Happy hacking! -- Alex >From 50f7ab0a503a0d61776add8a138abf2622fc6c35 Mon Sep 17 00:00:00 2001 From: Alex Shulgin Date: Fri, 19 Dec 2014 18:21:31 +0300 Subject: [PATCH] POC: COPY FROM ... EXCEPTIONS TO --- contrib/file_fdw/file_fdw.c | 4 +- src/backend/commands/copy.c | 251 +--- src/backend/parser/gram.y | 26 +++- src/bin/psql/common.c | 14 +- src/bin/psql/copy.c | 119 ++- src/bin/psql/settings.h | 1 + src/bin/psql/startup.c | 1 + src/bin/psql/tab-complete.c | 12 +- src/include/commands/copy.h | 3 +- src/include/nodes/parsenodes.h | 1 + src/include/parser/kwlist.h | 1 + src/interfaces/ecpg/preproc/ecpg.addons | 2 +- 12 files changed,
Re: [HACKERS] Turning recovery.conf into GUCs
Alex Shulgin writes: > Petr Jelinek writes: >> >> I had a quick look, the patch does not apply cleanly anymore but it's >> just release notes so nothing too bad. > > Yes, there were some ongoing changes that touched some parts of this and > I must have missed the latest one (or maybe I was waiting for it to > settle down). The rebased version is attached. >> - the StandbyModeRequested and StandbyMode should be lowercased like >> the rest of the GUCs > > Yes, except that standby_mode is linked to StandbyModeRequested, not the > other one. I can try see if there's a sane way out of this. As I see it, renaming these will only add noise to this patch, and there is also ArchiveRecoveryRequested / InArchiveRecovery. This is going to be tricky and I'm not sure it's really worth the effort. >> - The whole CopyErrorData and memory context logic is not needed in >> check_recovery_target_time() as the FlushErrorState() is not called >> there > > You must be right. I recall having some trouble with strings being > free'd prematurely, but if it's ERROR, then there should be no need for > that. I'll check again. Hrm, after going through this again I'm pretty sure that was correct: the only way to obtain the current error message is to use CopyErrorData(), but that requires you to switch back to non-error memory context (via Assert). The FlushErrorState() call is not there because it's handled by the hook caller (or it can exit via ereport() with elevel==ERROR). >> - The new code in StartupXLOG() like >> +if (recovery_target_xid_string != NULL) >> +InitRecoveryTarget(RECOVERY_TARGET_XID); >> + >> +if (recovery_target_time_string != NULL) >> +InitRecoveryTarget(RECOVERY_TARGET_TIME); >> + >> +if (recovery_target_name != NULL) >> +InitRecoveryTarget(RECOVERY_TARGET_NAME); >> + >> +if (recovery_target_string != NULL) >> +InitRecoveryTarget(RECOVERY_TARGET_IMMEDIATE); >> >> would probably be better in separate function that you then call >> StartupXLOG() as StartupXLOG() is crazy long already so I think it's >> preferable to not make it worse. > > We can move it at top of CheckStartingAsStandby() obviously. Moved. -- Alex recovery_guc_v5.6.patch.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Petr Jelinek writes: > > I had a quick look, the patch does not apply cleanly anymore but it's > just release notes so nothing too bad. Yes, there were some ongoing changes that touched some parts of this and I must have missed the latest one (or maybe I was waiting for it to settle down). > I did not do any testing yet, but I have few comments about the code: > > - the patch mixes functionality change with the lowercasing of the > config variables, I wonder if those could be separated into 2 separate > diffs - it would make review somewhat easier, but I can live with it > as it is if it's too much work do separate into 2 patches If we can get the lowercasing committed soon, I'd be in favor of that, otherwise it doesn't sound as pleasing, and there's some renaming to be made too (that standby.enabled, trigger_file, etc.) > - the StandbyModeRequested and StandbyMode should be lowercased like > the rest of the GUCs Yes, except that standby_mode is linked to StandbyModeRequested, not the other one. I can try see if there's a sane way out of this. > - I am wondering if StandbyMode and hot_standby should be merged into > one GUC if we are breaking backwards compatibility anyway I also have the feeling that there's room for merging some knobs together. > - Could you explain the reasoning behind: > + if ((*newval)[0] == 0) > + xid = InvalidTransactionId; > + else > > in check_recovery_target_xid() (and same check in > check_recovery_target_timeline()), isn't the strtoul which is called > later enough? Well, that makes it a bit more apparent to the reader I think and we're not abusing the fact that InvalidTransactionId equals zero. > - The whole CopyErrorData and memory context logic is not needed in > check_recovery_target_time() as the FlushErrorState() is not called > there You must be right. I recall having some trouble with strings being free'd prematurely, but if it's ERROR, then there should be no need for that. I'll check again. > - The new code in StartupXLOG() like > + if (recovery_target_xid_string != NULL) > + InitRecoveryTarget(RECOVERY_TARGET_XID); > + > + if (recovery_target_time_string != NULL) > + InitRecoveryTarget(RECOVERY_TARGET_TIME); > + > + if (recovery_target_name != NULL) > + InitRecoveryTarget(RECOVERY_TARGET_NAME); > + > + if (recovery_target_string != NULL) > + InitRecoveryTarget(RECOVERY_TARGET_IMMEDIATE); > > would probably be better in separate function that you then call > StartupXLOG() as StartupXLOG() is crazy long already so I think it's > preferable to not make it worse. We can move it at top of CheckStartingAsStandby() obviously. > - I wonder if we should rename trigger_file to standby_tigger_file I was also suggesting that, just not sure if mixing it into the same patch is any good. Thank you for the review! -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload
Craig Ringer writes: > On 12/19/2014 11:41 PM, Alex Shulgin wrote: >> I don't think so. The scenario this patch relies on assumes that the >> DBA will remember to look in the log if something goes wrong > > Well, actually, the whole point was that the user who's connecting > (likely also the "DBA") will see a HINT telling them that there's more > in the logs. > > But that got removed. While it sounds useful at first glance, I believe Peter's arguments upthread provide enough justification to avoid sending the hint to the client. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload
Steve Singer writes: > On 12/15/2014 11:38 AM, Alex Shulgin wrote: > >> These are all valid concerns IMHO. Attached is the modified version >> of the original patch by Craig, addressing the handling of the new >> hint_log error data field and removing the client-side HINT. I'm >> also moving this to the current CF. -- Alex >> >> > > > The updated patch removes the client message. I feel this addresses > Peter's concern. The updated patch also addresses the missing free I > mentioned in my original review. > > The patch applies cleanly to head, > > One thing I'm noticed while testing is that if you do the following > > 1. Edit pg_hba.conf to allow a connection from somewhere > 2. Attempt to connect, you get an error + the hind in the server log > 3. Make another change to pg_hba.conf and introduce an error in the file > 4. Attempt to reload the config files, pg_hba.conf the reload fails > because of the above error > 5. Attempt to connect you still can't connect since we have the > original pg_hba.conf loaded > > You don't get the warning in step 5 since we update PgReloadTime even > if the reload didn't work. > > Is this enough of a concern to justify the extra complexity in > tracking the reload time of the pg_hba and pg_ident times directly? I don't think so. The scenario this patch relies on assumes that the DBA will remember to look in the log if something goes wrong, and in your case there would be a message like the following: WARNING: pg_hba.conf not reloaded So an extra hint about file timestamp is unneeded. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Track TRUNCATE via pgstat
Alvaro Herrera writes: > Alex Shulgin wrote: > >> OK, I think I have now all bases covered, though the updated patch is >> not that "pretty". >> >> The problem is that we don't know in advance if the (sub)transaction is >> going to succeed or abort, and in case of aborted truncate we need to >> use the stats gathered prior to truncate. Thus the need to track >> insert/update/deletes that happened before first truncate separately. > > Ugh, this is messy indeed. I grant that TRUNCATE is a tricky case to > handle correctly, so some complexity is expected. Can you please > explain in detail how this works? The main idea is that aborted transaction can leave dead tuples behind (that is every insert and update), but when TRUNCATE is issued we need to reset insert/update/delete counters to 0: otherwise we won't get accurate live and dead counts at commit time. If the transaction that issued TRUNCATE is instead aborted, the insert/update counters that we were incrementing *after* truncate are not relevant to accurate calculation of dead tuples in the original relfilenode we are now back to due to abort. We need the insert/updates counts that happened *before* the first TRUNCATE, hence the need for separate counters. >> To the point of making a dedicated pgstat testing tool: let's have >> another TODO item? > > Sure. Added one. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Track TRUNCATE via pgstat
Alvaro Herrera writes: > >> Even with the current approach of checking the stats after every >> isolated case it's sometimes takes quite a little more than a glance >> to verify correctness due to side-effects of rollback (ins/upd/del >> counters are still updated), and the way stats are affecting the dead >> tuples counter. > > Honestly I think pg_regress is not the right tool to test stat counter > updates. It kind-of works today, but only because we don't stress it > too much. If you want to create a new test framework for pgstats, and > include some tests for truncate, be my guest. OK, I think I have now all bases covered, though the updated patch is not that "pretty". The problem is that we don't know in advance if the (sub)transaction is going to succeed or abort, and in case of aborted truncate we need to use the stats gathered prior to truncate. Thus the need to track insert/update/deletes that happened before first truncate separately. To the point of making a dedicated pgstat testing tool: let's have another TODO item? -- Alex >From 0b3161191a3ddb999cd9d0da08e1b6088ce07a84 Mon Sep 17 00:00:00 2001 From: Alex Shulgin Date: Tue, 9 Dec 2014 16:35:14 +0300 Subject: [PATCH] WIP: track TRUNCATEs in pgstat transaction stats. The n_live_tup and n_dead_tup counters need to be set to 0 after a TRUNCATE on the relation. We can't issue a special message to the stats collector because the xact might be later aborted, so we track the fact that the relation was truncated during the xact (and reset this xact's insert/update/delete counters). When xact is committed, we use the `truncated' flag to reset the n_live_tup and n_dead_tup counters. --- src/backend/commands/tablecmds.c | 3 + src/backend/postmaster/pgstat.c | 98 ++-- src/include/pgstat.h | 14 ++-- src/test/regress/expected/prepared_xacts.out | 50 ++ src/test/regress/expected/stats.out | 63 ++ src/test/regress/sql/prepared_xacts.sql | 27 src/test/regress/sql/stats.sql | 68 +++ 7 files changed, 315 insertions(+), 8 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c new file mode 100644 index 1e737a0..4f0e3d8 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** *** 71,76 --- 71,77 #include "parser/parse_type.h" #include "parser/parse_utilcmd.h" #include "parser/parser.h" + #include "pgstat.h" #include "rewrite/rewriteDefine.h" #include "rewrite/rewriteHandler.h" #include "rewrite/rewriteManip.h" *** ExecuteTruncate(TruncateStmt *stmt) *** 1224,1229 --- 1225,1232 */ reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST); } + + pgstat_count_heap_truncate(rel); } /* diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c new file mode 100644 index f71fdeb..9d19cf9 *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *** typedef struct TwoPhasePgStatRecord *** 199,206 --- 199,210 PgStat_Counter tuples_inserted; /* tuples inserted in xact */ PgStat_Counter tuples_updated; /* tuples updated in xact */ PgStat_Counter tuples_deleted; /* tuples deleted in xact */ + PgStat_Counter inserted_pre_trunc; /* tuples inserted prior to truncate */ + PgStat_Counter updated_pre_trunc; /* tuples updated prior to truncate */ + PgStat_Counter deleted_pre_trunc; /* tuples deleted prior to truncate */ Oid t_id; /* table's OID */ bool t_shared; /* is it a shared catalog? */ + bool t_truncated; /* was the relation truncated? */ } TwoPhasePgStatRecord; /* *** pgstat_count_heap_delete(Relation rel) *** 1863,1868 --- 1867,1919 } } + static void + record_xact_truncate_stats(PgStat_TableXactStatus *trans) + { + if (!trans->truncated) + { + trans->inserted_pre_trunc = trans->tuples_inserted; + trans->updated_pre_trunc = trans->tuples_updated; + trans->deleted_pre_trunc = trans->tuples_deleted; + } + } + + static void + restore_xact_truncate_stats(PgStat_TableXactStatus *trans) + { + if (trans->truncated) + { + trans->tuples_inserted = trans->inserted_pre_trunc; + trans->tuples_updated = trans->updated_pre_trunc; + trans->tuples_deleted = trans->deleted_pre_trunc; + } + } + + /* + * pgstat_count_heap_truncate - update tuple counters due to truncate + */ + void + pgstat_count_heap_truncate(Relation rel) + { + PgStat_TableStatus *pgstat_info = rel->pgstat_info; + + if (pgstat_info != NULL) + { + /* We have to log the effect at the proper transactional level */ + int nest_level = GetCurren
Re: [HACKERS] REVIEW: Track TRUNCATE via pgstat
Alvaro Herrera writes: > Alex Shulgin wrote: > >> Alvaro Herrera writes: >> >> >> >> Another idea would be exposing pgstat_report_stat(true) at SQL level. >> >> That would eleminate the need for explicit pg_sleep(>=0.5), but we'll >> >> still need the wait_for_... call to make sure the collector has picked >> >> it up. >> > >> > We already have a stats test that sleeps. Why not add this stuff there, >> > to avoid making another test slow? >> >> Well, if we could come up with a set of statements to test that would >> produce the end result unambigously, so that we can be certain the stats >> we check at the end cannot be a result of neat interaction of buggy >> behavior... > > It is always possible that things work just right because two bugs > cancel each other. > >> I'm not sure this is at all possible, but I know for sure it will make >> debugging the possible fails harder. > > Surely if some other patch introduces a failure, nobody will try to > debug it by looking only at the failures of this test. > >> Even with the current approach of checking the stats after every >> isolated case it's sometimes takes quite a little more than a glance >> to verify correctness due to side-effects of rollback (ins/upd/del >> counters are still updated), and the way stats are affecting the dead >> tuples counter. > > Honestly I think pg_regress is not the right tool to test stat counter > updates. It kind-of works today, but only because we don't stress it > too much. If you want to create a new test framework for pgstats, and > include some tests for truncate, be my guest. Yes, these are all good points. Actually, moving the test to stats.sql helped me realize the current patch behavior is not strictly correct: there's a corner case when you insert/update before truncate in transaction, which is later aborted. I need to take a closer look. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Track TRUNCATE via pgstat
Alvaro Herrera writes: >> >> Another idea would be exposing pgstat_report_stat(true) at SQL level. >> That would eleminate the need for explicit pg_sleep(>=0.5), but we'll >> still need the wait_for_... call to make sure the collector has picked >> it up. > > We already have a stats test that sleeps. Why not add this stuff there, > to avoid making another test slow? Well, if we could come up with a set of statements to test that would produce the end result unambigously, so that we can be certain the stats we check at the end cannot be a result of neat interaction of buggy behavior... I'm not sure this is at all possible, but I know for sure it will make debugging the possible fails harder. Even with the current approach of checking the stats after every isolated case it's sometimes takes quite a little more than a glance to verify correctness due to side-effects of rollback (ins/upd/del counters are still updated), and the way stats are affecting the dead tuples counter. I'll try to see if the checks can be converged though. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Track TRUNCATE via pgstat
Jim Nasby writes: > https://commitfest.postgresql.org/action/patch_view?id=1661 (apologies for > not replying to the thread; I can't find it in my inbox) > > Patch applies and passes make check. Code formatting looks good. Jim, > The regression test partially tests this. It does not cover 2PC, nor > does it test rolling back a subtransaction that contains a > truncate. The latter actually doesn't work correctly. Thanks for pointing out the missing 2PC test, I've added one. The test you've added for rolling back a subxact actually works correctly, if you consider the fact that aborted (sub)xacts still account for insert/update/delete in pgstat. I've added this test with the corrected expected results. > The test also adds 2.5 seconds of forced pg_sleep. I think that's both > bad and unnecessary. When I removed the sleeps I still saw times of > less than 0.1 seconds. Well, I never liked that part, but the stats don't get updated if we don't put the session to sleep for at least PGSTAT_STAT_INTERVAL (which is 500ms). Removing these extra sleep calls would theoretically not make a difference as wait_for_trunc_test_stats() seems to have enough sleep calls itself, but due to the pgstat_report_stat() being called from the main loop only, there's no way short of placing the explicit pg_sleep at top level, if we want to be able to check the effects reproducibly. Another idea would be exposing pgstat_report_stat(true) at SQL level. That would eleminate the need for explicit pg_sleep(>=0.5), but we'll still need the wait_for_... call to make sure the collector has picked it up. > Also, wait_for_trunc_test_stats() should display something if it times > out; otherwise you'll have a test failure and won't have any > indication why. Oh, good catch. Since I've copied this function from stats.sql, we might want to update that one too in a separate commit. > I've attached a patch that adds logging on timeout and contains a test > case that demonstrates the rollback to savepoint bug. I'm attaching the updated patch version. Thank you for the review! -- Alex PS: re: your CF comment: I'm producing the patches using git format-patch --ext-diff where diff.external is set to '/bin/bash src/tools/git-external-diff'. Now that I try to apply it using git, looks like git doesn't like the copied context diff very much... >From cc51823a01a194ef6fcd90bc763fa26498837322 Mon Sep 17 00:00:00 2001 From: Alex Shulgin Date: Tue, 9 Dec 2014 16:35:14 +0300 Subject: [PATCH] WIP: track TRUNCATEs in pgstat transaction stats. The n_live_tup and n_dead_tup counters need to be set to 0 after a TRUNCATE on the relation. We can't issue a special message to the stats collector because the xact might be later aborted, so we track the fact that the relation was truncated during the xact (and reset this xact's insert/update/delete counters). When xact is committed, we use the `truncated' flag to reset the n_live_tup and n_dead_tup counters. --- src/backend/commands/tablecmds.c | 3 + src/backend/postmaster/pgstat.c | 52 - src/include/pgstat.h | 3 + src/test/regress/expected/prepared_xacts.out | 50 src/test/regress/expected/truncate.out | 164 +++ src/test/regress/sql/prepared_xacts.sql | 27 + src/test/regress/sql/truncate.sql| 118 +++ 7 files changed, 414 insertions(+), 3 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c new file mode 100644 index 1e737a0..4f0e3d8 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** *** 71,76 --- 71,77 #include "parser/parse_type.h" #include "parser/parse_utilcmd.h" #include "parser/parser.h" + #include "pgstat.h" #include "rewrite/rewriteDefine.h" #include "rewrite/rewriteHandler.h" #include "rewrite/rewriteManip.h" *** ExecuteTruncate(TruncateStmt *stmt) *** 1224,1229 --- 1225,1232 */ reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST); } + + pgstat_count_heap_truncate(rel); } /* diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c new file mode 100644 index f71fdeb..b02e4a1 *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *** typedef struct TwoPhasePgStatRecord *** 201,206 --- 201,207 PgStat_Counter tuples_deleted; /* tuples deleted in xact */ Oid t_id; /* table's OID */ bool t_shared; /* is it a shared catalog? */ + bool t_truncated; /* was the relation truncated? */ } TwoPhasePgStatRecord; /* *** pgstat_count_heap_delete(Relation rel) *** 1864,
Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload
Peter Eisentraut writes: > On 10/16/14 11:34 PM, Craig Ringer wrote: >> psql: FATAL: Peer authentication failed for user "fred" >> HINT: See the server error log for additional information. > > I think this is wrong for many reasons. > > I have never seen an authentication system that responds with, hey, what > you just did didn't get you in, but the administrators are currently in > the process of making a configuration change, so why don't you check > that out. > > We don't know whether the user has access to the server log. They > probably don't. Also, it is vastly more likely that the user really > doesn't have access in the way they chose, so throwing in irrelevant > hints will be distracting. > > Moreover, it will be confusing to regular users if this message > sometimes shows up and sometimes doesn't, independent of their own state > and actions. > > Finally, the fact that a configuration change is in progress is > privileged information. Unprivileged users can deduct from the presence > of this message that administrators are doing something, and possibly > that they have done something wrong. > > I think it's fine to log a message in the server log if the pg_hba.conf > file needs reloading. But the client shouldn't know about this at all. These are all valid concerns IMHO. Attached is the modified version of the original patch by Craig, addressing the handling of the new hint_log error data field and removing the client-side HINT. I'm also moving this to the current CF. -- Alex >From 702e0ac31f3d8023ad8c064d90bdf5a8441fddea Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Fri, 17 Oct 2014 11:18:18 +0800 Subject: [PATCH 1/2] Add an errhint_log, akin to errdetail_log This allows a different HINT to be sent to the server error log and to the client, which will be useful where there's security sensitive information that's more appropriate for a HINT than a DETAIL message. --- src/backend/utils/error/elog.c | 59 -- src/include/utils/elog.h | 7 + 2 files changed, 53 insertions(+), 13 deletions(-) diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c new file mode 100644 index 2316464..da55207 *** a/src/backend/utils/error/elog.c --- b/src/backend/utils/error/elog.c *** errfinish(int dummy,...) *** 503,508 --- 503,510 pfree(edata->detail_log); if (edata->hint) pfree(edata->hint); + if (edata->hint_log) + pfree(edata->hint_log); if (edata->context) pfree(edata->context); if (edata->schema_name) *** errhint(const char *fmt,...) *** 1015,1020 --- 1017,1042 return 0; /* return value does not matter */ } + /* + * errhint_log --- add a hint_log error message text to the current error + */ + int + errhint_log(const char *fmt,...) + { + ErrorData *edata = &errordata[errordata_stack_depth]; + MemoryContext oldcontext; + + recursion_depth++; + CHECK_STACK_DEPTH(); + oldcontext = MemoryContextSwitchTo(edata->assoc_context); + + EVALUATE_MESSAGE(edata->domain, hint_log, false, true); + + MemoryContextSwitchTo(oldcontext); + recursion_depth--; + return 0; /* return value does not matter */ + } + /* * errcontext_msg --- add a context error message text to the current error *** CopyErrorData(void) *** 1498,1503 --- 1520,1527 newedata->detail_log = pstrdup(newedata->detail_log); if (newedata->hint) newedata->hint = pstrdup(newedata->hint); + if (newedata->hint_log) + newedata->hint_log = pstrdup(newedata->hint_log); if (newedata->context) newedata->context = pstrdup(newedata->context); if (newedata->schema_name) *** FreeErrorData(ErrorData *edata) *** 1536,1541 --- 1560,1567 pfree(edata->detail_log); if (edata->hint) pfree(edata->hint); + if (edata->hint_log) + pfree(edata->hint_log); if (edata->context) pfree(edata->context); if (edata->schema_name) *** ThrowErrorData(ErrorData *edata) *** 1607,1612 --- 1633,1640 newedata->detail_log = pstrdup(edata->detail_log); if (edata->hint) newedata->hint = pstrdup(edata->hint); + if (edata->hint_log) + newedata->hint_log = pstrdup(edata->hint_log); if (edata->context) newedata->context = pstrdup(edata->context); if (edata->schema_name) *** ReThrowError(ErrorData *edata) *** 1669,1674 --- 1697,1704 newedata->detail_log = pstrdup(newedata->detail_log); if (newedata->hint) newedata->hint = pstrdup(newedata->hint); + if (newedata->hint_log) + newedata->hint_log = pstrdup(newedata->hint_log);
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Michael Paquier writes: >>> >>> Perhaps ssloptions.[ch], unless you plan to add non-option-related code >>> there later? >> >> I don't think anything else than common options-related code fits in >> there, so ssloptions.c makes sense to me. >> >>> BTW, there is no Regent code in your openssl.c, so the copyright >>> statement is incorrect. >> >> Good catch, I just blindly copied that from some other file. > There have been arguments in favor and against this patch... But > marking it as returned with feedback because of a lack of activity in > the last couple of weeks. Feel free to update if you think that's not > correct, and please move this patch to commit fest 2014-12. Attached is a new version that addresses the earlier feedback: renamed the added *.[ch] files and removed incorrect copyright line. I'm moving this to the current CF. -- Alex >From 18388300f9ed34fa47c66b8a2da098aeb19a7f71 Mon Sep 17 00:00:00 2001 From: Alex Shulgin Date: Wed, 19 Nov 2014 19:49:29 +0300 Subject: [PATCH] DRAFT: ssl_protocols GUC --- doc/src/sgml/config.sgml | 29 ++ doc/src/sgml/libpq.sgml | 25 ++ src/backend/libpq/Makefile| 2 +- src/backend/libpq/be-secure-openssl.c | 29 -- src/backend/libpq/be-secure.c | 3 +- src/backend/libpq/ssloptions.c| 123 ++ src/backend/utils/misc/guc.c | 15 src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/libpq/libpq.h | 1 + src/include/libpq/ssloptions.h| 48 ++ src/interfaces/libpq/Makefile | 8 +- src/interfaces/libpq/fe-connect.c | 4 + src/interfaces/libpq/fe-secure-openssl.c | 18 +++- src/interfaces/libpq/libpq-int.h | 1 + 14 files changed, 297 insertions(+), 10 deletions(-) create mode 100644 src/backend/libpq/ssloptions.c create mode 100644 src/include/libpq/ssloptions.h diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml new file mode 100644 index 48ae3e4..699f0f9 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** include_dir 'conf.d' *** 1055,1060 --- 1055,1089 + + ssl_protocols (string) + +ssl_protocols configuration parameter + + + + + Specifies a colon-separated list of SSL protocols that are + allowed to be used on secure connections. Each entry in the list can + be prefixed by a + (add to the current list) + or - (remove from the current list). If no prefix is used, + the list is replaced with the specified protocol. + + + The full list of supported protocols can be found in the + the openssl manual page. In addition to the names of + individual protocols, the following keywords can be + used: ALL (all protocols supported by the underlying + crypto library), SSL (all supported versions + of SSL) and TLS (all supported versions + of TLS). + + + The default is ALL:-SSL. + + + + ssl_ciphers (string) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml new file mode 100644 index d829a4b..62ee0b4 *** a/doc/src/sgml/libpq.sgml --- b/doc/src/sgml/libpq.sgml *** postgresql://%2Fvar%2Flib%2Fpostgresql/d *** 1230,1235 --- 1230,1250 + + sslprotocols + + + Specifies a colon-separated list of SSL protocols that are + allowed to be used on secure connections. + See server configuration parameter + for format. + + + Defaults to ALL:-SSL. + + + + sslcompression *** myEventProc(PGEventId evtId, void *evtIn *** 6693,6698 --- 6708,6723 + + + +PGSSLPROTOCOLS + + PGSSLPROTOCOLS behaves the same as the connection parameter. + + + diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile new file mode 100644 index 09410c4..62abd46 *** a/src/backend/libpq/Makefile --- b/src/backend/libpq/Makefile *** OBJS = be-fsstubs.o be-secure.o auth.o c *** 18,24 pqformat.o pqmq.o pqsignal.o ifeq ($(with_openssl),yes) ! OBJS += be-secure-openssl.o endif include $(top_srcdir)/src/backend/common.mk --- 18,24 pqformat.o pqmq.o pqsignal.o ifeq ($(with_openssl),yes) ! OBJS += be-secure-openssl.o ssloptions.o endif include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-s
Re: [HACKERS] Turning recovery.conf into GUCs
Alex Shulgin writes: > Alex Shulgin writes: >> >> Here's an attempt to revive this patch. > > Here's the patch rebased against current HEAD, that is including the > recently committed action_at_recovery_target option. > > The default for the new GUC is 'pause', as in HEAD, and > pause_at_recovery_target is removed completely in favor of it. > > I've also taken the liberty to remove that part that errors out when > finding $PGDATA/recovery.conf. Now get your rotten tomatoes ready. ;-) This was rather short-sighted, so I've restored that part. Also, rebased on current HEAD, following the rename of action_at_recovery_target to recovery_target_action. -- Alex recovery_guc_v5.5.patch.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL information view
Magnus Hagander writes: >> >> You should add it to the next CF for proper tracking, there are already many >> patches in the queue waiting for reviews :) > > Absolutely - I just wanted those that were already involved in the > thread to get a chance to look at it early :) didn't want to submit it > until it was complete. > > Which it is now - attached is a new version that includes docs. Here's my review of the patch: * Applies to the current HEAD, no failed hunks. * Compiles and works as advertised. * Docs included. The following catches my eye: psql (9.5devel) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. postgres=# select * from pg_stat_ssl; pid | ssl | bits | compression | version | cipher| clientdn --+-+--+-+-+-+-- 1343 | t | 256 | f | TLSv1.2 | ECDHE-RSA-AES256-GCM-SHA384 | (1 row) I think the order of details in the psql prompt makes more sense, because it puts more important details first. I suggest we reorder the view columns, while also renaming 'version' to 'protocol' (especially since we have another patch in the works that adds a GUC named 'ssl_protocols'): pid, ssl, protocol, cipher, bits, compression, clientdn. Next, this one: + be_tls_get_cipher(Port *port, char *ptr, size_t len) + { + if (port->ssl) + strlcpy(ptr, SSL_get_cipher(port->ssl), NAMEDATALEN); should be this: + strlcpy(ptr, SSL_get_cipher(port->ssl), len); The same goes for this one: + be_tls_get_peerdn_name(Port *port, char *ptr, size_t len) + { + if (port->peer) + strlcpy(ptr, X509_NAME_to_cstring(X509_get_subject_name(port->peer)), NAMEDATALEN); The NAMEDATALEN constant is passed in the calling code in pgstat_bestart(). Other than that, looks good to me. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small TRUNCATE glitch
Bruce Momjian writes: > On Wed, Dec 10, 2014 at 10:32:42AM +0200, Heikki Linnakangas wrote: >> >I don't think we need to have 2PC files survive a pg_upgrade. It seems >> >perfectly okay to remove them from the new cluster at some appropriate >> >time, *if* they are copied from the old cluster at all (I don't think >> >they should be.) >> >> I think pg_upgrade should check if there are any prepared >> transactions pending, and refuse to upgrade if there are. It could >> be made to work, but it's really not worth the trouble. If there are >> any pending prepared transactions in the system when you run >> pg_upgrade, it's more likely to be a mistake or oversight in the >> first place, than on purpose. > > pg_upgrade already checks for prepared transactions and errors out if > they exist; see check_for_prepared_transactions(). Alright, that's good to know. So I'm just adding a new bool field to the 2PC pgstat record instead of the bit magic. Attached is v0.2, now with a regression test included. -- Alex >From 4c8fae27ecd9c94e7c3da0997f03099045a152d9 Mon Sep 17 00:00:00 2001 From: Alex Shulgin Date: Tue, 9 Dec 2014 16:35:14 +0300 Subject: [PATCH] WIP: track TRUNCATEs in pgstat transaction stats. The n_live_tup and n_dead_tup counters need to be set to 0 after a TRUNCATE on the relation. We can't issue a special message to the stats collector because the xact might be later aborted, so we track the fact that the relation was truncated during the xact (and reset this xact's insert/update/delete counters). When xact is committed, we use the `truncated' flag to reset the n_live_tup and n_dead_tup counters. --- src/backend/commands/tablecmds.c | 2 + src/backend/postmaster/pgstat.c| 52 - src/include/pgstat.h | 3 + src/test/regress/expected/truncate.out | 136 + src/test/regress/sql/truncate.sql | 98 5 files changed, 288 insertions(+), 3 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c new file mode 100644 index 1e737a0..192d033 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** ExecuteTruncate(TruncateStmt *stmt) *** 1224,1229 --- 1224,1231 */ reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST); } + + pgstat_count_heap_truncate(rel); } /* diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c new file mode 100644 index c7f41a5..88c83d2 *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *** typedef struct TwoPhasePgStatRecord *** 201,206 --- 201,207 PgStat_Counter tuples_deleted; /* tuples deleted in xact */ Oid t_id; /* table's OID */ bool t_shared; /* is it a shared catalog? */ + bool t_truncated; /* was the relation truncated? */ } TwoPhasePgStatRecord; /* *** pgstat_count_heap_delete(Relation rel) *** 1864,1869 --- 1865,1894 } /* + * pgstat_count_heap_truncate - update tuple counters due to truncate + */ + void + pgstat_count_heap_truncate(Relation rel) + { + PgStat_TableStatus *pgstat_info = rel->pgstat_info; + + if (pgstat_info != NULL) + { + /* We have to log the effect at the proper transactional level */ + int nest_level = GetCurrentTransactionNestLevel(); + + if (pgstat_info->trans == NULL || + pgstat_info->trans->nest_level != nest_level) + add_tabstat_xact_level(pgstat_info, nest_level); + + pgstat_info->trans->tuples_inserted = 0; + pgstat_info->trans->tuples_updated = 0; + pgstat_info->trans->tuples_deleted = 0; + pgstat_info->trans->truncated = true; + } + } + + /* * pgstat_update_heap_dead_tuples - update dead-tuples count * * The semantics of this are that we are reporting the nontransactional *** AtEOXact_PgStat(bool isCommit) *** 1927,1932 --- 1952,1959 tabstat->t_counts.t_tuples_deleted += trans->tuples_deleted; if (isCommit) { + tabstat->t_counts.t_truncated = trans->truncated; + /* insert adds a live tuple, delete removes one */ tabstat->t_counts.t_delta_live_tuples += trans->tuples_inserted - trans->tuples_deleted; *** AtEOSubXact_PgStat(bool isCommit, int ne *** 1991,1999 { if (trans->upper && trans->upper->nest_level == nestDepth - 1) { ! trans->upper->tuples_inserted += trans->tuples_inserted; ! trans->upper->tuples_updated += trans->tuples_updated; ! trans->upper->tuples_deleted += trans->tuples_deleted; tabstat->trans = trans->upper; pfree(trans); } --- 2018,2037 { if (trans-
Re: [HACKERS] Small TRUNCATE glitch
Alex Shulgin writes: > > Bruce Momjian writes: >> >> Added to TODO: >> >> o Clear table counters on TRUNCATE >> >> http://archives.postgresql.org/pgsql-hackers/2008-04/msg00169.php > > Hello, > > Attached is a WIP patch for this TODO. This part went as an attachment, which wasn't my intent: It does the trick by tracking if a TRUNCATE command was issued under a (sub)transaction and uses this knowledge to reset the live/dead tuple counters later if the transaction was committed. Testing in simple cases shows that this clears the counters correctly, including use of savepoints. The 2PC part requires extending bool flag to fit the trunc flag, is this approach sane? Given that 2PC transaction should survive server restart, it's reasonable to expect it to also survive the upgrade, so I see no clean way of adding another bool field to the TwoPhasePgStatRecord struct (unless some would like to add checks on record length, etc.). I'm going to add some regression tests, but not sure what would be the best location for this. The truncate.sql seems like natural choice, but stats are not updating realtime, so I'd need to borrow some tricks from stats.sql or better put the new tests in the stats.sql itself? -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small TRUNCATE glitch
Bruce Momjian writes: > > Added to TODO: > > o Clear table counters on TRUNCATE > > http://archives.postgresql.org/pgsql-hackers/2008-04/msg00169.php Hello, Attached is a WIP patch for this TODO. >From 97665ef1ca7d1847e90d4dfab38562135f01fb2b Mon Sep 17 00:00:00 2001 From: Alex Shulgin Date: Tue, 9 Dec 2014 16:35:14 +0300 Subject: [PATCH] WIP: track TRUNCATEs in pgstat transaction stats. The n_live_tup and n_dead_tup counters need to be set to 0 after a TRUNCATE on the relation. We can't issue a special message to the stats collector because the xact might be later aborted, so we track the fact that the relation was truncated during the xact (and reset this xact's insert/update/delete counters). When xact is committed, we use the `truncated' flag to reset the n_live_tup and n_dead_tup counters. --- src/backend/commands/tablecmds.c | 2 ++ src/backend/postmaster/pgstat.c | 70 src/include/pgstat.h | 3 ++ 3 files changed, 68 insertions(+), 7 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c new file mode 100644 index 1e737a0..192d033 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** ExecuteTruncate(TruncateStmt *stmt) *** 1224,1229 --- 1224,1231 */ reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST); } + + pgstat_count_heap_truncate(rel); } /* diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c new file mode 100644 index c7f41a5..7ff66b5 *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *** typedef struct TwoPhasePgStatRecord *** 200,208 PgStat_Counter tuples_updated; /* tuples updated in xact */ PgStat_Counter tuples_deleted; /* tuples deleted in xact */ Oid t_id; /* table's OID */ ! bool t_shared; /* is it a shared catalog? */ } TwoPhasePgStatRecord; /* * Info about current "snapshot" of stats file */ --- 200,211 PgStat_Counter tuples_updated; /* tuples updated in xact */ PgStat_Counter tuples_deleted; /* tuples deleted in xact */ Oid t_id; /* table's OID */ ! char t_flags; /* see TWOPHASE_PGSTAT_RECORD_*_FLAGs */ } TwoPhasePgStatRecord; + #define TWOPHASE_PGSTAT_RECORD_SHARED_FLAG 0x01 /* is it a shared catalog? */ + #define TWOPHASE_PGSTAT_RECORD_TRUNC_FLAG 0x02 /* was the relation truncated? */ + /* * Info about current "snapshot" of stats file */ *** pgstat_count_heap_delete(Relation rel) *** 1864,1869 --- 1867,1896 } /* + * pgstat_count_heap_truncate - update tuple counters due to truncate + */ + void + pgstat_count_heap_truncate(Relation rel) + { + PgStat_TableStatus *pgstat_info = rel->pgstat_info; + + if (pgstat_info != NULL) + { + /* We have to log the effect at the proper transactional level */ + int nest_level = GetCurrentTransactionNestLevel(); + + if (pgstat_info->trans == NULL || + pgstat_info->trans->nest_level != nest_level) + add_tabstat_xact_level(pgstat_info, nest_level); + + pgstat_info->trans->tuples_inserted = 0; + pgstat_info->trans->tuples_updated = 0; + pgstat_info->trans->tuples_deleted = 0; + pgstat_info->trans->truncated = true; + } + } + + /* * pgstat_update_heap_dead_tuples - update dead-tuples count * * The semantics of this are that we are reporting the nontransactional *** AtEOXact_PgStat(bool isCommit) *** 1927,1932 --- 1954,1961 tabstat->t_counts.t_tuples_deleted += trans->tuples_deleted; if (isCommit) { + tabstat->t_counts.t_truncated = trans->truncated; + /* insert adds a live tuple, delete removes one */ tabstat->t_counts.t_delta_live_tuples += trans->tuples_inserted - trans->tuples_deleted; *** AtEOSubXact_PgStat(bool isCommit, int ne *** 1991,1999 { if (trans->upper && trans->upper->nest_level == nestDepth - 1) { ! trans->upper->tuples_inserted += trans->tuples_inserted; ! trans->upper->tuples_updated += trans->tuples_updated; ! trans->upper->tuples_deleted += trans->tuples_deleted; tabstat->trans = trans->upper; pfree(trans); } --- 2020,2039 { if (trans->upper && trans->upper->nest_level == nestDepth - 1) { ! if (trans->truncated) ! { ! trans->upper->truncated = true; ! /* replace upper xact stats with ours */ ! trans->upper->tuples_inserted = trans->tuples_inserted; ! trans->upper->tuples_updated = trans->tuples_updated; ! trans->upper->tuples_deleted = trans->tuples_deleted; ! } ! else ! { ! trans->upper->tu
Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps
Alvaro Herrera writes: > Alex Shulgin wrote: > >> DEBUG: inserting column 7 value "varchar_transform" >> >> Breakpoint 1, GetSnapshotData (snapshot=0xdb2d60 ) >> at /home/ash/src/postgresql/src/backend/storage/ipc/procarray.c:1413 >> 1413 xmax = ShmemVariableCache->latestCompletedXid; >> >> (gdb) p ShmemVariableCache->latestCompletedXid >> $1 = 4294967295 > > I think you might need to "make distclean", then recompile. If you're > not passing --enable-depend to configure, I suggest you do so; that > greatly reduces such problems. Yeah, that did it. Sorry for the noise, I must have messed it up with the recompilations (note to self to always use --enable-depend). -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps
Alvaro Herrera writes: > Alex Shulgin wrote: > >> DEBUG: inserting column 7 value "varchar_transform" >> >> Breakpoint 1, GetSnapshotData (snapshot=0xdb2d60 ) >> at /home/ash/src/postgresql/src/backend/storage/ipc/procarray.c:1413 >> 1413 xmax = ShmemVariableCache->latestCompletedXid; >> >> (gdb) p ShmemVariableCache->latestCompletedXid >> $1 = 4294967295 > > I think you might need to "make distclean", then recompile. If you're > not passing --enable-depend to configure, I suggest you do so; that > greatly reduces such problems. Well I tried that before posting the complaint, let me try again though. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps
Alex Shulgin writes: > > Figured it out with a pg_usleep in bootstrap.c anyway. Does this look sane? > > > DEBUG: inserting column 6 value "0" > DEBUG: inserted -> 0 > DEBUG: inserting column 7 value "varchar_transform" > TRAP: FailedAssertion("!(((xmax) >= ((TransactionId) 3)))", File: > "/home/ash/src/postgresql/src/backend/storage/ipc/procarray.c", Line: 1414) I've tried to debug this and I feel really dumbfound... DEBUG: inserting column 7 value "varchar_transform" Breakpoint 1, GetSnapshotData (snapshot=0xdb2d60 ) at /home/ash/src/postgresql/src/backend/storage/ipc/procarray.c:1413 1413xmax = ShmemVariableCache->latestCompletedXid; (gdb) p ShmemVariableCache->latestCompletedXid $1 = 4294967295 (gdb) p *ShmemVariableCache $2 = {nextOid = 1, oidCount = 0, nextXid = 3, oldestXid = 3, xidVacLimit = 20003, xidWarnLimit = 2136483650, xidStopLimit = 2146483650, xidWrapLimit = 2147483650, oldestXidDB = 1, oldestCommitTs = 1, newestCommitTs = 0, latestCompletedXid = 4294967295} (gdb) p xmax $3 = 0 (gdb) n 1414Assert(TransactionIdIsNormal(xmax)); (gdb) p xmax $4 = 1 (gdb) p *ShmemVariableCache $5 = {nextOid = 1, oidCount = 0, nextXid = 3, oldestXid = 3, xidVacLimit = 20003, xidWarnLimit = 2136483650, xidStopLimit = 2146483650, xidWrapLimit = 2147483650, oldestXidDB = 1, oldestCommitTs = 1, newestCommitTs = 0, latestCompletedXid = 4294967295} (gdb) p ShmemVariableCache->latestCompletedXid $6 = 4294967295 (gdb) How? Is there an another concurrent process with the old view of VariableCacheData struct where latestCompletedXid still points to oldestCommitTs? This only happens with the CommitTs commit in effect. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps
Craig Ringer writes: > On 12/04/2014 10:50 PM, Alex Shulgin wrote: >> Is there a way to pause the bootstrap process so I can attach gdb to it? > > With a newer gdb, you can instead tell gdb to follow all forks. I wrote > some notes on it recently. > > http://blog.2ndquadrant.com/processes-breakpoints-watchpoints-postgresql/ > > I've found it really handy when debugging crashes in regression tests. > > However, it's often simpler to just: > > ulimit -c unlimited > make check > > (or whatever your crashing test is) then look at the core files that are > produced. Good one, it didn't occur to me that assert makes core files. Figured it out with a pg_usleep in bootstrap.c anyway. Does this look sane? DEBUG: inserting column 6 value "0" DEBUG: inserted -> 0 DEBUG: inserting column 7 value "varchar_transform" TRAP: FailedAssertion("!(((xmax) >= ((TransactionId) 3)))", File: "/home/ash/src/postgresql/src/backend/storage/ipc/procarray.c", Line: 1414) Program received signal SIGABRT, Aborted. 0x7f2757128d27 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x7f2757128d27 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7f275712a418 in __GI_abort () at abort.c:89 #2 0x009088b2 in ExceptionalCondition ( conditionName=0xac0710 "!(((xmax) >= ((TransactionId) 3)))", errorType=0xac01d8 "FailedAssertion", fileName=0xac0178 "/home/ash/src/postgresql/src/backend/storage/ipc/procarray.c", lineNumber=1414) at /home/ash/src/postgresql/src/backend/utils/error/assert.c:54 #3 0x0079e125 in GetSnapshotData ( snapshot=0xdb2d60 ) at /home/ash/src/postgresql/src/backend/storage/ipc/procarray.c:1414 #4 0x0094e02d in GetNonHistoricCatalogSnapshot (relid=1255) at /home/ash/src/postgresql/src/backend/utils/time/snapmgr.c:298 #5 0x0094dfdd in GetCatalogSnapshot (relid=1255) at /home/ash/src/postgresql/src/backend/utils/time/snapmgr.c:272 #6 0x004c8f0d in systable_beginscan (heapRelation=0x1d0e5c0, indexId=2691, indexOK=1 '\001', snapshot=0x0, nkeys=1, key=0x7fff201bbc40) at /home/ash/src/postgresql/src/backend/access/index/genam.c:275 #7 0x00885070 in regprocin (fcinfo=0x7fff201bbce0) at /home/ash/src/postgresql/src/backend/utils/adt/regproc.c:106 #8 0x00914fe7 in InputFunctionCall (flinfo=0x7fff201bc0c0, str=0x1d349b8 "varchar_transform", typioparam=24, typmod=-1) ---Type to continue, or q to quit--- at /home/ash/src/postgresql/src/backend/utils/fmgr/fmgr.c:1914 #9 0x0091533e in OidInputFunctionCall (functionId=44, str=0x1d349b8 "varchar_transform", typioparam=24, typmod=-1) at /home/ash/src/postgresql/src/backend/utils/fmgr/fmgr.c:2045 #10 0x0052af91 in InsertOneValue (value=0x1d349b8 "varchar_transform", i=7) at /home/ash/src/postgresql/src/backend/bootstrap/bootstrap.c:840 #11 0x00527409 in boot_yyparse () at /home/ash/src/postgresql/src/backend/bootstrap/bootparse.y:455 #12 0x0052a26b in BootstrapModeMain () at /home/ash/src/postgresql/src/backend/bootstrap/bootstrap.c:494 #13 0x0052a177 in AuxiliaryProcessMain (argc=6, argv=0x1cc8378) at /home/ash/src/postgresql/src/backend/bootstrap/bootstrap.c:414 #14 0x006a327c in main (argc=7, argv=0x1cc8370) at /home/ash/src/postgresql/src/backend/main/main.c:212 (gdb) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps
Alvaro Herrera writes: > > Uh, that's odd. Can you please get a stack trace? Do you have unusual > settings or a patched build? Is there a way to pause the bootstrap process so I can attach gdb to it? -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps
Alvaro Herrera writes: > Alex Shulgin wrote: > >> Also this commit breaks initdb of `make check' for me: >> >> creating template1 database in >> /home/ash/build/postgresql/HEAD/src/test/regress/./tmp_check/data/base/1 >> ... TRAP: FailedAssertion("!(((xmax) >= ((TransactionId) 3)))", >> File: >> "/home/ash/src/postgresql/src/backend/storage/ipc/procarray.c", >> Line: 1414) >> Aborted (core dumped) >> child process exited with exit code 134 > > Uh, that's odd. Can you please get a stack trace? Do you have unusual > settings or a patched build? Not really, and I would mention that. Will get a trace. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps
Heikki Linnakangas writes: > On 12/03/2014 04:54 PM, Alvaro Herrera wrote: >> ir commit timestamp directly as they commit, >> or an external transaction c > > Sorry, I'm late to the party, but here's some random comments on this > after a quick review: Also this commit breaks initdb of `make check' for me: creating template1 database in /home/ash/build/postgresql/HEAD/src/test/regress/./tmp_check/data/base/1 ... TRAP: FailedAssertion("!(((xmax) >= ((TransactionId) 3)))", File: "/home/ash/src/postgresql/src/backend/storage/ipc/procarray.c", Line: 1414) Aborted (core dumped) child process exited with exit code 134 -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Andres Freund writes: > On 2014-12-02 17:25:14 +0300, Alex Shulgin wrote: >> I'd be in favor of a solution that works the same way as before the >> patch, without the need for extra trigger files, etc., but that doesn't >> seem to be nearly possible. Whatever tricks we might employ will likely >> be defeated by the fact that the oldschool user will fail to *include* >> recovery.conf in the main conf file. > > I think removing the ability to define a trigger file is pretty much > unacceptable. It's not *too* bad to rewrite recovery.conf's contents > into actual valid postgresql.conf format, but it's harder to change an > existing complex failover setup that relies on the existance of such a > trigger. I think aiming for removal of that is a sure way to prevent > the patch from getting in. To make it clear, I was talking not about trigger_file, but about standby.enabled file which triggers the recovery/pitr/standby scenario in the current form of the patch and stands as a replacement check instead of the original check for the presense of recovery.conf. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Michael Paquier writes: > On Tue, Dec 2, 2014 at 1:58 AM, Alex Shulgin wrote: >> Here's the patch rebased against current HEAD, that is including the >> recently committed action_at_recovery_target option. > If this patch gets in, it gives a good argument to jump to 10.0 IMO. > That's not a bad thing, only the cost of making recovery params as > GUCs which is still a feature wanted. > >> The default for the new GUC is 'pause', as in HEAD, and >> pause_at_recovery_target is removed completely in favor of it. > Makes sense. Another idea that popped out was to rename this parameter > as recovery_target_action as well, but that's not really something > this patch should care about. Indeed, but changing the name after the fact is straightforward. >> I've also taken the liberty to remove that part that errors out when >> finding $PGDATA/recovery.conf. > I am not in favor of this part. It may be better to let the users know > that their old configuration is not valid anymore with an error. This > patch cuts in the flesh with a huge axe, let's be sure that users do > not ignore the side pain effects, or recovery.conf would be simply > ignored and users would not be aware of that. Yeah, that is good point. I'd be in favor of a solution that works the same way as before the patch, without the need for extra trigger files, etc., but that doesn't seem to be nearly possible. Whatever tricks we might employ will likely be defeated by the fact that the oldschool user will fail to *include* recovery.conf in the main conf file. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Alex Shulgin writes: > > Here's an attempt to revive this patch. Here's the patch rebased against current HEAD, that is including the recently committed action_at_recovery_target option. The default for the new GUC is 'pause', as in HEAD, and pause_at_recovery_target is removed completely in favor of it. I've also taken the liberty to remove that part that errors out when finding $PGDATA/recovery.conf. Now get your rotten tomatoes ready. ;-) -- Alex recovery_guc_v5.4.patch.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf
Christoph Berg writes: > Re: Petr Jelinek 2014-11-25 <5474efea.2040...@2ndquadrant.com> >> >Patch committed. Before I go and rebase that recovery.conf -> GUC patch on top of this... is it final? >> >> Thanks! > > I'm a bit late to the party, but wouldn't > > recovery_target_action = ... > > have been a better name for this? It'd be in line with the other > recovery_target_* parameters, and also a bit shorter than the imho > somewhat ugly "action_at_recovery_target". FWIW, I too think that "recovery_target_action" is a better name. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace
Julien Tachoires writes: > > 2011/12/15 Alvaro Herrera : >> >> Uhm, surely you could compare the original toast tablespace to the heap >> tablespace, and if they differ, handle appropriately when creating the >> new toast table? =A0Just pass down the toast tablespace into >> AlterTableCreateToastTable, instead of having it assume that >> rel->rd_rel->relnamespace is sufficient. =A0This should be done in all >> cases where a toast tablespace is created, which shouldn't be more than >> a handful of them. > > Thank you, that way seems right. > Now, I distinguish before each creation of a TOAST table with > AlterTableCreateToastTable() : if it will create a new one or recreate > an existing one. > Thus, in create_toast_table() when toastTableSpace is equal to > InvalidOid, we are able : > - to fallback to the main table tablespace in case of new TOAST table creat= > ion > - to keep it previous tablespace in case of recreation. > > Here's a new version rebased against HEAD. Almost 3 years later, here's a version rebased against current HEAD. :-) It compiles and even passes the included regression test. There were doubts originally if this is actually a useful feature, but there is at least one plausible use case (main table on SSD, TOAST on HDD): http://www.postgresql.org/message-id/4f3267ee.80...@deltasoft.no I didn't add anything on top of the latest version, but I notice we've added mat. views since then, so it looks like we now also need this: ALTER MATERIALIZED VIEW SET [VIEW | TOAST] TABLESPACE Should I add this patch to the next CommitFest? -- Alex set_toast_tablespace_v0.11.patch.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Dag-Erling Smørgrav writes: > Alex Shulgin writes: >> OK, looks like I've come up with something workable: I've added >> sslprotocol connection string keyword similar to pre-existing >> sslcompression, etc. Please see attached v2 of the original patch. >> I'm having doubts about the name of openssl.h header though, >> libpq-openssl.h? > > Perhaps ssloptions.[ch], unless you plan to add non-option-related code > there later? I don't think anything else than common options-related code fits in there, so ssloptions.c makes sense to me. > BTW, there is no Regent code in your openssl.c, so the copyright > statement is incorrect. Good catch, I just blindly copied that from some other file. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Alex Shulgin writes: >>> >>> I can do that too, just need a hint where to look at in libpq/psql to >>> add the option. >> >> The place to *enforce* the option is src/interfaces/libpq/fe-secure.c >> (look for SSLv23_method() and SSL_CTX_set_options()). I haven't looked >> into how to set it. > > Yes, I've figured it out. Guess we'd better share the ssl_protocol > value parser code between libpq and the backend. Any precedent? OK, looks like I've come up with something workable: I've added sslprotocol connection string keyword similar to pre-existing sslcompression, etc. Please see attached v2 of the original patch. I'm having doubts about the name of openssl.h header though, libpq-openssl.h? -- Alex >From 7cd60e962ce5e7fc10dc52ed9c92b0b2b5c4c7f1 Mon Sep 17 00:00:00 2001 From: Alex Shulgin Date: Wed, 19 Nov 2014 19:49:29 +0300 Subject: [PATCH] DRAFT: ssl_protocols GUC --- doc/src/sgml/config.sgml | 29 ++ doc/src/sgml/libpq.sgml | 25 ++ src/backend/libpq/Makefile| 2 +- src/backend/libpq/be-secure-openssl.c | 29 -- src/backend/libpq/be-secure.c | 3 +- src/backend/libpq/openssl.c | 124 ++ src/backend/utils/misc/guc.c | 15 src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/libpq/libpq.h | 1 + src/include/libpq/openssl.h | 50 +++ src/interfaces/libpq/Makefile | 8 +- src/interfaces/libpq/fe-connect.c | 4 + src/interfaces/libpq/fe-secure-openssl.c | 18 +++- src/interfaces/libpq/libpq-int.h | 1 + 14 files changed, 300 insertions(+), 10 deletions(-) create mode 100644 src/backend/libpq/openssl.c create mode 100644 src/include/libpq/openssl.h diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml new file mode 100644 index ab8c263..8b701df *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** include_dir 'conf.d' *** 1027,1032 --- 1027,1061 + + ssl_protocols (string) + +ssl_protocols configuration parameter + + + + + Specifies a colon-separated list of SSL protocols that are + allowed to be used on secure connections. Each entry in the list can + be prefixed by a + (add to the current list) + or - (remove from the current list). If no prefix is used, + the list is replaced with the specified protocol. + + + The full list of supported protocols can be found in the + the openssl manual page. In addition to the names of + individual protocols, the following keywords can be + used: ALL (all protocols supported by the underlying + crypto library), SSL (all supported versions + of SSL) and TLS (all supported versions + of TLS). + + + The default is ALL:-SSL. + + + + ssl_ciphers (string) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml new file mode 100644 index e23e91d..9ea71a4 *** a/doc/src/sgml/libpq.sgml --- b/doc/src/sgml/libpq.sgml *** postgresql://%2Fvar%2Flib%2Fpostgresql/d *** 1230,1235 --- 1230,1250 + + sslprotocols + + + Specifies a colon-separated list of SSL protocols that are + allowed to be used on secure connections. + See server configuration parameter + for format. + + + Defaults to ALL:-SSL. + + + + sslcompression *** myEventProc(PGEventId evtId, void *evtIn *** 6711,6716 --- 6726,6741 + + + +PGSSLPROTOCOLS + + PGSSLPROTOCOLS behaves the same as the connection parameter. + + + diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile new file mode 100644 index 09410c4..1c058ec *** a/src/backend/libpq/Makefile --- b/src/backend/libpq/Makefile *** OBJS = be-fsstubs.o be-secure.o auth.o c *** 18,24 pqformat.o pqmq.o pqsignal.o ifeq ($(with_openssl),yes) ! OBJS += be-secure-openssl.o endif include $(top_srcdir)/src/backend/common.mk --- 18,24 pqformat.o pqmq.o pqsignal.o ifeq ($(with_openssl),yes) ! OBJS += be-secure-openssl.o openssl.o endif include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c new file mode 100644 index b05364c..3801455 *** a/src/backend/libpq/be-secure-openssl.c --- b/src/backend/libpq/be-secure-openssl.c *** *** 7
Re: [HACKERS] Follow up to irc on CREATE INDEX vs. maintenance_work_mem on 9.3
Andrew Gierth writes: >>>>>> "Alex" == Alex Shulgin writes: > > > Tom Lane writes: > >> Must've been my evil twin. > > Alex> Sorry, I must be under false impression that RhodiumToad is > Alex> *your* nick on #postgresql at freenode. I don't recall who > Alex> told me that, but I was pretty sure it's you. :-p > > ... what > > People do occasionally make jokes on IRC about me being Tom's clone; I > know they mean it in a positive way but I still find it *extremely* > annoying, so I do try and discourage it. (If they're making those same > jokes elsewhere, I haven't been aware of it, but please consider this > a polite public request to stop.) > > My first name is easily visible in the irc gecos field: > > *** RhodiumToad is ~andrew@[my hostname] (Andrew) > > and there is also the IRC users list on the wiki: > http://wiki.postgresql.org/wiki/IRC2RWNames Andrew, Tom, Sorry for the confusion. And, Andrew, thanks again for the help! :-) -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Follow up to irc on CREATE INDEX vs. maintenance_work_mem on 9.3
Tom Lane writes: > Stephen Frost writes: >> * Alex Shulgin (a...@commandprompt.com) wrote: >>> Tom, >>> >>> First of all, thanks for your help on IRC last time with that CREATE >>> INDEX memory consumption problem. > >> Doubt it was Tom, but if it was, wanna share what channel on IRC it was? >> :D > > Must've been my evil twin. Sorry, I must be under false impression that RhodiumToad is *your* nick on #postgresql at freenode. I don't recall who told me that, but I was pretty sure it's you. :-p -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Follow up to irc on CREATE INDEX vs. maintenance_work_mem on 9.3
Tom, First of all, thanks for your help on IRC last time with that CREATE INDEX memory consumption problem. As has been pointed out in a stackexchange answer to my question[1], it is indeed the limitation of pre-9.4 versions, but the limit is imposed on memtuples array, rather than total memory the sort in CREATE INDEX may allocate. The memtuples won't grow further than MaxAllocSize and I've got 24x50x10^6 = 1200MB, which just doesn't fit. We've got a customer who is testing a migration to PostgreSQL-9.3 (from $some_other_db), thus they load the tables first (some of their tables have 10-100 million rows), then create the indexes and they constantly see disk sort being used despite lots of available RAM and maintenance_work_mem set to increasingly higher values. Now my question, is it feasible to back-patch this to 9.3? Or should we tell the customer to wait before 9.4 is released? Thanks. -- Alex [1] http://dba.stackexchange.com/questions/83600/postgresql-create-index-memory-requirement -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication connection URI?
Heikki Linnakangas writes: >>> >>> The second one is a self-contained fix, but the third one which is the >>> actual patch depends on the second one, because it specifies the dbname >>> keyword two times: first to parse the conninfo/URI, then to override any >>> dbname provided by the user with "replication" pseudo-database name. >> >> Hmm. Should we backpatch the second patch? It sure seems like an >> oversight rather than deliberate that you can't override dbname from the >> connection string with a later dbname keyword. I'd say "yes". >> >> How about the third patch? Probably not; it was an oversight with the >> connection URI patch that it could not be used in primary_conninfo, but >> it's not a big deal in practice as you can always use a non-URI >> connection string instead. > > Ok, committed the second patch to all stable branches, and the third > patch to master. > > In the second patch, I added a sentence to the docs to mention that > only the first "dbname" parameter is expanded. It's debatable if > that's what we actually want. It would be handy to be able to merge > multiple connection strings, by specifying multiple dbname > parameters. But now the docs at least match the code, changing the > behavior would be a bigger change. > > From the third patch, I removed the docs changes. It's necessary to > say "connection string or URI" everywhere, the URI format is just one > kind of a connection string. I also edited the code that builds the > keyword/value array, to make it a bit more readable. Yay, many thanks! :-) -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Missing OOM checks in libpq (was Re: [HACKERS] Replication connection URI?)
Heikki Linnakangas writes: >>> >>> I think we need to fix all of those, and backpatch. Per attached. >> >> Yikes! Looks sane to me. > > Ok thanks, committed. It didn't apply cleanly to 9.0, 9.1 and 9.2, so > the patch for those branches looks a bit different. Great. Are you looking at the actual replication URI patch? Or should I add it to commitfest so we don't lose track of it? -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Missing OOM checks in libpq (was Re: [HACKERS] Replication connection URI?)
Heikki Linnakangas writes: > On 11/24/2014 06:05 PM, Alex Shulgin wrote: >> The first patch is not on topic, I just spotted this missing check. > >> *** a/src/interfaces/libpq/fe-connect.c >> --- b/src/interfaces/libpq/fe-connect.c >> *** conninfo_array_parse(const char *const * >> *** 4402,4407 >> --- 4402,4415 >> if >> (options[k].val) >> >> free(options[k].val); >> options[k].val >> = strdup(str_option->val); >> +if >> (!options[k].val) >> +{ >> + >> printfPQExpBuffer(errorMessage, >> + >> libpq_gettext("out of memory\n")); >> + >> PQconninfoFree(options); >> + >> PQconninfoFree(dbname_options); >> +return >> NULL; >> +} >> break; >> } >> } > > Oh. There are actually many more places in connection option parsing > that don't check the return value of strdup(). The one in fillPGConn > even has an XXX comment saying it probably should check it. You can > get quite strange behavior if one of them fails. If for example the > strdup() on dbname fails, you might end up connecting to different > database than intended. And if the "conn->sslmode = > strdup(DefaultSSLMode);" call in connectOptions2 fails, you'll get a > segfault later because at least connectDBstart assumes that sslmode is > not NULL. > > I think we need to fix all of those, and backpatch. Per attached. Yikes! Looks sane to me. I've checked that patches #2 and #3 can be applied on top of this, w/o rebasing. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication connection URI?
Alex Shulgin writes: > Heikki Linnakangas writes: >>> >>> It appears that replication connection doesn't support URI but only the >>> traditional conninfo string. >>> >>> src/backend/replication/libpqwalreceiver/libpqwalreceiver.c:99: in >>> libpqrcv_connect(): >>> >>> snprintf(conninfo_repl, sizeof(conninfo_repl), >>> "%s dbname=replication replication=true >>> fallback_application_name=walreceiver", >>> conninfo); >>> >>> A patch to fix this welcome? >> >> Yeah, seems like an oversight. Hopefully you can fix that without >> teaching libpqwalreceiver what connection URIs look like.. > > Please see attached. We're lucky that PQconnectdbParams has an option > to parse and expand the first dbname parameter if it looks like a > connection string (or a URI). > > The first patch is not on topic, I just spotted this missing check. > > The second one is a self-contained fix, but the third one which is the > actual patch depends on the second one, because it specifies the dbname > keyword two times: first to parse the conninfo/URI, then to override any > dbname provided by the user with "replication" pseudo-database name. These patches are really simple, I hope a committer will pick them up? Or should I add them to the commitfest? Also, I'd rather get this committed first, then rebase that recovery.conf->GUC patch onto it and submit an updated version. Thanks. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Josh Berkus writes: >>> >>> Before I go into my ideas, though, what does the current patch do >>> regarding non-replication PITR? >> >> It removes that $PGDATA/standby.enable trigger file it relies on to >> start the PITR in the first place. > > OK, and that's required for replication too? I'm OK with that if it > gets the patch in. In the current form of the patch, yes. Thought I don't think I like it. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Josh Berkus writes: >> >> only that you need to start in recovery mode to start replication > > Right, but my point is that having a trigger file *is not necessary for > replication, only for PITR* -- and maybe not necessary even for PITR. > That is, in a streaming replication configuration, having a > "standby_mode = on|off" parameter is 100% sufficient to control > replication with the small detail that "pg_ctl promote" needs to set it > in pg.auto.conf or conf.d. > > And, now, having given it some thought, I'm going to argue that it's not > required for PITR either, provided that we can use the auto.conf method. > > Before I go into my ideas, though, what does the current patch do > regarding non-replication PITR? It removes that $PGDATA/standby.enable trigger file it relies on to start the PITR in the first place. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Jaime Casanova writes: >>> >>> Either way, from the code it is clear that we only stay in recovery if >>> standby_mode is directly turned on. This makes the whole check for a >>> specially named file unnecessary, IMO: we should just check the value of >>> standby_mode (which is off by default). > > no. currently we enter in recovery mode when postgres see a > recovery.conf and stays in recovery mode when standby_mode is on or an > appropiate restore_command is provided. > > which means recovery.conf has two uses: > 1) start in recovery mode (not continuous) > 2) provide parameters for recovery mode and for streaming > > we still need a "recovery trigger" file that forces postgres to start > in recovery mode and acts accordingly to recovery GUCs Yes, these were my latest findings also, but if instead of removing the trigger file upon successful recovery the server would set standby_mode=off, that would also work. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication connection URI?
Heikki Linnakangas writes: >> >> It appears that replication connection doesn't support URI but only the >> traditional conninfo string. >> >> src/backend/replication/libpqwalreceiver/libpqwalreceiver.c:99: in >> libpqrcv_connect(): >> >> snprintf(conninfo_repl, sizeof(conninfo_repl), >> "%s dbname=replication replication=true >> fallback_application_name=walreceiver", >> conninfo); >> >> A patch to fix this welcome? > > Yeah, seems like an oversight. Hopefully you can fix that without > teaching libpqwalreceiver what connection URIs look like.. Please see attached. We're lucky that PQconnectdbParams has an option to parse and expand the first dbname parameter if it looks like a connection string (or a URI). The first patch is not on topic, I just spotted this missing check. The second one is a self-contained fix, but the third one which is the actual patch depends on the second one, because it specifies the dbname keyword two times: first to parse the conninfo/URI, then to override any dbname provided by the user with "replication" pseudo-database name. Have a nice day! -- Alex >From 156e6faa96ad6a2ce58055ad72883ed78c576e5b Mon Sep 17 00:00:00 2001 From: Alex Shulgin Date: Mon, 24 Nov 2014 16:55:50 +0300 Subject: [PATCH 1/3] Add missing check on OOM in expand_dbname path of conninfo_array_parse(). --- src/interfaces/libpq/fe-connect.c | 8 1 file changed, 8 insertions(+) diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c new file mode 100644 index 3fe8c21..d7f2ec2 *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *** conninfo_array_parse(const char *const * *** 4402,4407 --- 4402,4415 if (options[k].val) free(options[k].val); options[k].val = strdup(str_option->val); + if (!options[k].val) + { + printfPQExpBuffer(errorMessage, + libpq_gettext("out of memory\n")); + PQconninfoFree(options); + PQconninfoFree(dbname_options); + return NULL; + } break; } } -- 2.1.0 >From 44d9d6a2c9cf5af83988f9d3b6eeb39c36104ef9 Mon Sep 17 00:00:00 2001 From: Alex Shulgin Date: Mon, 24 Nov 2014 18:12:51 +0300 Subject: [PATCH 2/3] Allow further dbname=value to override conninfo parsed from an expanded dbname in conninfo_array_parse(). --- src/interfaces/libpq/fe-connect.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c new file mode 100644 index d7f2ec2..5b45128 *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *** conninfo_parse(const char *conninfo, PQE *** 4302,4311 * Defaults are supplied (from a service file, environment variables, etc) * for unspecified options, but only if use_defaults is TRUE. * ! * If expand_dbname is non-zero, and the value passed for keyword "dbname" is a ! * connection string (as indicated by recognized_connection_string) then parse ! * and process it, overriding any previously processed conflicting ! * keywords. Subsequent keywords will take precedence, however. */ static PQconninfoOption * conninfo_array_parse(const char *const * keywords, const char *const * values, --- 4302,4313 * Defaults are supplied (from a service file, environment variables, etc) * for unspecified options, but only if use_defaults is TRUE. * ! * If expand_dbname is non-zero, and the value passed for the first occurrence ! * of "dbname" keyword is a connection string (as indicated by ! * recognized_connection_string) then parse and process it, overriding any ! * previously processed conflicting keywords. Subsequent keywords will take ! * precedence, however. In particular, a further occurrence of "dbname" may ! * override the dbname provided in the connection string. */ static PQconninfoOption * conninfo_array_parse(const char *const * keywords, const char *const * values, *** conninfo_array_parse(const char *const * *** 4381,4387 } /* ! * If we are on the dbname parameter, and we have a parsed * connection string, copy those parameters across, overriding any * existing previous settings. */ --- 4383,4389 } /* ! * If we are on the *first* dbname parameter, and we have a parsed * connection string, copy those parameters across, overriding any * existing previous settings. */ *** conninfo_array_parse(const char *const * *** 4415,4420 --- 4417,4428 } } } + /* + * Clear out the parsed dbname, so that a possible further + * occu
[HACKERS] Replication connection URI?
Hackers, It appears that replication connection doesn't support URI but only the traditional conninfo string. src/backend/replication/libpqwalreceiver/libpqwalreceiver.c:99: in libpqrcv_connect(): snprintf(conninfo_repl, sizeof(conninfo_repl), "%s dbname=replication replication=true fallback_application_name=walreceiver", conninfo); A patch to fix this welcome? -- Alex PS: I wrote the original URI parser used in libpq. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Josh Berkus writes: >> >> Well, the latest version of this patch fails to start when it sees >> 'recovery.conf' in PGDATA: >> >> FATAL: "recovery.conf" is not supported anymore as a recovery method >> DETAIL: Refer to appropriate documentation about migration methods >> >> I've missed all the discussion behind this decision and after reading >> the ALTER SYSTEM/conf.d thread I'm even more confused so I'd like >> someone more knowledgeable to speak up on the status of this. > > The argument was that people wanted to be able to have an empty > recovery.conf file as a "we're in standby" trigger so that they could > preserve backwards compatibility with external tools. I don't agree > with this argument, but several people championed it. It doesn't look like we can provide for 100% backwards compatibility with existing tools, here's why: a) The only sensible way to use recovery.conf as GUC source is via include/include_dir from postgresql.conf. b) The server used to rename $PGDATA/recovery.conf to $PGDATA/recovery.done so when it is re-started it doesn't accidentally start in recovery mode. We can't keep doing (b) because that will make postgres fail to start on a missing include. Well, it won't error out if we place the file under include_dir, as only files with the ".conf" suffix are included, but then again the server wouldn't know which file to rename unless we tell it or hard-code the the filename. Either way this is not 100% backwards compatible. Now the question is if we are going to break compatibility anyway: should we try to minimize the difference or will it disserve the user by providing a false sense of compatibility? For one, if we're breaking things, the trigger_file GUC should be renamed to end_recovery_trigger_file or something like that, IMO. That's if we decide to keep it at all. >> The docs use the term "continuous recovery". >> >> Either way, from the code it is clear that we only stay in recovery if >> standby_mode is directly turned on. This makes the whole check for a >> specially named file unnecessary, IMO: we should just check the value of >> standby_mode (which is off by default). > > So, what happens when someone does "pg_ctl promote"? Somehow > standby_mode needs to get set to "off". Maybe we write "standby_mode = > off" to postgresql.auto.conf? Well, standby_mode is only consulted at startup after crash recovery. But suddenly, a tool that *writes* recovery.conf needs to also consult/edit postgresql.auto.conf... Maybe that's inevitable, but still a point to consider. >> By the way, is there any use in setting standby_mode=on and any of the >> recovery_target* GUCs at the same time? > > See my thread on this topic from last week. Short answer: No. > >> I think it can only play together if you set the target farther than the >> latest point you've got in the archive locally. So that's sort of >> "Point-in-Future-Recovery". Does that make any sense at all? > > Again, see the thread. This doesn't work in a useful way, so there's no > reason to write code to enable it. Makes sense. Should we incorporate the actual tech and doc fix in this patch? >>>> /* File path names (all relative to $PGDATA) */ >>>> -#define RECOVERY_COMMAND_FILE "recovery.conf" >>>> -#define RECOVERY_COMMAND_DONE "recovery.done" >>>> +#define RECOVERY_ENABLE_FILE "standby.enabled" >>> >>> Imo the file and variable names should stay coherent. >> >> Yes, once we settle on the name (and if we really need that extra >> trigger file.) > > Personally, if we have three methods of promotion: > > 1) pg_ctl promote > 2) edit postgresql.conf and reload > 3) ALTER SYSTEM SET and reload > > ... I don't honestly think we need a 4th method for promotion. Me neither. It is tempting to make everything spin around GUCs without the need for any external trigger files. > The main reason to want a "we're in recovery file" is for PITR rather > than for replication, where it has a number of advantages as a method, > the main one being that recovery.conf is unlikely to be overwritten by > the contents of the backup. > > HOWEVER, there's a clear out for this with conf.d. If we enable conf.d > by default, then we can simply put recovery settings in conf.d, and > since that specific file (which can have whatever name the user chooses) > will not be part of backups, it would have the same advantage as > recovery.conf, without the drawbacks. > > Discuss? Well, I don't readily see how conf.d is special with regard to base backup, wouldn't you need to exclude it explicitly still? -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Alex Shulgin writes: > >>> > * Why do you include xlog_internal.h in guc.c and not xlog.h? >> >>> we actually need both but including xlog_internal.h also includes xlog.h >>> i added xlog.h and if someone things is enough only putting >>> xlog_internal.h let me know >> >> What's required from xlog_internal.h? > > Looks like this was addressed before me. Actually, it's there to bring in MAXFNAMELEN which is used in check_recovery_target_name. Not sure how it even compiled without that, but after some branch switching it doesn't anymore. :-p Maybe we should move these check/assign hooks to xlog.c, but that's likely going to create header files dependency problem due to use of GucSource in the hook prototype... -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
mptz_in() and even > DateTimeParseError(), replacing the ereport()s by the guc error > reporting. The use of PG_TRY/CATCH does protect from FATALs in SIGHUP indeed. Using CopyErrorData() we can also fetch the actual error message from timestamptz_in, though I wonder we really have to make a full copy. >> > * Why do you include xlog_internal.h in guc.c and not xlog.h? > >> we actually need both but including xlog_internal.h also includes xlog.h >> i added xlog.h and if someone things is enough only putting >> xlog_internal.h let me know > > What's required from xlog_internal.h? Looks like this was addressed before me. >> diff --git a/src/backend/access/transam/xlog.c >> b/src/backend/access/transam/xlog.c >> index b53ae87..54f6a0d 100644 >> --- a/src/backend/access/transam/xlog.c >> +++ b/src/backend/access/transam/xlog.c >> @@ -64,11 +64,12 @@ >> extern uint32 bootstrap_data_checksum_version; >> >> /* File path names (all relative to $PGDATA) */ >> -#define RECOVERY_COMMAND_FILE "recovery.conf" >> -#define RECOVERY_COMMAND_DONE "recovery.done" >> +#define RECOVERY_ENABLE_FILE"standby.enabled" > > Imo the file and variable names should stay coherent. Yes, once we settle on the name (and if we really need that extra trigger file.) >> +/* recovery.conf is not supported anymore */ >> +#define RECOVERY_COMMAND_FILE "recovery.conf" > >> +boolStandbyModeRequested = false; >> +static TimestampTz recoveryDelayUntilTime; > > This imo should be lowercase now, the majority of GUC variables are. This is not a GUC variable, though it's calculated based on a GUC recovery_min_apply_delay. >> +/* are we currently in standby mode? */ >> +bool StandbyMode = false; > > Why did you move this? It was easy to move it back though. >> -if (rtliGiven) >> +if (strcmp(recovery_target_timeline_string, "") != 0) >> { > > Why not have the convention that NULL indicates a unset target_timeline > like you use for other GUCs? Mixing things like this is confusing. > > Why is recovery_target_timeline stored as a string? Because it's a > unsigned int? If so, you should have an assign hook setting up a) > rtliGiven, b) properly typed variable. Yes, I believe setting these to NULL by default makes a lot more sense. Then one can check if there was a non-default setting by checking the *_string variable, which is not possible with int or TimestampTz. >> -if (rtli) >> +if (recovery_target_timeline) >> { >> /* Timeline 1 does not have a history file, all else >> should */ >> -if (rtli != 1 && !existsTimeLineHistory(rtli)) >> +if (recovery_target_timeline != 1 && >> + >> !existsTimeLineHistory(recovery_target_timeline)) >> ereport(FATAL, >> (errmsg("recovery target >> timeline %u does not exist", >> -rtli))); >> -recoveryTargetTLI = rtli; >> + >> recovery_target_timeline))); >> +recoveryTargetTLI = recovery_target_timeline; >> recoveryTargetIsLatest = false; > > So, now we have a recoveryTargetTLI and recovery_target_timeline > variable? Really? Why do we need recoveryTargetTLI at all now? Looks like we still do need both of them. The initial value of recoveryTargetTLI is derived from pg_control, but later it can be overriden by this setting. However, if the recovery_target_timeline was "latest" we need to find the latest TLI, based on the initial value from pg_control. But we can't set final recoveryTargetTLI value in the GUC assign hook, because we might need to fetch some history files first. >> +static void >> +assign_recovery_target_time(const char *newval, void *extra) >> +{ >> +recovery_target_time = *((TimestampTz *) extra); >> + >> +if (recovery_target_xid != InvalidTransactionId) >> +recovery_target = RECOVERY_TARGET_XID; >> +else if (recovery_target_name[0]) >> +recovery_target = RECOVERY_TARGET_NAME; >> +else if (recovery_target_time != 0) >> +recovery_target = RECOVERY_TARGET_TIME; >> +else >> +recovery_target = RECOVERY_TARGET_UNSET; >> +} >> + > > I don't think it's correct to do such han
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Dag-Erling Smørgrav writes: > Alex Shulgin writes: >> I can do that too, just need a hint where to look at in libpq/psql to >> add the option. > > The place to *enforce* the option is src/interfaces/libpq/fe-secure.c > (look for SSLv23_method() and SSL_CTX_set_options()). I haven't looked > into how to set it. Yes, I've figured it out. Guess we'd better share the ssl_protocol value parser code between libpq and the backend. Any precedent? -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging recovery.conf into PostgreSQL.conf -- where did this go?
Andres Freund writes: > > On 2014-11-19 15:09:10 -0800, Josh Berkus wrote: >> One patch that got deferred from 9.4 was the merger of recovery.conf and >> postgresql.conf, due to conflicts with ALTER SYSTEM SET. However, this >> is still a critical TODO, because recovery.conf is still an ongoing >> major management problem for PostgreSQL DBAs. >> >> So, what happened to this? I kinda expected it to get committed right >> after we opened 9.5. > > Nobody did the remaining work. I'd like to work on this. AFAICT, this CommitFest entry is the latest on the subject, right? https://commitfest.postgresql.org/action/patch_view?id=1293 Should/may I move it to the next Open fest? -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Dag-Erling Smørgrav writes: > Alex Shulgin writes: >> * The patch works as advertised, though the only way to verify that >> connections made with the protocol disabled by the GUC are indeed >> rejected is to edit fe-secure-openssl.c to only allow specific TLS >> versions. Adding configuration on the libpq side as suggested in the >> original discussion might help here. > > I can easily do that, but I won't have time until next week or so. I can do that too, just need a hint where to look at in libpq/psql to add the option. For SSL we have sslmode and sslcompression, etc. in conninfo, so adding sslprotocols seems to be an option. As an aside note: should we also expose a parameter to choose SSL ciphers (would be a separate patch)? -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Dag-Erling Smørgrav writes: > > The attached patches add an ssl_protocols configuration option which > control which versions of SSL or TLS the server will use. The syntax is > similar to Apache's SSLProtocols directive, except that the list is > colon-separated instead of whitespace-separated, although that is easy > to change if it proves unpopular. Hello, Here is my review of the patch against master branch: * The patch applies and compiles cleanly, except for sgml docs: postgres.xml:66141: element varlistentry: validity error : Element varlistentry content does not follow the DTD, expecting (term+ , listitem), got (term indexterm listitem) under the tag in the material added to config.sgml by the patch. * The patch works as advertised, though the only way to verify that connections made with the protocol disabled by the GUC are indeed rejected is to edit fe-secure-openssl.c to only allow specific TLS versions. Adding configuration on the libpq side as suggested in the original discussion might help here. * The code allows specifying SSLv2 and SSLv3 in the GUC, but removes them forcibly after parsing the complete string (a warning is issued). Should we also add a note about this to the documentation? * In case the list of allowed protocols comes empty a FATAL error message is logged: "could not set the protocol list (no valid protocols available)". I think it's worth changing that to "could not set SSL protocol list..." All other related error/warning logs include "SSL". * The added code follows conventions and looks good to me. I didn't spot any problems in the parser. I've tried a good deal of different values and all seem to be parsed correctly. I would try to apply patches for older branches if there is consensus that we really need this patch and we want to back-patch it. Thanks. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing unreferenced files
Bruce Momjian writes: > > Here is a cleaned-up version of the unreference file patch that was > discussed extensively in May of 2005. I want to get it into the > archives in case someone else want to work on it. > > Here is a reference to the work still needed on the patch: > > http://archives.postgresql.org/pgsql-patches/2005-05/msg00024.php Hello, This is a TODO item, but I'd like to know if there's general interest in this feature at the moment. I think it can be useful in a situation when the DBA knows that the database was shutdown or crashed during extensive data load and has a way to proceed without the need to start over with the clean database. In such a case having a tool to report or remove any stale files makes perfect sense. My other concern is that if at some point there is a bug in DROP TABLE/INDEX that leaves the files it's supposed to remove, we would likely want to know it. Regardless of the outcome of this discussion it is interesting to know if such functionality can only be programmed in backend/startup or making it a separate tool is feasible (the tool will only run with the stopped server)? There was once a tool named pgfsck[1], which was never updated after 8.2, so the name is already taken... Yet another point is that one might be interested not only in reporting stale files, but any *missing* ones too. Currently, you would only know if a relation data file is missing when actually trying to read from it and no attempt is made to verify this on startup. This might be not a very useful check since if the file is not missing, but corrupted the check doesn't buy you much. (I am likely kicking a dead horse here.) Thank you. -- Alex [1] http://svana.org/kleptog/pgsql/pgfsck.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
Andres Freund writes: > > I've invested some more time in this: > 0002 now makes sense on its own and doesn't change anything around the > interrupt handling. Oh, and it compiles without 0003. In this patch, the endif appears to be misplaced in PostgresMain: + if (MyProcPort != NULL) + { +#ifdef WIN32 + pgwin32_noblock = true; +#else + if (!pg_set_noblock(MyProcPort->sock)) + ereport(COMMERROR, + (errmsg("could not set socket to nonblocking mode: %m"))); + } +#endif + pqinitmask(); > 0003 Sinval/notify processing got simplified further. There really isn't > any need for DisableNotifyInterrupt/DisableCatchupInterrupt > anymore. Also begin_client_read/client_read_ended don't make much > sense anymore. Instead introduce ProcessClientReadInterrupt (which > wants a better name). > There's also a very WIP > 0004 Allows secure_read/write be interrupted when ProcDiePending is > set. All of that happens via the latch mechanism, nothing happens > inside signal handlers. So I do think it's quite an improvement > over what's been discussed in this thread. > But it (and the other approaches) do noticeably increase the > likelihood of clients not getting the error message if the client > isn't actually dead. The likelihood of write() being blocked > *temporarily* due to normal bandwidth constraints is quite high > when you consider COPY FROM and similar. Right now we'll wait till > we can put the error message into the socket afaics. > > 1-3 need some serious comment work, but I think the approach is > basically sound. I'm much, much less sure about allowing send() to be > interrupted. After re-reading these I don't see the rest of items I wanted to inqury about anymore, so it just makes more sense now. One thing I did try is sending a NOTICE to the client when in ProcessInterrupts() and DoingCommandRead is true. I think[1] it was expected to be delivered instantly, but actually the client (psql) only displays it after sending the next statement. While I'm reading on FE/BE protocol someone might want to share his wisdom on this subject. My guess: psql blocks on readline/libedit call and can't effectively poll the server socket before complete input from user. -- Alex [1] http://www.postgresql.org/message-id/1262173040.19367.5015.camel@ebony ``AFAIK, NOTICE was suggested because it can be sent at any time, whereas ERRORs are only associated with statements.'' -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idle transaction cancel/timeout and SSL revisited
Andres Freund writes: > On 2014-11-15 00:11:36 +0300, Alex Shulgin wrote: >> After reading up through archives on the two $subj related TODO items >> I'm under impression that the patches[1,2] didn't make it mainly because >> of the risk of breaking SSL internals if we try to longjump out of the >> signal handler in the middle of a blocking SSL read and/or if we try to >> report that final "transaction/connection aborted" notice to the client. > > I've written a patch that allows that - check > http://archives.postgresql.org/message-id/20140927225421.GE5423%40alap3.anarazel.de > > I feel pretty confident that that's the way to go. I just need some time > to polish it. > >> What if instead of trying to escape from the signal handler we would set >> a flag and use it to simulate EOF after the recv() is restarted due to >> EINTR? From the backend perspective it should look like the client has >> just hang up. > > That's pretty much what the above does. Except that it actually deals > with blocking syscalls by not using them and relying on latches/select > instead. Yay, that's nice, because my EOF approach is sort of fishy. I've checked the patches: they apply and compile on top of current HEAD (one failed hunk in pqcomm.c), so I can help with testing if needed. There is a (short) list of items that caught my attention. I will post in reply to your patches thread. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idle transaction cancel/timeout and SSL revisited
Andres Freund writes: > > On 2014-11-15 00:11:36 +0300, Alex Shulgin wrote: >> After reading up through archives on the two $subj related TODO items >> I'm under impression that the patches[1,2] didn't make it mainly because >> of the risk of breaking SSL internals if we try to longjump out of the >> signal handler in the middle of a blocking SSL read and/or if we try to >> report that final "transaction/connection aborted" notice to the client. > > I've written a patch that allows that - check > http://archives.postgresql.org/message-id/20140927225421.GE5423%40alap3.anarazel.de > > I feel pretty confident that that's the way to go. I just need some time > to polish it. > >> What if instead of trying to escape from the signal handler we would set >> a flag and use it to simulate EOF after the recv() is restarted due to >> EINTR? From the backend perspective it should look like the client has >> just hang up. > > That's pretty much what the above does. Except that it actually deals > with blocking syscalls by not using them and relying on latches/select > instead. Neat, I must have missed it. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers