Re: [GENERAL] Rounding Double Precision or Numeric
This is documented in section 8.1.2 in the manual. ( https://www.postgresql.org/docs/9.6/static/datatype-numeric.html) NUMERIC rounds away from zero. IEEE 754 based data types (FLOAT, DOUBLE PRECISION) round to the closest even number. On Thu, Jun 1, 2017 at 6:26 PM, Louis Battuellowrote: > Is the round() function implemented differently for double precision than > for numeric? Forgive me if this exists somewhere in the documentation, but > I can't seem to find it. > > I've noticed with 9.6 on OSX, the .5 rounding is handled differently > between the types. (I haven't tested other versions, yet.) For double > precision values, even whole numbers are rounded down, yet for odds they > are rounded up. For numeric values, all .5 numbers are rounded up. > > psql (9.6.3) > Type "help" for help. > > postgres=# \x > Expanded display is on. > postgres=# select round(cast(1230.5 as double precision)) as > round_double_even_0 > postgres-# ,round(cast(1231.5 as double precision)) as > round_double_odd_1 > postgres-# ,round(cast(1232.5 as double precision)) as > round_double_even_2 > postgres-# ,round(cast(1233.5 as double precision)) as > round_double_odd_3 > postgres-# ,round(cast(1234.5 as double precision)) as > round_double_even_4 > postgres-# ,round(cast(1235.5 as double precision)) as > round_double_odd_5 > postgres-# ; > > -[ RECORD 1 ]---+- > round_double_even_0 | 1230 > round_double_odd_1 | 1232 > round_double_even_2 | 1232 > round_double_odd_3 | 1234 > round_double_even_4 | 1234 > round_double_odd_5 | 1236 > > postgres=# select round(cast(1230.5 as numeric)) as round_numeric_even_0 > postgres-# ,round(cast(1231.5 as numeric)) as round_numeric_odd_1 > postgres-# ,round(cast(1232.5 as numeric)) as round_numeric_even_2 > postgres-# ,round(cast(1233.5 as numeric)) as round_numeric_odd_3 > postgres-# ,round(cast(1234.5 as numeric)) as round_numeric_even_4 > postgres-# ,round(cast(1235.5 as numeric)) as round_numeric_odd_5 > postgres-# ; > > -[ RECORD 1 ]+- > round_numeric_even_0 | 1231 > round_numeric_odd_1 | 1232 > round_numeric_even_2 | 1233 > round_numeric_odd_3 | 1234 > round_numeric_even_4 | 1235 > round_numeric_odd_5 | 1236 > > postgres=# select round(1230.5) as round_even_0 > ,round(1231.5) as round_odd_1 > ,round(1232.5) as round_even_2 > ,round(1233.5) as round_odd_3 > ,round(1234.5) as round_even_4 > ,round(1235.5) as round_odd_5 > ; > > -[ RECORD 1 ]+- > round_even_0 | 1231 > round_odd_1 | 1232 > round_even_2 | 1233 > round_odd_3 | 1234 > round_even_4 | 1235 > round_odd_5 | 1236 > > postgres=# \q > > Why does the algorithm vary by data type? > > Or is something entirely different happening? > >
[GENERAL] How to assign default values to psql variables?
Hi, I have a psql script relying on variables passed in on the command line with -v. Is there any way to assign a default value in case the -v option is forgotten? Here is an example with pgtap: select diag('should be printed only in verbose mode') where :testverbose; This can be called with -v testverbose=true or -v testverbose=false. However, if the assignment is forgotten, it breaks. I'd like to behave it as if testverbose=false was passed. Thanks.
Re: [GENERAL] change type from NUMERIC(14,4) to NUMERIC(24,12)
I found that myself. But ... postgres=# create table x(n14_4 NUMERIC(14,4), n24_12 NUMERIC(24,12), n NUMERIC); CREATE TABLE postgres=# insert into x select i+.4, i+.12, i+.5234543 from generate_series(1,100) i; INSERT 0 100 postgres=# select * from x order by n limit 5; n14_4 | n24_12 | n ++--- 1.4000 | 1.1200 | 1.5234543 2.4000 | 2.1200 | 2.5234543 3.4000 | 3.1200 | 3.5234543 4.4000 | 4.1200 | 4.5234543 5.4000 | 5.1200 | 5.5234543 (5 rows) postgres=# \d x Table "tf.x" Column | Type | Modifiers ++--- n14_4 | numeric(14,4) | n24_12 | numeric(24,12) | n | numeric| postgres=# begin; alter table x alter column n14_4 type NUMERIC(24,12); select * from x order by n limit 5; abort; BEGIN ALTER TABLE n14_4 | n24_12 | n ++--- 1.4000 | 1.1200 | 1.5234543 2.4000 | 2.1200 | 2.5234543 3.4000 | 3.1200 | 3.5234543 4.4000 | 4.1200 | 4.5234543 5.4000 | 5.1200 | 5.5234543 (5 rows) ROLLBACK postgres=# \d x Table "tf.x" Column | Type | Modifiers ++--- n14_4 | numeric(14,4) | n24_12 | numeric(24,12) | n | numeric| postgres=# select * from x order by n limit 5; n14_4 | n24_12 | n ++--- 1.4000 | 1.1200 | 1.5234543 2.4000 | 2.1200 | 2.5234543 3.4000 | 3.1200 | 3.5234543 4.4000 | 4.1200 | 4.5234543 5.4000 | 5.1200 | 5.5234543 (5 rows) postgres=# begin; update pg_attribute set atttypmod=1572880 where attrelid='x'::regclass::oid and attname='n14_4'; select * from x order by n limit 5; BEGIN UPDATE 1 n14_4 | n24_12 | n ++--- 1.4000 | 1.1200 | 1.5234543 2.4000 | 2.1200 | 2.5234543 3.4000 | 3.1200 | 3.5234543 4.4000 | 4.1200 | 4.5234543 5.4000 | 5.1200 | 5.5234543 (5 rows) postgres=# \d x Table "tf.x" Column | Type | Modifiers ++--- n14_4 | numeric(24,12) | n24_12 | numeric(24,12) | n | numeric| postgres=# abort; ROLLBACK As you can see, after the ALTER TABLE command the n14_4 column is shown with 12 places after the dot. If I just update atttypmod, it's still only 4 places. Why is that so? I checked ctid. The ALTER TABLE version does not actually update the tuple. On Tue, Jan 24, 2017 at 11:48 AM, Albe Laurenz <laurenz.a...@wien.gv.at> wrote: > Torsten Förtsch wrote: > > we have a large table and want to change the type of one column from > NUMERIC(14,4) to NUMERIC(24,12). > > If the new type is just NUMERIC without any boundaries, the operation is > fast. If (24,12) is > > specified, it takes ages. > > > > I think it takes so long because the database wants to check that all > data in the table is compatible > > with the new type. But the old type has stricter boundaries both before > and after the dot. So, it is > > compatible. It has to be. > > > > Is there a way to change the type skipping the additional check? > > > > This is 9.6. > > If you don't mind doing something unsupported, you could just modify > the attribute metadata in the catalog: > > test=# CREATE TABLE a(x numeric(14,4)); > CREATE TABLE > test=# INSERT INTO a VALUES (1234567890.1234); > INSERT 0 1 > test=# UPDATE pg_attribute >SET atttypmod = atttypmod + (24 - 14) * 65536 + (12 - 4) >WHERE attrelid = 'a'::regclass AND attname = 'x'; > UPDATE 1 > test=# \d a > Table "public.a" > Column | Type | Modifiers > ++--- > x | numeric(24,12) | > > test=# SELECT * FROM a; > x > - > 1234567890.1234 > (1 row) > > Yours, > Laurenz Albe >
[GENERAL] change type from NUMERIC(14,4) to NUMERIC(24,12)
Hi, we have a large table and want to change the type of one column from NUMERIC(14,4) to NUMERIC(24,12). If the new type is just NUMERIC without any boundaries, the operation is fast. If (24,12) is specified, it takes ages. I think it takes so long because the database wants to check that all data in the table is compatible with the new type. But the old type has stricter boundaries both before and after the dot. So, it is compatible. It has to be. Is there a way to change the type skipping the additional check? This is 9.6. Thanks, Torsten
Re: [GENERAL] psql error (encoding related?)
This hex string decodes to something sensible: $ perl -le 'print pack "H*", shift' 246c69626469722f757466385f616e645f69736f383835395f31 $libdir/utf8_and_iso8859_1 Maybe it rings a bell. On Thu, Jan 5, 2017 at 7:57 PM, BRUSSER Michaelwrote: > I see this with PostgreSQL 9.4.7 and some 8.x versions running on Linux > Red Hat. > Older versions "supposedly" do not exhibit this behavior, but I didn't > check. > > $ psql > Password: > psql: FATAL: could not access file > "\x246c69626469722f757466385f616e645f69736f383835395f31": > No such file or directory > > The same error is written to the database log: > FATAL: could not access file > "\x246c69626469722f757466385f616e645f69736f383835395f31": > No such file or directory > > This is the environment where this problem was reported > $ locale > LANG=en_US > LC_CTYPE="en_US" > LC_NUMERIC="en_US" > LC_COLLATE="en_US" > . . . . . .. . . > > This is the "normal" environment where I don't see any problems launching > psql > $ locale > LANG=en_US.UTF-8 > LC_CTYPE="en_US.UTF-8" > LC_NUMERIC="en_US.UTF-8" > LC_COLLATE="en_US.UTF-8" > . . . . . .. . . > > The database was created using these parameters "initdb -E UTF8 > --locale=C -D ... " > > When I display the list of databases with \l I see that all of them have > these parameters: > EncodingUTF8 > CollateC > Ctype C > > I can work around this error by doing this "setenv PGCLIENTENCODING UTF-8" > > I would like to understand what happens... is this some kind of a bug or > just an inherent incompatibility between the client (psql) and the database > settings? > Is there a different and maybe a better remedy than using PGCLIENTENCODING? > > And if anyone from the Postgres team listening... in the old tradition of > whining I would add that the error message referring to a long hex string > is not helpful! > Thanks in advance. > Michael. > > This email and any attachments are intended solely for the use of the > individual or entity to whom it is addressed and may be confidential and/or > privileged. > > If you are not one of the named recipients or have received this email in > error, > > (i) you should not read, disclose, or copy it, > > (ii) please notify sender of your receipt by reply email and delete this > email and all attachments, > > (iii) Dassault Systemes does not accept or assume any liability or > responsibility for any use of or reliance on this email. > > For other languages, go to http://www.3ds.com/terms/email-disclaimer > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
[GENERAL] streaming replication protocol in Perl?
Hi, is there a perl module that allows to speak the streaming replication protocol? Can DBD::Pg do that anyhow? I think I could just pipe from pg_recvlogical. But would be cool to have it directly in Perl. Thanks, Torsten
Re: [GENERAL] Checking data checksums...
I use this: create extension pageinspect; SELECT count(*) AS pages_read FROM ( SELECT c.oid::regclass::text AS rel, f.fork, ser.i AS blocknr, page_header(get_raw_page(c.oid::regclass::text, f.fork, ser.i)) FROM pg_class c CROSS JOIN (values ('main'::text), ('fsm'::text), ('vm'::text)) f(fork) CROSS JOIN pg_relation_size(c.oid::regclass, f.fork) sz(sz) CROSS JOIN generate_series(0,(sz.sz/8192)::int-1) ser(i) WHERE sz.sz>0 ) t1; The idea is to read just everything. Since a select works only inside one database, this works only for that database. If you have multiple databases in a cluster, you need to run it in every one of them. Note this only works if your page size is the usual 8k. If you have compiled your postgres otherwise then change 8192 to whatever it is. Also, PG verifies the checksum when it reads a page from storage. So, this will miss pages that are present in shared_buffers. But assuming that they came there from storage in the first place, that should be good enough. Alternatives are something like pg_dumpall >/dev/null. This reads all data files but won't probably detect problems in indexes. Still it's a good idea to do once in a while to check toasted data for instance. On Fri, Dec 16, 2016 at 11:07 AM,wrote: > Hi, > > I enabled data checksums (initdb --data-checksums) on a new instance and > was wandering is there a command in the psql console, or from the linux > console, to force a checksum check on the entire cluster and get error > reports if it finds some corrupted pages. > > Regards, > Mladen Marinović > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] SQL query problem of a Quiz program
Did you try DISTINCT ON? postgres=# table x; id | qid | uid +-+ 1 | 25 | 1 2 | 25 | 1 3 | 25 | 1 4 | 26 | 1 5 | 26 | 1 6 | 27 | 1 7 | 27 | 1 8 | 25 | 2 9 | 25 | 2 10 | 25 | 2 11 | 26 | 2 12 | 26 | 2 13 | 27 | 2 14 | 27 | 2 15 | 25 | 16 | 25 | 17 | 25 | 18 | 26 | 19 | 26 | 20 | 27 | 21 | 27 | (21 rows) postgres=# select distinct on (qid, uid) * from x order by uid, qid, id desc; id | qid | uid +-+ 3 | 25 | 1 5 | 26 | 1 7 | 27 | 1 10 | 25 | 2 12 | 26 | 2 14 | 27 | 2 17 | 25 | 19 | 26 | 21 | 27 | (9 rows) Is that what you need? On Sat, Dec 17, 2016 at 4:25 PM, Arup Rakshitwrote: > Hi, > > Here is a sample data from table "quiz_results": > > id | question_id | user_id > +-+ > 2 | 25 | 5142670086 > 3 | 26 | > 4 | 26 | > 5 | 27 | > 6 | 25 | 5142670086 > 7 | 25 | 5142670086 > 8 | 25 | 5142670086 > 9 | 26 | > 10 | 40 | 5142670086 > 11 | 29 | 5142670086 > > > As you see above question id 25 appeared more than once. This is basically > a quiz result table where for users as they answered. question_id 25 always > the first questions. Any user can go though the quiz N number of time. So, > I want to find the last occurrence of the question_id 25 for any specific > user in the table, and select that and all answers the users gave after > this till the end of the quiz. Any idea how to solve it in a single > efficient query. My all try didn't work out. > > > > > Regards, > Arup Rakshit > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] vacuum freeze in 96
On Wed, Dec 14, 2016 at 5:59 AM, Michael Paquier <michael.paqu...@gmail.com> wrote: > On Wed, Dec 14, 2016 at 5:00 AM, Torsten Förtsch <tfoertsch...@gmail.com> > wrote: > > one of the major enhancements in 96 is skipping completely frozen pages > in > > vacuum freeze. I assume that requires a special bit on the page. > > The freeze map uses an additional bit in the vm, and pg_upgrade would > take care of the conversion. > Thanks. That makes sense.
[GENERAL] vacuum freeze in 96
Hi, one of the major enhancements in 96 is skipping completely frozen pages in vacuum freeze. I assume that requires a special bit on the page. If I upgrade from 93 using pg_upgrade, that is re-using the data files, can it still do that? Or do I have to recreate the table? Thanks, Torsten
Re: [GENERAL] is this a known bug in 9.6?
Thanks Tom On Tue, Dec 13, 2016 at 7:22 PM, Tom Lanewrote: > I wrote: > > Yup, sure looks like a bug to me, especially since it seems to work as > > expected before 9.5. No idea offhand what broke it. > > The answer is, I broke it, through some ill-advised neatnik-ism :-(, > ie clearing a field I thought would be unused but it wasn't. > > Fix pushed. Thanks for the report! > > regards, tom lane >
Re: [GENERAL] is this a known bug in 9.6?
Thanks for confirming. Here are a few more examples that also work: with i(x) as (values (1::int)) select x from (select x from i union all select 3::int) b order by x desc limit 1; with i(x) as (values (1::int)) select max(x) from (select x from i union select 3::int) b; It also works with EXCEPT or INTERSECT, both with or without ALL. The UNION ALL version fails with MIN and MAX but it works with all other aggregates that I have tested. On Tue, Dec 13, 2016 at 4:31 PM, Tom Lanewrote: > =?UTF-8?Q?Torsten_F=C3=B6rtsch?= writes: > > postgres=# with i(x) as (values (1::int)), j(y) as (values (2::int)) > select > > x from (select x from i union all select y from j) b; > > x > > --- > > 1 > > 2 > > (2 rows) > > > postgres=# with i(x) as (values (1::int)), j(y) as (values (2::int)) > select > > max(x) from (select x from i union all select y from j) b; > > ERROR: could not find plan for CTE "i" > > Yup, sure looks like a bug to me, especially since it seems to work as > expected before 9.5. No idea offhand what broke it. > > regards, tom lane >
[GENERAL] is this a known bug in 9.6?
Hi, this is a stock PGDG 9.6: postgres=# with i(x) as (values (1::int)), j(y) as (values (2::int)) select x from (select x from i union all select y from j) b; x --- 1 2 (2 rows) postgres=# with i(x) as (values (1::int)), j(y) as (values (2::int)) select max(x) from (select x from i union all select y from j) b; ERROR: could not find plan for CTE "i" The same on 9.5: postgres=# with i(x) as (values (1::int)), j(y) as (values (2::int)) select x from (select x from i union all select y from j) b; x --- 1 2 (2 rows) postgres=# with i(x) as (values (1::int)), j(y) as (values (2::int)) select max(x) from (select x from i union all select y from j) b; max - 2 (1 row) Is this a bug or is my assumption that this should work wrong? Both the aggregate and the UNION are required to trigger the bug: postgres=# with i(x) as (values (1::int)) select x from (select x from i union all select 3::int) b; x --- 1 3 (2 rows) postgres=# with i(x) as (values (1::int)) select max(x) from (select x from i) b; max - 1 (1 row) postgres=# with i(x) as (values (1::int)) select max(x) from (select x from i union all select 3::int) b; ERROR: could not find plan for CTE "i" Thanks, Torsten
Re: [GENERAL] WAL
On Mon, Dec 12, 2016 at 12:37 PM, Albe Laurenz <laurenz.a...@wien.gv.at> wrote: > Torsten Förtsch wrote: > > if I do something like this: > > > > BEGIN; > > UPDATE tbl SET data='something' WHERE pkey='selector'; > > UPDATE tbl SET data=NULL WHERE pkey='selector'; > > COMMIT; > > > > Given 'selector' actually exists, I get a separate WAL entry for each of > the updates. My question is, > > does the first update actually hit the data file? > > It should, yes. > > > If I am only interested in the first update hitting the WAL, does it > make sense to do something like > > the above in a transaction? Would that help to keep the table small in a > high concurrency situation? > > The table itself has a small fillfactor. So, in most cases there should > be enough space to do a HOT > > update. For that HOT update, is that second update setting data to NULL > beneficial or rather adverse? > > How could the second update *not* be WAL logged? > > Maybe you could explain what you are trying to achieve. > I am working on a logical decoder plugin, something similar to wal2json. I don't need 'something' to be permanently in the database. I just need it in the WAL stream. So, I was thinking that maybe keeping data=NULL permanently in the table might help to keep the table small. This table has only about 500 rows in my current case. But it is updated very often by different processes. By now I have tried it out for short strings of 'something'. The size of tbl remains fixed and the same with or without the data=NULL update. Also, I ran both version each 3 times for 200 transactions. The run time difference is marginal. In the final version 'something' will be a string up to 1kb, I think. Torsten
[GENERAL] WAL
Hi, if I do something like this: BEGIN; UPDATE tbl SET data='something' WHERE pkey='selector'; UPDATE tbl SET data=NULL WHERE pkey='selector'; COMMIT; Given 'selector' actually exists, I get a separate WAL entry for each of the updates. My question is, does the first update actually hit the data file? If I am only interested in the first update hitting the WAL, does it make sense to do something like the above in a transaction? Would that help to keep the table small in a high concurrency situation? The table itself has a small fillfactor. So, in most cases there should be enough space to do a HOT update. For that HOT update, is that second update setting data to NULL beneficial or rather adverse? Thanks, Torsten
Re: [GENERAL] logical decoding output plugin
On Sun, Dec 11, 2016 at 12:10 AM, Andres Freund <and...@anarazel.de> wrote: > On 2016-12-10 22:34:02 +0100, Torsten Förtsch wrote: > > I am working on a logical decoding output plugin. I am only interested in > > the changes in one particular table. > > > > One way to check for the table would be to compare class_form->relname > > and get_namespace_name(class_form->relnamespace). But I'd much prefer to > > just compare the OID of the table. > > > > Is there a cheap way to do something like "SELECT > > 'schema.table'::regclass::oid" that can be called in the startup > callback? > > You shouldn't do it in the startup callback (no catalog access there), > but you can do it the first time through the change callback. > > For lookups the most complete lookup is to use RangeVarGetRelid() to get > the oid. There's other variants, but that's the easiest approach. > > Do you have to care about the table being renamed? > Thanks, Andres. That would be nice to have but not really necessary, at least not at the beginning. Is it possible to catch CREATE TABLE and ALTER TABLE RENAME in 9.5? Thanks, Torsten
[GENERAL] logical decoding output plugin
Hi, I am working on a logical decoding output plugin. I am only interested in the changes in one particular table. One way to check for the table would be to compare class_form->relname and get_namespace_name(class_form->relnamespace). But I'd much prefer to just compare the OID of the table. Is there a cheap way to do something like "SELECT 'schema.table'::regclass::oid" that can be called in the startup callback? Thanks, Torsten
Re: [GENERAL] high transaction rate
Think I found it. classid 1262 is pg_database and I seem to remember that NOTIFY takes that lock. I dropped pg_notify from my function and got immediately >3500 tx/sec. On Wed, Dec 7, 2016 at 11:31 AM, Torsten Förtsch <tfoertsch...@gmail.com> wrote: > On Wed, Dec 7, 2016 at 11:21 AM, Torsten Förtsch <tfoertsch...@gmail.com> > wrote: > >> Hi, >> >> I need to tune my database for a high update rate of a single small >> table. A little simplified it looks like this: >> >> CREATE TABLE temp_agg( >> topic TEXT PRIMARY KEY, >> tstmp TIMESTAMP, >> cnt BIGINT, >> sum NUMERIC >> ) >> >> The table has 500 rows. >> >> A transaction looks simplified like this: >> >> 1) select * from temp_agg where topic=$1 for update >> >> 2) if date_trunc('second', tstmp)=date_trunc('second', $3) then: >> 2a) update temp_agg set cnt=cnt+1, sum=sum+$2 where topic=$1 >> 2b) set local synchronous_commit to off >> >> 3) if a new second has started: >> 3a) insert into other_table select * from temp_agg where topic=$1 >> 3b) update temp_agg set tstmp=date_trunc('second', $3) cnt=1, sum=$2 >> where topic=$1 >> 3c) emit a notification (pg_notify(...)) with the new data >> >> sorry, hit the wrong button. > > As a first test my program starts 10 writers each of which serves 50 > topics. The timestamps are generated in a way that on average 10 timestamps > per second per topic are produced. That means on average the 2) branch is > hit 10 times more often. > > These 10 writers then flood the database. At first I see a really good > transaction rate of more than 6500 tx/sec. But after a while it suddenly > drops to less than 1/10 of that. I also monitored the number of processes > waiting for locks. As can be seen in this picture, this is a locking > problem. > > > Here is a snapshot of not granted locks: > > locktype | database | relation | page | tuple | virtualxid | > transactionid | classid | objid | objsubid | virtualtransaction | pid | >mode | granted | fastpath > --+--+--+++- > ---+---+-+---+--+--- > -+---+-+-+-- > object |0 || | | | > |1262 | 0 |0 | 15/6381185 | 11468 | > AccessExclusiveLock | f | f > object |0 || | | | > |1262 | 0 |0 | 13/6375058 | 11465 | > AccessExclusiveLock | f | f > object |0 || | | | > |1262 | 0 |0 | 9/6373397 | 11463 | > AccessExclusiveLock | f | f > object |0 || | | | > |1262 | 0 |0 | 11/6380027 | 11464 | > AccessExclusiveLock | f | f > object |0 || | | | > |1262 | 0 |0 | 3/447139 | 11133 | > AccessExclusiveLock | f | f > object |0 || | | | > |1262 | 0 |0 | 7/6375244 | 11461 | > AccessExclusiveLock | f | f > object |0 || | | | > |1262 | 0 |0 | 16/6381714 | 11467 | > AccessExclusiveLock | f | f > object |0 || | | | > |1262 | 0 |0 | 10/6380578 | 11460 | > AccessExclusiveLock | f | f > object |0 || | | | > |1262 | 0 |0 | 8/6374490 | 11459 | > AccessExclusiveLock | f | f > object |0 || | | | > |1262 | 0 |0 | 12/6377255 | 11462 | > AccessExclusiveLock | f | f > > What are these object locks here? How can I prevent contention? > > This is pgdg postgres 9.5 > > Thanks, > Torsten >
Re: [GENERAL] high transaction rate
On Wed, Dec 7, 2016 at 11:21 AM, Torsten Förtsch <tfoertsch...@gmail.com> wrote: > Hi, > > I need to tune my database for a high update rate of a single small table. > A little simplified it looks like this: > > CREATE TABLE temp_agg( > topic TEXT PRIMARY KEY, > tstmp TIMESTAMP, > cnt BIGINT, > sum NUMERIC > ) > > The table has 500 rows. > > A transaction looks simplified like this: > > 1) select * from temp_agg where topic=$1 for update > > 2) if date_trunc('second', tstmp)=date_trunc('second', $3) then: > 2a) update temp_agg set cnt=cnt+1, sum=sum+$2 where topic=$1 > 2b) set local synchronous_commit to off > > 3) if a new second has started: > 3a) insert into other_table select * from temp_agg where topic=$1 > 3b) update temp_agg set tstmp=date_trunc('second', $3) cnt=1, sum=$2 where > topic=$1 > 3c) emit a notification (pg_notify(...)) with the new data > > sorry, hit the wrong button. As a first test my program starts 10 writers each of which serves 50 topics. The timestamps are generated in a way that on average 10 timestamps per second per topic are produced. That means on average the 2) branch is hit 10 times more often. These 10 writers then flood the database. At first I see a really good transaction rate of more than 6500 tx/sec. But after a while it suddenly drops to less than 1/10 of that. I also monitored the number of processes waiting for locks. As can be seen in this picture, this is a locking problem. Here is a snapshot of not granted locks: locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath --+--+--++++---+-+---+--++---+-+-+-- object |0 || | | | |1262 | 0 |0 | 15/6381185 | 11468 | AccessExclusiveLock | f | f object |0 || | | | |1262 | 0 |0 | 13/6375058 | 11465 | AccessExclusiveLock | f | f object |0 || | | | |1262 | 0 |0 | 9/6373397 | 11463 | AccessExclusiveLock | f | f object |0 || | | | |1262 | 0 |0 | 11/6380027 | 11464 | AccessExclusiveLock | f | f object |0 || | | | |1262 | 0 |0 | 3/447139 | 11133 | AccessExclusiveLock | f | f object |0 || | | | |1262 | 0 |0 | 7/6375244 | 11461 | AccessExclusiveLock | f | f object |0 || | | | |1262 | 0 |0 | 16/6381714 | 11467 | AccessExclusiveLock | f | f object |0 || | | | |1262 | 0 |0 | 10/6380578 | 11460 | AccessExclusiveLock | f | f object |0 || | | | |1262 | 0 |0 | 8/6374490 | 11459 | AccessExclusiveLock | f | f object |0 || | | | |1262 | 0 |0 | 12/6377255 | 11462 | AccessExclusiveLock | f | f What are these object locks here? How can I prevent contention? This is pgdg postgres 9.5 Thanks, Torsten
[GENERAL] high transaction rate
Hi, I need to tune my database for a high update rate of a single small table. A little simplified it looks like this: CREATE TABLE temp_agg( topic TEXT PRIMARY KEY, tstmp TIMESTAMP, cnt BIGINT, sum NUMERIC ) The table has 500 rows. A transaction looks simplified like this: 1) select * from temp_agg where topic=$1 for update 2) if date_trunc('second', tstmp)=date_trunc('second', $3) then: 2a) update temp_agg set cnt=cnt+1, sum=sum+$2 where topic=$1 2b) set local synchronous_commit to off 3) if a new second has started: 3a) insert into other_table select * from temp_agg where topic=$1 3b) update temp_agg set tstmp=date_trunc('second', $3) cnt=1, sum=$2 where topic=$1 3c) emit a notification (pg_notify(...)) with the new data As a first test my program starts 10 writers each of which serves 50 topics. The timestamps are generated in a way that on average 10 timestamps per second per topic a
[GENERAL] hot_standby_feedback
Hi, I am in the process of reviewing our configs for a number of 9.3 databases and found a replica with hot_standby_feedback=on. I remember when we set it long ago we were fighting cancelled queries. I also remember that it never really worked for us. In the end we set up 2 replicas, one suitable for short queries where we prefer low replication lag, and another one where we allow for long running queries but sacrifice timeliness (max_standby_*_delay=-1). I have a hunch why hot_standby_feedback=on didn't work. But I never verified it. So, here it is. The key is this sentence: "Feedback messages will not be sent more frequently than once per wal_receiver_status_interval." That interval is 10 sec. So, assuming a transaction on the replica uses a row right after the message has been sent. Then there is a 10 sec window in which the master cannot know that the row is needed on the replica and can vacuum it. If then the transaction on the replica takes longer than max_standby_*_delay, the only option is to cancel it. Is that explanation correct? What is the correct way to use hot_standby_feedback to prevent cancellations reliably? (and accepting the bloat) Thanks, Torsten
Re: [GENERAL] min/max_wal_size
On Tue, Nov 22, 2016 at 10:34 PM, Adrian Klaver <adrian.kla...@aklaver.com> wrote: > On 11/22/2016 12:51 PM, Torsten Förtsch wrote: > >> >> Now, I have a database with very low write activity. Archive_command is >> called about once per hour to archive one segment. When the database was >> moved to PG 9.5, it was initially configured with insanely high settings >> for max_wal_size, min_wal_size and wal_keep_segments. I reset >> min/max_wal_size to the default settings of 80MB and 1GB and reduced >> wal_keep_segments to 150. >> >> I am seeing in pg_xlog the WAL segments from >> >> -rw--- 1 postgres postgres 16777216 Nov 17 04:01 >> pg_xlog/0001000400F9 >> ... >> -rw--- 1 postgres postgres 16777216 Nov 22 20:00 >> pg_xlog/00010005008E >> -rw--- 1 postgres postgres 16777216 Nov 22 20:19 >> pg_xlog/00010005008F >> -rw--- 1 postgres postgres 16777216 Nov 15 07:50 >> pg_xlog/000100050090 >> ... >> -rw--- 1 postgres postgres 16777216 Nov 15 07:52 >> pg_xlog/000100060017 >> >> As you can see, the files from 1/4/F9 to 1/5/8E are old. That are 150 >> files which matches exactly wal_keep_segments. If I understand >> correctly, the file 1/5/8F is currently written. Further, the files from >> 1/5/90 to 1/6/17 seem to be old WAL files that have been renamed to be >> reused in the future. Their count is 136. >> >> Why does a database that generates a little more than 1 WAL file per >> hour and has a checkpoint_timeout of 30 minutes with a >> completion_target=0.7 need so many of them? The default value for >> min_wal_size is 80MB which amounts to 5 segments. That should be totally >> enough for this database. >> >> Is this because of the previously insanely high setting (min=1GB, >> max=9GB)? Should I expect this value to drop in a week's time? Or is >> there anything that I am not aware of? >> > > Are you talking about the recycled files? Yes, I was talking about recycled files and I think the suspicion that the number of recycled files will drop over time to min_wal_size is correct. Over night the number of recycled files dropped to 123 and according to the log, many checkpoints have removed a file and none was added.
[GENERAL] min/max_wal_size
Hi, I am a bit confused about min_wal_size and max_wal_size. Previously, there was this formula to estimate the max number of WAL segment files in pg_xlog/ (https://www.postgresql.org/docs/9.4/static/wal-configuration.html ): (2 + checkpoint_completion_target) * checkpoint_segments + 1 or checkpoint_segments + wal_keep_segments + 1 I don't exactly know what the operation "or" means. Before writing this email I always thought of wal_keep_segments as a parameter that configures how many segments to keep that would otherwise be deleted and checkpoint_segments as the number of WAL files the database is allowed to work with within a checkpoint_timeout interval. The formula above makes more or less sense. The database is allowed to write one set of WAL files during the checkpoint interval. While performing the checkpoint it needs the previous set of WAL files. I don't know where that checkpoint_completion_target comes in. But I trust the wisdom of the author of the documentation. Now, I have a database with very low write activity. Archive_command is called about once per hour to archive one segment. When the database was moved to PG 9.5, it was initially configured with insanely high settings for max_wal_size, min_wal_size and wal_keep_segments. I reset min/max_wal_size to the default settings of 80MB and 1GB and reduced wal_keep_segments to 150. I am seeing in pg_xlog the WAL segments from -rw--- 1 postgres postgres 16777216 Nov 17 04:01 pg_xlog/0001000400F9 ... -rw--- 1 postgres postgres 16777216 Nov 22 20:00 pg_xlog/00010005008E -rw--- 1 postgres postgres 16777216 Nov 22 20:19 pg_xlog/00010005008F -rw--- 1 postgres postgres 16777216 Nov 15 07:50 pg_xlog/000100050090 ... -rw--- 1 postgres postgres 16777216 Nov 15 07:52 pg_xlog/000100060017 As you can see, the files from 1/4/F9 to 1/5/8E are old. That are 150 files which matches exactly wal_keep_segments. If I understand correctly, the file 1/5/8F is currently written. Further, the files from 1/5/90 to 1/6/17 seem to be old WAL files that have been renamed to be reused in the future. Their count is 136. Why does a database that generates a little more than 1 WAL file per hour and has a checkpoint_timeout of 30 minutes with a completion_target=0.7 need so many of them? The default value for min_wal_size is 80MB which amounts to 5 segments. That should be totally enough for this database. Is this because of the previously insanely high setting (min=1GB, max=9GB)? Should I expect this value to drop in a week's time? Or is there anything that I am not aware of? Thanks, Torsten
Re: [GENERAL] Pgbouncer
On 30/11/15 22:50, Steve Crawford wrote: > Do you have any clients connected that are idle in transaction? No, the backend database is a streaming replica. Also, as I said, the backend process is in "idle" state not "idle in transaction". backend_start| 2015-11-30 18:58:30.639962+00 xact_start | query_start | 2015-11-30 20:56:36.479003+00 state_change | 2015-11-30 20:56:36.480199+00 waiting | f state| idle -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Pgbouncer
Hi, I am not sure if this is the right place to ask this question. If not, please point me to it. I am trying out the new pgbouncer (latest git). "SHOW SERVERS" is telling me 2 connections in "active" state. Both show connect_time=2015-11-30 18:58:30. Request_time is 19:01:35 and 20:56:36. Both use pool_mode=transaction. Server_lifetime=600. And now is 21:38:55. Shouldn't those connections have been closed long ago? I also checked on the backend. The backend processes are there, are idle and query_start and state_change times match the request_times above. Also, since pool_mode=transaction, why are these connections "active"? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] postgres function
On 15/10/15 14:32, Ramesh T wrote: > select position('-' in '123-987-123') > position > --- > 4 > But I want second occurrence, > position > - > 8 > > plz any help..? For instance: # select char_length(substring('123-987-123' from '^[^-]*-[^-]*-')); char_length - 8 Best, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] xmin horizon?
On 29/07/15 21:13, CS DBA wrote: The documentation for pg_stat_activity lists this column: backend_xmin xid The current backend's xmin horizon. Can someone point me to a better understanding on xmin horizon? https://momjian.us/main/writings/pgsql/mvcc.pdf you can find this talk also on youtube. It's worth watching. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Question about timelines
Hi, we have a complex structure of streaming replication (PG 9.3) like: master -- replica1 | +- replica2 -- replica21 | +-- replica22 -- replica221 Now I want to retire master and make replica2 the new master: +-- replica1 | replica2 -- replica21 | +-- replica22 -- replica221 replica2 is currently a synchronous replica. If I promote replica2 a new timeline is created. Hence, I have to instruct all other replicas to follow that new timeline (recovery_target_timeline = 'latest' in recovery.conf). On the other hand, since replica2 is synchronous it should have all transactions when master is shut down. So, I am thinking do I really need a new timeline? Can't I just remove recovery.conf on replica2 and restart it as a master. The only thing I then have to do is to point replica1 to it. Is that a way to go? If promoting is better than why? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_xlog Concern
On 18/05/15 13:44, Sachin Srivastava wrote: But currently my pg_xlog size is 60 GB and there are 3740 WAL file in this folder and in Last week this was 400 GB(pg_xlog folder) and WAL file were approx. 3. Due to archiving pg_xlog folder size is decreasing now but it’s taking one week to come in normal size. Any chance you have unfinished transactions running for a week? pg_stat_activity should be able to tell you that. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Locking question
Hi, given a query like this: select * from account a cross join lateral ( select rate from exchange where target='USD' and source=a.currency order by date desc limit 1) e where a.id=19 for update; If I understand the documentation correctly, both rows, the one from exchange and the one from account are locked, right? In fact, I have tried it. This query blocks (currency is 'AUD' for account #19): select * from exchange where target='USD' and source='AUD' order by date desc limit 1 for update; However, if I create a SQL function like this: CREATE OR REPLACE FUNCTION exchangetousd_rate( cur CHAR(3), tm TIMESTAMP DEFAULT now() ) RETURNS TABLE(rate NUMERIC) AS $def$ SELECT rate FROM exchange WHERE source = $1 AND target = 'USD' AND date = $2::TIMESTAMP ORDER BY date DESC LIMIT 1 $def$ LANGUAGE sql STABLE; and use it here: select * from account a cross join exchangeToUSD_rate(a.currency) e where a.id=19 for update; Then the 2nd query above does not block. So, the row from the exchange table is not locked. Is that documented somewhere? Can I rely on it? The plan for the last query tells me the function call is inlined. So, in principle it's not different from the first one. Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 9.3: bug related to json
On 25/02/15 07:22, David G Johnston wrote: I'm doubting you intended to join a bunch of commas using the field value as the delimiter...methinks your got the argument order reversed for string_agg. OMG, I am so stupid. Thanks. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 9.3: bug related to json
On 25/02/15 07:34, David G Johnston wrote: Torsten Förtsch wrote Is there anything similar for JSON scalars? IDK, but have you tried ::text? yes. Here is the difference select * from (values (('{a:b}'::json - 'a')::text), ('{a:b}'::json - 'a')) t; column1 - b b Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] 9.3: bug related to json
Hi, I think I found a json related bug in 9.3. Given this query: select * from json_array_elements('[{s:[{x:1},{x:2},{x:5}]}, {s:[{x:3},{x:4},{x:6}]}]') t(el) cross join lateral ( select syms.sym - 'x' as x from json_array_elements(t.el - 's') syms(sym) ) s; It gives me this table: el | x ---+--- {s:[{x:1},{x:2},{x:5}]} | 1 {s:[{x:1},{x:2},{x:5}]} | 2 {s:[{x:1},{x:2},{x:5}]} | 5 {s:[{x:3},{x:4},{x:6}]} | 3 {s:[{x:3},{x:4},{x:6}]} | 4 {s:[{x:3},{x:4},{x:6}]} | 6 (6 rows) So far so good. Now I want to aggregate all the x's: select * from json_array_elements('[{s:[{x:1},{x:2},{x:5}]}, {s:[{x:3},{x:4},{x:6}]}]') t(el) cross join lateral ( select array_agg(syms.sym - 'x') as xx from json_array_elements(t.el - 's') syms(sym) ) s; el | xx ---+- {s:[{x:1},{x:2},{x:5}]} | {1,2,5} {s:[{x:3},{x:4},{x:6}]} | {3,4,6} (2 rows) Still works. But if I want to string_agg them, I get this: select * from json_array_elements('[{s:[{x:1},{x:2},{x:5}]}, {s:[{x:3},{x:4},{x:6}]}]') t(el) cross join lateral ( select string_agg(', ', syms.sym - 'x') as xx from json_array_elements(t.el - 's') syms(sym) ) s; el |xx ---+-- {s:[{x:1},{x:2},{x:5}]} | , 2, 5, {s:[{x:3},{x:4},{x:6}]} | , 4, 6, (2 rows) Note, the first element of the resulting string is always missing. If the xx is first aggregated as array and then converted to a string, it works as expected: select * from json_array_elements('[{s:[{x:1},{x:2},{x:5}]}, {s:[{x:3},{x:4},{x:6}]}]') t(el) cross join lateral ( select array_to_string(array_agg(syms.sym - 'x'), ', ') as xx from json_array_elements(t.el - 's') syms(sym) ) s; el | xx ---+- {s:[{x:1},{x:2},{x:5}]} | 1, 2, 5 {s:[{x:3},{x:4},{x:6}]} | 3, 4, 6 (2 rows) One more question. Originally, my JSON data looked like this: select * from json_array_elements('[{s:[1,2,5]}, {s:[3,4,6]}]') t(el) cross join lateral ( select syms.sym as x -- problem from json_array_elements(t.el - 's') syms(sym) ) s; el | x -+- {s:[1,2,5]} | 1 {s:[1,2,5]} | 2 {s:[1,2,5]} | 5 {s:[3,4,6]} | 3 {s:[3,4,6]} | 4 {s:[3,4,6]} | 6 (6 rows) The syms.sym field in the x column is a JSON scalar. How do I convert that to simple TEXT? For JSON objects there is the - operator. Is there anything similar for JSON scalars? Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PG user group in the Kuala Lumpur area?
Hi, I was looking for a PG user group around KL. I know there is one in Singapore. As it happens, Chris Travers, the PG contact for Malaysia is a friend of mine. So, I asked him. He wasn't aware of one either. However, he very much appreciated the idea of founding one. I know there are lots of PG users in the area. But is there enough demand for a user group? If you are interested, please contact me. My idea behind this whole thing is to eventually have a regular PG conference South East Asia. I have been to PGconf.eu several times and I know from experience that it is a great opportunity to learn new stuff, meet people and also have much fun. I think esp. Malaysia is a good place for such an event. There are many people out there that could never come to PGconf.eu or similar in the US and in many other places because of their passport. Getting a visa to Malaysia is possible for almost everyone. I don't know about North Korea, but there are many Iranians around here. About myself, I am German, currently traveling back and forth between Germany and Malaysia. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Planet Postgres
On 31/10/14 09:38, Magnus Hagander wrote: I think in this particular case, much of the blame can be placed on everybody being at pgconf.eu last week, and struggling under backlogs. But as Stephen says, the backlog will eventually get worked through, and the moderation happen. I see. No problem, I'll just wait. BTW, the reason or rather the last trigger to start the blog was pgconf.eu. Thanks for organizing it. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Planet Postgres
Hi, I don't know if this is the right mailing list for my question but I have already emailed pla...@postgresql.org without any response. What is the correct way to get a blog listed on planet postgres? The documented way does not work. I registered my blog on Monday. It still is in Not approved yet state. Yesterday I asked planet@... no response. So, what did I wrong? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] some queries on standby preventing replication updates
On 28/10/14 19:37, Joe Van Dyk wrote: On Mon, Oct 27, 2014 at 6:22 AM, Emanuel Calvo emanuel.ca...@2ndquadrant.com mailto:emanuel.ca...@2ndquadrant.com wrote: El 23/10/14 a las 17:40, Joe Van Dyk escibió: Hi, I have a master and a slave database. I've got hot_standby_feedback turned on, max_standby_streaming_delay=-1. I've configured the master and slave to keep a few days of WALs around. I've noticed that when some large queries are run on the standby machine (ones that take more than a minute or so), replication updates are paused. Is there a way to fix this? You may need to set a value on max_standby_streaming_delay, which controls the time before cancelling the standby queries when a conflict occurs on a wal-records-about-to-be-applied. Source: http://www.postgresql.org/docs/9.3/static/runtime-config-replication.html I'm using -1 for that option, would using something different be better? It depends on what you want to achieve. If you want to sacrifice your long-running query to keep replication going, set the value to 0. If you (like me) are using the slave to run analytical queries that can take many hours or even days, I'd rather live with the current behaviour. When the long-running query is over the wal receiver automatically reconnects to the master. The only thing you should make sure is to keep enough wal segments. With 9.4 even that gets easier. There you can assign a replication slot to the replica and the master then knows which segments are still needed when the slave reconnects. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pg_stat_replication in 9.3
Hi, I noticed a strange behaviour regarding pg_stat_replication in 9.3. If called from psql using the \watch command, I see all my replicas. From time to time one of them drops out and reconnects in a short period of time, typically ~30 sec. If I use the same select in plpgsql like: FOR r in SELECT application_name, client_addr, flush_location, clock_timestamp() AS lmd FROM pg_stat_replication ORDER BY application_name, client_addr LOOP RAISE NOTICE 'aname=%, ca=%, lmd=%, loc=%, cur=%, lag=%', r.application_name, r.client_addr, r.lmd, r.flush_location, pg_current_xlog_location(), pg_size_pretty( pg_xlog_location_diff( pg_current_xlog_location(), r.flush_location ) ); END LOOP; I see one of the replicas dropping out but never coming back again while in a parallel session using psql and \watch it indeed does come back. Is that intended? Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_stat_replication in 9.3
On 14/09/14 16:24, Andy Colson wrote: I wonder if its a transaction thing? Maybe \watch is using a transaction for each (or isn't using transactions at all), whereas the plpgsql is one long transaction? Also if one of your replicas is far away, it doesn't really surprise me that it might loose connection every once and a while. On the other hand, if the box is on the same subnet, right next to the master, and it was loosing connection, that would be a bad thing. So, how far away is the replica? And does 'ps ax|grep postgr' show 'idle' or 'idle in transaction' on the \watch and the plpgsql? The replicas are far away, intercontinental far. I am not complaining that the replica looses the connection. What makes me wonder is that within a transaction, pg_stat_replication can forget rows but cannot acquire new ones. I'd think it should be either report the state at the beginning of the transaction like now() or the current state like clock_timestamp(). But currently it's reporting half the current state. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_stat_replication in 9.3
On 14/09/14 18:55, Tom Lane wrote: Are you watching the state in a loop inside a single plpgsql function? If so, I wonder whether the problem is that the plpgsql function's snapshot isn't changing. From memory, marking the function VOLATILE would help if that's the issue. The function is VOLATILE. I attached 2 versions of it. fn-old.sql does not work because once a slave has disconnected it drops out and does not come back. fn.sql uses dblink to work around the problem. But it consumes 2 db connections. The intent of the function is to be called between operations that may cause slaves to lag behind. If the lag is below a certain limit, it simply returns. Otherwise, it waits until the lag drops below a second limit. If it were a VOLATILE problem, the functions would not be able to see when a slave drops out nor changes in the data. But it does see these changes. Only when a slave comes back online, it is not seen in the current transaction. Torsten CREATE OR REPLACE FUNCTION wait_for_streaming_lag(low_water_mark BIGINT DEFAULT 100, high_water_mark BIGINT DEFAULT 2000, tmout INTERVAL DEFAULT '4h') RETURNS BIGINT AS $def$ DECLARE r RECORD; water_mark BIGINT; BEGIN SET LOCAL client_min_messages TO ERROR; CREATE TEMP TABLE IF NOT EXISTS lag ( gen INT, application_name TEXT, client_addr INET, flush_location TEXT, lmd TIMESTAMP ); SET LOCAL client_min_messages TO NOTICE; water_mark := $2; -- use high_water_mark for the first loop LOOP WITH g AS (SELECT max(gen) AS gen FROM lag), r AS (SELECT 1 AS ord, application_name, client_addr, flush_location, clock_timestamp() AS lmd FROM pg_stat_replication UNION ALL SELECT 2 AS ord, application_name, client_addr, flush_location, lmd FROM lag) INSERT INTO lag SELECT coalesce(g.gen+1, 1), rx.* FROM (SELECT DISTINCT ON (application_name, client_addr) application_name, client_addr, flush_location, lmd FROM r ORDER BY application_name, client_addr, ord ASC, pg_xlog_location_diff(flush_location, '0/0') ASC) rx CROSS JOIN g; DELETE FROM lag WHERE gen(SELECT max(gen) FROM lag); DELETE FROM lag WHERE lmdclock_timestamp() - '5min'::INTERVAL; SELECT INTO r coalesce(max(pg_xlog_location_diff(pg_current_xlog_location(), flush_location)), 0) AS lag, clock_timestamp()-now() AS tm FROM lag; EXIT WHEN r.lag = water_mark; IF r.tm$3 THEN RAISE EXCEPTION USING MESSAGE='Timeout while waiting for streaming lag to drop below ' || $1, ERRCODE='TF001'; END IF; water_mark := $1; PERFORM pg_sleep(1); END LOOP; RETURN r.lag; END; $def$ LANGUAGE plpgsql VOLATILE SECURITY invoker; BEGIN; CREATE OR REPLACE FUNCTION wait_for_streaming_lag(low_water_mark BIGINT DEFAULT 100, high_water_mark BIGINT DEFAULT 2000, tmout INTERVAL DEFAULT '4h') RETURNS BIGINT AS $def$ DECLARE r RECORD; water_mark BIGINT; BEGIN -- we need dblink here because pg_stat_replication at least in 9.3, -- although it does report replicas dropping out, it does not report -- replicas reconnecting if called in a transaction. PERFORM dblink_connect('wait_for_streaming_lag', 'dbname=' || current_database() || ' application_name=wait_for_streaming_lag') WHERE NOT EXISTS (SELECT 1 FROM unnest(dblink_get_connections()) c(c) WHERE c='wait_for_streaming_lag'); SET LOCAL client_min_messages TO ERROR; CREATE TEMP TABLE IF NOT EXISTS lag ( gen INT, application_name TEXT, client_addr INET, flush_location TEXT, lmd TIMESTAMP ); SET LOCAL client_min_messages TO NOTICE; water_mark := $2; -- use high_water_mark for the first loop LOOP WITH g AS (SELECT max(gen) AS gen FROM lag), r AS (SELECT 1 AS ord, application_name, client_addr, flush_location, clock_timestamp() AS lmd FROM dblink('wait_for_streaming_lag', $$ SELECT application_name, client_addr, flush_location FROM pg_stat_replication $$) repl(application_name TEXT, client_addr INET, flush_location TEXT) UNION ALL SELECT 2 AS ord, application_name, client_addr, flush_location, lmd FROM lag) INSERT INTO lag SELECT coalesce(g.gen+1, 1), rx.* FROM (SELECT DISTINCT ON (application_name, client_addr) application_name, client_addr, flush_location, lmd
Re: [GENERAL] check database integrity
On 21/07/14 16:17, Tom Lane wrote: db=# select page_header(get_raw_page(2836::oid::regclass::text, 'fsm', 1)); ERROR: block number 1 is out of range for relation pg_toast_1255 db=# select pg_relation_size(2836::oid::regclass, 'fsm'); pg_relation_size -- 24576 That's bizarre. AFAICS, pg_relation_size() reduces to a stat() call, while the other error looks like it's coming from rawpage.c's check on RelationGetNumberOfBlocks() which depends on mdnblocks() which prefers to look at the result of lseek(SEEK_END). But both of those should surely get the same answer, if the file's not changing. Could you trace through it and see where the results diverge? Also, what's the actual size of the file on disk? # select pg_relation_filepath(2836::oid::regclass); pg_relation_filepath -- base/25317/11790 # ls -l data/base/25317/11790* -rw--- 1 postgres postgres 8192 Jul 21 07:31 data/base/25317/11790 -rw--- 1 postgres postgres 24576 Jul 21 07:33 data/base/25317/11790_fsm -rw--- 1 postgres postgres 8192 Jul 21 07:33 data/base/25317/11790_vm You see, main and vm forks of the relation are one page. Only fsm is 3 pages. After a fresh restart of the database I attached strace to the backend. There are only 2 lines in the output that mention that relation: open(base/25317/11790, O_RDWR)= 35 lseek(35, 0, SEEK_END) = 8192 This happened during this query: select get_raw_page(2836::oid::regclass::text, 'fsm', 1); Shouldn't it rather open 11790_fsm? Or is there something that first checks the main fork to see if the fsm page makes sense? It seems so because here is the same query for a relation where it works: open(base/25317/60966, O_RDWR)= 39 lseek(39, 0, SEEK_END) = 1490944 open(base/25317/60966_fsm, O_RDWR)= 40 lseek(40, 8192, SEEK_SET) = 8192 read(40, \37\1\0\0\360\371\275\212\305\35\0\0\30\0\0 \0 \4 \0\0\0\0\0\0\0\0\372\372\0\372..., 8192) = 8192 First it opens the main fork, then *_fsm where it reads a page at offset 8192. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] check database integrity
On 22/07/14 16:58, Tom Lane wrote: Doh. I looked right at this code in get_raw_page yesterday: if (blkno = RelationGetNumberOfBlocks(rel)) elog(ERROR, block number %u is out of range for relation \%s\, blkno, RelationGetRelationName(rel)); RelationGetNumberOfBlocks reports the length of the main fork ... but this check is applied regardless of which fork we're reading. Should be using RelationGetNumberOfBlocksInFork, of course. Thanks for fixing it. I saw your GIT commit. Do you know how long it takes for the change to make it into a new pgdg debian package? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] check database integrity
Hi, we are using 9.3 with data checksums enabled. Now I am looking for a way to check if all database blocks are still intact. First I tried pg_filedump. In many cases it simply ignored tampered data blocks. It is probably not made for this task. Then I remembered about the pageinspect extension. The following select is a bit too verbose but it seems to do the job for everything except fsm files. SELECT c.oid::regclass::text as rel, f.fork, ser.i as blocknr, pg.* FROM pg_class c CROSS JOIN (values ('main'::text), ('vm'::text)) f(fork) CROSS JOIN pg_relation_size(c.oid::regclass, f.fork) sz(sz) CROSS JOIN generate_series(0,(sz.sz/8192)::int-1) ser(i) CROSS JOIN page_header(get_raw_page(c.oid::regclass::text, f.fork, ser.i)) pg WHERE sz.sz0 Am I right? The problem with the select above is that either page_header() or get_raw_page() seems to allocate the memory for the page without freeing it again. The process size grew to ~12.5 GB and the query returned ~1,500,000 rows. And 1.5E6 * 8 kB gives roughly 12 GB. Shared buffers is ~120 MB for this database. I ran this query in a separate transaction. The memory was freed only when the backend process exited. Is there a way to work around this memory leak? Is there a better way to do what I want? I also thought about pg_dump. But that does not read indexes, AFAIK. Best if the solution would avoid expensive locks. Then I could also use it in production. But currently I need it only to verify a backup. Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] check database integrity
On 20/07/14 17:35, Tom Lane wrote: =?ISO-8859-1?Q?Torsten_F=F6rtsch?= torsten.foert...@gmx.net writes: Then I remembered about the pageinspect extension. The following select is a bit too verbose but it seems to do the job for everything except fsm files. SELECT c.oid::regclass::text as rel, f.fork, ser.i as blocknr, pg.* FROM pg_class c CROSS JOIN (values ('main'::text), ('vm'::text)) f(fork) CROSS JOIN pg_relation_size(c.oid::regclass, f.fork) sz(sz) CROSS JOIN generate_series(0,(sz.sz/8192)::int-1) ser(i) CROSS JOIN page_header(get_raw_page(c.oid::regclass::text, f.fork, ser.i)) pg WHERE sz.sz0 The problem with the select above is that either page_header() or get_raw_page() seems to allocate the memory for the page without freeing it again. Probably commit 45b0f3572 will help you with that. Thanks, Tom. At least the patch description helped. I moved the page_header() call to output column list and now it works perfectly. I'll try the patch next weekend. I ran this query in a separate transaction. The memory was freed only when the backend process exited. AFAIK such memory is released at end of query, even without the patch. Are you sure you aren't looking at shared-buffer usage? Or maybe you're on a platform where libc doesn't release freed memory back to the OS. You are right here. When I wrote the email I restored the behaviour from my faulty memory. Today I tried it again and the memory is indeed freed at the end of the query. Another question, just out of curiosity, for vm and main forks I use pg_relation_size to figure out the highest page number. That does not work for fsm. I have at least one fsm file that it 24 kb. Fetching page 0 works, page 1 and above gives an error: db=# select page_header(get_raw_page(2836::oid::regclass::text, 'fsm', 0)); page_header -- (114/23485F78,19084,0,24,8192,8192,8192,4,0) (1 row) db=# select page_header(get_raw_page(2836::oid::regclass::text, 'fsm', 1)); ERROR: block number 1 is out of range for relation pg_toast_1255 db=# select pg_relation_size(2836::oid::regclass, 'fsm'); pg_relation_size -- 24576 For other relations it works: db=# select page_header(get_raw_page(60966::oid::regclass::text, 'fsm', i)) from generate_series(0,2) i; page_header --- (11F/76884610,-4342,0,24,8192,8192,8192,4,0) (11F/768825C0,22465,0,24,8192,8192,8192,4,0) (11F/83E9EC38,-29015,0,24,8192,8192,8192,4,0) (3 rows) db=# select pg_relation_size(60966::oid::regclass, 'fsm'); pg_relation_size -- 24576 Is there a way to figure out the highest page number for fsm forks? Is there perhaps a common way that works for all forks? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] check database integrity
On 20/07/14 16:02, Andrew Sullivan wrote: Then I could also use it in production. But currently I need it only to verify a backup. If you need to verify a backup, why isn't pg_dump acceptable? Or is it that you are somehow trying to prove that what you have on the target (backup) machine is in fact production-ready? I guess I don't really understand what you are trying to do. Sorry, for kind-of misusing the word backup. What I am doing is this. I took a base backup and replayed a few xlogs. This is what I meant with backup. What I want to verify is whether all pages in all files match their checksums. So, I have to make postgres read all pages at least once. Pg_dump does this for normal tables and toast. But it does not read index relations as far as I know. A select count(*) from all tables would also do the job, again without indexes. The sentence about the backup was only to point out that I don't really care if the query locks the database for concurrent transactions. But better if it would not acquire an exclusive lock on all tables. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] updates not causing changes
Hi, our developers use a ORM tool which generates updates that write all columns of a table. In most cases, however, very few columns actually change. So, those updates mostly write the same value that already is there in the column. Now, if there is an index on such columns, does Postgres recognize the situation and skip the index update? Similarly, does it skip the FK or UNIQUE check? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] locking order
Hi, when a row is updated a RowExclusiveLock is obtained on the updated row and on every related index. Is the order in which these locks are acquired defined in some way? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] WAL bandwidth
Hi, time and again I need to build indexes. If they are big, that generates a lot of WAL data that needs to be replicated to streaming replication slaves. Usually these slaves don't lag behind noticeably. So, the application often reads from them. Well, unless I build indexes and, thus, create a huge amount of WAL in a short period of time. What I'd like to have is something where I can set the max. bandwidth with which the index generating backend may generate WAL data. I seem to remember to have seen a discussion about something similar but can't recall where. Is there anything I can do about that problem in 9.3 or 9.4? I already have a function that waits for the streaming slaves to catch up. But that mitigates the problem only at a very crude level. I'd like to be able to set that bandwidth to, say, 10mbit/sec. Then I can be sure that all my replicas are fine. How long the index creation takes, does not matter. Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] WAL bandwidth
On 22/05/14 21:05, Jeff Janes wrote: time and again I need to build indexes. If they are big, that generates a lot of WAL data that needs to be replicated to streaming replication slaves. Usually these slaves don't lag behind noticeably. So, the application often reads from them. Well, unless I build indexes and, thus, create a huge amount of WAL in a short period of time. Are these built CONCURRENTLY? yes What I'd like to have is something where I can set the max. bandwidth with which the index generating backend may generate WAL data. I seem to remember to have seen a discussion about something similar but can't recall where. Is there anything I can do about that problem in 9.3 or 9.4? I already have a function that waits for the streaming slaves to catch up. But that mitigates the problem only at a very crude level. I'd like to be able to set that bandwidth to, say, 10mbit/sec. Then I can be sure that all my replicas are fine. How long the index creation takes, does not matter. This does not appear the domain of PostgreSQL as much as the domain of your OS and network layer. The OS and network have little choice but to process the WAL in the order it is generated. If you want to throttle the generation of WAL by background maintenance operations so they don't interfere with the processing of WAL generated by bread-and-butter transaction processing, that is something that only PostgreSQL can do. That's what I want, to throttle the rate at which WAL is generated by maintenance operations. I take it, there is no such thing by now. Would it be a useful addition? I am not sure if I have the time to implement it. I have had a cursory look at the code before, just to find out how things work, but never changed something. What do you think, is it complicated to implement? Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] break table into portions for writing to separate files
On 01/05/14 19:50, Seb wrote: Hello, I've been looking for a way to write a table into multiple files, and am wondering if there are some clever suggestions. Say we have a table that is too large (several Gb) to write to a file that can be used for further analyses in other languages. The table consists of a timestamp field and several numeric fields, with records every 10th of a second. It could be meaningfully broken down into subsets of say 20 minutes worth of records. One option is to write a shell script that loops through the timestamp, selects the corresponding subset of the table, and writes it as a unique file. However, this would be extremely slow because each select takes several hours, and there can be hundreds of subsets. Is there a better way? # copy (select * from generate_series(1,1000)) to program 'split -l 100 - /tmp/xxx'; COPY 1000 # \q $ ls -l /tmp/xxxa* -rw--- 1 postgres postgres 292 May 1 19:08 /tmp/xxxaa -rw--- 1 postgres postgres 400 May 1 19:08 /tmp/xxxab -rw--- 1 postgres postgres 400 May 1 19:08 /tmp/xxxac -rw--- 1 postgres postgres 400 May 1 19:08 /tmp/xxxad -rw--- 1 postgres postgres 400 May 1 19:08 /tmp/xxxae -rw--- 1 postgres postgres 400 May 1 19:08 /tmp/xxxaf -rw--- 1 postgres postgres 400 May 1 19:08 /tmp/xxxag -rw--- 1 postgres postgres 400 May 1 19:08 /tmp/xxxah -rw--- 1 postgres postgres 400 May 1 19:08 /tmp/xxxai -rw--- 1 postgres postgres 401 May 1 19:08 /tmp/xxxaj Each of those contains 100 lines. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] arrays of rows and dblink
Hi, we have the ROW type and we have arrays. We also can create arrays of rows like: select array_agg(r) from (values (1::int, 'today'::timestamp, 'a'::text), (2, 'yesterday', 'b')) r(a,b,c); array_agg --- {(1,\2014-04-30 00:00:00\,a),(2,\2014-04-29 00:00:00\,b)} Now I want to execute that query via dblink on a remote server. How do I specify the result type? select tb.* from dblink( 'dbname=postgres', $$ select array_agg(r) from (values (1::int, 'today'::timestamp, 'a'::text), $$) tb( WHAT DO I PUT HERE? ) Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] arrays of rows and dblink
On 30/04/14 20:19, David G Johnston wrote: ISTM that you have to CREATE TYPE ... as appropriate then ... tb ( col_alias type_created_above[] ) There is only so much you can do with anonymous types (which is what the ROW construct creates; ROW is not a type but an expression anchor - like ARRAY[...]) that tells the parser how to interpret what follows. I thought so. Do I have to create the type in both databases or only on the receiving site? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to find out PIDs of transactions older than the current?
Hi, I think I can find out the transaction ids of concurrent transactions older than the current one by: select * from txid_snapshot_xip(txid_current_snapshot()) union select * from txid_snapshot_xmax(txid_current_snapshot()); Now, I want to map these transaction ids to backend process ids. pg_stat_activity does not provide the transaction id. So, I turned to pg_locks. select l.pid from ( select * from txid_snapshot_xip(txid_current_snapshot()) union select * from txid_snapshot_xmax(txid_current_snapshot())) tx(id) join pg_locks l on (l.locktype='transactionid' and l.transactionid::TEXT::BIGINT=tx.id); This works. But my transaction ids are still far less than 2^32. Will it also work after the wraparound? I am worried because there is no default cast from XID to INT or BIGINT. Is there a better way? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to find out PIDs of transactions older than the current?
On 25/04/14 13:26, Torsten Förtsch wrote: I think I can find out the transaction ids of concurrent transactions older than the current one by: select * from txid_snapshot_xip(txid_current_snapshot()) union select * from txid_snapshot_xmax(txid_current_snapshot()); Now, I want to map these transaction ids to backend process ids. pg_stat_activity does not provide the transaction id. So, I turned to pg_locks. select l.pid from ( select * from txid_snapshot_xip(txid_current_snapshot()) union select * from txid_snapshot_xmax(txid_current_snapshot())) tx(id) join pg_locks l on (l.locktype='transactionid' and l.transactionid::TEXT::BIGINT=tx.id); This works. But my transaction ids are still far less than 2^32. I think I got it. pg_locks.transactionid is a 4-byte quantity. But I can safely ignore the upper half of the BIGINT that comes out of txid_snapshot_xip(). So, the query becomes: select l.pid from ( select * from txid_snapshot_xip(txid_current_snapshot()) union select * from txid_snapshot_xmax(txid_current_snapshot())) tx(id) join pg_locks l on (l.locktype='transactionid' and l.transactionid::TEXT::BIGINT=tx.id (1::BIGINT32)-1) Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Is this a planner bug?
Hi, I got this plan: Limit (cost=0.00..1.12 rows=1 width=0) - Seq Scan on fmb (cost=0.00..6964734.35 rows=6237993 width=0) Filter: ... The table has ~80,000,000 rows. So, the filter, according to the plan, filters out 90% of the rows. Although the cost for the first row to come out of the seqscan might be 0, the cost for the first row to pass the filter and, hence, to hit the limit node is probably higher. Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is this a planner bug?
On 22/04/14 14:24, Pavel Stehule wrote: what is your effective_cache_size in postgresql.conf? What is random_page_cost and seq_page_cost? 8GB, 4, 1 But I am not asking about how to get a different plan or how to optimize the query. I know that. What I'm asking is the following. Assuming node without any filter has a startup cost C1, a total cost of C2 and produces N rows. Now, a filter is applied which passes through M rows. Then the startup cost for the node *with* the filter applied should be different from C1 because a certain amount of rows from the beginning is filtered out, right? I think the startup cost should be something like C1 + (C2+N*F-C1)*M/N or C1 + 0.5*(C2+N*F-C1)*M/N where F is the cost to apply the filter to one row. On average only one out of N/M rows matches the filter. So we need to fetch N/M rows to produce the first row out of the filter. Now, you can argue that we don't know where in that set the first matching row is. On average it would probably in the middle. That's where the 0.5 comes from. I certainly got it wrong somewhere. But I think you got the idea. If not the seqscan node, but the limit node should have a startup cost 0 (depending where the filter is taken into account). In my case the startup cost for the limit node should be somewhere between 25 and 30. Torsten 2014-04-22 14:10 GMT+02:00 Torsten Förtsch torsten.foert...@gmx.net mailto:torsten.foert...@gmx.net: Hi, I got this plan: Limit (cost=0.00..1.12 rows=1 width=0) - Seq Scan on fmb (cost=0.00..6964734.35 rows=6237993 width=0) Filter: ... The table has ~80,000,000 rows. So, the filter, according to the plan, filters out 90% of the rows. Although the cost for the first row to come out of the seqscan might be 0, the cost for the first row to pass the filter and, hence, to hit the limit node is probably higher. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is this a planner bug?
On 22/04/14 16:39, Albe Laurenz wrote: Could you run EXPLAIN ANALYZE for the query with enable_seqscan on and off? I'd be curious a) if the index can be used b) if it can be used, if that is actually cheaper c) how the planner estimates compare with reality. Using the index: Limit (cost=0.57..2.95 rows=1 width=0) (actual time=0.095..0.095 rows=1 loops=1) - Index Scan ... (cost=0.57..14857285.83 rows=6240539 width=0) (actual time=0.095..0.095 rows=1 loops=1) Index Cond:... Filter: ... Rows Removed by Filter: 4 Total runtime: 0.147 ms seq scan: Limit (cost=0.00..1.12 rows=1 width=0) (actual time=0.943..0.944 rows=1 loops=1) - Seq Scan ... (cost=0.00..6967622.77 rows=6240580 width=0) (actual time=0.940..0.940 rows=1 loops=1) Filter: ... Rows Removed by Filter: 215 Total runtime: 0.997 ms In these cases all the stuff comes from cache hits. When I first tried the query it used a seq scan and it took several seconds. In this case only setting random_page_cost less than seq_page_cost would make the planner use the index. I think if we had separate filter nodes, just like SORT nodes, then it would be clearer that the setup cost of the seq scan with filter cannot be 0. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is this a planner bug?
On 22/04/14 16:45, Tom Lane wrote: No. The model is that startup cost is what's expended before the scan can start, and then the run cost (total_cost - startup_cost) is expended while scanning. Applying a filter increases the run cost and also reduces the number of rows returned, but that's got nothing to do with startup cost. As a comparison point, imagine an index scan that has a filter condition in addition to the indexable condition (which let's assume selects multiple rows). The startup cost for such a plan corresponds to the index descent costs. The run cost corresponds to scanning the index entries matching the indexable condition, fetching the heap rows, and applying the filter condition. Or in other words, time to get the first result row is not just startup cost; it's startup cost plus run_cost/N, if the plan is estimated to return N rows altogether. Ok, I understand that's the way the model is. The point is that especially in presence of a LIMIT 1 there is a difference between a seq scan that has to fetch a few 10MB to find the first and only row and an index scan that has to process perhaps a few kb. And in this case even setting random_page_cost=seq_page_cost didn't help. If that query were part of a larger one, I wouldn't want to fiddle with the cost parameters to get one part of the query fast only to sacrifice performance in another part. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pg_stat_replication.state: streaming/catchup
Hi, just out of curiosity, what's the difference between streaming and catchup state in pg_stat_replication. According to the documentation this field is Current WAL sender state. But that does not tell me anything. Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_stat_replication.state: streaming/catchup
On 21/04/14 13:18, Michael Paquier wrote: When a standby connects for the first time to a primary, it is not yet synchronized, this is the catchup phase. Once the lag between the standby and the master is reduced to zero for the first time, replication state changes to streaming. Thanks. I am seeing several standbys changing from streaming to catchup and back. Sometimes they also get completely lost. This happens when the lag becomes high, hundreds of MB or even GB. The standby servers are relatively far away on the internet. And the operation to generate this kind of lag is index creation on large tables. Is there a difference in the protocol used in those phases? Maybe the catchup phase is optimized for bulk throughput? Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Disable an index temporarily
On 20/04/14 03:02, Sergey Konoplev wrote: Thanks for you reply. an index can be INVALID (pg_index.indisvalid=false). I want to temporarily disable an index so that it won't be used to access data but will still be updated. Can I simply set pg_index.indisvalid=false and later turn it true again? It works on a quick test, but I'm not sure how safe it is. If you need to test a query without the index use a transaction: Begin; Drop index ...; Explain ... select ...; Rollback; I know about that. The problem is I have a number of indexes in a large system that are very similar. And I suspect some of them are superfluous. Example: btree (fmb_id, action_type) btree (fmb_id) Action_type in this case is one out of a handful of values (should perhaps be an ENUM but is TEXT) and for most of the table the combination of (fmb_id, action_type) is unique. The table itself has ~2E8 rows. So it takes a while to build these indexes from scratch. Now, we have several performance counters in place. I want to disable these indexes one by one and see what happens. I am probably not able to find all of the queries that use them. But I believe that nothing much happens if I drop one of them (preferably the former?). Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Disable an index temporarily
On 20/04/14 12:08, Thomas Kellerer wrote: Example: btree (fmb_id, action_type) btree (fmb_id) [...] Btw: in the above example the second one is definitely not needed. Any query that uses the second one (single column) can also use the first one. I know. But the single column index is 3534 MB the 2-column one 4963 MB. The number of rows per distinct fmb_id is very small, usually 1 or 2. So, if a query looks for fmb_id=A and action_type=B, it has to filter out only one row if the first index is not available. Hence, I thought maybe the lower size of the index and the higher compactness per page could outweigh the more direct access provided by the 2-column index. I am quite sure there is no query that qualifies for an index-only scan on the 2-column index. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Disable an index temporarily
Hi, an index can be INVALID (pg_index.indisvalid=false). I want to temporarily disable an index so that it won't be used to access data but will still be updated. Can I simply set pg_index.indisvalid=false and later turn it true again? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Why do we need an AccessExclusiveLock to validate a FK constraint marked as NOT VALID?
Hi, currently, ALTER TABLE VALIDATE CONSTRAINT for foreign key constraints acquires an AccessExclusiveLock on the referencing table. Why? If the constraint is in place but not validated (ADD CONSTRAINT ... NOT VALID) it already prevents new modifications from violating the constraint. The code that is called to validate the constraint, RI_Initial_Check, contains this comment: * We expect that the caller has made provision to prevent any problems * caused by concurrent actions. This could be either by locking rel and * pkrel at ShareRowExclusiveLock or higher, or by otherwise ensuring * that triggers implementing the checks are already active. * Hence, we do not need to lock individual rows for the check. Doesn't the presence of the NOT VALID constraint qualify as otherwise ensuring that triggers implementing the checks are already active? Is there any deeper reason? Or is it simply not implemented yet? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why do we need an AccessExclusiveLock to validate a FK constraint marked as NOT VALID?
On 13/04/14 13:34, Vik Fearing wrote: Actually, it is implemented yet. http://www.postgresql.org/message-id/e1wwovd-0004ts...@gemulon.postgresql.org It'll be in 9.4. That's good news. So, I could validate a FK constraint this way: UPDATE pg_constraint SET convalidated = NOT EXISTS( SELECT 1 FROM ONLY fkrel a LEFT JOIN ONLY pkrel b ON (a.fkcol1=b.pkcol1 AND ...) -- all fk columns WHERE b.pkcol1 IS NULL -- inner join failed AND (a.fkcol1 IS NOT NULL OR/AND -- MATCH SIMPLE: AND; FULL: OR a.fkcol2 IS NOT NUL ...) ) WHERE contype='f' AND ... fkrel is confrelid::regclass and pkrel conrelid::regclass. That's essentially what AT VALIDATE CONSTRAINT does. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Indices and Foreign Tables
On 04/04/14 17:14, Kohler Manuel (ID SIS) wrote: Here is the same query with fdw: db=# EXPLAIN VERBOSE select ds.code, count(*), sum(dsf.size_in_bytes) as raw_size,pg_size_pretty(sum(dsf.size_in_bytes)) as size from data_set_files_fdw dsf, data_sets_fdw ds where dsf.parent_id is null and dsf.dase_id=ds.id group by ds.code order by raw_size desc; QUERY PLAN --- Sort (cost=327.81..327.97 rows=64 width=40) Output: ds.code, (count(*)), (sum(dsf.size_in_bytes)), (pg_size_pretty(sum(dsf.size_in_bytes))) Sort Key: (sum(dsf.size_in_bytes)) - HashAggregate (cost=325.09..325.89 rows=64 width=40) Output: ds.code, count(*), sum(dsf.size_in_bytes), pg_size_pretty(sum(dsf.size_in_bytes)) - Hash Join (cost=270.61..324.45 rows=64 width=40) Output: dsf.size_in_bytes, ds.code Hash Cond: (ds.id = (dsf.dase_id)::bigint) - Foreign Scan on public.data_sets_fdw ds (cost=100.00..148.40 rows=1280 width=40) Output: ds.id, ds.code, ds.location Remote SQL: SELECT id, code FROM public.data_sets - Hash (cost=170.48..170.48 rows=10 width=16) Output: dsf.size_in_bytes, dsf.dase_id - Foreign Scan on public.data_set_files_fdw dsf (cost=100.00..170.48 rows=10 width=16) Output: dsf.size_in_bytes, dsf.dase_id Filter: (dsf.parent_id IS NULL) Remote SQL: SELECT dase_id, parent_id, size_in_bytes FROM public.data_set_files (17 rows) And also with ANALYZE: db=# EXPLAIN ANALYZE select ds.code, count(*), sum(dsf.size_in_bytes) as raw_size,pg_size_pretty(sum(dsf.size_in_bytes)) as size from data_set_files_fdw dsf, data_sets_fdw ds where dsf.parent_id is null and dsf.dase_id=ds.id group by ds.code order by raw_size desc; QUERY PLAN -- Sort (cost=327.81..327.97 rows=64 width=40) (actual time=219401.864..219414.641 rows=13839 loops=1) Sort Key: (sum(dsf.size_in_bytes)) Sort Method: quicksort Memory: 2283kB - HashAggregate (cost=325.09..325.89 rows=64 width=40) (actual time=219327.664..219363.709 rows=13839 loops=1) - Hash Join (cost=270.61..324.45 rows=64 width=40) (actual time=219127.848..219277.308 rows=13839 loops=1) Hash Cond: (ds.id = (dsf.dase_id)::bigint) - Foreign Scan on data_sets_fdw ds (cost=100.00..148.40 rows=1280 width=40) (actual time=1.057..77.415 rows=13839 loops=1) - Hash (cost=170.48..170.48 rows=10 width=16) (actual time=219126.713..219126.713 rows=13839 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 649kB - Foreign Scan on data_set_files_fdw dsf (cost=100.00..170.48 rows=10 width=16) (actual time=1082.614..219083.326 rows=13839 loops=1) Filter: (parent_id IS NULL) Rows Removed by Filter: 35726596 Total runtime: 219438.925 ms (1 Do you know that you can combine VERBOSE and ANALYZE in one EXPLAIN? EXPLAIN (ANALYZE, VERBOSE) SELECT ... The problem is that your qualifier is not pushed down to the backend database. So, you pull (35726596+13839) rows (which is the complete table) from data_set_files in the remote database, filter them locally and throw 35726596 of them away. Then you copy the remote data_sets table (13839 rows). And then you join them. I don't think use_remote_estimate can help a lot in this situation. We tried to use postgres_fdw in a similar approach last year. Then switched to dblink because, although it is possible for FDW to push qualifiers to the backend, it too often does not do so. Then it copies large tables over the network and scans them sequentially on the local machine. Also, LIMIT for instance is never pushed to the backend. Now, we create functions like this: CREATE OR REPLACE FUNCTION __get_client_metrics_on(srv TEXT) RETURNS TABLE(loginid TEXT, ...) AS $def$ SELECT tb.* FROM dblink($1, $$ -- remote query starts here SELECT c.loginid, ... FROM clients c JOIN ... -- remote query ends here $$) AS tb(loginid TEXT, ...) $def$ LANGUAGE sql VOLATILE SECURITY definer ROWS ... COST ...; CREATE OR REPLACE FUNCTION get_client_metrics() RETURNS TABLE(srv TEXT, loginid TEXT, ...) AS $def$ SELECT s.srvname, rem.* FROM production_servers() s CROSS JOIN
Re: [GENERAL] Pagination count strategies
On 03/04/14 15:34, Leonardo M. Ramé wrote: Hi, in one of our systems, we added a kind of pagination feature, that shows N records of Total records. To do this, we added a count(*) over() as Total field in our queries in replacement of doing two queries, one for fetching the records, and other for getting the count. This improved the performance, but we are't happy with the results yet, by removing the count, the query takes 200ms vs 2000ms with it. We are thinking of removing the count, but if we do that, the system will lack an interesting feature. What strategy for showing the total number of records returned do you recommend?. If you need only an estimated number and if your planner statistics are up to date, you can use the planner. Here is my implementation of the explain function. The COMMENT below shows how to use it: CREATE OR REPLACE FUNCTION explain(VARIADIC TEXT[]) RETURNS JSON AS $$ DECLARE tmp TEXT; BEGIN EXECUTE 'EXPLAIN (' || array_to_string(array_append($1[2:array_upper($1, 1)], 'FORMAT JSON'), ', ') || ') ' || $1[1] INTO tmp; RETURN tmp::JSON; END; $$ LANGUAGE plpgsql VOLATILE; COMMENT ON FUNCTION explain(VARIADIC TEXT[]) IS $def$ This function is a SQL interface to the planner. It returns the plan (result of EXPLAIN) of the query passed as TEXT string as the first parameter as JSON object. The remaining parameters are EXPLAIN-modifiers, like ANALYZE or BUFFERS. The function can be used to store plans in the database. Another interesting usage is when you need only an estimated row count for a query. You can use SELECT count(*) ... This gives you an exact number but is usually slow. If your planner statistics are up to date and the query is not too complicated, the planner usually gives a good estimate and is much faster. SELECT explain('SELECT 1 FROM tb WHERE id8000') -0-'Plan'-'Plan Rows'; $def$; Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Pagination count strategies
On 03/04/14 17:44, Leonardo M. Ramé wrote: Nice!, do you know if this will work on 8.4?. no way -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] SQL advice needed
Hi, I have a volatile function that returns multiple rows. It may also return nothing. Now, I want to write an SQL statement that calls this function until it returns an empty result set and returns all the rows. So, in principle I want to: WITH RECURSIVE t AS ( SELECT * FROM xx() UNION ALL SELECT * FROM xx() ) SELECT * FROM t; But that's not recursive because the union all part lacks a reference to t. Next I tried this: WITH RECURSIVE t AS ( SELECT * FROM xx() UNION ALL SELECT * FROM xx() WHERE EXISTS (SELECT 1 FROM t) ) SELECT * FROM t; But the reference to t is not allowed in a subquery. What's the best (or at least a working) way to achieve what I want? I can do it in plpgsql. But that would mean to accumulate the complete result in memory first, right? I need to avoid that. Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SQL advice needed
On 17/03/14 21:42, Merlin Moncure wrote: I can do it in plpgsql. But that would mean to accumulate the complete result in memory first, right? I need to avoid that. I would test that assumption. This is better handled in loop IMO. LOOP RETURN QUERY SELECT * FROM xx(); IF NOT found THEN RETURN; END IF; END LOOP; At least according to the manual it is stored in memory: cite Note: The current implementation of RETURN NEXT and RETURN QUERY stores the entire result set before returning from the function, as discussed above. That means that if a PL/pgSQL function produces a very large result set, performance might be poor: data will be written to disk to avoid memory exhaustion, but the function itself will not return until the entire result set has been generated. A future version of PL/pgSQL might allow users to define set-returning functions that do not have this limitation. Currently, the point at which data begins being written to disk is controlled by the work_mem configuration variable. Administrators who have sufficient memory to store larger result sets in memory should consider increasing this parameter. /cite I didn't test that, though. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to make PG use work_mem?
Hi, I have a query that involves an external sort: - Sort (cost=13662680.01..13850498.48 rows=75127389 width=16) (actual time=980098.397..1021411.862 rows=74181544 loops=1) Sort Key: (ROW(account_id, (purchase_time)::date)) Sort Method: external merge Disk: 3118088kB Buffers: shared hit=1568637 read=1327223, temp read=389763 written=389763 What puzzles me is that this happens even when I set work_mem to 50GB in the session. Why does it still use the external merge? The query runs on a streaming replica if that matters. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to make PG use work_mem?
On 11/03/14 14:36, Tom Lane wrote: Perhaps you fat-fingered the SET somehow? I just repeated it: # select * from pg_settings where name='work_mem'; -[ RECORD 1 ] name | work_mem setting| 52428800 unit | kB ... # explain (analyze,buffers) select distinct(account_id, purchase_time::date) from fmb; QUERY PLAN - Unique (cost=13666972.01..14042722.46 rows=12894641 width=16) (actual time=1000989.364..1058273.210 rows=2200442 loops=1) Buffers: shared hit=1570088 read=1326647, temp read=389842 written=389842 - Sort (cost=13666972.01..13854847.24 rows=75150089 width=16) (actual time=1000989.362..1035694.670 rows=74196802 loops=1) Sort Key: (ROW(account_id, (purchase_time)::date)) Sort Method: external merge Disk: 3118720kB Buffers: shared hit=1570088 read=1326647, temp read=389842 written=389842 - Seq Scan on fmb (cost=0.00..3836111.11 rows=75150089 width=16) (actual time=0.021..35520.901 rows=74196802 loops=1) Buffers: shared hit=1570088 read=1326647 Total runtime: 1059324.646 ms # show work_mem; work_mem -- 50GB This is 9.3.3 from the pgdg debian repository. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to make PG use work_mem?
On 11/03/14 16:03, Tom Lane wrote: [ thinks for awhile... ] Oh, I know what's happening: your sort is so large that it's being constrained by the MaxAllocSize limit on the tuple pointer array. This has been fixed in HEAD, but it's not yet in any shipping release. According to the log entry for commit 263865a48973767ce8ed7b7788059a38a24a9f37, the previous limit on the number of tuples that could be sorted in memory was INT_MAX/48 or about 44 million; I've not done the arithmetic to check that, but it seems about right seeing that you're having trouble with 75 million. Thanks, that makes sense. BTW, I solved my problem w/o that sort. I was just curious what happened here. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to continue streaming replication after this error?
On 22/02/14 03:21, Torsten Förtsch wrote: 2014-02-21 05:17:10 UTC PANIC: heap2_redo: unknown op code 32 2014-02-21 05:17:10 UTC CONTEXT: xlog redo UNKNOWN 2014-02-21 05:17:11 UTC LOG: startup process (PID 1060) was terminated by signal 6: Aborted 2014-02-21 05:17:11 UTC LOG: terminating any other active server processes 2014-02-21 05:17:11 UTC WARNING: terminating connection because of crash of another server process 2014-02-21 05:17:11 UTC DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2014-02-21 05:17:11 UTC HINT: In a moment you should be able to reconnect to the database and repeat your command. Any idea what that means? Updating the replica to 9.3.3 cured it. The master was already on 9.3.3. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to continue streaming replication after this error?
Hi, one of our streaming replicas died with 2014-02-21 05:17:10 UTC PANIC: heap2_redo: unknown op code 32 2014-02-21 05:17:10 UTC CONTEXT: xlog redo UNKNOWN 2014-02-21 05:17:11 UTC LOG: startup process (PID 1060) was terminated by signal 6: Aborted 2014-02-21 05:17:11 UTC LOG: terminating any other active server processes 2014-02-21 05:17:11 UTC WARNING: terminating connection because of crash of another server process 2014-02-21 05:17:11 UTC DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2014-02-21 05:17:11 UTC HINT: In a moment you should be able to reconnect to the database and repeat your command. Now, if I try to restart it, I get this: The PostgreSQL server failed to start. Please check the log output: 2014-02-21 07:42:53 UTC LOG: database system was interrupted while in recovery at log time 2014-02-21 05:02:45 UTC 2014-02-21 07:42:53 UTC HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2014-02-21 07:42:53 UTC LOG: incomplete startup packet 2014-02-21 07:42:53 UTC LOG: entering standby mode 2014-02-21 07:42:53 UTC LOG: redo starts at 11C/B2211778 2014-02-21 07:42:53 UTC FATAL: the database system is starting up 2014-02-21 07:42:54 UTC LOG: consistent recovery state reached at 11C/B4234108 2014-02-21 07:42:54 UTC LOG: database system is ready to accept read only connections 2014-02-21 07:42:54 UTC PANIC: heap2_redo: unknown op code 32 2014-02-21 07:42:54 UTC CONTEXT: xlog redo UNKNOWN 2014-02-21 07:42:54 UTC LOG: startup process (PID 38187) was terminated by signal 6: Aborted 2014-02-21 07:42:54 UTC LOG: terminating any other active server processes This is 9.3.2. What is the supposed way to continue replication? Or do I need to start from a fresh base backup? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to continue streaming replication after this error?
On 21/02/14 09:17, Torsten Förtsch wrote: one of our streaming replicas died with 2014-02-21 05:17:10 UTC PANIC: heap2_redo: unknown op code 32 2014-02-21 05:17:10 UTC CONTEXT: xlog redo UNKNOWN 2014-02-21 05:17:11 UTC LOG: startup process (PID 1060) was terminated by signal 6: Aborted 2014-02-21 05:17:11 UTC LOG: terminating any other active server processes 2014-02-21 05:17:11 UTC WARNING: terminating connection because of crash of another server process 2014-02-21 05:17:11 UTC DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2014-02-21 05:17:11 UTC HINT: In a moment you should be able to reconnect to the database and repeat your command. Any idea what that means? I have got a second replica dying with the same symptoms. Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to get rid of superfluous WAL segments?
Hi, we decreased wal_keep_segments quite a lot. What is the supposed way to get rid of the now superfluous files in pg_xlog? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to get rid of superfluous WAL segments?
On 06/02/14 06:46, Torsten Förtsch wrote: we decreased wal_keep_segments quite a lot. What is the supposed way to get rid of the now superfluous files in pg_xlog? Nothing special. The database did it for me. Sorry for the noise. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help with details of what happens when I create a constraint NOT VALID
On 23/01/14 14:46, Bill Moran wrote: Some quickie background: I'm on a project to migrate a fairly large database from MySQL to PostgreSQL (~2T). As a result of a number of factors, I have to do it in one shot and I have a limited time window in which things can be down while I switch it over. As one of many, many things I'm considering to make this work, I'm looking at adding constraints after the data move using NOT VALID to allow them to be applied quickly. This seems pretty straight forward, but I'm trying to understand if there are any troublesome side-effects to leaving the constraints unvalidated. Because of the uptime requirements, there are some very large tables with may foreign keys that I will never be allowed to take a lock on long enough to validate all the constraints. It was suggested that leaving the constraints as NOT VALID might affect the planner, causing it to use less optimal plans because it doesn't think it can trust the constraint. Is this true? It has also been suggested that manually changing the status to valid in the catalog without going through the validation process could cause problems, although I haven't found an explanation of what those problems might be. I understand that the best way is to go through and do all the steps, but that may simply be impossible for me because of the lock it requires and the time involved. Is there any negative effect to leaving the constraint unvalidated? Is there any actual danger in manually flipping the value in the catalog (The constraint can be consider safe because it was previously enforced on the source database system) I had a similar problem some time ago. The way I solved it is as follows. First, add the constraint as NOT VALID. That prevents further changes to violate it. Then make sure the constraint is met. Then update pg_constraint. UPDATE pg_constraint SET convalidated = true WHERE conrelid='schema.table'::regclass::oid AND conname='constraintname' Not sure if that way can be recommended but it worked for me. In my case it was a check constraint ensuring an interdependence between the columns in a row. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Is http://www.postgresql.org/docs/9.3/static/row-estimation-examples.html up to date?
Hi, I am asking because ... I have a table with relpages | 19164 reltuples | 194775 pg_relation_size / 8192 yields the same number as relpages. So, there is no need to scale reltuples. Relcardinality is therefore 194775. Statistics target is the default, 100. So, I assume each of the 100 buckets contains 1947.75 tuples. Now, I have a timestamp column and a query for col'2013-01-01'. There are 27 buckets out of the 100 where col is '2013-01-01'. The bucket boundaries where 2013-01-01 falls into are hist | 2013-01-08 20:48:52 hist | 2012-12-13 12:36:30 There is no / operation for INTERVAL types. So, I calculate in seconds: select (27 + extract('epoch' from '2013-01-08 20:48:52'::timestamp - '2013-01-01'::timestamp) / extract('epoch' from '2013-01-08 20:48:52'::timestamp - '2012-12-13 12:36:30'::timestamp)) * 1947.75; That results in 53170.9642980797 and would be rounded to 53171. However, EXPLAIN shows: - Seq Scan on client (cost=0.00..21731.03 rows=52939 width=29) Filter: (date_joined '2013-01-01 00:00:00'::timestamp without time zone) The numbers are of the same number of magnitude, but they are too different to be rounding errors. So, what did I wrong? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] question about checksum in 9.3
On 13/01/14 17:50, Mike Broers wrote: Is there a built-in method of scanning the server to check for corruption or will I have to wait for a corrupt object to be accessed to see the log entry? This presentation: http://www.youtube.com/watch?v=TovZ0lb16-Q suggests pg_filedump. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] window function and order by
Hi, assuming I have a query that computes a running sum like this: select id, buy_price, sum(buy_price) over (order by id) sum from fmb where 202300=id and id=202400 order by id; Do I need the ORDER BY clause at the end? Or does the ORDER BY in the window function already define the order of the result set? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Is it possible to avoid the VACUUM FREEZE when upgrading to 9.3.2?
Hi, can the VACUUM FREEZE be skipped if a) txid_current vacuum_freeze_min_age or if b) txid_current vacuum_freeze_table_age or if c) txid_current autovacuum_freeze_max_age and no manual VACUUM has been done? Or is the operation perhaps only necessary for tables matching select n.nspname, c.relname, c.relfrozenxid from pg_class c join pg_namespace n on c.relnamespace=n.oid where c.relfrozenxid::text::int=(select setting::int from pg_settings where name='vacuum_freeze_min_age') or similar? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PG replication across DataCenters
On 22/11/13 11:57, Albe Laurenz wrote: Don't use synchronous replication if you have a high transaction rate and a noticable network latency between the sites. Wait for the next bugfix release, since a nasty bug has just been discovered. Can you please explain or provide a pointer for more information? We have recently started to use sync replication over a line with 80ms latency. It works for small transactions with a relatively low transaction rate. Avoid transactions using NOTIFY. Those acquire an exclusive lock during commit that is released only when the remote host has also done its commit. So, only one such transaction can be committing at time. Async replication works just fine. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Partitioned table question
Hi, we have a table partitioned by time. Each month goes into a separate child table. Primary key in each table is (underlying, ts). The resulting index is perfect for ordering like in the query below. Each child table has a constraint like: CHECK(ts= '2011-1-1' and ts'2011-1-1'::DATE + interval '1 month') Now, we have queries of this type: SELECT * FROM tick WHERE underlying = 'R_50' AND ts = '2013-05-02' ORDER BY ts DESC LIMIT 100 The query plan for this is at http://explain.depesz.com/s/fB6 According to this plan it fetches all the result tuples from tick_2013_4 which is fine because tick_2013_5 obviously does not contain matches. My question is, why does it then try to fetch one row from every other index? Can that be avoided without a lower bound on ts? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Partitioned table question
On 13/11/13 13:49, Gabriel Sánchez Martínez wrote: My question is, why does it then try to fetch one row from every other index? Can that be avoided without a lower bound on ts? If you don't set a lower bound, since every other table has dates below 2013-05-01, they have to be scanned too. I'm not sure what happens on actual execution if it searches in '2013_4' first and finds 100 or more rows. I don't know if the query planner uses constraint exclusion rules to figure out the order in which tables will be scanned. It probably does. According to the analyze part of the query plan it does not find any match in 2013_5. But from 2013_4 it fetches 100 rows. - Index Scan Backward using tick_2013_4_pkey on tick_2013_4 tick (cost=0.00..5025184.53 rows=1336481 width=40) (actual time=0.047..0.124 rows=100 loops=1) == rows=100 Of course, it's a good idea to add a lower bound to the query. I also know that the planner does not know how many rows it can fetch from each table (it can have a quite accurate guess though). So, the plan must include all tables before and including 2013_5. The question, however, was why does the executor try to fetch rows from the other tables at all. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Partitioned table question
On 13/11/13 20:21, Jeff Janes wrote: The planner uses the check constraints to reason about the relation between each partition separately and the query, not between the different partitions. So while it may be possible to know that all rows in 2013_4 must be greater than all in 2013_3, it doesn't make use of that, instead taking the greatest value from each partition and putting it in a priority queue. So the one row from each table acts as a sentinel to prove that more rows from that table are not needed. That makes perfect sense. Thank you very much. Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] locks held during commit with synchronous replication
On 21/10/13 20:46, Tom Lane wrote: =?ISO-8859-1?Q?Torsten_F=F6rtsch?= torsten.foert...@gmx.net writes: I noticed that with synchronous replication I often see locks like this: [ AccessExclusiveLock on database 0 ] You did not say what PG version you're talking about, but if it's recent It's 9.3.1 then this must be coming from PreCommit_Notify, which takes such a lock while pushing entries into the shared notification-event queue: /* * Serialize writers by acquiring a special lock that we hold till * after commit. This ensures that queue entries appear in commit * order, and in particular that there are never uncommitted queue * entries ahead of committed ones, so an uncommitted transaction * can't block delivery of deliverable notifications. * * We use a heavyweight lock so that it'll automatically be released * after either commit or abort. This also allows deadlocks to be * detected, though really a deadlock shouldn't be possible here. * * The lock is on database 0, which is pretty ugly but it doesn't * seem worth inventing a special locktag category just for this. * (Historical note: before PG 9.0, a similar lock on database 0 was * used by the flatfiles mechanism.) */ LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock); This has nothing to do with synchronous replication, only with use of LISTEN/NOTIFY. Does that mean that only one transaction can be committed at a time? If they're sending notifies, yes. Thanks, Tom! I think that explains it. We are also using bucardo. So when a transaction commits the bucardo notification acquires the lock. But now with synchronous replication and synchronous_commit=on commit waits for the remote host to complete. Thus, the lock is held longer than before and I see it. The problem is we use synchronous replication over long distances on the internet to bring the data closer to the customer. Turns out, this combined with bucardo creates quite a bottleneck. Best, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] locks held during commit with synchronous replication
Hi, I noticed that with synchronous replication I often see locks like this: -[ RECORD 10 ]-+ locktype | object database | 0 relation | page | tuple | virtualxid | transactionid | classid| 1262 objid | 0 objsubid | 0 virtualtransaction | 13/42806 pid| 9794 mode | AccessExclusiveLock granted| t fastpath | f According to the manual classid is an oid in pg_class. If so, the 1262 resolves to pg_database. What objid=0 means, I have no idea. For how long is this lock held? Am I right in assuming that it is held from the start of the commit until the transaction is streamed and written to disk by the synchronous replica? Does that mean that only one transaction can be committed at a time? I have to admit I haven't thought much about that before. Lock contention has never been a problem here. Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] like optimization
On 12/10/13 20:08, Scott Ribe wrote: select * from test where tz = start and tz end and colb like '%foobar%' I think you can use an index only for wildcard expressions that are anchored at the beginning. So, select * from test where tz = start and tz end and colb like 'foobar%' can use an index on colb. You could perhaps select * from test where tz = start and tz end and colb like 'foobar%' union all select * from test where tz = start and tz end and reverse(colb) like 'raboof%' Then you need 2 indexes, one on colb the other on reverse(colb). You can have duplicates in the result set if the table contains rows where colb='foobar'. If that's a problem, use union distinct. Alternatively, if foobar is kind of a word (with boundaries), you could consider full-text search. Just my 2¢, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] declare constraint as valid
Hi, assuming a constraint is added to a table as NOT VALID. Now I know it IS valid. Can I simply declare it as valid by update pg_constraint set convalidated='t' where conrelid=(select c.oid from pg_class c join pg_namespace n on (n.oid=c.relnamespace) where c.relname='tablename' and n.nspname='schemaname') and conname='constraintname'; instead of alter table tablename validate constraint ... Or does the latter have other side effects? I am asking because I want to avoid the ACCESS EXCLUSIVE lock required by the ALTER TABLE. I am sure there are no rows violating the constraint. Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] ALTER TABLE VALIDATE CONSTRAINT w/o lock
Hi, I want to add a new column named sell_time to a table which already has a boolean field named is_sold. Then I want to create a new check constraint: ALTER TABLE ... CHECK(is_sold AND sell_time IS NOT NULL OR NOT is_sold AND sell_time IS NULL) NOT VALID; The constraint is added as NOT VALID for 2 reasons. 1) it is not valid at that point and 2) I cannot afford the lock required to validate the it. According to the docs subsequent operations on the table must pass the constraint. Now I can slowly populate the new column. After a while I'll reach a point where the constraint is valid for all rows. How can I tell that the database without the re-validation? Alternatively, is there a way to make the constraint validation really fast (a few seconds at most for 50.000.000 rows)? (this is 9.3) Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] the new checksum feature
Hi, is it possible to turn on checksums in an existing database? Or do I have to dump/initdb/restore? Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general