Re: [GENERAL] pg_basebackup on slave running for a long time
On 11/20/2016 11:00 PM, Subhankar Chattopadhyay wrote: Yes so if the slave is behind I need to start over pgbasebackup. I saw according to the documentation this query gives us the replication state. Can somebody tell me if this would be sufficient to know if I need to start over the backup ? if the slave is behind but is catching up, no, restarting replication would be overkill.only if the slave gets so far behind that it can't catch up, and in that case, a wal archive would be a better choice than a new base backup. I've never run into these problems as I run on dedicated hardware servers, which don't have all these reliability and performance problems. a complete server failure requiring a full rebuild is something that would happen less than annually. -- john r pierce, recycling bits in santa cruz
Re: [GENERAL] pg_basebackup on slave running for a long time
Yes so if the slave is behind I need to start over pgbasebackup. I saw according to the documentation this query gives us the replication state. Can somebody tell me if this would be sufficient to know if I need to start over the backup ? On 21 Nov 2016 12:18, "John R Pierce"wrote: > On 11/20/2016 10:37 PM, Subhankar Chattopadhyay wrote: > >> We are using the database in a cloud setup and the nodes are running >> in VMs. The slave can fall behind for various reasons as you have >> stated, like slave is shut down or high write workload. >> >> We don't use replication slot but the wal_keep_segments is high enough >> for us, 5000 to be exact. >> >> In that case, we can do pg_basebackup only when necessary, like only >> when slave is behind. We can check that from the query "SELECT state >> FROM pg_stat_replication WHERE pid <> pg_backend_pid();" Will that be >> correct way to do it? >> > > maintaining a wal archive, and configuring the slaves so they can recover > from it when they are restarted will likely take care of things if they are > getting so far behind that 5000 wal segments is insufficient.. > > I'm not that familiar with pg_stat_replication, I'm not sure what that > query is telling you. others can probably chime in with more info on this. > > > -- > john r pierce, recycling bits in santa cruz > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] pg_basebackup on slave running for a long time
On 11/20/2016 10:37 PM, Subhankar Chattopadhyay wrote: We are using the database in a cloud setup and the nodes are running in VMs. The slave can fall behind for various reasons as you have stated, like slave is shut down or high write workload. We don't use replication slot but the wal_keep_segments is high enough for us, 5000 to be exact. In that case, we can do pg_basebackup only when necessary, like only when slave is behind. We can check that from the query "SELECT state FROM pg_stat_replication WHERE pid <> pg_backend_pid();" Will that be correct way to do it? maintaining a wal archive, and configuring the slaves so they can recover from it when they are restarted will likely take care of things if they are getting so far behind that 5000 wal segments is insufficient.. I'm not that familiar with pg_stat_replication, I'm not sure what that query is telling you. others can probably chime in with more info on this. -- john r pierce, recycling bits in santa cruz -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_basebackup on slave running for a long time
Hi John, We are using the database in a cloud setup and the nodes are running in VMs. The slave can fall behind for various reasons as you have stated, like slave is shut down or high write workload. We don't use replication slot but the wal_keep_segments is high enough for us, 5000 to be exact. In that case, we can do pg_basebackup only when necessary, like only when slave is behind. We can check that from the query "SELECT state FROM pg_stat_replication WHERE pid <> pg_backend_pid();" Will that be correct way to do it? On Mon, Nov 21, 2016 at 11:42 AM, John R Piercewrote: > On 11/20/2016 10:00 PM, Subhankar Chattopadhyay wrote: >> >> Thanks for reply. In situations where slave is behind master, if I >> don't start over, will it catch up automatically? >> I am using 9.4 version. > > > it should stay within a few seconds under normal conditions. why is it > falling behind, is your write workload too high for the speed of the > connection between the hosts? or is the slave shut down for some period of > time? > > If the slave is intermittently offline, and if you're using replication > slots (that was a new feature in 9.4), then the master will hold a queue of > pending data as long as is needed until the slave catches up again. > > Alternately, you can increase wal_keep_segments for a long enough interval > to cover the worst case time the slave is offline, or you can implement a > wal archive that the slave can recover from when resuming streaming > replication. > > See https://www.postgresql.org/docs/9.4/static/warm-standby.html for > discussions of these various options. > > > > > > -- > john r pierce, recycling bits in santa cruz > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Subhankar Chattopadhyay Bangalore, India -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_basebackup on slave running for a long time
On 11/20/2016 10:00 PM, Subhankar Chattopadhyay wrote: Thanks for reply. In situations where slave is behind master, if I don't start over, will it catch up automatically? I am using 9.4 version. it should stay within a few seconds under normal conditions. why is it falling behind, is your write workload too high for the speed of the connection between the hosts? or is the slave shut down for some period of time? If the slave is intermittently offline, and if you're using replication slots (that was a new feature in 9.4), then the master will hold a queue of pending data as long as is needed until the slave catches up again. Alternately, you can increase wal_keep_segments for a long enough interval to cover the worst case time the slave is offline, or you can implement a wal archive that the slave can recover from when resuming streaming replication. See https://www.postgresql.org/docs/9.4/static/warm-standby.html for discussions of these various options. -- john r pierce, recycling bits in santa cruz -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_basebackup on slave running for a long time
Hi John, Thanks for reply. In situations where slave is behind master, if I don't start over, will it catch up automatically? I am using 9.4 version. On Mon, Nov 21, 2016 at 11:22 AM, John R Piercewrote: > On 11/20/2016 9:29 PM, Subhankar Chattopadhyay wrote: >> >> We have setup PostgreSQL master-slave topology with Streaming >> replication setup. >> One of the steps for setting up streaming replication is to do >> pg_basebackup on slave from master. >> >> For subsequent update of this database, this step is repeated every >> time, deleting the existing data copy of slave and running >> pg_basebackup again. >> >> For a huge data size of over 500GB, it takes a lot of time to copy >> the data from master to slave. >> We were looking for some optimization technique so that it doesnt have >> to copy the whole data in every update of the system. >> >> Is there a way to do that? Can somebody throw some light on this? > > > if you have streaming replication, why do you delete it and start over ??!? > > the streaming replication should replicate all updates of the master in near > realtime to the slave(s). > > > -- > john r pierce, recycling bits in santa cruz > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Subhankar Chattopadhyay Bangalore, India -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_basebackup on slave running for a long time
On 11/20/2016 9:29 PM, Subhankar Chattopadhyay wrote: We have setup PostgreSQL master-slave topology with Streaming replication setup. One of the steps for setting up streaming replication is to do pg_basebackup on slave from master. For subsequent update of this database, this step is repeated every time, deleting the existing data copy of slave and running pg_basebackup again. For a huge data size of over 500GB, it takes a lot of time to copy the data from master to slave. We were looking for some optimization technique so that it doesnt have to copy the whole data in every update of the system. Is there a way to do that? Can somebody throw some light on this? if you have streaming replication, why do you delete it and start over ??!? the streaming replication should replicate all updates of the master in near realtime to the slave(s). -- john r pierce, recycling bits in santa cruz -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pg_basebackup on slave running for a long time
Hi Team, We have setup PostgreSQL master-slave topology with Streaming replication setup. One of the steps for setting up streaming replication is to do pg_basebackup on slave from master. For subsequent update of this database, this step is repeated every time, deleting the existing data copy of slave and running pg_basebackup again. For a huge data size of over 500GB, it takes a lot of time to copy the data from master to slave. We were looking for some optimization technique so that it doesnt have to copy the whole data in every update of the system. Is there a way to do that? Can somebody throw some light on this? Subhankar Chattopadhyay Bangalore, India -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Feature request: separate logging
On Sun, Nov 20, 2016 at 12:56 PM, Guillaume Lelargewrote: > Le 18 nov. 2016 2:03 PM, "otheus uibk" a écrit : >> Finally, if these changes can be implemented, is it impossible to backport >> them to prior versions, say 9.1 and up? If I wrote a patch, under what >> conditions would the patch be accepted for inclusion in official releases of >> older versions? >> > > Only bug fixes are accepted in older releases. What you're proposing isn't a > bug fix. To put it in other words, if you are willing to write a patch that would be accepted by upstream, the first version where it could land is Postgres 10. Currently this version is still in development and the last commit fest will be in March. See more details here: https://wiki.postgresql.org/wiki/Submitting_a_Patch If you are willing to get something done in older versions, it may be interesting to look at the hook in elog.c and try to implement what you are looking for as plugin loaded by the server at startup using shared_preload_libraries. Reading roughly what you are writing above, I guess that it could be possible: the elog hook offers ways to redirect message strings generated by the server the way you want. I wrote such a thing some time ago to generate logs with a JSON format, you may want to look at that to give you more ideas: https://github.com/michaelpq/pg_plugins/tree/master/jsonlog -- Michael -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strict min and max aggregate functions
Jeff Janeswrites: > On Sun, Nov 20, 2016 at 2:45 AM, Pavel Stehule > wrote: >> you can use composite type instead array too. > I tried a composite type of (flag int, value anyelement) but you can't use > anyelement in a composite type. So the aggregate function couldn't be > polymorphic. Or, that was my conclusion after making a few attempts. Maybe > I need to give on polymorphism if I want to get performance? I would bet on a composite type being as slow or slower than the array solution. You could do a quick test with a non-polymorphic definition just to see, but I doubt it's a promising avenue. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strict min and max aggregate functions
2016-11-20 20:18 GMT+01:00 Jeff Janes: > On Sun, Nov 20, 2016 at 2:45 AM, Pavel Stehule > wrote: > >> >> >> 2016-11-19 22:12 GMT+01:00 Jeff Janes : >> >>> I need "strict" MIN and MAX aggregate functions, meaning they return >>> NULL upon any NULL input, and behave like the built-in aggregates if none >>> of the input values are NULL. >>> >>> This doesn't seem like an outlandish thing to want, and I'm surprised I >>> can't find other discussion of it. Perhaps because none of the words here >>> are very effective as search terms as they are so individually common. >>> >>> I've hit upon a solution that works, but it is both ugly and slow (about >>> 50 fold slower than the built-ins; for my current purpose this is not a big >>> problem but I would love it to be faster if that could be done easily). >>> >>> So here is my approach. Any suggestions to improve it? Or are there >>> better canned solutions I've failed to find? >>> >>> >>> -- If no values have been delivered to the aggregate, the internal state >>> is the >>> -- NULL array. If a null values has been delivered, the internal status >>> is an >>> -- array with one element, which is NULL. Otherwise, it is an array >>> with one element, >>> -- the least/greatest seen so far. >>> >>> CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement ) >>> RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$ >>> SELECT CASE >>> WHEN $1 IS NULL THEN ARRAY[$2] >>> WHEN $1[1] IS NULL THEN $1 >>> WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to >>> preserve type >>> ELSE ARRAY[least($1[1],$2)] END ; >>> $$; >>> >>> >>> CREATE OR REPLACE FUNCTION strict_min_final (anyarray) >>> RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$ >>> SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ; >>> $$; >>> >>> CREATE AGGREGATE strict_min (x anyelement) ( >>> sfunc = strict_min_agg, >>> stype = anyarray, >>> finalfunc = strict_min_final >>> ); >>> >> >> can you use plpgsql instead sql? >> > > I can. Would there be an advantage? > PLpgSQL uses prepared statements - the expressions should be evaluated faster. In this case there are not possible SQL inlining. > > you can use composite type instead array too. >> > > I tried a composite type of (flag int, value anyelement) but you can't use > anyelement in a composite type. So the aggregate function couldn't be > polymorphic. Or, that was my conclusion after making a few attempts. Maybe > I need to give on polymorphism if I want to get performance? > > Cheers, > > Jeff >
Re: [GENERAL] Strict min and max aggregate functions
On Sun, Nov 20, 2016 at 2:45 AM, Pavel Stehulewrote: > > > 2016-11-19 22:12 GMT+01:00 Jeff Janes : > >> I need "strict" MIN and MAX aggregate functions, meaning they return NULL >> upon any NULL input, and behave like the built-in aggregates if none of the >> input values are NULL. >> >> This doesn't seem like an outlandish thing to want, and I'm surprised I >> can't find other discussion of it. Perhaps because none of the words here >> are very effective as search terms as they are so individually common. >> >> I've hit upon a solution that works, but it is both ugly and slow (about >> 50 fold slower than the built-ins; for my current purpose this is not a big >> problem but I would love it to be faster if that could be done easily). >> >> So here is my approach. Any suggestions to improve it? Or are there >> better canned solutions I've failed to find? >> >> >> -- If no values have been delivered to the aggregate, the internal state >> is the >> -- NULL array. If a null values has been delivered, the internal status >> is an >> -- array with one element, which is NULL. Otherwise, it is an array with >> one element, >> -- the least/greatest seen so far. >> >> CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement ) >> RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$ >> SELECT CASE >> WHEN $1 IS NULL THEN ARRAY[$2] >> WHEN $1[1] IS NULL THEN $1 >> WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve >> type >> ELSE ARRAY[least($1[1],$2)] END ; >> $$; >> >> >> CREATE OR REPLACE FUNCTION strict_min_final (anyarray) >> RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$ >> SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ; >> $$; >> >> CREATE AGGREGATE strict_min (x anyelement) ( >> sfunc = strict_min_agg, >> stype = anyarray, >> finalfunc = strict_min_final >> ); >> > > can you use plpgsql instead sql? > I can. Would there be an advantage? you can use composite type instead array too. > I tried a composite type of (flag int, value anyelement) but you can't use anyelement in a composite type. So the aggregate function couldn't be polymorphic. Or, that was my conclusion after making a few attempts. Maybe I need to give on polymorphism if I want to get performance? Cheers, Jeff
Re: [GENERAL] [HACKERS] How to change order sort of table in HashJoin
Manwrites: > Additional information. > In 9.6 the second table (lesser tuple) was choosen (the same testdata). > There are something (cost estimation?) different in previous versions. I'd bet on different statistics in the two installations (either you forgot to ANALYZE, or the random sample came up quite a bit different). And I'm a little suspicious that these tests weren't all done with the same work_mem setting. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: R: [GENERAL] Autovacuum and frequent pg_bulkload
On 11/20/2016 07:20 AM, Job wrote: Hi Adrian, Thank you for your fast reply! FYI, Postgres 8.4 is over two years past EOL. Yes, i am aware; we will migrate next year but for application compatibility reason we have at the moment to the remain here.. What are the steps in the load/delete cycle? We need to load, with pg_bulkload, log datas for reporting. We load every hours about one million of row and twice a day we generated aggregated report. Then we delete old rows (we cannot use truncate because we only delere rows older than 24 hours). I think, but i can be wrong, that pg_bulkload do not reuse free space marked previously by the delete.. Can you be more specific. In other words show the actual commands you use? Also how are you defining free space, space the OS sees or space Postgres sees? How are you determining the free space? Only vacuum full empty free space, but tablea became locked. VACUUM full returns space to the OS if that is what you mean? Thank you Francesco -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
R: [GENERAL] Autovacuum and frequent pg_bulkload
Hi Adrian, Thank you for your fast reply! >FYI, Postgres 8.4 is over two years past EOL. Yes, i am aware; we will migrate next year but for application compatibility reason we have at the moment to the remain here.. >What are the steps in the load/delete cycle? We need to load, with pg_bulkload, log datas for reporting. We load every hours about one million of row and twice a day we generated aggregated report. Then we delete old rows (we cannot use truncate because we only delere rows older than 24 hours). I think, but i can be wrong, that pg_bulkload do not reuse free space marked previously by the delete.. Only vacuum full empty free space, but tablea became locked. Thank you Francesco -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Autovacuum and frequent pg_bulkload
On 11/20/2016 04:51 AM, Job wrote: Hello, i use PostgreSql 8.4.22 and two times a day, i import logs wth pg_bulkload and i delete old data. I move some millions of records in a day. FYI, Postgres 8.4 is over two years past EOL. I noticed that only autovacuum seems not to be able to free unused space. Do you run ANALYZE at any point in your procedure? What are the steps in the load/delete cycle? I need a periodical vacuum full but this operations takes many hours. Do you think it is due to pk_bulkload that is not able to "see" free-marked space to use when loading new data? Thank you, /F -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Autovacuum and frequent pg_bulkload
Hello, i use PostgreSql 8.4.22 and two times a day, i import logs wth pg_bulkload and i delete old data. I move some millions of records in a day. I noticed that only autovacuum seems not to be able to free unused space. I need a periodical vacuum full but this operations takes many hours. Do you think it is due to pk_bulkload that is not able to "see" free-marked space to use when loading new data? Thank you, /F -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strict min and max aggregate functions
Jeff Janes schrieb am 19.11.2016 um 22:12: I need "strict" MIN and MAX aggregate functions, meaning they return NULL upon any NULL input, and behave like the built-in aggregates if none of the input values are NULL. This doesn't seem like an outlandish thing to want, and I'm surprised I can't find other discussion of it. Perhaps because none of the words here are very effective as search terms as they are so individually common. I've hit upon a solution that works, but it is both ugly and slow (about 50 fold slower than the built-ins; for my current purpose this is not a big problem but I would love it to be faster if that could be done easily). This is not really pretty as well, but might be faster: select a, case when group_count = nn_count then min_b end as min_b from ( select a, min(b) as min_b, count(b) as nn_count, count(*) as group_count from x group by a ) t; As the expensive part is the group by I wouldn't expect the additional aggregates to make a big difference. Alternatively: select a, case when no_nulls then min_b end as min_b from ( select a, min(b) as min_b, bool_and(b is not null) as no_nulls from x group by a ) t; -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strict min and max aggregate functions
2016-11-19 22:12 GMT+01:00 Jeff Janes: > I need "strict" MIN and MAX aggregate functions, meaning they return NULL > upon any NULL input, and behave like the built-in aggregates if none of the > input values are NULL. > > This doesn't seem like an outlandish thing to want, and I'm surprised I > can't find other discussion of it. Perhaps because none of the words here > are very effective as search terms as they are so individually common. > > I've hit upon a solution that works, but it is both ugly and slow (about > 50 fold slower than the built-ins; for my current purpose this is not a big > problem but I would love it to be faster if that could be done easily). > > So here is my approach. Any suggestions to improve it? Or are there > better canned solutions I've failed to find? > > > -- If no values have been delivered to the aggregate, the internal state > is the > -- NULL array. If a null values has been delivered, the internal status > is an > -- array with one element, which is NULL. Otherwise, it is an array with > one element, > -- the least/greatest seen so far. > > CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement ) > RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$ > SELECT CASE > WHEN $1 IS NULL THEN ARRAY[$2] > WHEN $1[1] IS NULL THEN $1 > WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve > type > ELSE ARRAY[least($1[1],$2)] END ; > $$; > > > CREATE OR REPLACE FUNCTION strict_min_final (anyarray) > RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$ > SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ; > $$; > > CREATE AGGREGATE strict_min (x anyelement) ( > sfunc = strict_min_agg, > stype = anyarray, > finalfunc = strict_min_final > ); > can you use plpgsql instead sql? you can use composite type instead array too. Regards Pavel > > > Cheers, > > Jeff >
Re: [GENERAL] [HACKERS] How to change order sort of table in HashJoin
Thanks for response, sir. On 11/20/2016 1:18 AM, Tom Lane wrote: Man Trieuwrites: As in the example below, i think the plan which hash table is created on testtbl2 (the fewer tuples) should be choosen. The planner usually prefers to hash on the table that has a flatter MCV histogram, since a hash table with many key collisions will be inefficient. You might find it illuminating to read the comments around estimate_hash_bucketsize(). Thanks, I will read it. Additional information. In 9.6 the second table (lesser tuple) was choosen (the same testdata). There are something (cost estimation?) different in previous versions. --- In 9.6.1 --- postgres=# explain analyze select * from testtbl1 inner join testtbl2 using(c1,c2,c3); QUERY PLAN - Hash Join (cost=6935.57..60389.58 rows=1 width=60) (actual time=80.214..1165.762 rows=142857 loops=1) Hash Cond: ((testtbl1.c1 = testtbl2.c1) AND (testtbl1.c2 = testtbl2.c2) AND (testtbl1.c3 = testtbl2.c3)) -> Seq Scan on testtbl1 (cost=0.00..21276.00 rows=100 width=56) (actual time=0.038..226.324 rows=100 loops=1) -> Hash (cost=3039.57..3039.57 rows=142857 width=56) (actual time=79.632..79.632 rows=142857 loops=1) Buckets: 65536 Batches: 4 Memory Usage: 3658kB -> Seq Scan on testtbl2 (cost=0.00..3039.57 rows=142857 width=56) (actual time=0.028..20.646 rows=142857 loops=1) Planning time: 0.252 ms Execution time: 1174.588 ms (8 rows) -- --- In 9.4.10 --- postgres=# explain analyze select * from testtbl1 inner join testtbl2 using(c1,c2,c3); QUERY PLAN - Hash Join (cost=48542.00..67353.86 rows=1 width=60) (actual time=880.580..1277.611 rows=142857 loops=1) Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2) AND (testtbl2.c3 = testtbl1.c3)) -> Seq Scan on testtbl2 (cost=0.00..3039.57 rows=142857 width=56) (actual time=0.016..24.421 rows=142857 loops=1) -> Hash (cost=21276.00..21276.00 rows=100 width=56) (actual time=878.296..878.296 rows=100 loops=1) Buckets: 8192 Batches: 32 Memory Usage: 2839kB -> Seq Scan on testtbl1 (cost=0.00..21276.00 rows=100 width=56) (actual time=0.025..258.193 rows=100 loops=1) Planning time: 2.683 ms Execution time: 1285.868 ms (8 rows) -- In general, given a hashtable that fits in memory and light bucket loading, a hash join is more or less O(M) + O(N); it doesn't matter so much whether the larger table is on the inside. It does matter if the table gets big enough to force batching of the join, but that's not happening in your example (at least not the first one; it's unclear to me why it did happen in the second one). The key thing that will drive the choice, then, is avoiding a skewed bucket distribution that causes lots of comparisons for common values. regards, tom lane Thanks and best regards, -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strict min and max aggregate functions
> CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement ) > RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$ > SELECT CASE > WHEN $1 IS NULL THEN ARRAY[$2] > WHEN $1[1] IS NULL THEN $1 > WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve > type > ELSE ARRAY[least($1[1],$2)] END ; > $$; > > > CREATE OR REPLACE FUNCTION strict_min_final (anyarray) > RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$ > SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ; > $$; > > CREATE AGGREGATE strict_min (x anyelement) ( > sfunc = strict_min_agg, > stype = anyarray, > finalfunc = strict_min_final > ); > It seems like this should be possible to do in something more close to O(log n). But I'm not sure how to fix the semantics with aggregates. SELECT max() FROM ; SELECT true FROM WHERE IS NULL LIMIT 1; Both these queries can be resolved with a index lookup (if one is available).
Re: [GENERAL] Trim performance on 9.5
Op 18/11/2016 om 16:58 schreef William Ivanski: I just ran EXPLAIN ANALYZE, please see images attached. Field doesn't have a index. Em sex, 18 de nov de 2016 às 12:16, vinny> escreveu: On 2016-11-18 15:06, William Ivanski wrote: > Hi, > > I recently did major improvements on perfomance on our routines by > simply removing the call for trim functions on specific bottlenecks. > Please see images attached for a simple example. > > I'm using PostgreSQL version 9.5.5-1.pgdg80+1 on Debian 8.6. Someone > knows if it's a bug on trim function? Thanks in advance. > > -- > > William Ivanski Did you run EXPLAIN on these queries? I'm guessing that you have an index on the field, but not on TRIM(field), which would mean that the database is forced to seqscan to fetch every row value, trim it and then compare it. -- William Ivanski Neither exeution times are really "fast", I'd suggest creating an index on the TRIM() version of the field.