[GENERAL] Very suspicious plan difference for select and corresponding delete queries PostgreSQL 9.6.2
Hi, I found a case with very curious plan difference between: explain select * from jobs_to_delete WHERE EXISTS(SELECT 1 FROM job_stats_new_201411 jsm WHERE jsm.job_reference = jobs_to_delete.job_reference); QUERY PLAN - Hash Join (cost=239386.25..376872.49 rows=111372 width=41) Hash Cond: ((jobs_to_delete.job_reference)::text = (jsm.job_reference)::text) -> Seq Scan on jobs_to_delete (cost=0.00..101547.10 rows=9286780 width=41) -> Hash (cost=237994.10..237994.10 rows=111372 width=18) -> HashAggregate (cost=236880.38..237994.10 rows=111372 width=18) Group Key: (jsm.job_reference)::text -> Index Only Scan using job_stats_master_201411_job_reference_idx_ebs on job_stats_new_201411 jsm (cost=0.56..214784.97 rows=8838161 width=18) and corresponding delete (which I suspect should have the same plan) explain delete from jobs_to_delete WHERE EXISTS(SELECT 1 FROM job_stats_new_201411 jsm WHERE jsm.job_reference = jobs_to_delete.job_reference); QUERY PLAN Delete on jobs_to_delete (cost=266351.88..403838.13 rows=111372 width=12) -> Hash Join (cost=266351.88..403838.13 rows=111372 width=12) Hash Cond: ((jobs_to_delete.job_reference)::text = (jsm.job_reference)::text) -> Seq Scan on jobs_to_delete (cost=0.00..101547.10 rows=9286780 width=43) -> Hash (cost=264959.73..264959.73 rows=111372 width=24) -> HashAggregate (cost=263846.01..264959.73 rows=111372 width=24) Group Key: (jsm.job_reference)::text -> Seq Scan on job_stats_new_201411 jsm (cost=0.00..241750.61 rows=8838161 width=24) Manual analyze of the both tables didn't change a result. I can not see any possible/logical/realistic reason for the database to switch between INDEX ONLY SCAN and SEQ SCAN for EXIST part of query in this two cases. I not sure that it's a but, so I better post in -general first. -- Maxim Boguk Senior Postgresql DBA http://dataegret.com/ <http://www.postgresql-consulting.com/> Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com "People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage."
Re: [GENERAL] text_pattern_ops index not being used for prefix query
On Fri, Oct 21, 2016 at 8:42 AM, Alexander Staubo <a...@purefiction.net> wrote: > Indexing on a text column: > > create index index_documents_on_id_with_pattern_ops (id > text_pattern_ops); > > This works fine: > > > explain select id from documents where id like 'dingbat%'; > Index Only Scan using index_documents_on_id_with_pattern_ops on > documents (cost=0.56..8.58 rows=736 width=19) > Index Cond: ((id >= 'dingbat'::text) AND (id < 'dingbau'::text)) > Filter: (id ~~ 'dingbat%'::text) > > But for some reason, if an underscore character appears in my search > string, it falls back to a disasterously slow seqscan: > > > explain select id from documents where id like '_dingbat%'; > Seq Scan on documents (cost=0.00..779238.28 rows=736 width=19) > Filter: (id ~~ '_dingbat%'::text) > > Is this because of PostgreSQL’s collation system? Using “C” doesn’t work > either. > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > Underscore in like pattern have a special meaning of "any symbol". >From documentation on the https://www.postgresql.org/docs/9.4/static/functions-matching.html : "An underscore (_) in pattern stands for (matches) any single character;" This could be useful as well: "To match a literal underscore or percent sign without matching other characters, the respective character in pattern must be preceded by the escape character." -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com "People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage."
Re: [GENERAL] Insert only table and size of GIN index JSONB field.
On Wed, May 4, 2016 at 3:45 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Tue, May 3, 2016 at 3:09 AM, Maxim Boguk <maxim.bo...@gmail.com> wrote: > > Hi, > > > > I started with empty table with index over > > custom_fields | jsonb > > field > > defined as: > > "idx_learners_custom_fields" gin (custom_fields) > > Globally gin_pending_list_limit set to 2MB. > > Database version is 9.5.2. > > > > Now question: > > If table populated with 1M records in single transaction then the final > size > > of the GIN index is: > > 4265 MB > > but after I performed reindex index idx_learners_custom_fields; > > the index size had been reduced 15x to 295 MB. > > > > Is this behavior expected? > > This sounds like a known issue, fixed in 9.6, where the pages used for > the pending list do not eligible for recycling until the table is > vacuumed. Autovacuum does not run on insert only tables, so they just > accumulate as empty pages in the index. > > Cheers, > > Jeff > Hi Jeff, Yes it's look like a cause. Is there any workaround possible for 9.5.2? At this moment I see palliative (because it doesn't reduce already bloated index) cure via performing manual vacuum on the table after each batch insert or very slow (for the large table) full scale fix via create new index/drop old index. -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com "People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage."
Re: [GENERAL] Insert only table and size of GIN index JSONB field.
On Tue, May 3, 2016 at 8:09 PM, Maxim Boguk <maxim.bo...@gmail.com> wrote: > Hi, > > I started with empty table with index over > custom_fields | jsonb > field > defined as: > "idx_learners_custom_fields" gin (custom_fields) > Globally gin_pending_list_limit set to 2MB. > Database version is 9.5.2. > > Now question: > If table populated with 1M records in single transaction then the final > size of the GIN index is: > 4265 MB > but after I performed reindex index idx_learners_custom_fields; > the index size had been reduced 15x to 295 MB. > > Is this behavior expected? > It's look easy to reproduce. I can send a sample dataset for analyze. drop table if exists test; create table test (custom_fields jsonb); create index test_gin_key on test USING GIN(custom_fields); insert into test select custom_fields from public.learners; INSERT 0 100 \di+ test_gin_key List of relations Schema | Name | Type | Owner | Table | Size | Description +--+---+--+---+-+- public | test_gin_key | index | postgres | test | 4211 MB | reindex index test_gin_key; \di+ test_gin_key List of relations Schema | Name | Type | Owner | Table | Size | Description +--+---+--+---++- public | test_gin_key | index | postgres | test | 295 MB | -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com "People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage."
[GENERAL] Insert only table and size of GIN index JSONB field.
Hi, I started with empty table with index over custom_fields | jsonb field defined as: "idx_learners_custom_fields" gin (custom_fields) Globally gin_pending_list_limit set to 2MB. Database version is 9.5.2. Now question: If table populated with 1M records in single transaction then the final size of the GIN index is: 4265 MB but after I performed reindex index idx_learners_custom_fields; the index size had been reduced 15x to 295 MB. Is this behavior expected? -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com "People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage."
[GENERAL] 2x Time difference between first and subsequent run of the same query on fresh established connection (on linux, with perf data included, all query data in the shared buffers) on postgresql
Hi, I found a strange case when the query (which works through large amount of shared buffers) run 2x time faster on the second and subsequent run in new connection to db. For sample: postgres@base1:~$ psql *** psql (9.3.10) ***=# explain (analyze, costs, buffers, timing) select * from transactions where "timestamp" >= '2016-02-20' and "timestamp" < '2016-02-23'::date; QUERY PLAN - Index Scan using i_transactions_timestamp on transactions (cost=0.57..138824.52 rows=1955459 width=790) (actual time=0.146..3416.477 rows=1950630 loops=1) Index Cond: (("timestamp" >= '2016-02-20 00:00:00'::timestamp without time zone) AND ("timestamp" < '2016-02-23'::date)) Buffers: shared hit=1965635 Total runtime: 3481.322 ms (4 строки) ***=# explain (analyze, costs, buffers, timing) select * from transactions where "timestamp" >= '2016-02-20' and "timestamp" < '2016-02-23'::date; QUERY PLAN - Index Scan using i_transactions_timestamp on transactions (cost=0.57..138824.52 rows=1955459 width=790) (actual time=0.030..1812.361 rows=1950630 loops=1) Index Cond: (("timestamp" >= '2016-02-20 00:00:00'::timestamp without time zone) AND ("timestamp" < '2016-02-23'::date)) Buffers: shared hit=1965635 Total runtime: 1878.503 ms And every run after it - works in 1.8-1.9s, but if I establish the new connection to database - the first query will run 3.5s again. Time difference and timing of each run pretty repeatable (+/- 100ms). There are perf report data for the first and for the second runs: The first run (something fishy with kernel calls): 19,60% postgres [kernel.kallsyms] [k] filemap_map_pages 15,86% postgres postgres [.] hash_search_with_hash_value 8,20% postgres postgres [.] heap_hot_search_buffer 8,20% postgres postgres [.] heap_page_prune_opt 5,72% postgres postgres [.] PinBuffer 4,38% postgres [kernel.kallsyms] [k] page_fault 4,04% postgres [kernel.kallsyms] [k] page_waitqueue 3,55% postgres [kernel.kallsyms] [k] __wake_up_bit 2,95% postgres postgres [.] LWLockAcquire 2,31% postgres [kernel.kallsyms] [k] unlock_page 1,96% postgres [vdso] [.] __vdso_gettimeofday 1,83% postgres [kernel.kallsyms] [k] radix_tree_next_chunk 1,77% postgres [kernel.kallsyms] [k] page_add_file_rmap 1,66% postgres postgres [.] _bt_checkkeys 1,27% postgres postgres [.] LWLockRelease The second run (look perfectly ok for such kind of query): 27,74% postgres postgres [.] hash_search_with_hash_value 15,51% postgres postgres [.] heap_hot_search_buffer 14,28% postgres postgres [.] heap_page_prune_opt 9,22% postgres postgres [.] PinBuffer 3,94% postgres [vdso] [.] __vdso_gettimeofday 3,32% postgres postgres [.] _bt_checkkeys 3,29% postgres postgres [.] LWLockAcquire 2,86% postgres postgres [.] LWLockRelease 1,54% postgres postgres [.] UnpinBuffer So it's looks like that something strange going inside linux kernel 3.16.0 memory managment (it's vanilla kernel on the bare hardware, no virtualization, swap off). Question is: it's work as expected (and in that case probably good idea use pgbouncer even for one-shot analytical queries), or it's sign of the potential issues with the ykernel? -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com "People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage."
[GENERAL] How to analyze locking issues of the startup process on hot standby replica?
Hi, I analyze a strange case with the startup process on replica spend almost all time in waiting state: "postgres: startup process recovering 00014A3E00BF waiting" As a result recovery seriously lagged behind master (in range of hours). Replica have hot_standby=on and almost always have some long running (in range of minutes-ten minutes) select queries. I expected to find some not granted locks in pg_locks on the standby server, but select * from pg_locks where not granted; show no results. Is there any way to look into the startup process and see what it waiting for (less invasive than gdb). PS: there are no DDL routinely run on the master (so there no need accessexclusivelock on standby to apply ddl). Kind Regards, Maksym -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com "People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage."
[GENERAL] Weird behaviour in planner (PostgreSQL v 9.2.14)
Hi, I found very weird behaviour on planner side with estimation error of 700.000.000. Situation (with explain analyze): EXPLAIN ANALYZE select * from person2obj WHERE p2o_id IN (SELECT p2o_id::bigint FROM (SELECT * FROM (SELECT column1 AS p2o_id FROM ( VALUES ('2056892'), up to 199 values total ) AS __CDP_VALUES__) AS __CDP_DATA__) AS __TARGET__ ); ; QUERY PLAN Nested Loop (cost=2.99..16316.34 rows=199 width=58) (actual time=0.196..1.202 rows=198 loops=1) -> HashAggregate (cost=2.99..4.98 rows=199 width=32) (actual time=0.160..0.205 rows=199 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..2.49 rows=199 width=32) (actual time=0.003..0.088 rows=199 loops=1) -> Index Scan using pk_person2obj on person2obj (cost=0.00..81.96 rows=1 width=58) (actual time=0.004..0.004 rows=1 loops=199) Index Cond: (p2o_id = ("*VALUES*".column1)::bigint) Estimate looks pretty reasonable. However, with length of the value list 200 (or more), the database switch to completely different (and very weird) estimation of 700.000.000: QUERY PLAN Nested Loop (cost=3.00..16398.33 rows=714143698 width=58) (actual time=0.200..1.239 rows=200 loops=1) -> HashAggregate (cost=3.00..5.00 rows=200 width=32) (actual time=0.165..0.201 rows=200 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..2.50 rows=200 width=32) (actual time=0.004..0.090 rows=200 loops=1) -> Index Scan using pk_person2obj on person2obj (cost=0.00..81.96 rows=1 width=58) (actual time=0.004..0.004 rows=1 loops=200) Index Cond: (p2o_id = ("*VALUES*".column1)::bigint) The all estimates looks ok until the final nested loop plan estimate of 700.000.000 PS: the person2obj table contains ~1.4 billion tuples, p2o_id - primary key. -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/> Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com "People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage."
Re: [GENERAL] After configuring remote access,server can't be started
On Sun, Nov 22, 2015 at 8:54 PM, Alex Luya <alexander.l...@gmail.com> wrote: > My postgresql 9.4 is installed in centos 6.7,and I have followed this: > > >> http://www.cyberciti.biz/faq/postgresql-remote-access-or-connection/ > > > 1,cd /var/libpgsql/9.4/data > > 2,cp postgresql.conf.sample postgresql.conf > > 3,sudo vi postgresql.conf and add two lines,and save it: > > >> *listen_addresses = "*"tcpip_socket = true* > > There are no such option as tcpip_socket anymore (it had been removed sometime around 2005 year). So very likely your server doesn't start because your config file not valid anymore. -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/> Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com "People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage."
Re: [GENERAL] Grant SELECT/Execute to View/Function but not underlying Table
On Thu, Sep 24, 2015 at 9:28 PM, Alex Magnum <magnum11...@gmail.com> wrote: > Hi, > is it possible to grant select to views and functions without the need to > also grant the user the SELECT privileges to the Tables used in the views > or functions? > > That way I could create read only users on a website and limit their > access to the bare minimum. > > Thanks in advance for any advise on this > > Alex > Hi, For functions it's possible (read about SECURITY DEFINER), for view no it isn't possible (view is query text stored in database for future use and nothing more). -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>
Re: [GENERAL] Curious case of huge simple btree indexes bloat.
On Mon, Jun 1, 2015 at 3:18 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: On the one of databases under my support I found very curious case of the almost endless index bloat (index size stabilises around 100x of the original size). The table have 5 indexes and they all have the same bloating behaviour (growth to almost 100x and stabilisation around that amount). An original index size 4-8Mb (after manual reindex), over time of the 5 days they all monotonically growth to 300-900MB. In the same time table size staying pretty constant at 30-50Mb (and amount of rows in the same don't vary widely and stays between 200k and 500k). At least for the index you gave stats for, it seems like it's stabilizing at one index entry per page. This is a known possible pathological behavior if the application's usage involves heavy decimation of original entries; say, you insert sequential timestamps and then later remove all but every one-thousandth one, leaving at most one live entry on every index page. Btree can recover the totally-empty leaf pages but it has no provision for merging non-empty leaf pages, so those all stay as they are indefinitely. It would be pretty unusual for all the indexes on a table to be used like that, though. regards, tom lane Thank you very much for an explanation. This table are part of the complicated 3-tables session info structure with a lot of short living sessions and some very long living. And most used id's are bigserials. So yet every index field on that table have the same bad behaviour. -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/
[GENERAL] Curious case of huge simple btree indexes bloat.
Hi, On the one of databases under my support I found very curious case of the almost endless index bloat (index size stabilises around 100x of the original size). Graph of one index size history attached (other indexes have an similar time/size graphs). The table have 5 indexes and they all have the same bloating behaviour (growth to almost 100x and stabilisation around that amount). An original index size 4-8Mb (after manual reindex), over time of the 5 days they all monotonically growth to 300-900MB. In the same time table size staying pretty constant at 30-50Mb (and amount of rows in the same don't vary widely and stays between 200k and 500k). The table have large amount of the inserts/update/deletes, but autovacuum tuned to be pretty aggressive and I sure that there are no long transactions (longer then few minutes). Also there are no standby replica with hot_standby=on and no prepared transactions used, and not batch deletes/inserts/updates used. The server have plenty of RAM (database fit into shared buffers), IO and CPU available so there are no visible resource starvation. Background information: The PostgreSQL version 9.4.2 64 bit on Linux. Table structure: \d+ clientsession Table public.clientsession Column | Type |Modifiers| Storage | Stats target | Description -+--+-+--+--+- globalsessionid | bigint | not null default nextval('clientsession_globalsessionid_seq'::regclass) | plain | | deviceuid | text | | extended | | localsessionid | bigint | | plain| | createddate | timestamp with time zone | | plain| | lastmodified| timestamp with time zone | | plain| | keypairid | bigint | | plain| | sessiondataid | bigint | | plain| | Indexes: clientsession_pkey PRIMARY KEY, btree (globalsessionid) CLUSTER clientsession_ukey UNIQUE CONSTRAINT, btree (deviceuid, localsessionid) clientsession_keypairid_key btree (keypairid) clientsession_sessiondataid_key btree (sessiondataid) clientsession_uduid_localid_idx btree (upper(deviceuid), localsessionid) Foreign-key constraints: clientsession_keypair_fkey FOREIGN KEY (keypairid) REFERENCES keypair(id) ON DELETE CASCADE clientsession_sessiondata_id FOREIGN KEY (sessiondataid) REFERENCES sessiondata(id) ON DELETE CASCADE Referenced by: TABLE remotecommand CONSTRAINT remotecommand_clientsessionid_fkey FOREIGN KEY (clientsessionid) REFERENCES clientsession(globalsessionid) ON DELETE CASCADE Options: fillfactor=50, autovacuum_vacuum_scale_factor=0.01 Results of pgstatindex for one of bloated indexes: select * from pgstatindex('clientsession_pkey'); -[ RECORD 1 ]--+-- version| 2 tree_level | 2 index_size | 552640512 root_block_no | 290 internal_pages | 207 leaf_pages | 67224 empty_pages| 0 deleted_pages | 29 avg_leaf_density | 1.08 leaf_fragmentation | 3.02 List of current index sizes (they stabilized 1 day ago): \di+ clientsession* List of relations Schema | Name | Type | Owner | Table | Size | Description +-+---+-+---++- public | clientsession_keypairid_key | index | phoenix | clientsession | 545 MB | public | clientsession_pkey | index | phoenix | clientsession | 527 MB | public | clientsession_sessiondataid_key | index | phoenix | clientsession | 900 MB | public | clientsession_uduid_localid_idx | index | phoenix | clientsession | 254 MB | public | clientsession_ukey | index | phoenix | clientsession | 254 MB | I never seen such behaviour on other databases and all my attempts to get this index bloat under control have no effect. If anyone have any ideas (even crazy ones) - welcome. -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru
Re: [GENERAL] Documentation bug?
On Tue, May 19, 2015 at 7:39 PM, Thomas Kellerer spam_ea...@gmx.net wrote: Hello all, I just noticed that you can do something like this (using 9.4.1): select array[1,2,3] - 3 which is doing the same thing as: select array_remove(array[1,2,3],3) but the minus is not documented as an array operator: http://www.postgresql.org/docs/current/static/functions-array.html Is that an oversight in the documentation or isn't the minus supposed to work like that in the first place? You very likely have an intarray extension installed: http://www.postgresql.org/docs/9.4/interactive/intarray.html int[] - int operator documented in the extension documentation as it's a part of the extension but not part of the PostgreSQL core. -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [GENERAL] moving to PostgreSQL from MS-SQL and from Oracle, looking for feature comparison information
On Sun, May 10, 2015 at 12:30 PM, Yuri Budilov yuri.budi...@hotmail.com wrote: MANY THANKS to everyone who replied ! Keep up great work! more things (critical for very large and mission critical databases) - database row/page compression - it looks to me that there is no page/block compression available on PostgreSQL 9.4 along the lines of MS-SQL/Oracle row/page compression features? I realize that there is some compression of individual varchar/text data type columns but there is nothing like a complete row compression, index page compression and page/dictionary compression? Is that correct? Yes that's correct. Only individual field compression supported (for fields longer that 2Kb usually). database and transaction log backup compression? not available? Transaction log backup compression not available (however could be easily archived via external utilities like bzip2). Both built-in backup utilities (pg_dump and pg_basebackup) support compression. - recovery from hardware or software corruption - suppose I am running a mission critical database (which is also relatively large, say 1TB) and I encounter a corruption of some sort (say, due to hardware or software bug) on individual database pages or a number of pages in a database How do I recover quickly and without losing any transactions? MS-SQL and Oracle can restore individual pages (or sets of pages) or restore individual database files and then allow me to roll forward transaction log to bring back every last transaction. It can be done on-line or off-line. How do I achieve the same in PostgreSQL 9.4? One solution I see may be via complete synchronous replication of the database to another server. I am but sure what happens to the corrupt page(s) - does it get transmitted corrupt to the mirror server so I end up with same corruption on both databases or is there some protection against this? It's depend where a corruption happen, if pages become corrupted due to some problems with physical storage (filesystem) in that case a replica data should be ok. There are no facility to recover individual database files and/or page ranges from base backup and roll forward the transaction log (not even offline). From my practice using a PostgreSQL for the terabyte scale and/or mission-critical databases definitely possible but require very careful design and planning (and good hardware). Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Melbourne, Australia Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [GENERAL] moving to PostgreSQL from MS-SQL and from Oracle, looking for feature comparison information
Hi Yuri, I will try answer your questions one by one. In the meantime, I have scanned the manual for PostgreSQL 9.4 and there are a few things I was not able to find in the manual, my apologies if I missed it: 1. does PostgreSQL have parallel query capability like MS-SQL 2008+ and Oracle 11g+ ? Or does a single query only run on 1 CPU? The latest stable version will execute a single query on single CPU. Next version (9.5) will have some limited parallel query execution facilities. 2. does PostgreSQL have ability to apply query optimizer hints to individual queries - such as use a particular index, join type, join order, plan guides, etc ? No it's not available. In most (but not all) cases PostgreSQL query optimizer will select reasonable good plan. 3. does PostgreSQL have Column-Store capability? In community version - no, but there are some external addons available which add column storage (however a bit limited). 4. does PostgreSQL have anything resembling Microsoft SQL Server Profiler Trace or Extended Events ? No, but statistical views in 9.2+ provides pretty good overview about what's going on the database (especially useful could be pg_stat_statements: http://www.postgresql.org/docs/9.4/interactive/pgstatstatements.html ) 5. does PostgreSQL have a database backup capability such that not a single transaction is lost in case of hardware failure? Some of our target databases are several TeraBytes in size with several hundred concurrent connections and transactions are financial in their nature. So we need extremely robust backup/restore capability, 100% on-line. Yep PostgreSQL could use built-in synchronous replication with zero committed transaction lost after failover. 6. does PostgreSQL support NUMA on Intel based X64 servers and does it support Hyper-Threading ? No NUMA support. Yes PostgreSQL will work on HT enabled servers (will it be efficient - depend on workload and CPU type). 7. does PostgreSQL support in-memory store (similar to Oracle 12c in-memory and SQL Server 2014 in-memory OLTP) ? No. 8. does PostgreSQL have temporary tables support? Yes full temporary tables support since beginning. Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting. com / http://www.postgresql-consulting.com/ Melbourne, Australia Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [GENERAL] Improving performance of merging data between tables
On Wed, Jan 7, 2015 at 8:49 PM, Pawel Veselov pawel.vese...@gmail.com wrote: PPPS: and the last suggestion, after you finished with the write all the data into its own tables, then application should perform analyze of these own tables (or you could have weird/inefficient plans during last stage). Any references to back this up? I don't particularly mind doing it, but I wonder if analysis can be more expensive the processing. These tables get a few hundreds of records inserted/updated, then are entirely processed (with expected full scans), and then deleted... If these own tables used only in full table selects but never used in joins - than there should be no issues. However, once you start join these tables with anything else, you could have very inefficient/weird plans because the database doesn't know (without analyze) how many rows you have in these tables. PS: your setup look pretty complicated and hard to analyze without seeing all involved table structures, transaction/query flow, and (especially) involved procedures source code. Sure :) At this point, I've put together the bulk merge code as well. I can't quite see much of a difference, actually, but it's hard to trust the execution times, as on the same amount of data they vary from, say, 0.5s to 2s, and the sample data is not stepping on any other locks. In general, I'm afraid all those left joins and multiple scans, even over small amount of data, is nullifying any positive effect. Now some ideas to check. The high CPU usage usually isn't related to locking, but related to seq scan or wrong plans or simple inefficient pl/pgsql code, locked processes usually doesn't use too much cpu. 1)on the test database perform select pg_stat_reset(); then perform full round of merges, then check select * from pg_stat_user_tables where seq_scan0 order by seq_tup_read; and if you find a lot of seq_scan and seq_tuple_reads on the particular table try find where they coming from (it could be reason for high CPU usage). 2)enable track_functions in postgresql.conf and perform the same sequence (select pg_stat_reset() + full round of merges ) then check select * FROM pg_stat_user_functions order by self_time desc; and check which function using the most time. 3)old/lost prepared transactions can have deadly effect on the database performance at whole. So check select * from pg_prepared_xact(); and verify that you don't have a hours (or weeks) old prepared xact lying around. PS: btw I still don't fully understood relation between the: - merges data into its own node tables (using merge_xxx PL/pgSQL functions) and provided code for the public.merge_all02-9A-46-8B-C1-DD and PUBLIC.merge_agrio. As I see public.merge_all02-9A-46-8B-C1-DD calling PUBLIC.merge_agrio, and the PUBLIC.merge_agrio updates a global table R_AGRIO (but not the own node table). I think the best implementation of such task is asynchronous processing of this changes via background process. An application only inserts events into queue table (it lockless process), and some background process read these data from queue table and merge it into main table (again lockless because it single thread so no concurrent writes), and then delete the merged data from queue table. -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [GENERAL] Improving performance of merging data between tables
On Wed, Dec 31, 2014 at 11:10 AM, Pawel Veselov pawel.vese...@gmail.com wrote [skipped] 2) try pg_stat_statements, setting pg_stat_statements.track = all. see: http://www.postgresql.org/docs/9.4/static/pgstatstatements.html I have used this to profile some functions, and it worked pretty well. Mostly I use it on a test box, but once ran it on the live, which was scary, but worked great. That looks promising. Turned it on, waiting for when I can turn the server at the next quiet time. I have to say this turned out into a bit of a disappointment for this use case. It only measures total time spent in a call. So, it sends up operations that waited a lot on some lock. It's good, but it would be great if total_time was provided along with wait_time (and io_time may be as well, since I also see operations that just naturally have to fetch a lot of data) 1) pg_stat_statements provide an information about io_time of each statement but you should have track_io_timing enabled for that. 2) About locking I suggest enable log_lock_waits and set deadlock_timeout to say 100ms (just for testing purposes), and than any lock waiting more than 100ms will be logged with some useful additional info. PS: your setup look pretty complicated and hard to analyze without seeing all involved table structures, transaction/query flow, and (especially) involved procedures source code. PPS: btw, please check the database logs for deadlocks messages, your setup around and then call a pgsql function to merge the data from its tables into the common tables part could be easily deadlock prone. PPPS: and the last suggestion, after you finished with the write all the data into its own tables, then application should perform analyze of these own tables (or you could have weird/inefficient plans during last stage). -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [GENERAL] Drastic select count performance hit when jsonb GIN indices are present
- Bitmap Heap Scan on myevents (cost=35.80..3615.09 rows=3716 width=0) (actual time=351.510..77669.907 rows=1417152 loops=1) Recheck Cond: (event @ '{event: delivered}'::jsonb) Heap Blocks: exact=298362 Buffers: shared hit=1 read=298589 ... Execution time: 80986.340 ms - Bitmap Heap Scan on myevents (cost=42.80..3622.09 rows=3716 width=0) (actual time=534.816..78526.944 rows=1417152 loops=1) Recheck Cond: (event @ '{event: delivered}'::jsonb) Heap Blocks: exact=298362 Buffers: shared hit=383 read=299133 ... Execution time: 81898.578 ms Hi Anton, What you see there (i think) - it's a performance hit of random disk read for non-cached database. Try increase a shared buffers to value when table and index could fit into, and redo queries few time until you see something like Buffers: shared hit=bigvalue read=0 and compare performance, it might change timing quite a lot. Also, I recommend set track_io_timing=on in postgresql.conf and after it use explain (analyze, buffers, timing) to see check how much time database spent doing IO operations. Also try perform vacuum analyze myevents; before testing because it seems that you have no up to date visibility map on the table. However, even in fully cached case selecting 40% on the table rows almost always will be faster via sequential scan, so I don't expect miracles. -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [GENERAL] Drastic select count performance hit when jsonb GIN indices are present
Getting back to my original point - you pointed out that for queries that need a decent % of the table it will be cheaper to do a scan, which is exactly what the query planner does for the relational version. If it only needs a small % of the values it looks at the index and for a large % it goes for a scan (it also puts everything in shared buffers and is lightening quick!). Is this just a lack of maturity in the jsonb planner or am I missing something? Hi Anton, Good selectivity estimators exists only for the scalar data types. For the complex data types such as json/jsonb introducing a reasonable selectivity estimator is very complicated task, so database could only guess in this cases. In your case the database guessed amount of returned rows with 3 order of magnitude error (estimated 3716 rows, actually 1417152 rows). Personally, I don't expect serious progress in json/jsonb selectivity estimators in short future, so better to avoid using a low-selectivity queries against indexed json/jsonb fields. -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
[GENERAL] Question about forced immediate checkpoints during create database
Hi everyone, I have few question about checkpoints during create database. First just extract from log on my test database 9.2.4: 2013-11-12 03:48:31 MSK 1717 @ from [vxid: txid:0] [] LOG: checkpoint starting: immediate force wait 2013-11-12 03:48:31 MSK 1717 @ from [vxid: txid:0] [] LOG: checkpoint complete: wrote 168 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.314 s, sync=0.146 s, total=0.462 s; sync files=104, longest=0.040 s, average=0.001 s 2013-11-12 03:48:32 MSK 1717 @ from [vxid: txid:0] [] LOG: checkpoint starting: immediate force wait 2013-11-12 03:48:32 MSK 1717 @ from [vxid: txid:0] [] LOG: checkpoint complete: wrote 6 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.311 s, sync=0.002 s, total=0.315 s; sync files=6, longest=0.000 s, average=0.000 s 2013-11-12 03:48:32 MSK 13609 postgres@hh_data from [local] [vxid:502/0 txid:0] [CREATE DATABASE] LOG: duration: 1160.409 ms statement: create database _tmp; So during creating of database two immediate force checkpoints was performed. Now questions: 1)Why these checkpoints performed at all? I understood why checkpoint performed during drop database (to clean shared buffers from the dropped db data), but why issue checkpoint during create database? 2)Why two checkpoints performed one after one? 3)Is there any good way to perform spread checkpoint during create database (similar to --checkpoint=spread for the pg_basebackup) ? I'm ready to wait 30 min for create database in that case... I asking because performing immediate checkpoint on the large heavy loaded database - good recipe for downtime (IO become overloaded to point of the total stall)... Is there any workaround for this problem? 4)Is idea to add an option for create/drop database syntax to control checkpoint behaviour sounds reasonable? Kind Regards, Maksym -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [GENERAL] Question about PostgreSQL logging configuration
I want log all 'mod' statements with their execution times and all statements longer than 10ms (also with their execution times). You cannot combine things as you want. However, it seems a fairly minor loss - why would you care about how fast sub-10ms mods ran? Trouble if I try that way, than database log for mod statements over 10ms duration will be written in wrong style: 2012-11-30 08:06:28 MSK [vxid:229/5138506 txid:1345713884] [INSERT] LOG: duration: 10.158 ms 2012-11-30 08:06:28 MSK [vxid:325/5420118 txid:0] [UPDATE] LOG: execute S_5: update applicant_adv_subscription ... instead of more usual way: 2012-11-30 08:08:21 MSK [vxid:307/5334684 txid:0] [SELECT] LOG: duration: 16.085 ms execute S_42: select ... So at end I getting logs which are incompatible with pgFouine (and with my custom log analyzer as well). Is there any serious reason why log_duration = on uses so strange output style? I could not think any reasonable case for use log_duration = on without log_statements = all. And log_duration = on with log_statements = all produce results which are equivalent to log_min_duration_statement=0, but log_min_duration_statement=0 produce logs which are much more easy to parse and read in general. Is there anyone who consciously used log_duration = on without log_statements = all ? And if yes - what for? I might be very wrong, but log_duration seems close to useless knob in reality. Kind Regards, Maksym -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Is it possible to create row-wise indexable condition for special case...
Hi, I know that for condition like ((field1value1) or (field1=value1 and field2value2)) I could built index on (field1, field2) and use indexable condition like (field1, field2) (value1, value2). However, I have very tricky query which requires an indexable condition like (field1value1) or (field1=value1 and field2value2). I guess that I could use index definition like ON(field1, field2 DESC) however, I can not build query condition which would use that index effectively. Is it possible and my mind just went black on simple task? PS: field2 is varchar type so I couldn't using the negative values trick build index ON(field1, (-field2)) and write something like WHERE (field1, -field2) (value1, -value2). -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
[GENERAL] Inefficient plan selected by PostgreSQL 9.0.7
index (sb_messages_special4_key) will read the exactly same amount of rows from the table as scan over sb_messages_special3_key. And very likely scan over related index will win. What I can do to fight that issue (I looking to keep both indexes on that table for fast queries with different ordering). -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [GENERAL] Inefficient plan selected by PostgreSQL 9.0.7
On Wed, May 2, 2012 at 2:50 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: I got very inefficient plan for a simple query. It looks like the problem is with the estimate of the antijoin size: - Nested Loop Anti Join (cost=0.00..24576.82 rows=1 width=206) (actual time=0.043..436.386 rows=20761 loops=1) that is, only about 20% of the rows in sb_messages are eliminated by the NOT EXISTS condition, but the planner thinks that nearly all of them will be (and that causes it to not think that the LIMIT is going to affect anything, so it doesn't prefer a fast-start plan). Since you've not told us anything about the statistics of these tables, it's hard to speculate as to why the estimate is off. regards, tom lane Hi, Is there any particular stat data what I need provide except these two: SELECT * from pg_stats where tablename='users' and attname='blocked'; -[ RECORD 1 ]-+ schemaname| public tablename | users attname | blocked inherited | f null_frac | 0 avg_width | 1 n_distinct| 2 most_common_vals | {f,t} most_common_freqs | {0.573007,0.426993} histogram_bounds | correlation | 0.900014 and SELECT schemaname,tablename,attname,inherited,null_frac,avg_width,n_distinct,correlation from pg_stats where tablename='sb_messages' and attname='from_user'; -[ RECORD 1 ] schemaname | public tablename | sb_messages attname | from_user inherited | f null_frac | 0 avg_width | 4 n_distinct | 103473 correlation | 0.512214 (most_common_vals, most_common_freqs and histogram_bounds is very long values from default_statistics_target=1000, top most_common_freqs is only 0.00282333). Kind Regards, Maksym
Re: [GENERAL] Inefficient plan selected by PostgreSQL 9.0.7
On Wed, May 2, 2012 at 2:50 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: I got very inefficient plan for a simple query. It looks like the problem is with the estimate of the antijoin size: - Nested Loop Anti Join (cost=0.00..24576.82 rows=1 width=206) (actual time=0.043..436.386 rows=20761 loops=1) that is, only about 20% of the rows in sb_messages are eliminated by the NOT EXISTS condition, but the planner thinks that nearly all of them will be (and that causes it to not think that the LIMIT is going to affect anything, so it doesn't prefer a fast-start plan). Since you've not told us anything about the statistics of these tables, it's hard to speculate as to why the estimate is off. regards, tom lane Most interesting part that NOT EXISTS estimates way off, when LEFT JOIN WHERE ... IS NULL esimated correctly: good esitmate (estimated rows=20504 vs real rows=20760): Game2=# EXPLAIN ANALYZE SELECT * FROM sb_messages messages_tbl LEFT JOIN users users_tbl ON users_tbl.id = messages_tbl.from_user WHERE messages_tbl.type IN (0, 9) AND messages_tbl.visibility_status = 0 AND messages_tbl.not_show_on_air = 'f' AND messages_tbl.clan_id IS NULL AND users_tbl.blocked IS DISTINCT FROM 't'; QUERY PLAN --- Nested Loop Left Join (cost=0.00..24577.74 rows=20504 width=1037) (actual time=0.045..532.012 rows=20760 loops=1) Filter: (users_tbl.blocked IS DISTINCT FROM true) - Index Scan using sb_messages_special3_key on sb_messages messages_tbl (cost=0.00..3793.75 rows=35784 width=208) (actual time=0.019..67.746 rows=24937 loops=1) - Index Scan using sb_users_pkey on users users_tbl (cost=0.00..0.53 rows=1 width=829) (actual time=0.007..0.009 rows=1 loops=24937) Index Cond: (users_tbl.id = messages_tbl.from_user) Total runtime: 563.944 ms bad estimate (estimated 1 vs real rows=20760): Game2=# EXPLAIN (ANALYZE, COSTS) SELECT * FROM sb_messages messages_tbl WHERE (messages_tbl.type IN (0, 9) AND messages_tbl.visibility_status=0 AND messages_tbl.not_show_on_air='f' AND messages_tbl.clan_id IS NULL) AND NOT EXISTS (SELECT 1 FROM users users_tbl WHERE blocked='t' and users_tbl.id = messages_tbl.from_user); QUERY PLAN --- Nested Loop Anti Join (cost=0.00..24488.28 rows=1 width=208) (actual time=0.044..430.645 rows=20760 loops=1) - Index Scan using sb_messages_special3_key on sb_messages messages_tbl (cost=0.00..3793.75 rows=35784 width=208) (actual time=0.020..67.810 rows=24937 loops=1) - Index Scan using sb_users_pkey on users users_tbl (cost=0.00..0.53 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=24937) Index Cond: (users_tbl.id = messages_tbl.from_user) Filter: users_tbl.blocked Total runtime: 461.296 ms What is curious that not exists always perform 20% faster (I performed both explains like 10 times each and each time not exits is close to 20% faster). -- Maxim Boguk Senior Postgresql DBA.
[GENERAL] Postgresql 9.0.7 weird planner decision (rows in plan close to reality but plan suboptimal)
Hi, Today on one of databases under my management I found very strange plan for simple query. Postgresql 9.0.7 on Linux, random_page_cost=4 seq_page_cost=1 The query and plan: db=# EXPLAIN (ANALYZE, COSTS, BUFFERS) select obj_id, obj_commented,p2o_id FROM blog_post as obj JOIN person2obj ON p2o_obj_obj_id = obj_id JOIN person2obj_counters ON p2oc_id = p2o_id WHERE obj_status_did = 1 AND obj_commented IS NOT NULL AND obj_commented now() - '7days'::interval AND obj_commented p2o_notified AND p2o_notify = 't'; QUERY PLAN Merge Join (cost=6546.54..6584.69 rows=3 width=24) (actual time=86.262..1349.266 rows=1770 loops=1) Merge Cond: (person2obj_counters.p2oc_id = person2obj.p2o_id) Buffers: shared hit=1140491 - Index Scan using pk_person2obj_counters on person2obj_counters (cost=0.00..47110.95 rows=1212591 width=8) (actual time=0.008..997.948 rows=1212765 loops=1) Buffers: shared hit=1108452 - Sort (cost=6546.42..6546.98 rows=221 width=24) (actual time=85.877..88.373 rows=7870 loops=1) Sort Key: person2obj.p2o_id Sort Method: quicksort Memory: 807kB Buffers: shared hit=32039 - Nested Loop (cost=0.00..6537.82 rows=221 width=24) (actual time=0.097..80.129 rows=7870 loops=1) Buffers: shared hit=32039 - Index Scan using i_blog_post_commented_active on blog_post obj (cost=0.00..225.73 rows=1726 width=16) (actual time=0.028..17.957 rows=6010 loops=1) Index Cond: ((obj_commented IS NOT NULL) AND (obj_commented (now() - '7 days'::interval))) Buffers: shared hit=6207 - Index Scan using i_person2obj_obj_notified_subscribed on person2obj (cost=0.00..3.64 rows=1 width=24) (actual time=0.006..0.009 rows=1 loops=6010) Index Cond: ((person2obj.p2o_obj_obj_id = obj.obj_id) AND (obj.obj_commented person2obj.p2o_notified)) Buffers: shared hit=25832 Total runtime: 1349.767 ms I don't understand why database choose merge join with 1.2M entries table. person2obj_counters have an index on p2oc_id (it's a primary key field). Switch to fast inner loop plan could be managed with set random_page_cost=10 db=# set random_page_cost to 10; SET db=# EXPLAIN (ANALYZE, COSTS, BUFFERS) select obj_id, obj_commented,p2o_id FROM blog_post as obj JOIN person2obj ON p2o_obj_obj_id = obj_id JOIN person2obj_counters ON p2oc_id = p2o_id WHERE obj_status_did = 1 AND obj_commented IS NOT NULL AND obj_commented now() - '7days'::interval AND obj_commented p2o_notified AND p2o_notify = 't'; QUERY PLAN -- Nested Loop (cost=0.00..14810.38 rows=3 width=24) (actual time=16.910..115.110 rows=1758 loops=1) Buffers: shared hit=57403 - Nested Loop (cost=0.00..14616.37 rows=221 width=24) (actual time=0.088..82.342 rows=7858 loops=1) Buffers: shared hit=32046 - Index Scan using i_blog_post_commented_active on blog_post obj (cost=0.00..273.70 rows=1725 width=16) (actual time=0.029..16.260 rows=6009 loops=1) Index Cond: ((obj_commented IS NOT NULL) AND (obj_commented (now() - '7 days'::interval))) Buffers: shared hit=6222 - Index Scan using i_person2obj_obj_notified_subscribed on person2obj (cost=0.00..8.30 rows=1 width=24) (actual time=0.007..0.010 rows=1 loops=6009) Index Cond: ((person2obj.p2o_obj_obj_id = obj.obj_id) AND (obj.obj_commented person2obj.p2o_notified)) Buffers: shared hit=25824 - Index Scan using pk_person2obj_counters on person2obj_counters (cost=0.00..0.87 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=7858) Index Cond: (person2obj_counters.p2oc_id = person2obj.p2o_id) Buffers: shared hit=25357 Total runtime: 115.465 ms (14 rows) Ok... so what I see... the database think it will need join 221 rows from previous level to the person2obj_counters table. And somehow Pg manage to calculate that the full index scan over 1.2M entries of the person2obj_counters table is faster that nested loop probes over 221 value. What look very suspicious is that merge full index scan+merge join part adds only 40 points to the total cost ( Merge Join (cost=6546.54..6584.69 rows=3 width=24) (actual time=86.262..1349.266 rows=1770 loops=1) ... - Index Scan using pk_person2obj_counters on person2obj_counters (cost=0.00..47110.95 rows=1212591 width=8) (actual time=0.008..997.948 rows=1212765 loops=1) ... - Sort (cost=6546.42..6546.98 rows=221 width=24) (actual time=85.877..88.373 rows=7870 loops=1) )... how that could be? -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU
[GENERAL] Why checkpoint_timeout had maximum value of 1h?
Hi all, Is there any real reason why checkpoint_timeout limited to 1hour? In my case I have some replicas with WAL on SAS raid and PGDATA on SSD with limited write endurance. And I don't worry about possible long time recovery after power failure in that case. Whats more working dataset fill in shared buffers, so almost no dirty buffers evictions by bgwriter or backends happened. In that case having checkpoint_timeout=10hour could reduce amout of writes on SSD by factor of 10, and increase planned ssd lifetime by the same amount. I would like to have ability to set checkpoint_timeout=high value and (whats even better) checkpoint_timeout=0 - in that case checkpoint happen when all checkpoint_segments were used. Is there any serious drawbacks in that idea? Is it safe to increase that limit in source and rebuild database? (9.0 and 9.1 case) -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
[GENERAL] Question about warning: invalid resource manager ID 128 at ... on hot stanby
Hi all. Database version used 9.0.4 on FreeBSD 7.3. Today ,after restart of replica db, I got the next warning in log: 2012-03-23 03:10:08.221 MSK 55096 @ from [vxid:1/0 txid:0] []LOG: invalid resource manager ID 128 at 44E/4E7303B0 I searched over mailing lists but I still not sure is it harmless or no... Full log looks like: 2012-03-23 03:08:46.465 MSK 38622 @ from [vxid: txid:0] []LOG: received fast shutdown request 2012-03-23 03:08:46.465 MSK 38622 @ from [vxid: txid:0] []LOG: aborting any active transactions 2012-03-23 03:08:46.465 MSK 38627 @ from [vxid: txid:0] []FATAL: terminating walreceiver process due to administrator command 2012-03-23 03:08:46.693 MSK 38718 @ from [vxid: txid:0] []LOG: shutting down 2012-03-23 03:08:46.696 MSK 38718 @ from [vxid: txid:0] []LOG: database system is shut down 2012-03-23 03:08:53.850 MSK 55096 @ from [vxid: txid:0] []LOG: database system was shut down in recovery at 2012-03-23 03:08:46 MSK 2012-03-23 03:08:53.874 MSK 55096 @ from [vxid: txid:0] []LOG: entering standby mode 2012-03-23 03:08:53.905 MSK 55096 @ from [vxid:1/0 txid:0] []LOG: redo starts at 44E/114060E8 2012-03-23 03:10:08.221 MSK 55096 @ from [vxid:1/0 txid:0] []LOG: consistent recovery state reached at 44E/4E7303B0 2012-03-23 03:10:08.221 MSK 55096 @ from [vxid:1/0 txid:0] []LOG: invalid resource manager ID 128 at 44E/4E7303B0 2012-03-23 03:10:08.222 MSK 55093 @ from [vxid: txid:0] []LOG: database system is ready to accept read only connections 2012-03-23 03:10:08.239 MSK 56317 @ from [vxid: txid:0] []LOG: streaming replication successfully connected to primary Is that warning harmless on 9.0.4 or should I start to worry about? -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
[GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again
One of servers under my support 2 days ago produced the next error: ERROR: could not read block 135 in file base/16404/118881486: read only 0 of 8192 bytes Server version 9.0.6. No db or server crashes or db recovery happen on that server since setup. Server is db backend for very large web project under quite a heavy load (10k+ request per second to db). No other strange errors found in the logs since server was put to production half year ago. Now what happened (from DB log): 2012-02-17 22:35:58 MSK 14333 [vxid:340/1822 txid:2341883282] [DELETE] LOG: duration: 5.669 ms execute unnamed: delete from agency_statistics 2012-02-17 22:35:58 MSK 14333 [vxid:340/1823 txid:0] [BIND] LOG: duration: 2.787 ms bind unnamed: insert into agency_statistics (employer_id, area_id, area_pa th, professional_area_id, vacancies_number) select e.employer_id, vb.area_id, a.path, s.professional_area_id, count(distinct v.vacancy_id) from vacancy v inner join employer e on v.employer _id = e.employer_id inner join vacancy_body vb on v.vacancy_body_id = vb.vacancy_body_id inner join vacancy_body_specialization vs on vb.vacancy_body_id = vs.vacancy_body_id inner join spec ialization s on s.specialization_id = vs.specialization_id inner join area a on vb.area_id = a.area_id where v.archived_status in (0) and v.disabled = false and e.category = $1 and e.state = $2 and e.manager_id 0 group by e.employer_id, vb.area_id, a.path, s.professional_area_id 2012-02-17 22:35:59 MSK 14333 [vxid:340/1823 txid:2341883336] [INSERT] LOG: duration: 726.700 ms execute unnamed: insert into agency_statistics (employer_id, area_id, area_path, professional_area_id, vacancies_number) select e.employer_id, vb.area_id, a.path, s.professional_area_id, count(distinct v.vacancy_id) from vacancy v inner join employer e on v.employer_id = e.employer_id inner join vacancy_body vb on v.vacancy_body_id = vb.vacancy_body_id inner join vacancy_body_specialization vs on vb.vacancy_body_id = vs.vacancy_body_id inner join specialization s on s.specialization_id = vs.specialization_id inner join area a on vb.area_id = a.area_id where v.archived_status in (0) and v.disabled = false and e.category = $1 and e.state = $2 and e.manager_id 0 group by e.employer_id, vb.area_id, a.path, s.professional_area_id 2012-02-17 22:35:59 MSK 14333 [vxid:340/18230070 txid:0] [BIND] LOG: duration: 30.195 ms bind S_35: insert into agency_statistics (employer_id, area_id, area_path, professional_area_id, vacancies_number, rank) values ($1, $2, $3, $4, $5, 0) 2012-02-17 22:35:59 MSK 14333 [vxid:340/18230082 txid:2341883427] [INSERT] ERROR: could not read block 135 in file base/16404/118881486: read only 0 of 8192 bytes 2012-02-17 22:35:59 MSK 14333 [vxid:340/18230082 txid:2341883427] [INSERT] STATEMENT: insert into agency_statistics (employer_id, area_id, area_path, professional_area_id, vacancies_number, rank) values ($1, $2, $3, $4, $5, 0) No concurrent writes was happened to that table outside of that connection. May be autovacuum were fired on that table between delete and first or second inserts (unfortunately it wasn't logged). I recreated table from scratch and keep the damaged table under another name (through alter table agency_statistics rename to agency_statistics_old). So I have files to dig into. What I see in file system: hh=# SELECT relfilenode from pg_class where relname='agency_statistics_old'; relfilenode - 118881486 postgres@db10:~/tmp$ ls -la /var/lib/postgresql/9.0/main/base/16404/118881486 -rw--- 1 postgres postgres 0 2012-02-20 12:04 /var/lib/postgresql/9.0/main/base/16404/118881486 So table file size zero bytes (seems autovacuum truncated that table to 0 bytes). But index show: hh=# SELECT relfilenode from pg_class where relname='agency_statistics_pkey'; relfilenode - 118881489 postgres@db10:~/tmp$ ls -la /var/lib/postgresql/9.0/main/base/16404/118881489 -rw--- 1 postgres postgres 4956160 2012-02-20 12:07 /var/lib/postgresql/9.0/main/base/16404/118881489 hh=# SELECT * from pgstatindex('agency_statistics_pkey'); -[ RECORD 1 ]--+ version| 2 tree_level | 2 index_size | 4947968 root_block_no | 295 internal_pages | 1 leaf_pages | 601 empty_pages| 0 deleted_pages | 1 avg_leaf_density | 0.45 leaf_fragmentation | 13.14 I out of ideas now. Any suggestions where and what I should look next? Kind Regards, Maksym -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again
So table file size zero bytes (seems autovacuum truncated that table to 0 bytes). Hmmm something did, but I see no clear evidence that it was autovacuum. Do you know why the mod date on the file is 2012-02-20 12:04? That's more than two days after the error in your logs, so it's not clear to me that the current state of the file tells us much about what happened on the 17th. If autovacuum had truncated the table then, and the table wasn't touched otherwise, the file mod date shouldn't have increased. regards, tom lane Hi, Cron was attempt to populate the table once per hour after that problem happened. And each time it was produced the same error. So table had attempted write activity after error happen: ( 2012-02-18 00:36:01 MSK 27743 hh.app@hh from [local] [vxid:95/19965879 txid:2342867908] [INSERT] ERROR: could not read block 171 in file base/16404/118881486: read only 0 of 8192 bytes 2012-02-18 00:36:01 MSK 27743 hh.app@hh from [local] [vxid:95/19965879 txid:2342867908] [INSERT] STATEMENT: insert into agency_statistics (employer_id, area_id, area_path, professional_area_id, vacancies_number) select e.employer_id, vb.area_id, a.path, s.professional_area_id, count(distinct v.vacancy_id) from vacancy v inner join employer e on v.employer_id = e.employer_id inner join vacancy_body vb on v.vacancy_body_id = vb.vacancy_body_id inner join vacancy_body_specialization vs on vb.vacancy_body_id = vs.vacancy_body_id inner join specialization s on s.specialization_id = vs.specialization_id inner join area a on vb.area_id = a.area_id where v.archived_status in (0) and v.disabled = false and e.category = $1 and e.state = $2 and e.manager_id 0 group by e.employer_id, vb.area_id, a.path, s.professional_area_id ... and so on until Monday when I was informed about that problem. ). For information the table has quite simple structure without any unusual features: hh=# \d+ agency_statistics_old Table public.agency_statistics_old Column| Type | Modifiers | Storage | Description --++---+--+- employer_id | integer| not null | plain| area_id | integer| not null | plain| area_path| character varying(255) | not null | extended | professional_area_id | integer| not null | plain| vacancies_number | integer| not null | plain| rank | integer| | plain| normalised_rank | integer| | plain| Indexes: agency_statistics_pkey PRIMARY KEY, btree (employer_id, area_id, professional_area_id) CLUSTER Has OIDs: no Options: fillfactor=50 I almost sure I will get an error if I try insert something to the agency_statistics_old. Should I perform that test? Kind Regards, Maxim
Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again
On Tue, Feb 21, 2012 at 1:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: Do you know why the mod date on the file is 2012-02-20 12:04? Cron was attempt to populate the table once per hour after that problem happened. And each time it was produced the same error. That's interesting ... is there any possibility that the insertions were attempting to insert values that matched a previously-existing primary key value? I'm thinking there's no reason for the INSERT per se to be touching nonexistent blocks, but if for some reason the pkey index still had entries pointing at vanished rows (as it seems to) then the errors could be coming from uniqueness checks attempting to fetch those rows to see if they're live. regards, tom lane Hi, There isn't possibility but close to 100% new inserted values were matched a previously-existing primary key value. The table is hand-made 'materialyzed view'-type statistic table which is getting recalculated via cron. -- Maxim Boguk Senior Postgresql DBA.
Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again
On Tue, Feb 21, 2012 at 1:46 PM, Maxim Boguk maxim.bo...@gmail.com wrote: On Tue, Feb 21, 2012 at 1:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: Do you know why the mod date on the file is 2012-02-20 12:04? Cron was attempt to populate the table once per hour after that problem happened. And each time it was produced the same error. That's interesting ... is there any possibility that the insertions were attempting to insert values that matched a previously-existing primary key value? I'm thinking there's no reason for the INSERT per se to be touching nonexistent blocks, but if for some reason the pkey index still had entries pointing at vanished rows (as it seems to) then the errors could be coming from uniqueness checks attempting to fetch those rows to see if they're live. regards, tom lane Hi, There isn't possibility but close to 100% new inserted values were matched a previously-existing primary key value. The table is hand-made 'materialyzed view'-type statistic table which is getting recalculated via cron. To be clear - the new inserted values do match a previously-existing primary key values almost always. Sorry for not being clear.
Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again
OK, so that pretty much explains where the visible symptoms are coming from: somehow, the table got truncated but its pkey index did not get cleared out. So an insert creates an empty page zero, inserts a heap tuple there, tries to insert an index entry. The btree code sees there is an index entry for that key already, and tries to fetch the heap tuple for that index entry to see if it's dead (which would allow the insertion to proceed). But the block number the index is pointing at isn't there, so you get the quoted error message. The insertion rolls back, leaving a dead tuple that can be garbage-collected by autovacuum, after which it truncates the table again --- but of course without removing any index entries, except maybe one for TID (0,1) if that's still there. Lather rinse repeat. But this brings us little closer to understanding the cause of the problem. How could the table have gotten truncated without vacuuming out the index? Are you sure the only operations happening on that table are INSERT, DELETE, autovacuum? No DDL of any kind? regards, tom lane Not what I know. There are could be some updates as well, but they performed by the same cron as initial delete and they doesn't happened in that case because the cron script exit on the first database error. E.g. no one in my knowledge could write into that table except that single cron and autovacuum. And I have logged of the all queries with duration over 1ms. Kind Regards, Maksym
Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again
On Tue, Feb 21, 2012 at 3:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: OK, so that pretty much explains where the visible symptoms are coming from: somehow, the table got truncated but its pkey index did not get cleared out. So an insert creates an empty page zero, inserts a heap tuple there, tries to insert an index entry. The btree code sees there is an index entry for that key already, and tries to fetch the heap tuple for that index entry to see if it's dead (which would allow the insertion to proceed). But the block number the index is pointing at isn't there, so you get the quoted error message. The insertion rolls back, leaving a dead tuple that can be garbage-collected by autovacuum, after which it truncates the table again --- but of course without removing any index entries, except maybe one for TID (0,1) if that's still there. Lather rinse repeat. Hmm ... actually there is a point that this theory doesn't explain entirely. If the probability of a collision with an existing index entry was near 100%, then each hourly cron job should only have been able to insert one or a few heap tuples before failing. That would not trigger an autovacuum right away. Eventually the number of dead tuples would build up to the point where autovacuum got interested, but it strains credulity a bit to assume that this happened exactly after the last hourly run before you renamed the table. Yet, if that didn't happen just that way, how come the size of the table is exactly zero now? The theory would be more satisfactory if we could expect that an hourly run would be able to insert some thousands of tuples before failing, enough to trigger an autovacuum run. So I'm wondering if maybe the index is *partially* cleaned out, but not completely. Does this materialized view have a fairly predictable number of rows, and if so how does that compare to the number of entries in the index? (If you have no other way to get the number of entries in the index, try inserting a dummy row, deleting it, and then VACUUM VERBOSE.) regards, tom lane There is some funny results: hh=# VACUUM verbose agency_statistics_old; INFO: vacuuming public.agency_statistics_old INFO: index agency_statistics_pkey now contains 0 row versions in 605 pages DETAIL: 0 index row versions were removed. 595 index pages have been deleted, 595 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: agency_statistics_old: found 0 removable, 0 nonremovable row versions in 0 out of 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.01 sec. VACUUM However when I try populate that table with production data I get an error: hh=# insert into agency_statistics_old select * from agency_statistics; ERROR: could not read block 228 in file base/16404/118881486: read only 0 of 8192 bytes E.g. the database see that index have zero rows, but an insert still fail. May be I should use pageinspect addon to see an actual index pages content? -- Maxim Boguk Senior Postgresql DBA.
Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again
On Tue, Feb 21, 2012 at 3:47 PM, Maxim Boguk maxim.bo...@gmail.com wrote: On Tue, Feb 21, 2012 at 3:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: OK, so that pretty much explains where the visible symptoms are coming from: somehow, the table got truncated but its pkey index did not get cleared out. So an insert creates an empty page zero, inserts a heap tuple there, tries to insert an index entry. The btree code sees there is an index entry for that key already, and tries to fetch the heap tuple for that index entry to see if it's dead (which would allow the insertion to proceed). But the block number the index is pointing at isn't there, so you get the quoted error message. The insertion rolls back, leaving a dead tuple that can be garbage-collected by autovacuum, after which it truncates the table again --- but of course without removing any index entries, except maybe one for TID (0,1) if that's still there. Lather rinse repeat. Hmm ... actually there is a point that this theory doesn't explain entirely. If the probability of a collision with an existing index entry was near 100%, then each hourly cron job should only have been able to insert one or a few heap tuples before failing. That would not trigger an autovacuum right away. Eventually the number of dead tuples would build up to the point where autovacuum got interested, but it strains credulity a bit to assume that this happened exactly after the last hourly run before you renamed the table. Yet, if that didn't happen just that way, how come the size of the table is exactly zero now? The theory would be more satisfactory if we could expect that an hourly run would be able to insert some thousands of tuples before failing, enough to trigger an autovacuum run. So I'm wondering if maybe the index is *partially* cleaned out, but not completely. Does this materialized view have a fairly predictable number of rows, and if so how does that compare to the number of entries in the index? (If you have no other way to get the number of entries in the index, try inserting a dummy row, deleting it, and then VACUUM VERBOSE.) regards, tom lane There is some funny results: hh=# VACUUM verbose agency_statistics_old; INFO: vacuuming public.agency_statistics_old INFO: index agency_statistics_pkey now contains 0 row versions in 605 pages DETAIL: 0 index row versions were removed. 595 index pages have been deleted, 595 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: agency_statistics_old: found 0 removable, 0 nonremovable row versions in 0 out of 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.01 sec. VACUUM However when I try populate that table with production data I get an error: hh=# insert into agency_statistics_old select * from agency_statistics; ERROR: could not read block 228 in file base/16404/118881486: read only 0 of 8192 bytes E.g. the database see that index have zero rows, but an insert still fail. May be I should use pageinspect addon to see an actual index pages content? What makes Your idea about: index is *partially* cleaned out, but not completely highly probable that is the next query produce no error: hh=# insert into agency_statistics_old select * from agency_statistics limit 1; INSERT 0 1 -- Maxim Boguk Senior Postgresql DBA.
Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again
On Tue, Feb 21, 2012 at 4:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: There is some funny results: hh=# VACUUM verbose agency_statistics_old; INFO: vacuuming public.agency_statistics_old INFO: index agency_statistics_pkey now contains 0 row versions in 605 pages DETAIL: 0 index row versions were removed. Wow. That seems to blow my theory to small pieces. If the index contains no entries then it shouldn't be causing any uniqueness check probes. But at the same time, if the index is empty then how come pgstatindex showed avg_leaf_density = 0.45 ? May be I should use pageinspect addon to see an actual index pages content? That or pg_filedump would be interesting. But your experiments with adding data from the other table will probably have produced some new index entries, which will confuse the situation. Did you save a physical copy of the index before that? Another idea is to attach to the backend with gdb, set a breakpoint at errfinish, and get a stack trace from the point of the could not read block error. That would show definitively if this is coming from a uniqueness check or something else entirely. regards, tom lane Yes I had saved a physical copy of the file before start playing with it. Unfortunately that is a production server with no gcc and gdb available so pg_filedump or gdb yet (but I going to work on it). While I waiting for gdb/gcc on that server I had built pg_filedump on the development server using same postgresql version and created pg_filedump of the index file. It can be downloaded there: http://maximboguk.com/static/etc/agency_statistics_pkey.pg_filedump.gz I have not enough knowledge of the b-tree index structure to extract any useful information from that file. Kind Regards, Maksym
Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again
On Tue, Feb 21, 2012 at 5:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: While I waiting for gdb/gcc on that server I had built pg_filedump on the development server using same postgresql version and created pg_filedump of the index file. It can be downloaded there: http://maximboguk.com/static/etc/agency_statistics_pkey.pg_filedump.gz If the index key values are not private information, could we see that with pg_filedump -i -f not just bare? regards, tom lane There is it: http://maximboguk.com/static/etc/agency_statistics_pkey.pg_full_filedump.gz -- Maxim Boguk Senior Postgresql DBA.
Re: [GENERAL] Question about (probably wrong) index scan cost for conditional indexes
On Mon, Jan 30, 2012 at 12:02 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: Seems previous test case not clear demonstrate the problem which i have stuck with. Now much better and close to reality test case: AFAICT, these behaviors all boil down to the fact that contrib/intarray doesn't provide a real cost estimator for its operator. It's using the contsel stub function, which provides a fixed selectivity of 0.001. In your test case, with 100 rows in the table, the estimate for the number of rows satisfying sections '{2}' thus comes out to exactly 1000. Unfortunately, the true number is around 10, and it's that discrepancy that is leading to all of these bad cost estimates. What I'd like to see done about this is for somebody to adapt the work Jan Urbanski did on tsvector stats collection and estimation so that it works for the anyarray operators. It's a bit too late to imagine that that'll get done for 9.2, but maybe for 9.3. regards, tom lane Hi, Thank you very much for the answer. I know there is issue with statistics over intarrays (it was there very long time and sometime it's complicating things a lot). However, the 100x cost difference between: SELECT * from test order by id limit 100; (over primary key (id) btree index) Limit (cost=0.00..3.43 rows=100 width=37) vs SELECT * from test where sections '{2}' order by value limit 100; (over test_value_in2section_key on test(value) where sections '{2}' btree index) Limit (cost=0.00..539.29 rows=100 width=37) seems wrong for me. Both queries performs the absolutely same task: fetch 100 entries from the table based on the ideally suitable index (no post processing/filtering were done at all... just return 100 sorted tuples based on single index scan). I don't understand where 2x+ order of cost difference come from. And even if I drop the intarray index completely, than I still have a wrong plan (bitmap scan + sort), because planner cost for the index scan over conditional index 100 more the it should be. (e.g. there is still an issue even in absence of the intarray index). Is absence of frequency statistics over intarrays somehow linked to the wrong planner cost estimates for conditional index scan? King Regards, Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Question about (probably wrong) index scan cost for conditional indexes
Hi. Seems previous test case not clear demonstrate the problem which i have stuck with. Now much better and close to reality test case: Preparation: set random_page_cost to 4; set seq_page_cost to 1; create table test (id integer primary key, sections integer[], value float); insert into test select id, ('{'||((random()*10)::integer)||'}')::integer[] as value, random() as value from generate_series(1,100) as g(id); --generic gist index for array CREATE INDEX test_sections_gist on test using gist(sections); --specialized index on value for sections '{2}' CREATE INDEX test_value_in2section_key on test(value) where sections '{2}'; analyze test; Now actual tests: Good query but cost definitely wrong: postgres=# EXPLAIN ANALYZE SELECT * from test where sections '{2}' order by value limit 100; QUERY PLAN --- Limit (cost=0.00..539.29 rows=100 width=37) (actual time=0.043..0.499 rows=100 loops=1) - Index Scan using test_value_in2section_key on test (cost=0.00..5392.87 rows=1000 width=37) (actual time=0.040..0.434 rows=100 loops=1) Total runtime: 0.570 ms Compare with almost equivalent query: postgres=# EXPLAIN ANALYZE SELECT * from test order by id limit 100; QUERY PLAN --- Limit (cost=0.00..3.43 rows=100 width=37) (actual time=0.057..0.192 rows=100 loops=1) - Index Scan using test_pkey on test (cost=0.00..34317.36 rows=100 width=37) (actual time=0.054..0.115 rows=100 loops=1) Total runtime: 0.258 ms Actual speed almost same but cost differs 100 times. Now if I increase the limit I start getting slow plans because it switch to GIST index and bitmap scan (because cost of common index scan too high): postgres=# EXPLAIN ANALYZE SELECT * from test where sections '{2}' order by value limit 1000; QUERY PLAN - Limit (cost=2941.68..2944.18 rows=1000 width=37) (actual time=175.301..175.766 rows=1000 loops=1) - Sort (cost=2941.68..2944.18 rows=1000 width=37) (actual time=175.298..175.541 rows=1000 loops=1) Sort Key: value Sort Method: top-N heapsort Memory: 127kB - Bitmap Heap Scan on test (cost=56.48..2891.85 rows=1000 width=37) (actual time=80.230..132.479 rows=99641 loops=1) Recheck Cond: (sections '{2}'::integer[]) - Bitmap Index Scan on test_sections_gist (cost=0.00..56.23 rows=1000 width=0) (actual time=78.112..78.112 rows=99641 loops=1) Index Cond: (sections '{2}'::integer[]) Total runtime: 175.960 ms (9 rows) Even if I drop GIST index i'm still getting wrong plan: postgres=# drop index test_sections_gist; DROP INDEX postgres=# EXPLAIN ANALYZE SELECT * from test where sections '{2}' order by value limit 1000; QUERY PLAN -- Limit (cost=4489.88..4492.38 rows=1000 width=37) (actual time=116.637..117.088 rows=1000 loops=1) - Sort (cost=4489.88..4492.38 rows=1000 width=37) (actual time=116.635..116.857 rows=1000 loops=1) Sort Key: value Sort Method: top-N heapsort Memory: 127kB - Bitmap Heap Scan on test (cost=1604.68..4440.05 rows=1000 width=37) (actual time=22.175..74.556 rows=99641 loops=1) Recheck Cond: (sections '{2}'::integer[]) - Bitmap Index Scan on test_value_in2section_key (cost=0.00..1604.43 rows=1000 width=0) (actual time=20.248..20.248 rows=99641 loops=1) Total runtime: 117.261 ms And only if I completely disable bitmap scan I get good fast plan (but with exceptional high cost): postgres=# set enable_bitmapscan to 0; SET postgres=# EXPLAIN ANALYZE SELECT * from test where sections '{2}' order by value limit 1000; QUERY PLAN Limit (cost=0.00..5392.87 rows=1000 width=37) (actual time=0.047..4.123 rows=1000 loops=1) - Index Scan using test_value_in2section_key on test (cost=0.00..5392.87 rows=1000 width=37) (actual time=0.044..3.552 rows=1000 loops=1) Total runtime: 4.460 ms I hope that test case will make my issue more clear. Regards, Maksym On Mon, Jan 23, 2012 at 11:46 AM, Maxim Boguk maxim.bo...@gmail.com wrote: On Mon, Jan 23, 2012 at 11
[GENERAL] Question about (probably wrong) index scan cost for conditional indexes
I not sure it is bug or just planner work that way. Postgresql 9.1.2 on Linux. But it seems that index scan cost for very narrow/selective conditional indexes is greatly overestimated at least in some cases. In my case I have an special conditional index like: news_dtime_in208section_active_key2 btree (dtime) WHERE status = 1 AND class::text = 'Sports::News'::text AND sections '{208}'::integer[] And query: db=# EXPLAIN ANALYZE select * from news where (status = 1) and (class = 'Sports::News') and (sections '{208}') order by dtime limit 10; QUERY PLAN - Limit (cost=0.00..26.38 rows=10 width=1262) (actual time=0.040..0.082 rows=10 loops=1) - Index Scan using news_dtime_in208section_active_key2 on news (cost=0.00..1429.55 rows=542 width=1262) (actual time=0.038..0.073 rows=10 loops=1) Total runtime: 0.142 ms (3 rows) I see no reasons why cost of that query that high... i think it should be very close equvalent in cost of query: news_pkey PRIMARY KEY, btree (id) db=# EXPLAIN ANALYZE select * from news order by id limit 10; QUERY PLAN --- Limit (cost=0.00..0.33 rows=10 width=1262) (actual time=0.043..0.085 rows=10 loops=1) - Index Scan using news_pkey on news (cost=0.00..25944.34 rows=775090 width=1262) (actual time=0.041..0.077 rows=10 loops=1) Total runtime: 0.147 ms (3 rows) (and if you compare real execution times - they are same but cost is different by 2 orders). No changes of costing setting have an effect that difference. That problem leads to switching to very slow plan for medium limits: db=# EXPLAIN ANALYZE select * from news where (status = 1) and (class = 'Sports::News') and (sections '{208}') order by dtime limit 40; QUERY PLAN Limit (cost=91.97..92.07 rows=40 width=1262) (actual time=630.865..630.889 rows=40 loops=1) - Sort (cost=91.97..93.32 rows=542 width=1262) (actual time=630.862..630.872 rows=40 loops=1) Sort Key: dtime Sort Method: top-N heapsort Memory: 89kB - Bitmap Heap Scan on news (cost=6.18..74.83 rows=542 width=1262) (actual time=145.816..412.254 rows=262432 loops=1) Recheck Cond: ((sections '{208}'::integer[]) AND (status = 1) AND ((class)::text = 'Sports::News'::text)) - Bitmap Index Scan on news_sections_gin2_special (cost=0.00..6.05 rows=542 width=0) (actual time=98.954..98.954 rows=262754 loops=1) Index Cond: (sections '{208}'::integer[]) Total runtime: 632.049 ms (9 rows) Kind regards, Maksym -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [GENERAL] Question about (probably wrong) index scan cost for conditional indexes
On Mon, Jan 23, 2012 at 11:28 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: But it seems that index scan cost for very narrow/selective conditional indexes is greatly overestimated at least in some cases. I realized in connection with http://archives.postgresql.org/pgsql-general/2012-01/msg00459.php that btcostestimate is not correctly estimating numIndexTuples for partial indexes. But it's impossible to tell from this amount of information whether you're seeing an effect of that, or something else. Can you provide a self-contained test case? regards, tom lane Prorably simpliest test case: set random_page_cost to 4; set seq_page_cost to 1; drop table if exists test; CREATE TABLE test (id integer primary key, value1 float, value2 float, value3 float, value4 float); INSERT into test select id,random() as value1,random() as value2, random() as value3,random() as value4 from generate_series(1,100) as g(id); CREATE INDEX test_special_key on test(value1) where value2*20.01 and value3*20.01 and value4*20.01; ANALYZE test; postgres=# EXPLAIN ANALYZE select * from test order by id limit 100; QUERY PLAN --- Limit (cost=0.00..3.43 rows=100 width=36) (actual time=0.042..0.170 rows=100 loops=1) - Index Scan using test_pkey on test (cost=0.00..34317.36 rows=100 width=36) (actual time=0.040..0.108 rows=100 loops=1) Total runtime: 0.243 ms (3 rows) vs postgres=# EXPLAIN ANALYZE select * from test where value2*20.01 and value3*20.01 and value4*20.01 order by value1 limit 100; QUERY PLAN -- Limit (cost=0.00..92.52 rows=100 width=36) (actual time=0.072..0.072 rows=0 loops=1) - Index Scan using test_special_key on test (cost=0.00..34264.97 rows=37037 width=36) (actual time=0.070..0.070 rows=0 loops=1) Total runtime: 0.113 ms (3 rows) cost difference: (cost=0.00..3.43 rows=100 width=36) vs (cost=0.00..92.52 rows=100 width=36) An actual speed (and theoretical performance) almost same. More selective conditions added to conditional index - worse situation with wrong costing. Kind Regards, Maksym -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
[GENERAL] Question about HoT updates and conditional indexes
Lets assume I have a table with an index defined as: create index test_key on test_table(mtime) where status=1; Will be update such as: update test_table set mtime=NOW() where id=10; threated as HOT update if the entry with id=10 have status=0 ? That update seems valid one for using HOT, however in practice it seems use the full update way. -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.
[GENERAL] WITH and WITH RECURSIVE in single query
Hi. Is here any way to combine WITH and WITH RECURSIVE into single query? Something like: WITH t AS (some complicated select to speed up recursive part), RECURSIVE r AS ( ... UNION ALL ... ) ? -- Maxim Boguk Senior Postgresql DBA.
[GENERAL] Questions about setting an array element value outside of the update
Lets say i have subquery which produce array[], position and new_value Is here less clumsy way to set array[position] to the new_value (not update but just change an element inside an array) than: SELECT _array[1:pos-1] ||newval ||_array[_pos+1:array_length(_array, 1)] FROM ( SELECT _array, pos, newval FROM some_colmplicated_logic ); The: _array[1:pos-1] ||newval ||_array[_pos+1:array_length(_array, 1)] part is very clumsy for my eyes. PS: that is just small part of the complicated WITH RECURSIVE iterator in real task. -- Maxim Boguk Senior Postgresql DBA.
Re: [GENERAL] WITH and WITH RECURSIVE in single query
On Mon, Dec 5, 2011 at 2:45 PM, David Johnston pol...@yahoo.com wrote: On Dec 4, 2011, at 22:28, Maxim Boguk maxim.bo...@gmail.com wrote: Hi. Is here any way to combine WITH and WITH RECURSIVE into single query? Something like: WITH t AS (some complicated select to speed up recursive part), RECURSIVE r AS ( ... UNION ALL ... ) ? -- Maxim Boguk Senior Postgresql DBA. WITH RECURSIVE q1 As (), q2 AS () ... Add RECURSIVE after the WITH; it then applies to any/all the CTEs. Look at the specification (and description) in the SELECT documentation closely. David J. Trouble is I trying to precalculate some data through WITH syntax (non recursive). To be used later in WITH RECURSIVE part (and keep a single of that data instead of N). Something like: WITH _t AS (some complicated select to speed up recursive part), RECURSIVE r AS ( ... UNION ALL SELECT * FROM r JOIN t ON ... ) So I need have precalculated t table before I start an iterator. Now instead of _t I using record[] + unnest but that appoach very memory hungry for long iterations: WITH RECURSIVE r AS ( SELECT ... ARRAY(SELECT ROW(t.*) FROM some complicated select to speed up recursive part) as _t_array FROM ... UNION ALL SELECT ..., _t_array FROM r JOIN (unnest(_t_array) ...) ON something ) However that approach lead to having copy of the _t_array per each final row, so can use a lot of memory. PS: Yes I know about pl/pgsql but WITH RECURSIVE iterators can give 2-10 time performance gains over implemenation of the same algorythm inside pl/pgsql. -- Maxim Boguk Senior Postgresql DBA.
[GENERAL] pl/pgsql and arrays[]
Some quetions about pl/pgsql and arrays[]. Is such constructions as: RETURN NEXT array[1]; OR SELECT val INTO array[1] FROM ...; Should not work? At least documentation about RETURN NEXT says: RETURN NEXT expression; I think array[1] is a valid expression. -- Maxim Boguk Senior Postgresql DBA.
Re: [GENERAL] WITH and WITH RECURSIVE in single query
On Mon, Dec 5, 2011 at 3:15 PM, David Johnston pol...@yahoo.com wrote: On Dec 4, 2011, at 22:58, Maxim Boguk maxim.bo...@gmail.com wrote: On Mon, Dec 5, 2011 at 2:45 PM, David Johnston pol...@yahoo.com pol...@yahoo.com wrote: On Dec 4, 2011, at 22:28, Maxim Boguk maxim.bo...@gmail.com maxim.bo...@gmail.com wrote: Hi. Is here any way to combine WITH and WITH RECURSIVE into single query? Something like: WITH t AS (some complicated select to speed up recursive part), RECURSIVE r AS ( ... UNION ALL ... ) ? -- Maxim Boguk Senior Postgresql DBA. WITH RECURSIVE q1 As (), q2 AS () ... Add RECURSIVE after the WITH; it then applies to any/all the CTEs. Look at the specification (and description) in the SELECT documentation closely. David J. Trouble is I trying to precalculate some data through WITH syntax (non recursive). To be used later in WITH RECURSIVE part (and keep a single of that data instead of N). Something like: WITH _t AS (some complicated select to speed up recursive part), RECURSIVE r AS ( ... UNION ALL SELECT * FROM r JOIN t ON ... ) So I need have precalculated t table before I start an iterator. Now instead of _t I using record[] + unnest but that appoach very memory hungry for long iterations: WITH RECURSIVE r AS ( SELECT ... ARRAY(SELECT ROW(t.*) FROM some complicated select to speed up recursive part) as _t_array FROM ... UNION ALL SELECT ..., _t_array FROM r JOIN (unnest(_t_array) ...) ON something ) However that approach lead to having copy of the _t_array per each final row, so can use a lot of memory. PS: Yes I know about pl/pgsql but WITH RECURSIVE iterators can give 2-10 time performance gains over implemenation of the same algorythm inside pl/pgsql. -- Maxim Boguk Senior Postgresql DBA. Read the documentation closely, the syntax definition for WITH is precise and accurate. No matter how many queries you want to create you write the word WITH one time. If ANY of your queries require iterative behavior you put the word RECURSIVE after the word WITH. Between individual queries you may only put the name, and optional column alias, along with the required comma. As a side benefit to adding RECURSIVE the order in which the queries appear is no longer relevant. Without RECURSIVE you indeed must list the queries in order of use. David J. Thank you very much David. That work like a charm. another 30% runtime gone. -- Maxim Boguk Senior Postgresql DBA.
Re: [GENERAL] pl/pgsql and arrays[]
On Mon, Dec 5, 2011 at 3:53 PM, Pavel Stehule pavel.steh...@gmail.comwrote: Hello it work on my pc postgres=# \sf fx CREATE OR REPLACE FUNCTION public.fx() RETURNS SETOF integer LANGUAGE plpgsql AS $function$ declare g int[] = '{20}'; begin return next g[1]; return; end; $function$ postgres=# select fx(); fx 20 (1 row) regards Pavel Stehule Oh sorry. Seems I didn't tested simple cases. Error happened when you work with record[] types and return setof: create table test (id serial); insert into test select generate_series(1,10); CREATE OR REPLACE FUNCTION _test_array() RETURNS SETOF test LANGUAGE plpgsql AS $$ DECLARE _array test[]; _row test%ROWTYPE; BEGIN SELECT array(SELECT test FROM test) INTO _array; --work --_row := _array[1]; --RETURN NEXT _row; --also work --RETURN QUERY SELECT (_array[1]).*; --error --RETURN NEXT _array[1]; --error --RETURN NEXT (_array[1]); --error --RETURN NEXT (_array[1]).*; RETURN; END; $$; 2011/12/5 Maxim Boguk maxim.bo...@gmail.com: Some quetions about pl/pgsql and arrays[]. Is such constructions as: RETURN NEXT array[1]; OR SELECT val INTO array[1] FROM ...; Should not work? At least documentation about RETURN NEXT says: RETURN NEXT expression; I think array[1] is a valid expression. -- Maxim Boguk Senior Postgresql DBA. -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.
Re: [GENERAL] Problem with custom aggregates and record pseudo-type
On Fri, Dec 2, 2011 at 3:19 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: I created special custom aggregate function to append arrays defined as: CREATE AGGREGATE array_accum (anyarray) ( sfunc = array_cat, stype = anyarray, initcond = '{}' ); On arrays of common types it work without any problems: SELECT array_accum(i) from (values (ARRAY[1,2]), (ARRAY[3,4])) as t(i); array_accum - {1,2,3,4} (1 row) However once I try use it with record[] type I get an error: SELECT array_accum(i) from (values (ARRAY[row(1,2),row(2,3)]), (ARRAY[row(1,2),row(2,3)])) as t(i); ERROR: cannot concatenate incompatible arrays Hm ... it looks like this case can be made to work with a simple adjustment to getTypeIOData, but in the meantime you might be able to get away with leaving the initial value as NULL (ie, leave off the initcond clause). The only behavioral difference would be that you'd get NULL not an empty array for zero rows of input. regards, tom lane Thank you very much for an idea, you suggestiong work excellent as usual. And again thank you for commiting a fix. Kind Regards, Maksym
[GENERAL] Problem with custom aggregates and record pseudo-type
I created special custom aggregate function to append arrays defined as: CREATE AGGREGATE array_accum (anyarray) ( sfunc = array_cat, stype = anyarray, initcond = '{}' ); On arrays of common types it work without any problems: SELECT array_accum(i) from (values (ARRAY[1,2]), (ARRAY[3,4])) as t(i); array_accum - {1,2,3,4} (1 row) However once I try use it with record[] type I get an error: SELECT array_accum(i) from (values (ARRAY[row(1,2),row(2,3)]), (ARRAY[row(1,2),row(2,3)])) as t(i); ERROR: cannot concatenate incompatible arrays DETAIL: Arrays with element types record[] and record are not compatible for concatenation. The base function of the aggregate: array_cat work with record[] without any complains: SELECT array_cat(ARRAY[row(1,2),row(2,3)], ARRAY[row(1,2),row(2,3)]); array_cat --- {(1,2),(2,3),(1,2),(2,3)} What I doing wrong? Or how to create correct version of such aggregate function? -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.
[GENERAL] Quite a fast lockless vacuum full implemenation
reports. -- Maxim Boguk Senior Postgresql DBA. Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все. clear_table_tail.sql Description: Binary data vacuum_table.pl Description: Binary data -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Some strange bug with drop table with slony cluster
DB version: PostgreSQL 8.3.6 (under linux) no server/db crashes happen before. Server was slave in slony replication. Now problem: table was unsubscribed from replication (without any errors) and then dropped from master without any errors But when i try drop table from slave i got very strange error: hh=# drop TABLE metro_station_old; ERROR: area_pk is an index hh=# \d+ metro_station_old Table public.metro_station_old Column | Type | Modifiers | Description --+---+---+- metro_station_id | integer | not null | city_id | integer | not null | metro_line_id| integer | not null | city_district_id | integer | not null | name | character varying(64) | not null | image_point_x| integer | | image_point_y| integer | | acronym_line | character varying(10) | default ''::character varying | colocation | integer | | Foreign-key constraints: metro_station_ibfk_1 FOREIGN KEY (city_id) REFERENCES area(area_id) metro_station_ibfk_2 FOREIGN KEY (metro_line_id) REFERENCES metro_line(metro_line_id) metro_station_ibfk_3 FOREIGN KEY (city_district_id) REFERENCES tables_to_drop.city_district(city_district_id) Has OIDs: no hh=# \d+ area_pk Index public.area_pk Column | Type | Description -+-+- area_id | integer | primary key, btree, for table public.area complete independent things. Also: hh=# ALTER TABLE metro_station_old drop constraint metro_station_ibfk_1; ERROR: area_pk is an index hh=# ALTER TABLE metro_station_old drop constraint metro_station_ibfk_2; ERROR: metro_line_pk is an index hh=# ALTER TABLE metro_station_old drop constraint metro_station_ibfk_3; ERROR: city_district_pk is an index First thing witch i think it is corrupted cache in shared memory or something like... but fresh restart of db server do not change situation. second thing is possible broken system indexes... but reindex system hh; not help too hh=# SELECT oid from pg_class where relname='metro_station_old'; oid --- 17542 (1 row) hh=# SELECT oid from pg_class where relname='area_pk'; oid --- 18933 (1 row) Now i out of ideas. Main thing what made me worry is second slave show same issue. So that is not single random error. Because one of slaves is pure backup i can easy experiment with him. But need any hints what to look next. PS: one additional info... i can rename table via alter without errors. But still can't drop it. -- SY, Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Issue with Inheritance and Partitioning and grants
I have one large table Partitioned via timestamp on monthly parts. And i use Inheritance on it. So structure look like: Parent table: entity_log ... And lots child tables like: entity_log_2002_01 ... Check constraints: chk_entity_log_2002_01 CHECK (ctime = '2002-01-01 00:00:00'::timestamp without time zone AND ctime '2002-02-01 00:00:00'::timestamp without time zone) Inherits: entity_log ... and so on entity_log_2009_06 ... Business logic required past months entries must by read only. Do it on application side look impossible (way too many old spaghetti code). So after some think about alternatives i choose do it via limiting grants on past months tables. So i removed write access from application user on all old entity_log_* tables (on test system). And testers found unexpected bug: 12:15:18 MSD b...@billing 67842 ERROR: permission denied for relation entity_log_2002_01 12:15:18 MSD b...@billing 67842 STATEMENT: UPDATE entity_log SET usermsg = 'some text' WHERE id = 46248962 really entry with id=46248962 located in entity_log_2009_06 table where write grants set (e.g. only entity_log_2009_06 table need be updated). I not sure are it is bug or intentional. Probably would be good first check where entries to update located and check grants only after (I not sure about possibility learn PostgreSQL do such tricks). I really look for some working solution of my problem. I cannot solve this issue via adding 'on update' trigger on entity_log table. -- SY, Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Postgresql selecting strange index for simple query
without time zone) Total runtime: 1386.074 ms (4 rows) Thanks for help. -- SY, Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgresql selecting strange index for simple query
Tom Lane wrote: Maxim Boguk mbo...@masterhost.ru writes: Somehow postgres think index scan on singlecolumn index slower comparing to scan on 4th field of 4column index. It does know better than that. I'm wondering if the single-column index has become very bloated or something. Have you compared the physical index sizes? Table fresh loaded from dump on test server... So no index bloat for sure... As for comparing physical sizes, right single column index indeed smaller then wrong one: Right index: hh=# SELECT pg_size_pretty(pg_relation_size('resume_last_change_time_idx')); pg_size_pretty 125 MB (1 row) Wrong index: hh=# SELECT pg_size_pretty(pg_relation_size('resume_user_id_disabled_is_finished_last_change_time_idx')); pg_size_pretty 226 MB (1 row) Regards, Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgresql selecting strange index for simple query
Maxim Boguk mbo...@masterhost.ru writes: Tom Lane wrote: It does know better than that. I'm wondering if the single-column index has become very bloated or something. Have you compared the physical index sizes? Table fresh loaded from dump on test server... So no index bloat for sure... As for comparing physical sizes, right single column index indeed smaller then wrong one: Huh. I get sane-looking choices when I try a similar case here. Can you put together a self-contained test case? Not full self-contained test case but some minimal setup data which can give you ideas whats going wrong: Test confirm my theory wrong index selection linked with long rows in table. My tests contain such queries: Ok TEST1: set random_page_cost=1; drop TABLE if exists test_table ; SELECT user_id,last_change_time,rpad('a',1,'b') as f1 into test_table from resume; ANALYZE test_table; SELECT count(*) from test_table; CREATE INDEX right_idx on test_table(last_change_time); CREATE INDEX wrong_idx on test_table(user_id, last_change_time); EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time '2009-01-10 00:00:00'; result: Index Scan using right_idx on test_table (cost=0.00..42763.35 rows=388718 width=0) (actual time=0.020..342.653 rows=390370 loops=1) Index Cond: (last_change_time '2009-01-10 00:00:00'::timestamp without time zone) Total runtime: 368.699 ms Ok TEST2 (but see: cost increased 4x times when real work time increased only by 30%): same but rpad('a',200,'b') instead of rpad('a',1,'b') when populating test_table: result: Index Scan using right_idx on test_table (cost=0.00..179346.09 rows=392268 width=0) (actual time=0.089..422.439 rows=390370 loops=1) Index Cond: (last_change_time '2009-01-10 00:00:00'::timestamp without time zone) Total runtime: 448.717 ms !!Not ok TEST3:!! same but rpad('a',500,'b') instead of rpad('a',1,'b') when populating test_table: Oops wrong index used: Index Scan using wrong_idx on test_table (cost=0.00..254918.19 rows=392231 width=0) (actual time=0.067..730.097 rows=390370 loops=1) Index Cond: (last_change_time '2009-01-10 00:00:00'::timestamp without time zone) Total runtime: 757.930 ms (3 rows) (btw if drop wrong_idx query become works almost 2х faster: drop INDEX wrong_idx; Index Scan using right_idx on test_table (cost=0.00..259709.09 rows=392231 width=0) (actual time=0.019..416.108 rows=390370 loops=1) Index Cond: (last_change_time '2009-01-10 00:00:00'::timestamp without time zone) Total runtime: 442.790 ms ) Again Ok TEST4 (here toast engine removed all long values from test_table): same but rpad('a',2000,'b') instead of rpad('a',1,'b') when populating test_table: Index Scan using right_idx on test_table (cost=0.00..64606.50 rows=393002 width=0) (actual time=0.058..371.723 rows=390370 loops=1) Index Cond: (last_change_time '2009-01-10 00:00:00'::timestamp without time zone) Total runtime: 397.929 ms So i have two theory (just waving hands ofcourse): 1)integer owerflow somewhere in cost calculation 2)floating rounding errors (because cost very close in wrong situations: 254918.19 vs 259709.09) PS: second issue is cost increasing with increasing lenght of rows in table complete indepenent with increase real work time. (probably i need tune other _cost parameters for get more exact cost values) PPS: sorry for my poor english Regards, Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgresql selecting strange index for simple query
Tom Lane wrote: Maxim Boguk mbo...@masterhost.ru writes: So i have two theory (just waving hands ofcourse): 1)integer owerflow somewhere in cost calculation Costs are floats, and in any case you're not showing costs anywhere near the integer overflow limit... 2)floating rounding errors (because cost very close in wrong situations: 254918.19 vs 259709.09) The planner is intentionally set up to consider costs within a percent or so of each other as being effectively equal. If the estimated costs are that close then it doesn't surprise me if it sometimes picks the wrong plan. The real question is why are the estimates so close? They should not be, since AFAICS you are talking about a situation where we'd have to scan all of the multicol index versus only about a fifth of the single-col one. Ok i exploring more: just one thing: hh=# SHOW default_statistics_target ; default_statistics_target --- 10 (1 row) (btw increase statistic to 1000 do not fix situation). I try simplify test case and: Now use sequential user_id, and truncate last_change_time to date: SELECT nextval('test_seq') as user_id,last_change_time::date ,rpad('a',500,'b') as f1 into test_table from resume; ANALYZE test_table; SELECT count(*) from test_table; CREATE INDEX right_idx on test_table(last_change_time); CREATE INDEX wrong_idx on test_table(user_id, last_change_time); EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time '2009-01-10 00:00:00'; Index Scan using wrong_idx on test_table (cost=0.00..182623.51 rows=316522 width=0) (actual time=0.056..534.620 rows=382671 loops=1) Index Cond: (last_change_time '2009-01-10'::date) DROP INDEX wrong_idx; EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time '2009-01-10 00:00:00'; Index Scan using right_idx on test_table (cost=0.00..221765.19 rows=316522 width=0) (actual time=0.023..346.213 rows=382671 loops=1) Index Cond: (last_change_time '2009-01-10'::date) Full index scan over wrong index cost reasonable lower then 1/17 of single column index (182623 vs 221765)! So just last_change_time still cannot be generated... but: hh=# SELECT count(distinct last_change_time) from test_table; count --- 2133 (1 row) And statistic values for last_change_time is: hh=# SELECT * from pg_stats where tablename='test_table' and attname='last_change_time'; -[ RECORD 1 ]-+--- schemaname| public tablename | test_table attname | last_change_time null_frac | 0 avg_width | 4 n_distinct| 1211 most_common_vals | {2004-02-27,2009-01-26,2008-03-11,2009-01-27,2004-01-15,2008-10-13,2009-01-19,2009-01-22,2009-01-21,2008-09-29} most_common_freqs | {0.0083,0.0077,0.0073,0.007,0.0067,0.0053,0.0053,0.0053,0.005,0.0047} histogram_bounds | {2003-03-31,2005-08-26,2006-10-04,2007-04-29,2007-09-27,2008-01-24,2008-05-01,2008-07-29,2008-10-01,2008-11-27,2009-01-25} correlation | 0.261512 I think it is all what planner can use when choose plan... because user_id is unique sequential values. regargs, Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Indexes on NULL's and order by ... limit N queries
When i read about 8.3 support indexed queries on NULL values (like rubric_id is NULL) i was really happy. But reality strike again... look like NULL in WHERE don't allow effective using index on (rubric_id, pos) for queries like: ... WHERE rubric_id IS NULL ORDER BY pos LIMIT 5 Here is some details about my issue (all tests on fresh loaded/analyzed into empty 8.3.5 DB): mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where cluster_weight.rubric_id=8 order by pos limit 5; QUERY PLAN --- Limit (cost=1337.02..1337.03 rows=5 width=28) (actual time=27.556..27.575 rows=5 loops=1) - Sort (cost=1337.02..1340.77 rows=1501 width=28) (actual time=27.552..27.558 rows=5 loops=1) Sort Key: pos Sort Method: top-N heapsort Memory: 25kB - Seq Scan on cluster_weight (cost=0.00..1312.09 rows=1501 width=28) (actual time=0.058..25.008 rows=1501 loops=1) Filter: (rubric_id = 8) Total runtime: 27.638 ms (7 rows) ok so we need index on (rubric_id, pos), lets add it: mboguk_billing=# CREATE INDEX cluster_weight_2 on cluster_weight(rubric_id, pos); CREATE INDEX And try again: mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where cluster_weight.rubric_id=8 order by pos limit 5; QUERY PLAN - Limit (cost=0.00..1.70 rows=5 width=28) (actual time=0.095..0.122 rows=5 loops=1) - Index Scan using cluster_weight_2 on cluster_weight (cost=0.00..509.31 rows=1501 width=28) (actual time=0.090..0.104 rows=5 loops=1) Index Cond: (rubric_id = 8) Total runtime: 0.176 ms (4 rows) Ok... so now query works as intended... Lets check are index used on search NULL values: mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where cluster_weight.rubric_id IS NULL; QUERY PLAN -- Index Scan using cluster_weight_2 on cluster_weight (cost=0.00..1034.21 rows=26435 width=28) (actual time=0.053..48.123 rows=26435 loops=1) Index Cond: (rubric_id IS NULL) Total runtime: 85.210 ms (3 rows) Yes it is working... Now lets try main query over NULL: mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where cluster_weight.rubric_id IS NULL ORDER BY pos LIMIT 5; QUERY PLAN -- Limit (cost=1473.29..1473.30 rows=5 width=28) (actual time=92.220..92.239 rows=5 loops=1) - Sort (cost=1473.29..1539.37 rows=26435 width=28) (actual time=92.216..92.223 rows=5 loops=1) Sort Key: pos Sort Method: top-N heapsort Memory: 25kB - Index Scan using cluster_weight_2 on cluster_weight (cost=0.00..1034.21 rows=26435 width=28) (actual time=0.033..47.333 rows=26435 loops=1) Index Cond: (rubric_id IS NULL) Total runtime: 92.310 ms (7 rows) Ooops... that is surprise... I wasn't ready see that plan here... and performance difference over 1000. Looks like when indexed search over NULL's added, planner wasn't learned right way use such index on where something is NULL order by ... limit ... queries. Thanks for any responses and sorry for not so good English. -- SY, Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Indexes on NULL's and order by ... limit N queries
Sorry with all my respect to you, you look like wrong. Here example: With NULL's: mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where cluster_weight.rubric_id IS NULL ORDER BY pos LIMIT 5; QUERY PLAN --- Limit (cost=1503.75..1503.76 rows=5 width=28) (actual time=93.334..93.353 rows=5 loops=1) - Sort (cost=1503.75..1569.84 rows=26435 width=28) (actual time=93.329..93.335 rows=5 loops=1) Sort Key: pos Sort Method: top-N heapsort Memory: 25kB - Bitmap Heap Scan on cluster_weight (cost=314.32..1064.67 rows=26435 width=28) (actual time=7.519..48.678 rows=26435 loops=1) Recheck Cond: (rubric_id IS NULL) - Bitmap Index Scan on cluster_weight_2 (cost=0.00..307.72 rows=26435 width=0) (actual time=7.350..7.350 rows=26435 loops=1) Index Cond: (rubric_id IS NULL) Total runtime: 93.433 ms (9 rows) Now lets change NULL's to -1 mboguk_billing=# UPDATE cluster_weight set rubric_id=-1 where rubric_id IS NULL; UPDATE 26435 And ANALYZE mboguk_billing=# ANALYZE cluster_weight; ANALYZE And try same query with -1 instead of NULL: mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where cluster_weight.rubric_id=-1 ORDER BY pos LIMIT 5; QUERY PLAN --- Limit (cost=0.00..0.25 rows=5 width=28) (actual time=0.056..0.080 rows=5 loops=1) - Index Scan using cluster_weight_2 on cluster_weight (cost=0.00..1334.41 rows=26435 width=28) (actual time=0.053..0.065 rows=5 loops=1) Index Cond: (rubric_id = (-1)) Total runtime: 0.133 ms (4 rows) And plan become normal. So issue not with too many NULL's in my dataset. -- SY, Maxim Boguk Tom Lane wrote: Maxim Boguk [EMAIL PROTECTED] writes: Looks like when indexed search over NULL's added, planner wasn't learned right way use such index on where something is NULL order by ... limit ... queries. There's nothing wrong with the plan; you've just got too many NULLs to make it worth using the index for that. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Indexes on NULL's and order by ... limit N queries
Alvaro Herrera wrote: Maxim Boguk wrote: Sorry with all my respect to you, you look like wrong. The difference is that the plan with -1 does not need to sort the output, because it comes sorted out of the index; so the execution can be stopped as soon as 5 tuples have come out. With NULL, that can't be done. But why? NULL's have some special representation in index which don't work same as normal values? Eg output with rubric_id is NULL dont come sorted from index? Really my tests show same behavior of -1 and NULL values: mboguk_billing=# SELECT pos from cluster_weight where rubric_id=-1 limit 20; pos - 20 20 25 40 40 50 60 60 80 80 100 120 140 160 180 200 220 240 260 280 (20 rows) mboguk_billing=# UPDATE cluster_weight set rubric_id=NULL where rubric_id=-1; UPDATE 26435 mboguk_billing=# ANALYZE cluster_weight; ANALYZE mboguk_billing=# SELECT pos from cluster_weight where rubric_id is NULL limit 20; pos - 20 20 25 40 40 50 60 60 80 80 100 120 140 160 180 200 220 240 260 280 (20 rows) Eg output with rubric_id is NULL come ordered be pos from index (rubric_id, pos) ( Here is explains: mboguk_billing=# EXPLAIN ANALYZE SELECT pos from cluster_weight where rubric_id is NULL limit 20; QUERY PLAN --- Limit (cost=0.00..0.99 rows=20 width=2) (actual time=0.050..0.144 rows=20 loops=1) - Index Scan using cluster_weight_2 on cluster_weight (cost=0.00..1314.94 rows=26435 width=2) (actual time=0.045..0.082 rows=20 loops=1) Index Cond: (rubric_id IS NULL) Total runtime: 0.214 ms (4 rows) mboguk_billing=# UPDATE cluster_weight set rubric_id=-1 where rubric_id IS NULL; UPDATE 26435 mboguk_billing=# ANALYZE cluster_weight; ANALYZE mboguk_billing=# EXPLAIN ANALYZE SELECT pos from cluster_weight where rubric_id=-1 limit 20; QUERY PLAN --- Limit (cost=0.00..0.95 rows=20 width=2) (actual time=0.050..0.141 rows=20 loops=1) - Index Scan using cluster_weight_2 on cluster_weight (cost=0.00..1259.05 rows=26435 width=2) (actual time=0.045..0.081 rows=20 loops=1) Index Cond: (rubric_id = (-1)) Total runtime: 0.214 ms (4 rows) Plans look same. ) PS: REINDEX do not change situation. -- SY Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgresql optimisator deoptimise queries sometime...
) SubPlan - Aggregate (cost=849.50..849.51 rows=1 width=0) (actual time=16.072..16.073 rows=1 loops=50) - Seq Scan on table2 (cost=0.00..847.00 rows=1000 width=0) (actual time=0.020..14.599 rows=1000 loops=50) Filter: (fk = $0) Total runtime: 804.552 ms (9 rows) performance difference 20 times... :((( I think is is just missoptimisation from db side. PS: in real world query work around view: CREATE VIEW test_view as SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id) as total from table1; and i have no way put offset 0 into query select * from test_view where total=990 or total=991 or total=992 or total=993 or total=994 or total=995 or total=996 or total=997 or total=998 or total=999 or total=1000 or total=1001 or total=1002 or total=1003 or total=1004 or total=1005 or total=1006 or total=1007 or total=1008 or total=1009 or total=1010; -- Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Postgresql optimisator deoptimise queries sometime...
(*) from table2 where table2.fk=table1.id and random()0.1) as count from table1) as t1 where count900; QUERY PLAN --- Seq Scan on table1 (cost=0.00..73557.24 rows=17 width=4) (actual time=47.385..1346.824 rows=32 loops=1) Filter: ((subplan) 900) SubPlan - Aggregate (cost=1097.84..1097.85 rows=1 width=0) (actual time=16.333..16.334 rows=1 loops=50) - Seq Scan on table2 (cost=0.00..1097.00 rows=333 width=0) (actual time=0.025..14.995 rows=900 loops=50) Filter: ((fk = $0) AND (random() 0.1::double precision)) - Aggregate (cost=1097.84..1097.85 rows=1 width=0) (actual time=16.537..16.539 rows=1 loops=32) - Seq Scan on table2 (cost=0.00..1097.00 rows=333 width=0) (actual time=0.028..15.141 rows=934 loops=32) Filter: ((fk = $0) AND (random() 0.1::double precision)) Total runtime: 1346.972 ms (10 rows) This plan can produce just wrong result (wich is clear bug). VS right plan: testdb=# EXPLAIN ANALYZE select * from (SELECT table1.id,(select count(*) from table2 where table2.fk=table1.id and random()0.1) as count from table1 offset 0) as t1 where count900; QUERY PLAN --- Subquery Scan t1 (cost=0.00..54894.38 rows=17 width=12) (actual time=31.181..800.898 rows=35 loops=1) Filter: (t1.count 900) - Limit (cost=0.00..54893.75 rows=50 width=4) (actual time=14.992..800.754 rows=50 loops=1) - Seq Scan on table1 (cost=0.00..54893.75 rows=50 width=4) (actual time=14.988..800.602 rows=50 loops=1) SubPlan - Aggregate (cost=1097.84..1097.85 rows=1 width=0) (actual time=16.003..16.004 rows=1 loops=50) - Seq Scan on table2 (cost=0.00..1097.00 rows=333 width=0) (actual time=0.025..14.725 rows=898 loops=50) Filter: ((fk = $0) AND (random() 0.1::double precision)) Total runtime: 801.021 ms (9 rows) -- Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Postgresql 8.3 statistic stadistinct value dead wrong even with default_statistics_target=1000
Some time ago i found one simple sql over large table eat whole ram+swap and almost killed server (postgresql 8.3.3 on 4gb freebsd server): After some exploring i found what happens: Query was over simple table: profiles=# \d+ counter_vis Table counter_vis Column| Type | Modifiers | Description --+-+---+- counter_id | bigint | not null | visitor_id | bigint | not null | ts | bigint | not null | sessionstart | bigint | not null | sessionend | bigint | not null | numpageviews | integer | not null | Indexes: counter_vis_counter btree (counter_id) counter_vis_vis btree (visitor_id) Has OIDs: no Which contain around 648M entries. (according fresh analyzed stats from pg_stat_user_tables). Query was: select count(*) from (select visitor_id, sum(numpageviews) as s from counter_vis group by visitor_id having sum(numpageviews)1) as foo; With plan: QUERY PLAN --- Aggregate (cost=17429989.40..17429989.41 rows=1 width=0) - HashAggregate (cost=17422096.40..17426700.65 rows=263100 width=12) Filter: (sum(counter_vis.numpageviews) 1) - Seq Scan on counter_vis (cost=0.00..12554826.80 rows=648969280 width=12) Plan look ok... but how query eat over 4gb ram? After lookin i found one strange point: rows=263100 because i know in table must have around 70M unique visitor_id's. Manual analyze on table do not changed plan. Here is pg_statistic entry after analyze (with default_statistics_target=10): profiles=# SELECT * from pg_statistic where starelid=25488 and staattnum=2 order by 2 asc; -[ RECORD 1 ]--- starelid| 25488 staattnum | 2 stanullfrac | 0 stawidth| 8 stadistinct | 263100 ( here is 70M distinct values in reality) stakind1| 1 stakind2| 2 stakind3| 3 stakind4| 0 staop1 | 410 staop2 | 412 staop3 | 412 staop4 | 0 stanumbers1 | {0.00067,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067} stanumbers2 | stanumbers3 | {-0.0443004} stanumbers4 | stavalues1 | {413866965,489514660,624858316,753063164,790095243,1279713644,1628857812,2104294292,2726728837,2771123172} stavalues2 | {-9035671468843485583,184524075,555699387,921684844,1329929495,1897558820,2602632340,3083805588,3511488708,3963719633,9173061477162286552} stavalues3 | stavalues4 | so stadistinct is 300 times wrong from reality. Already strange... and i need run that query anyway... so i changed default_statistics_target to 1000 and analyzed table again: profiles=# ANALYZE verbose counter_vis; INFO: analyzing counter_vis INFO: counter_vis: scanned 30 of 6065134 pages, containing 3210 live rows and 0 dead rows; 30 rows in sample, 648969338 estimated total rows ANALYZE After statistic was better: starelid| 25488 staattnum | 2 stanullfrac | 0 stawidth| 8 stadistinct | 7.12958e+06 stakind1| 1 stakind2| 2 stakind3| 3 stakind4| 0 staop1 | 410 staop2 | 412 staop3 | 412 staop4 | 0 ... long stats skipped... but stadistinct still 10 times wrong from reality: profiles=# SELECT count(distinct visitor_id) from counter_vis; count -- 69573318 (1 row) Any way deal with such situations? Because 10х difference if postgres choose hashed plan will easy kill server because OOM (because query will use 10x more ram then postgres awaited). Probably some strange effects in statdistinc count algoritm? Or just any way remove limits on default_statistics_target? Thanks for help. PS: sorry for bad english. -- Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Cannot drop user (PostgreSQL 8.1.11)
I trying drop old user but got some strange issues: template1=# drop USER szhuchkov; ERROR: role szhuchkov cannot be dropped because some objects depend on it DETAIL: 1 objects in database billing 2 objects in database shop ok... lets look closer these two DB: shop=# drop USER szhuchkov; ERROR: role szhuchkov cannot be dropped because some objects depend on it DETAIL: owner of type pg_toast.pg_toast_406750 owner of type pg_toast.pg_toast_1770195 1 objects in database billing hm damn strange... shop=# SELECT * from pg_user where usename='szhuchkov'; usename | usesysid | usecreatedb | usesuper | usecatupd | passwd | valuntil | useconfig ---+--+-+--+---+--+--+ szhuchkov |16387 | f | f| f | | | {search_path=bill, billstat} (1 запись) Lets look pg_type for these two types: shop=# SELECT typname,typowner from pg_type where typname IN ('pg_toast_406750', 'pg_toast_1770195'); typname | typowner --+-- pg_toast_1770195 | 10 pg_toast_406750 | 10 (записей: 2) owner right (pgsql) Lets look db billing: billing=# drop USER szhuchkov; ERROR: role szhuchkov cannot be dropped because some objects depend on it DETAIL: owner of function vz_vds_ip_add(integer,bigint) 2 objects in database shop billing=# SELECT proname,proowner from pg_proc where proname like '%vz_vds_ip_add%'; proname| proowner ---+-- vz_vds_ip_add | 10 (1 запись) ok... again right owner... no signs of szhuchkov Last test... lets try pg_dumpall -s : [EMAIL PROTECTED] /home/mboguk]$ pg_dumpall -s | grep szhuchkov CREATE ROLE szhuchkov; ALTER ROLE szhuchkov WITH NOSUPERUSER INHERIT NOCREATEROLE NOCREATEDB LOGIN PASSWORD '***'; ALTER ROLE szhuchkov SET search_path TO bill, billstat; GRANT bill1c_r TO szhuchkov GRANTED BY pgsql; GRANT bill_r TO szhuchkov GRANTED BY pgsql; GRANT billexch_r TO szhuchkov GRANTED BY pgsql; GRANT billstat_r TO szhuchkov GRANTED BY pgsql; GRANT shop_r TO szhuchkov GRANTED BY pgsql; GRANT templar_r TO szhuchkov GRANTED BY pgsql; Nothing more... so according pg_dumpall szhuchkov also doesnt have any active objects in DB. In all other sides DB work 24x7 well without any other issues (and because 24x7 requirements i cannot stop DB and drop user from single user mode). Any ideas? or what to check else? -- Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Postgresql 8.3: stats collector process eat all CPU all time
details: db version: postgres (PostgreSQL) 8.3.0 OS version: FreeBSD 7.0-RELEASE stats collector eating whole single CPU last week or so... TOP lookin like: PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 69360 pgsql 1 1240 22380K 5548K CPU0 1 258.6H 99.02% postgres where 69360 is: USERPID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND pgsql 69360 100,0 0,1 22380 5548 ?? Rs 20мар08 15513:55,99 postgres: stats collector process(postgres) Server is dedicated server for DB... under small load so stats collector eat more CPU then all other db activity by 5-10x. Stats collector itself doing his work (eg i see all required requests in pg_stat_activity and counts work well too). And stat collector add a lot system load on server. ktrace/kdump this process show just one sequence: 69360 postgres CALL poll(0x7fffd4e0,0x1,0x7d0) 69360 postgres RET poll -1 errno 4 Interrupted system call 69360 postgres CALL poll(0x7fffd4e0,0x1,0x7d0) 69360 postgres RET poll -1 errno 4 Interrupted system call 69360 postgres CALL poll(0x7fffd4e0,0x1,0x7d0) 69360 postgres RET poll -1 errno 4 Interrupted system call 69360 postgres CALL poll(0x7fffd4e0,0x1,0x7d0) 69360 postgres RET poll -1 errno 4 Interrupted system call 69360 postgres CALL poll(0x7fffd4e0,0x1,0x7d0) kdump.out for 1 second size over 10Mbytes. Look like something wrong going on. Any idea what happend and how to fix situation (after postgres reboot situation become bad again in few days). -- Maxim Boguk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general