[GENERAL] Very suspicious plan difference for select and corresponding delete queries PostgreSQL 9.6.2

2017-05-08 Thread Maxim Boguk
Hi,

I found a case with very curious plan difference between:

explain select * from jobs_to_delete WHERE EXISTS(SELECT 1 FROM
job_stats_new_201411  jsm WHERE jsm.job_reference =
jobs_to_delete.job_reference);

QUERY PLAN
-
 Hash Join  (cost=239386.25..376872.49 rows=111372 width=41)
   Hash Cond: ((jobs_to_delete.job_reference)::text =
(jsm.job_reference)::text)
   ->  Seq Scan on jobs_to_delete  (cost=0.00..101547.10 rows=9286780
width=41)
   ->  Hash  (cost=237994.10..237994.10 rows=111372 width=18)
 ->  HashAggregate  (cost=236880.38..237994.10 rows=111372 width=18)
   Group Key: (jsm.job_reference)::text
   ->  Index Only Scan using
job_stats_master_201411_job_reference_idx_ebs on job_stats_new_201411 jsm
(cost=0.56..214784.97 rows=8838161 width=18)

and corresponding delete (which I suspect should have the same plan)

explain delete from jobs_to_delete WHERE EXISTS(SELECT 1 FROM
job_stats_new_201411  jsm WHERE jsm.job_reference =
jobs_to_delete.job_reference);
 QUERY PLAN

 Delete on jobs_to_delete  (cost=266351.88..403838.13 rows=111372 width=12)
   ->  Hash Join  (cost=266351.88..403838.13 rows=111372 width=12)
 Hash Cond: ((jobs_to_delete.job_reference)::text =
(jsm.job_reference)::text)
 ->  Seq Scan on jobs_to_delete  (cost=0.00..101547.10 rows=9286780
width=43)
 ->  Hash  (cost=264959.73..264959.73 rows=111372 width=24)
   ->  HashAggregate  (cost=263846.01..264959.73 rows=111372
width=24)
 Group Key: (jsm.job_reference)::text
 ->  Seq Scan on job_stats_new_201411 jsm
(cost=0.00..241750.61 rows=8838161 width=24)

Manual analyze of the both tables didn't change a result.

I can not see any possible/logical/realistic reason for the database to
switch between INDEX ONLY SCAN and SEQ SCAN for EXIST part of query in this
two cases.

I not sure that it's a but, so I better post in -general first.

-- 
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/ <http://www.postgresql-consulting.com/>

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


Re: [GENERAL] text_pattern_ops index not being used for prefix query

2016-10-20 Thread Maxim Boguk
On Fri, Oct 21, 2016 at 8:42 AM, Alexander Staubo <a...@purefiction.net>
wrote:

> Indexing on a text column:
>
> create index index_documents_on_id_with_pattern_ops (id
> text_pattern_ops);
>
> This works fine:
>
> > explain select id from documents where id like 'dingbat%';
> Index Only Scan using index_documents_on_id_with_pattern_ops on
> documents  (cost=0.56..8.58 rows=736 width=19)
>   Index Cond: ((id >= 'dingbat'::text) AND (id < 'dingbau'::text))
>   Filter: (id ~~ 'dingbat%'::text)
>
> But for some reason, if an underscore character appears in my search
> string, it falls back to a disasterously slow seqscan:
>
> > explain select id from documents where id like '_dingbat%';
> Seq Scan on documents  (cost=0.00..779238.28 rows=736 width=19)
>   Filter: (id ~~ '_dingbat%'::text)
>
> Is this because of PostgreSQL’s collation system? Using “C” doesn’t work
> either.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

​Underscore in like pattern have a special meaning of "any symbol".
>From documentation on the
https://www.postgresql.org/docs/9.4/static/functions-matching.html :
"An underscore (_) in pattern stands for (matches) any single character;"
This could be useful as well: "To match a literal underscore or percent
sign without matching other characters, the respective character in pattern
must be preceded by the escape character."

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


Re: [GENERAL] Insert only table and size of GIN index JSONB field.

2016-05-04 Thread Maxim Boguk
On Wed, May 4, 2016 at 3:45 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Tue, May 3, 2016 at 3:09 AM, Maxim Boguk <maxim.bo...@gmail.com> wrote:
> > Hi,
> >
> > I started with empty table with index over
> >  custom_fields | jsonb
> > field
> > defined as:
> > "idx_learners_custom_fields" gin (custom_fields)
> > Globally gin_pending_list_limit set to 2MB.
> > Database version is 9.5.2.
> >
> > Now question:
> > If table populated with 1M records in single transaction then the final
> size
> > of the GIN index is:
> > 4265 MB
> > but after I performed reindex index idx_learners_custom_fields;
> > the index size had been reduced 15x to 295 MB.
> >
> > Is this behavior expected?
>
> This sounds like a known issue, fixed in 9.6, where the pages used for
> the pending list do not eligible for recycling until the table is
> vacuumed.  Autovacuum does not run on insert only tables, so they just
> accumulate as empty pages in the index.
>
> Cheers,
>
> Jeff
>


​Hi Jeff,

Yes it's look like a cause.
Is there any workaround possible for 9.5.2?
At this moment I see palliative (because it doesn't reduce already bloated
index) cure via performing manual vacuum on the table after each batch
insert
or very slow (for the large table) full scale fix via create new index/drop
old index.
​

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


Re: [GENERAL] Insert only table and size of GIN index JSONB field.

2016-05-03 Thread Maxim Boguk
On Tue, May 3, 2016 at 8:09 PM, Maxim Boguk <maxim.bo...@gmail.com> wrote:

> Hi,
>
> I started with empty table with index over
>  custom_fields | jsonb
> field
> defined as:
> "idx_learners_custom_fields" gin (custom_fields)
> Globally gin_pending_list_limit set to 2MB.
> Database version is 9.5.2.
>
> Now question:
> If table populated with 1M records in single transaction then the final
> size of the GIN index is:
> 4265 MB
> but after I performed reindex index idx_learners_custom_fields;
> the index size had been reduced 15x to 295 MB.
>
> Is this behavior expected?
>

It's look easy to reproduce.
I can send a sample dataset for analyze.

drop table if exists test;
create table test (custom_fields jsonb);
create index test_gin_key on test USING GIN(custom_fields);
insert into test select custom_fields from public.learners;
INSERT 0 100
\di+ test_gin_key
List of relations
 Schema | Name | Type  |  Owner   | Table |  Size   | Description
+--+---+--+---+-+-
 public | test_gin_key | index | postgres | test  | 4211 MB |

reindex index test_gin_key;

\di+ test_gin_key
List of relations
 Schema | Name | Type  |  Owner   | Table |  Size  | Description
+--+---+--+---++-
 public | test_gin_key | index | postgres | test  | 295 MB |


​


-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


[GENERAL] Insert only table and size of GIN index JSONB field.

2016-05-03 Thread Maxim Boguk
Hi,

I started with empty table with index over
 custom_fields | jsonb
field
defined as:
"idx_learners_custom_fields" gin (custom_fields)
Globally gin_pending_list_limit set to 2MB.
Database version is 9.5.2.

Now question:
If table populated with 1M records in single transaction then the final
size of the GIN index is:
4265 MB
but after I performed reindex index idx_learners_custom_fields;
the index size had been reduced 15x to 295 MB.

Is this behavior expected?

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


[GENERAL] 2x Time difference between first and subsequent run of the same query on fresh established connection (on linux, with perf data included, all query data in the shared buffers) on postgresql

2016-02-24 Thread Maxim Boguk
Hi,

I found a strange case when the query (which works through large amount of
shared buffers) run 2x time faster on the second and subsequent run in new
connection to db.
For sample:

postgres@base1:~$ psql ***
psql (9.3.10)

***=# explain (analyze, costs, buffers, timing) select * from
transactions where "timestamp" >= '2016-02-20' and "timestamp"  <
'2016-02-23'::date;

QUERY PLAN
-
 Index Scan using i_transactions_timestamp on transactions
(cost=0.57..138824.52 rows=1955459 width=790) (actual time=0.146..3416.477
rows=1950630 loops=1)
   Index Cond: (("timestamp" >= '2016-02-20 00:00:00'::timestamp without
time zone) AND ("timestamp" < '2016-02-23'::date))
   Buffers: shared hit=1965635
 Total runtime: 3481.322 ms
(4 строки)

***=# explain (analyze, costs, buffers, timing) select * from
transactions where "timestamp" >= '2016-02-20' and "timestamp"  <
'2016-02-23'::date;

QUERY PLAN
-
 Index Scan using i_transactions_timestamp on transactions
(cost=0.57..138824.52 rows=1955459 width=790) (actual time=0.030..1812.361
rows=1950630 loops=1)
   Index Cond: (("timestamp" >= '2016-02-20 00:00:00'::timestamp without
time zone) AND ("timestamp" < '2016-02-23'::date))
   Buffers: shared hit=1965635
 Total runtime: 1878.503 ms

And every run after it - works in 1.8-1.9s, but if I establish the new
connection to database - the first query will run 3.5s again.

​Time difference and timing of each run pretty repeatable (+/- 100ms).​

There are perf report data for the first and for the second runs:

The first run (something fishy with kernel calls):
  19,60%  postgres  [kernel.kallsyms]  [k] filemap_map_pages
  15,86%  postgres  postgres   [.] hash_search_with_hash_value
   8,20%  postgres  postgres   [.] heap_hot_search_buffer
   8,20%  postgres  postgres   [.] heap_page_prune_opt
   5,72%  postgres  postgres   [.] PinBuffer
   4,38%  postgres  [kernel.kallsyms]  [k] page_fault
   4,04%  postgres  [kernel.kallsyms]  [k] page_waitqueue
   3,55%  postgres  [kernel.kallsyms]  [k] __wake_up_bit
   2,95%  postgres  postgres   [.] LWLockAcquire
   2,31%  postgres  [kernel.kallsyms]  [k] unlock_page
   1,96%  postgres  [vdso] [.] __vdso_gettimeofday
   1,83%  postgres  [kernel.kallsyms]  [k] radix_tree_next_chunk
   1,77%  postgres  [kernel.kallsyms]  [k] page_add_file_rmap
   1,66%  postgres  postgres   [.] _bt_checkkeys
   1,27%  postgres  postgres   [.] LWLockRelease

The second run (look perfectly ok for such kind of query):
  27,74%  postgres  postgres   [.] hash_search_with_hash_value
  15,51%  postgres  postgres   [.] heap_hot_search_buffer
  14,28%  postgres  postgres   [.] heap_page_prune_opt
   9,22%  postgres  postgres   [.] PinBuffer
   3,94%  postgres  [vdso] [.] __vdso_gettimeofday
   3,32%  postgres  postgres   [.] _bt_checkkeys
   3,29%  postgres  postgres   [.] LWLockAcquire
   2,86%  postgres  postgres   [.] LWLockRelease
   1,54%  postgres  postgres   [.] UnpinBuffer


So it's looks like that something strange going inside linux kernel 3.16.0
memory managment (it's vanilla kernel on the bare hardware, no
virtualization, swap off).
Question is: it's work as expected (and in that case probably good idea use
pgbouncer even for one-shot analytical queries), or it's sign of the
potential issues with the ykernel?



-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


[GENERAL] How to analyze locking issues of the startup process on hot standby replica?

2016-02-19 Thread Maxim Boguk
Hi,

I analyze a strange case with the startup process on replica spend almost
all time in waiting state:
"postgres: startup process   recovering 00014A3E00BF waiting"
As a result recovery seriously lagged behind master (in range of hours).

Replica have hot_standby=on and almost always have some long running (in
range of minutes-ten minutes) select queries.
I expected to find some not granted locks in pg_locks on the standby
server, but select * from pg_locks where not granted; show no results.

Is there any way to look into the startup process and see what it waiting
for (less invasive than gdb).

PS: there are no DDL routinely run on the master (so there no need
accessexclusivelock on standby to apply ddl).

​Kind Regards,
Maksym​

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


[GENERAL] Weird behaviour in planner (PostgreSQL v 9.2.14)

2015-12-11 Thread Maxim Boguk
Hi,

I found very weird behaviour on planner side with estimation error of
700.000.000.

Situation (with explain analyze):

EXPLAIN ANALYZE
select * from person2obj
WHERE
p2o_id IN (SELECT p2o_id::bigint FROM (SELECT * FROM (SELECT column1 AS
p2o_id FROM (
VALUES ('2056892'), up to 199 values total
) AS __CDP_VALUES__) AS __CDP_DATA__) AS __TARGET__ );

;

 QUERY PLAN

 Nested Loop  (cost=2.99..16316.34 rows=199 width=58) (actual
time=0.196..1.202 rows=198 loops=1)
   ->  HashAggregate  (cost=2.99..4.98 rows=199 width=32) (actual
time=0.160..0.205 rows=199 loops=1)
 ->  Values Scan on "*VALUES*"  (cost=0.00..2.49 rows=199 width=32)
(actual time=0.003..0.088 rows=199 loops=1)
   ->  Index Scan using pk_person2obj on person2obj  (cost=0.00..81.96
rows=1 width=58) (actual time=0.004..0.004 rows=1 loops=199)
 Index Cond: (p2o_id = ("*VALUES*".column1)::bigint)


​Estimate looks pretty reasonable.


However, with length of the value list 200 (or more), the database switch
to completely different (and very weird) estimation of 700.000.000:

 QUERY PLAN

 Nested Loop  (cost=3.00..16398.33 rows=714143698 width=58) (actual
time=0.200..1.239 rows=200 loops=1)
   ->  HashAggregate  (cost=3.00..5.00 rows=200 width=32) (actual
time=0.165..0.201 rows=200 loops=1)
 ->  Values Scan on "*VALUES*"  (cost=0.00..2.50 rows=200 width=32)
(actual time=0.004..0.090 rows=200 loops=1)
   ->  Index Scan using pk_person2obj on person2obj  (cost=0.00..81.96
rows=1 width=58) (actual time=0.004..0.004 rows=1 loops=200)
 Index Cond: (p2o_id = ("*VALUES*".column1)::bigint)


The all estimates looks ok until the final nested loop plan estimate of
​700.000.000

PS: the person2obj table contains ~1.4 billion tuples, p2o_id - primary key.

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


Re: [GENERAL] After configuring remote access,server can't be started

2015-11-22 Thread Maxim Boguk
On Sun, Nov 22, 2015 at 8:54 PM, Alex Luya <alexander.l...@gmail.com> wrote:

> My postgresql 9.4 is installed in centos 6.7,and I have followed this:
>
> ​​
>> http://www.cyberciti.biz/faq/postgresql-remote-access-or-connection/
>
>
> 1,cd /var/libpgsql/9.4/data
>
> 2,cp  postgresql.conf.sample postgresql.conf
>
> 3,sudo vi postgresql.conf and add two lines,and save it:
>
>
>> *listen_addresses = "*"​​tcpip_socket = true*
>
>
There are no such option as ​

​
tcpip_socket
​ anymore (it had been removed sometime around 2005 year).
So very likely your server doesn't start because your config file not valid
anymore.

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


Re: [GENERAL] Grant SELECT/Execute to View/Function but not underlying Table

2015-09-24 Thread Maxim Boguk
On Thu, Sep 24, 2015 at 9:28 PM, Alex Magnum <magnum11...@gmail.com> wrote:

> Hi,
> is it possible to grant select to views and functions without the need to
> also grant the user the SELECT privileges to the Tables used in the views
> or functions?
>
> That way I could create read only users on a website and limit their
> access to the bare minimum.
>
> Thanks in advance for any advise on this
>
> Alex
>


​Hi,

For functions it's possible (read about SECURITY DEFINER), for view no it
isn't possible (view is query text stored in database for future use and
nothing more).
​



-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>


Re: [GENERAL] Curious case of huge simple btree indexes bloat.

2015-06-01 Thread Maxim Boguk
On Mon, Jun 1, 2015 at 3:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  On the one of databases under my support I found very curious case of the
  almost endless index bloat (index size stabilises around 100x of the
  original size).

  The table have 5 indexes and they all have the same bloating behaviour
  (growth to almost 100x and stabilisation around that amount). An original
  index size 4-8Mb (after manual reindex), over time of the 5 days they all
  monotonically growth to 300-900MB. In the same time table size staying
  pretty constant at 30-50Mb (and amount of rows in the same don't vary
  widely and stays between 200k and 500k).

 At least for the index you gave stats for, it seems like it's stabilizing
 at one index entry per page.  This is a known possible pathological
 behavior if the application's usage involves heavy decimation of original
 entries; say, you insert sequential timestamps and then later remove all
 but every one-thousandth one, leaving at most one live entry on every
 index page.  Btree can recover the totally-empty leaf pages but it has no
 provision for merging non-empty leaf pages, so those all stay as they are
 indefinitely.

 It would be pretty unusual for all the indexes on a table to be used like
 that, though.

 regards, tom lane


Thank you very much for an explanation.
This table are part of the complicated 3-tables session info structure with
a lot of short living sessions and some very long living.
And most used id's are bigserials. So yet every index field on that table
have the same bad behaviour.

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/


[GENERAL] Curious case of huge simple btree indexes bloat.

2015-05-30 Thread Maxim Boguk
Hi,

On the one of databases under my support I found very curious case of the
almost endless index bloat (index size stabilises around 100x of the
original size).
Graph of one index size history attached (other indexes have an similar
time/size graphs).

The table have 5 indexes and they all have the same bloating behaviour
(growth to almost 100x and stabilisation around that amount). An original
index size 4-8Mb (after manual reindex), over time of the 5 days they all
monotonically growth to 300-900MB. In the same time table size staying
pretty constant at 30-50Mb (and amount of rows in the same don't vary
widely and stays between 200k and 500k).

The table have large amount of the inserts/update/deletes, but autovacuum
tuned to be pretty aggressive and I sure that there are no long
transactions (longer then few minutes). Also there are no standby replica
with hot_standby=on and no prepared transactions used, and not batch
deletes/inserts/updates used. The server have plenty of RAM (database fit
into shared buffers), IO and CPU available so there are no visible resource
starvation.

​Background information:
The PostgreSQL version 9.4.2 64 bit on Linux.
Table structure:

\d+ clientsession
 Table
public.clientsession
 Column  |   Type
|Modifiers|
Storage  | Stats target | Description
-+--+-+--+--+-
 globalsessionid | bigint   | not null default
nextval('clientsession_globalsessionid_seq'::regclass) | plain
|  |
 deviceuid   | text
| |
extended |  |
 localsessionid  | bigint
| |
plain|  |
 createddate | timestamp with time zone
| |
plain|  |
 lastmodified| timestamp with time zone
| |
plain|  |
 keypairid   | bigint
| |
plain|  |
 sessiondataid   | bigint
| |
plain|  |
Indexes:
clientsession_pkey PRIMARY KEY, btree (globalsessionid) CLUSTER
clientsession_ukey UNIQUE CONSTRAINT, btree (deviceuid,
localsessionid)
clientsession_keypairid_key btree (keypairid)
clientsession_sessiondataid_key btree (sessiondataid)
clientsession_uduid_localid_idx btree (upper(deviceuid),
localsessionid)
Foreign-key constraints:
clientsession_keypair_fkey FOREIGN KEY (keypairid) REFERENCES
keypair(id) ON DELETE CASCADE
clientsession_sessiondata_id FOREIGN KEY (sessiondataid) REFERENCES
sessiondata(id) ON DELETE CASCADE
Referenced by:
TABLE remotecommand CONSTRAINT remotecommand_clientsessionid_fkey
FOREIGN KEY (clientsessionid) REFERENCES clientsession(globalsessionid) ON
DELETE CASCADE
Options: fillfactor=50, autovacuum_vacuum_scale_factor=0.01

Results of pgstatindex for one of bloated indexes:
select * from pgstatindex('clientsession_pkey');
-[ RECORD 1 ]--+--
version| 2
tree_level | 2
index_size | 552640512
root_block_no  | 290
internal_pages | 207
leaf_pages | 67224
empty_pages| 0
deleted_pages  | 29
avg_leaf_density   | 1.08
leaf_fragmentation | 3.02

List of current index sizes (they stabilized 1 day ago):
\di+ clientsession*
 List of relations
 Schema |  Name   | Type  |  Owner  | Table
|  Size  | Description
+-+---+-+---++-
 public | clientsession_keypairid_key | index | phoenix | clientsession
| 545 MB |
 public | clientsession_pkey  | index | phoenix | clientsession
| 527 MB |
 public | clientsession_sessiondataid_key | index | phoenix | clientsession
| 900 MB |
 public | clientsession_uduid_localid_idx | index | phoenix | clientsession
| 254 MB |
 public | clientsession_ukey  | index | phoenix | clientsession
| 254 MB |



I never seen such behaviour on other databases and all my attempts to get
this index bloat under control have no effect.
If anyone have any ideas (even crazy ones) - welcome.


-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru

Re: [GENERAL] Documentation bug?

2015-05-19 Thread Maxim Boguk
On Tue, May 19, 2015 at 7:39 PM, Thomas Kellerer spam_ea...@gmx.net wrote:

 Hello all,

 I just noticed that you can do something like this (using 9.4.1):

select array[1,2,3] - 3

 which is doing the same thing as:

select array_remove(array[1,2,3],3)

 but the minus is not documented as an array operator:
 http://www.postgresql.org/docs/current/static/functions-array.html

 Is that an oversight in the documentation or isn't the minus supposed to
 work like that in the first place?


​You very likely have an intarray extension installed​:​
http://www.postgresql.org/docs/9.4/interactive/intarray.html
int[] - int  operator documented in the extension documentation as it's a
part of the extension but not part of the PostgreSQL core.​



-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


Re: [GENERAL] moving to PostgreSQL from MS-SQL and from Oracle, looking for feature comparison information

2015-05-09 Thread Maxim Boguk
On Sun, May 10, 2015 at 12:30 PM, Yuri Budilov yuri.budi...@hotmail.com
wrote:

 MANY THANKS to everyone who replied !
 Keep up great work!

 more things (critical for very large and mission critical databases)

 - database row/page compression -

 it looks to me that there is no page/block compression available on
 PostgreSQL 9.4 along the lines of MS-SQL/Oracle row/page compression
 features?
 I realize that there is some compression of individual varchar/text data
 type columns but there is nothing like a complete row compression, index
 page compression and page/dictionary compression? Is that correct?


​Yes that's correct. Only individual field compression supported (for
fields longer that 2Kb usually).​



 database and transaction log backup compression? not available?


Transaction log backup compression not available (however could be easily
archived via external utilities like bzip2).
Both built-in backup utilities (pg_dump and pg_basebackup) support
compression.



 - recovery from hardware or software corruption -

 suppose I am running a mission critical database (which is also relatively
 large, say  1TB) and I encounter a corruption of some sort (say, due to
 hardware or software bug) on individual database pages or a number of pages
 in a database

 How do I recover quickly and without losing any transactions? MS-SQL and
 Oracle can restore individual pages (or sets of pages) or restore
 individual database files and then allow me to roll forward transaction log
 to bring back every last transaction. It can be done on-line or off-line.
 How do I achieve the same in PostgreSQL 9.4? One solution I see may be via
 complete synchronous replication of the database to another server. I am
 but sure what happens to the corrupt page(s) - does it get transmitted
 corrupt to the mirror server so I end up with same corruption on both
 databases or is there some protection against this?


​It's depend where a corruption happen, if pages become corrupted due to
some problems with physical storage (filesystem) in that case a replica
data should be ok.
There are no facility to recover individual database files and/or page
ranges from base backup and roll forward the transaction log (not even
offline).

From my practice using a PostgreSQL for the terabyte scale and/or
mission-critical databases definitely possible but require very careful
design and planning (and good hardware).

​
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/
​Melbourne, Australia​


Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


Re: [GENERAL] moving to PostgreSQL from MS-SQL and from Oracle, looking for feature comparison information

2015-05-09 Thread Maxim Boguk
Hi Yuri,

I will try answer your questions one by one.


In the meantime, I have scanned the manual for PostgreSQL 9.4 and there are
 a few things I was not able to find in the manual, my apologies if I missed
 it:

  1. does PostgreSQL have parallel query capability like MS-SQL 2008+ and
 Oracle 11g+ ? Or does a single query only run on 1 CPU?

The latest stable version will execute a single query on single CPU.​ Next
version (9.5) will have some limited parallel query execution facilities.



 2. does PostgreSQL have ability to apply query optimizer hints to
 individual queries - such as use a particular index, join type, join order,
 plan guides, etc ?

​No it's not available. In most (but not all) cases PostgreSQL query
optimizer will select reasonable good plan.​


 3. does PostgreSQL have Column-Store capability?

​In community version - no, but there are some external addons available
which add column storage (however a bit limited).​



 4. does PostgreSQL have anything resembling Microsoft SQL Server Profiler
 Trace or Extended Events ?

​No, but statistical views in 9.2+ provides pretty good overview about
what's going on the database (especially useful could be
pg_stat_statements:
http://www.postgresql.org/docs/9.4/interactive/pgstatstatements.html )



 5. does PostgreSQL have a database backup capability such that not a
 single transaction is lost in case of hardware failure? Some of our target
 databases are several TeraBytes in size with several hundred concurrent
 connections and transactions are financial in their nature. So we need
 extremely robust backup/restore capability, 100% on-line.

​Yep PostgreSQL could use built-in synchronous replication with zero
committed transaction lost after failover.​



 6. does PostgreSQL support NUMA on Intel based X64 servers and does it
 support Hyper-Threading ?

​No NUMA support. Yes PostgreSQL will work on HT enabled servers (will it
be efficient - depend on workload and CPU type).​


7. does PostgreSQL support in-memory store (similar to Oracle 12c in-memory
 and SQL Server 2014 in-memory OLTP) ?

​No.​


8. does PostgreSQL have temporary tables support?

​Yes full temporary tables support since beginning.​


Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.
​com​
/ http://www.postgresql-consulting.com/
​Melbourne, Australia​


Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


Re: [GENERAL] Improving performance of merging data between tables

2015-01-07 Thread Maxim Boguk
On Wed, Jan 7, 2015 at 8:49 PM, Pawel Veselov pawel.vese...@gmail.com
wrote:


 PPPS: and the last suggestion, after you finished with the write all the
 data into its own tables, then application should perform analyze of these
 own tables (or you could have weird/inefficient plans during last stage).


 Any references to back this up? I don't particularly mind doing it, but I
 wonder if analysis can be more expensive the processing. These tables get a
 few hundreds of records inserted/updated, then are entirely processed (with
 expected full scans), and then deleted...

​
If these own tables used only in full table selects but never used in
joins - than there should be no issues.
However, once you start join these tables with anything else, you could
have very inefficient/weird plans because the database doesn't know
(without analyze) how many rows you have in these tables.​





 PS: your setup look pretty complicated and hard to analyze without seeing
 all involved table structures, transaction/query flow, and (especially)
 involved procedures source code.


 Sure :) At this point, I've put together the bulk merge code as well. I
 can't quite see much of a difference, actually, but it's hard to trust the
 execution times, as on the same amount of data they vary from, say, 0.5s to
 2s, and the sample data is not stepping on any other locks. In general, I'm
 afraid all those left joins and multiple scans, even over small amount of
 data, is nullifying any positive effect.


​
Now some ideas to check.
The high CPU usage usually isn't related to locking, but related to seq
scan or wrong plans or simple inefficient pl/pgsql code, locked processes
usually doesn't use too much cpu.

1)on the test database perform  select pg_stat_reset(); then perform full
round of merges, then check
select * from pg_stat_user_tables where seq_scan0 order by seq_tup_read;
and if you find a lot of seq_scan and seq_tuple_reads on the particular
table try find where they coming from (it could be reason for high CPU
usage).

2)enable track_functions in postgresql.conf and perform the same sequence
(select pg_stat_reset() + full round of merges
) then check
select * FROM pg_stat_user_functions order by self_time desc;
and check which function using the most time.

3)old/lost prepared transactions can have deadly effect on the database
performance at whole. So check select * from pg_prepared_xact(); and verify
that you don't have a hours (or weeks) old prepared xact lying around.

PS: btw I still don't fully understood relation between the:

- merges data into its own node tables (using merge_xxx PL/pgSQL functions)

and provided code for the public.merge_all02-9A-46-8B-C1-DD and
PUBLIC.merge_agrio.
As I see
public.merge_all02-9A-46-8B-C1-DD calling PUBLIC.merge_agrio, and the
PUBLIC.merge_agrio updates a global table R_AGRIO (but not the own node
table).


I think the best implementation of such task is asynchronous processing of
this changes via background process. An application only inserts events
into queue table (it lockless process), and some background process read
these data from queue table and merge it into main table (again lockless
because it single thread so no concurrent writes), and then delete the
merged data from queue table.


-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


Re: [GENERAL] Improving performance of merging data between tables

2014-12-30 Thread Maxim Boguk
On Wed, Dec 31, 2014 at 11:10 AM, Pawel Veselov pawel.vese...@gmail.com
wrote


 [skipped]

 2) try pg_stat_statements, setting pg_stat_statements.track = all.  see:
 http://www.postgresql.org/docs/9.4/static/pgstatstatements.html

 I have used this to profile some functions, and it worked pretty well.
 Mostly I use it on a test box, but once ran it on the live, which was
 scary, but worked great.


 That looks promising. Turned it on, waiting for when I can turn the
 server at the next quiet time.


 I have to say this turned out into a bit of a disappointment for this use
 case. It only measures total time spent in a call. So, it sends up
 operations that waited a lot on some lock. It's good, but it would be great
 if total_time was provided along with wait_time (and io_time may be as
 well, since I also see operations that just naturally have to fetch a lot
 of data)


​1) pg_stat_statements provide an information about io_time of each
statement but you should have track_io_timing ​

​enabled for that.

2) About locking I suggest enable log_lock_waits and set deadlock_timeout
to say 100ms (just for testing purposes), and than any lock waiting more
than 100ms will be logged with some useful additional info.

PS: your setup look pretty complicated and hard to analyze without seeing
all involved table structures, transaction/query flow, and (especially)
involved procedures source code.

PPS: btw, please check the database logs for deadlocks messages, your setup
around and then call a pgsql function to merge the data from its tables
into the common tables part could be easily deadlock prone.

PPPS: and the last suggestion, after you finished with the write all the
data into its own tables, then application should perform analyze of these
own tables (or you could have weird/inefficient plans during last stage).



-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


Re: [GENERAL] Drastic select count performance hit when jsonb GIN indices are present

2014-12-26 Thread Maxim Boguk
   -  Bitmap Heap Scan on myevents  (cost=35.80..3615.09 rows=3716
width=0) (actual time=351.510..77669.907 rows=1417152 loops=1)

  Recheck Cond: (event @ '{event: delivered}'::jsonb)
  Heap Blocks: exact=298362
 ​​
  Buffers: shared hit=1 read=298589

​...​

  Execution time: 80986.340 ms



-  Bitmap Heap Scan on
 ​​
 myevents  (cost=42.80..3622.09 rows=3716 width=0) (actual
 time=534.816..78526.944 rows=1417152 loops=1)
  Recheck Cond: (event @ '{event: delivered}'::jsonb)
  Heap Blocks: exact=298362
  Buffers: shared hit=383 read=299133

​...
  Execution time: 81898.578 ms


​Hi Anton,

What you see there (i think) - it's a performance hit of random disk read
for non-cached database.
Try increase a shared buffers to value when table and index could fit into,
and redo queries few time until you see something like
Buffers: shared hit=bigvalue read=0 and compare performance, it might
change timing quite a lot.

Also, I recommend set track_io_timing=on in postgresql.conf and  after it
use explain (analyze, buffers, timing) to see check how much time database
spent doing IO operations.
Also try perform vacuum analyze ​myevents; before testing because it seems
that you have no up to date visibility map on the table.


However, even in fully cached case selecting 40% on the table rows almost
always will be faster via sequential scan, so I don't expect miracles.

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


Re: [GENERAL] Drastic select count performance hit when jsonb GIN indices are present

2014-12-26 Thread Maxim Boguk



 Getting back to my original point - you pointed out that for queries that
 need a decent % of the table it will be cheaper to do a scan, which is
 exactly what the query planner does for the relational version. If it only
 needs a small % of the values it looks at the index and for a large % it
 goes for a scan (it also puts everything in shared buffers and is
 lightening quick!). Is this just a lack of maturity in the jsonb planner or
 am I missing something?


​Hi Anton,

Good selectivity estimators exists only for the scalar data types.
For the complex data types such as json/jsonb introducing a reasonable
selectivity estimator is very complicated task, so database could only
guess in this cases.
In your case the database guessed amount of returned rows with 3 order of
magnitude error (estimated 3716 rows, actually 1417152 rows).
Personally, I don't expect serious progress in json/jsonb selectivity
estimators in short future, so better to avoid using a low-selectivity
queries against indexed json/jsonb fields.

​

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


[GENERAL] Question about forced immediate checkpoints during create database

2013-11-11 Thread Maxim Boguk
Hi everyone,

I have few question about checkpoints during create database.

First just extract from log on my test database 9.2.4:

2013-11-12 03:48:31 MSK 1717 @ from  [vxid: txid:0] [] LOG:  checkpoint
starting: immediate force wait
2013-11-12 03:48:31 MSK 1717 @ from  [vxid: txid:0] [] LOG:  checkpoint
complete: wrote 168 buffers (0.0%); 0 transaction log file(s) added, 0
removed, 0 recycled; write=0.314 s, sync=0.146 s, total=0.462 s; sync
files=104, longest=0.040 s, average=0.001 s
2013-11-12 03:48:32 MSK 1717 @ from  [vxid: txid:0] [] LOG:  checkpoint
starting: immediate force wait
2013-11-12 03:48:32 MSK 1717 @ from  [vxid: txid:0] [] LOG:  checkpoint
complete: wrote 6 buffers (0.0%); 0 transaction log file(s) added, 0
removed, 0 recycled; write=0.311 s, sync=0.002 s, total=0.315 s; sync
files=6, longest=0.000 s, average=0.000 s
2013-11-12 03:48:32 MSK 13609 postgres@hh_data from [local] [vxid:502/0
txid:0] [CREATE DATABASE] LOG:  duration: 1160.409 ms  statement: create
database _tmp;

So during creating of database two immediate force checkpoints was
performed.

Now questions:

1)Why these checkpoints performed at all? I understood why checkpoint
performed during drop database (to clean shared buffers from the dropped db
data), but why issue checkpoint during create database?

2)Why two checkpoints performed one after one?

3)Is there any good way to perform spread checkpoint during create database
(similar to  --checkpoint=spread for the pg_basebackup) ?
I'm ready to wait 30 min for create database in that case...
I asking because performing immediate checkpoint on the large heavy loaded
database - good recipe for downtime (IO become overloaded to point of the
total stall)...
Is there any workaround for this problem?

4)Is idea to add an option for create/drop database syntax to control
checkpoint behaviour sounds reasonable?

Kind Regards,
Maksym

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


Re: [GENERAL] Question about PostgreSQL logging configuration

2012-11-29 Thread Maxim Boguk
 I want log all 'mod' statements with their execution times and all
 statements longer than 10ms (also with their execution times).

 You cannot combine things as you want. However, it seems a fairly
 minor loss - why would you care about how fast sub-10ms mods
 ran?

Trouble if I try that way, than database log for mod statements over
10ms duration will be written in wrong style:
2012-11-30 08:06:28 MSK [vxid:229/5138506 txid:1345713884] [INSERT]
LOG:  duration: 10.158 ms
2012-11-30 08:06:28 MSK [vxid:325/5420118 txid:0] [UPDATE] LOG:
execute S_5: update applicant_adv_subscription ...

instead of more usual way:
2012-11-30 08:08:21 MSK [vxid:307/5334684 txid:0] [SELECT] LOG:
duration: 16.085 ms  execute S_42: select ...

So at end I getting logs which are incompatible with pgFouine (and
with my custom log analyzer as well).

Is there any serious reason why log_duration = on uses so strange output style?

I could not think any reasonable case for use log_duration = on
without log_statements = all.

And log_duration = on with log_statements = all produce results which
are equivalent to log_min_duration_statement=0,
but log_min_duration_statement=0 produce logs which are much more easy
to parse and read in general.

Is there anyone who consciously used log_duration = on without
log_statements = all ?
And if yes - what for?

I might be very wrong, but log_duration seems close to useless knob in reality.

Kind Regards,
Maksym


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Is it possible to create row-wise indexable condition for special case...

2012-08-05 Thread Maxim Boguk
Hi,

I know that for condition like   ((field1value1) or (field1=value1 and
field2value2))  I could built index on (field1, field2) and use indexable
condition like   (field1, field2)  (value1, value2).

However, I have very tricky query which requires an indexable condition
like  (field1value1) or (field1=value1 and field2value2).
I guess that I could use index definition like ON(field1, field2 DESC)
however,  I can not build query condition which would use that index
effectively.

Is it possible and my mind just went black on simple task?

PS: field2 is varchar type so I couldn't using the negative values trick
build index ON(field1, (-field2))
and write something like WHERE (field1, -field2)  (value1, -value2).

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


[GENERAL] Inefficient plan selected by PostgreSQL 9.0.7

2012-05-01 Thread Maxim Boguk
 index
(sb_messages_special4_key) will read the exactly same amount of rows from
the table as scan over sb_messages_special3_key.
And very likely scan over related index will win.

What I can do to fight that issue (I looking to keep both indexes on that
table for fast queries with different ordering).

-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


Re: [GENERAL] Inefficient plan selected by PostgreSQL 9.0.7

2012-05-01 Thread Maxim Boguk
On Wed, May 2, 2012 at 2:50 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  I got very inefficient plan for a simple query.

 It looks like the problem is with the estimate of the antijoin size:

   -  Nested Loop Anti Join  (cost=0.00..24576.82 rows=1
 width=206)
  (actual time=0.043..436.386 rows=20761 loops=1)

 that is, only about 20% of the rows in sb_messages are eliminated by the
 NOT EXISTS condition, but the planner thinks that nearly all of them
 will be (and that causes it to not think that the LIMIT is going to
 affect anything, so it doesn't prefer a fast-start plan).

 Since you've not told us anything about the statistics of these tables,
 it's hard to speculate as to why the estimate is off.

regards, tom lane



Hi,

Is there any particular stat data what I need provide except these two:

SELECT * from pg_stats where tablename='users' and attname='blocked';
-[ RECORD 1 ]-+
schemaname| public
tablename | users
attname   | blocked
inherited | f
null_frac | 0
avg_width | 1
n_distinct| 2
most_common_vals  | {f,t}
most_common_freqs | {0.573007,0.426993}
histogram_bounds  |
correlation   | 0.900014

and

SELECT
schemaname,tablename,attname,inherited,null_frac,avg_width,n_distinct,correlation
from pg_stats where tablename='sb_messages' and attname='from_user';
-[ RECORD 1 ]
schemaname  | public
tablename   | sb_messages
attname | from_user
inherited   | f
null_frac   | 0
avg_width   | 4
n_distinct  | 103473
correlation | 0.512214

(most_common_vals, most_common_freqs and histogram_bounds is very long
values from default_statistics_target=1000, top most_common_freqs is only
0.00282333).

Kind Regards,
Maksym


Re: [GENERAL] Inefficient plan selected by PostgreSQL 9.0.7

2012-05-01 Thread Maxim Boguk
On Wed, May 2, 2012 at 2:50 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  I got very inefficient plan for a simple query.

 It looks like the problem is with the estimate of the antijoin size:

   -  Nested Loop Anti Join  (cost=0.00..24576.82 rows=1
 width=206)
  (actual time=0.043..436.386 rows=20761 loops=1)

 that is, only about 20% of the rows in sb_messages are eliminated by the
 NOT EXISTS condition, but the planner thinks that nearly all of them
 will be (and that causes it to not think that the LIMIT is going to
 affect anything, so it doesn't prefer a fast-start plan).

 Since you've not told us anything about the statistics of these tables,
 it's hard to speculate as to why the estimate is off.

regards, tom lane


Most interesting part that NOT EXISTS estimates way off, when LEFT JOIN
WHERE ... IS NULL esimated correctly:

good esitmate (estimated rows=20504  vs real rows=20760):
Game2=# EXPLAIN ANALYZE
SELECT
*
FROM sb_messages messages_tbl
LEFT JOIN users users_tbl ON users_tbl.id = messages_tbl.from_user
WHERE
messages_tbl.type IN (0, 9) AND
messages_tbl.visibility_status = 0 AND
messages_tbl.not_show_on_air = 'f' AND
messages_tbl.clan_id IS NULL AND
users_tbl.blocked IS DISTINCT FROM 't';

QUERY PLAN
---
 Nested Loop Left Join  (cost=0.00..24577.74 rows=20504 width=1037) (actual
time=0.045..532.012 rows=20760 loops=1)
   Filter: (users_tbl.blocked IS DISTINCT FROM true)
   -  Index Scan using sb_messages_special3_key on sb_messages
messages_tbl  (cost=0.00..3793.75 rows=35784 width=208) (actual
time=0.019..67.746 rows=24937 loops=1)
   -  Index Scan using sb_users_pkey on users users_tbl  (cost=0.00..0.53
rows=1 width=829) (actual time=0.007..0.009 rows=1 loops=24937)
 Index Cond: (users_tbl.id = messages_tbl.from_user)
 Total runtime: 563.944 ms


bad estimate (estimated 1 vs real rows=20760):
Game2=# EXPLAIN (ANALYZE, COSTS) SELECT * FROM sb_messages messages_tbl
WHERE
   (messages_tbl.type IN (0, 9) AND messages_tbl.visibility_status=0 AND
messages_tbl.not_show_on_air='f' AND messages_tbl.clan_id IS NULL)
   AND  NOT EXISTS (SELECT 1 FROM users users_tbl WHERE blocked='t' and
users_tbl.id = messages_tbl.from_user);

QUERY PLAN
---
 Nested Loop Anti Join  (cost=0.00..24488.28 rows=1 width=208) (actual
time=0.044..430.645 rows=20760 loops=1)
   -  Index Scan using sb_messages_special3_key on sb_messages
messages_tbl  (cost=0.00..3793.75 rows=35784 width=208) (actual
time=0.020..67.810 rows=24937 loops=1)
   -  Index Scan using sb_users_pkey on users users_tbl  (cost=0.00..0.53
rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=24937)
 Index Cond: (users_tbl.id = messages_tbl.from_user)
 Filter: users_tbl.blocked
 Total runtime: 461.296 ms


What is curious that not exists always perform 20% faster (I performed both
explains like 10 times each and each time not exits is close to 20% faster).


-- 
Maxim Boguk
Senior Postgresql DBA.


[GENERAL] Postgresql 9.0.7 weird planner decision (rows in plan close to reality but plan suboptimal)

2012-04-06 Thread Maxim Boguk
Hi,

Today on one of databases under my management I found very strange plan for
simple query.
Postgresql 9.0.7 on Linux,
random_page_cost=4
seq_page_cost=1

The query and plan:

db=# EXPLAIN (ANALYZE, COSTS, BUFFERS)  select obj_id, obj_commented,p2o_id
FROM blog_post as obj
JOIN person2obj ON p2o_obj_obj_id = obj_id
JOIN person2obj_counters ON p2oc_id = p2o_id
WHERE obj_status_did = 1
AND obj_commented IS NOT NULL
AND obj_commented  now() - '7days'::interval
AND obj_commented  p2o_notified
AND p2o_notify = 't';

QUERY PLAN

 Merge Join  (cost=6546.54..6584.69 rows=3 width=24) (actual
time=86.262..1349.266 rows=1770 loops=1)
   Merge Cond: (person2obj_counters.p2oc_id = person2obj.p2o_id)
   Buffers: shared hit=1140491
   -  Index Scan using pk_person2obj_counters on person2obj_counters
(cost=0.00..47110.95 rows=1212591 width=8) (actual time=0.008..997.948
rows=1212765 loops=1)
 Buffers: shared hit=1108452
   -  Sort  (cost=6546.42..6546.98 rows=221 width=24) (actual
time=85.877..88.373 rows=7870 loops=1)
 Sort Key: person2obj.p2o_id
 Sort Method:  quicksort  Memory: 807kB
 Buffers: shared hit=32039
 -  Nested Loop  (cost=0.00..6537.82 rows=221 width=24) (actual
time=0.097..80.129 rows=7870 loops=1)
   Buffers: shared hit=32039
   -  Index Scan using i_blog_post_commented_active on
blog_post obj  (cost=0.00..225.73 rows=1726 width=16) (actual
time=0.028..17.957 rows=6010 loops=1)
 Index Cond: ((obj_commented IS NOT NULL) AND
(obj_commented  (now() - '7 days'::interval)))
 Buffers: shared hit=6207
   -  Index Scan using i_person2obj_obj_notified_subscribed on
person2obj  (cost=0.00..3.64 rows=1 width=24) (actual time=0.006..0.009
rows=1 loops=6010)
 Index Cond: ((person2obj.p2o_obj_obj_id = obj.obj_id)
AND (obj.obj_commented  person2obj.p2o_notified))
 Buffers: shared hit=25832
 Total runtime: 1349.767 ms

I don't understand why database choose merge join with 1.2M entries table.
person2obj_counters have an index on p2oc_id (it's a primary key field).

Switch to fast inner loop plan could be managed with set random_page_cost=10

db=# set random_page_cost to 10;
SET
db=# EXPLAIN (ANALYZE, COSTS, BUFFERS)  select obj_id, obj_commented,p2o_id
FROM blog_post as obj
JOIN person2obj ON p2o_obj_obj_id = obj_id
JOIN person2obj_counters ON p2oc_id = p2o_id
WHERE obj_status_did = 1
AND obj_commented IS NOT NULL
AND obj_commented  now() - '7days'::interval
AND obj_commented  p2o_notified
AND p2o_notify = 't';

QUERY PLAN
--
 Nested Loop  (cost=0.00..14810.38 rows=3 width=24) (actual
time=16.910..115.110 rows=1758 loops=1)
   Buffers: shared hit=57403
   -  Nested Loop  (cost=0.00..14616.37 rows=221 width=24) (actual
time=0.088..82.342 rows=7858 loops=1)
 Buffers: shared hit=32046
 -  Index Scan using i_blog_post_commented_active on blog_post
obj  (cost=0.00..273.70 rows=1725 width=16) (actual time=0.029..16.260
rows=6009 loops=1)
   Index Cond: ((obj_commented IS NOT NULL) AND (obj_commented
 (now() - '7 days'::interval)))
   Buffers: shared hit=6222
 -  Index Scan using i_person2obj_obj_notified_subscribed on
person2obj  (cost=0.00..8.30 rows=1 width=24) (actual time=0.007..0.010
rows=1 loops=6009)
   Index Cond: ((person2obj.p2o_obj_obj_id = obj.obj_id) AND
(obj.obj_commented  person2obj.p2o_notified))
   Buffers: shared hit=25824
   -  Index Scan using pk_person2obj_counters on person2obj_counters
(cost=0.00..0.87 rows=1 width=8) (actual time=0.004..0.004 rows=0
loops=7858)
 Index Cond: (person2obj_counters.p2oc_id = person2obj.p2o_id)
 Buffers: shared hit=25357
 Total runtime: 115.465 ms
(14 rows)

Ok... so what I see... the database think it will need join 221 rows from
previous level to the person2obj_counters table.
And somehow Pg  manage to calculate that the full index scan over 1.2M
entries of the person2obj_counters table is faster that nested loop probes
over 221 value.

What look very suspicious is that merge full index scan+merge join part
adds only 40 points to the total cost (
Merge Join  (cost=6546.54..6584.69 rows=3 width=24) (actual
time=86.262..1349.266 rows=1770 loops=1)
...
 -  Index Scan using pk_person2obj_counters on person2obj_counters
(cost=0.00..47110.95 rows=1212591 width=8) (actual time=0.008..997.948
rows=1212765 loops=1)
...
   -  Sort  (cost=6546.42..6546.98 rows=221 width=24) (actual
time=85.877..88.373 rows=7870 loops=1)
)... how that could be?

-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU

[GENERAL] Why checkpoint_timeout had maximum value of 1h?

2012-03-29 Thread Maxim Boguk
Hi all,

Is there any real reason why checkpoint_timeout limited to 1hour?

In my case I have some replicas with WAL on SAS raid and PGDATA on SSD with
limited write endurance.
And I don't worry about possible long time recovery after power failure in
that case.
Whats more working dataset fill in shared buffers, so almost no dirty
buffers evictions by bgwriter or backends happened.

In that case having checkpoint_timeout=10hour could reduce amout of writes
on SSD  by factor of 10, and increase planned ssd lifetime by the same
amount.

I would like to have ability to set checkpoint_timeout=high value
and (whats even better) checkpoint_timeout=0 - in that case checkpoint
happen when all checkpoint_segments were used.

Is there any serious drawbacks in that idea?

Is it safe to increase that limit in source and rebuild database?  (9.0 and
9.1 case)

-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


[GENERAL] Question about warning: invalid resource manager ID 128 at ... on hot stanby

2012-03-22 Thread Maxim Boguk
Hi all.

Database version used 9.0.4 on FreeBSD 7.3.

Today ,after restart of replica db, I got the next warning in log:
2012-03-23 03:10:08.221 MSK 55096 @ from  [vxid:1/0 txid:0] []LOG:
invalid resource manager ID 128 at 44E/4E7303B0

I searched over mailing lists but I still not sure is it harmless or no...
Full log looks like:

2012-03-23 03:08:46.465 MSK 38622 @ from  [vxid: txid:0] []LOG:  received
fast shutdown request
2012-03-23 03:08:46.465 MSK 38622 @ from  [vxid: txid:0] []LOG:  aborting
any active transactions
2012-03-23 03:08:46.465 MSK 38627 @ from  [vxid: txid:0] []FATAL:
terminating walreceiver process due to administrator command
2012-03-23 03:08:46.693 MSK 38718 @ from  [vxid: txid:0] []LOG:  shutting
down
2012-03-23 03:08:46.696 MSK 38718 @ from  [vxid: txid:0] []LOG:  database
system is shut down
2012-03-23 03:08:53.850 MSK 55096 @ from  [vxid: txid:0] []LOG:  database
system was shut down in recovery at 2012-03-23 03:08:46 MSK
2012-03-23 03:08:53.874 MSK 55096 @ from  [vxid: txid:0] []LOG:  entering
standby mode
2012-03-23 03:08:53.905 MSK 55096 @ from  [vxid:1/0 txid:0] []LOG:  redo
starts at 44E/114060E8
2012-03-23 03:10:08.221 MSK 55096 @ from  [vxid:1/0 txid:0] []LOG:
consistent recovery state reached at 44E/4E7303B0
2012-03-23 03:10:08.221 MSK 55096 @ from  [vxid:1/0 txid:0] []LOG:  invalid
resource manager ID 128 at 44E/4E7303B0
2012-03-23 03:10:08.222 MSK 55093 @ from  [vxid: txid:0] []LOG:  database
system is ready to accept read only connections
2012-03-23 03:10:08.239 MSK 56317 @ from  [vxid: txid:0] []LOG:  streaming
replication successfully connected to primary

Is that warning harmless on 9.0.4 or should I start to worry about?

-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


[GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again

2012-02-20 Thread Maxim Boguk
One of servers under my support 2 days ago produced the next error:

ERROR:  could not read block 135 in file base/16404/118881486: read only
0 of 8192 bytes

Server version 9.0.6. No db or server crashes or db recovery happen on that
server since setup.
Server is db backend for very large web project under quite a heavy load
(10k+ request per second to db).
No other strange errors found in the logs since server was put to
production half year ago.

Now what happened (from DB log):

2012-02-17 22:35:58 MSK 14333 [vxid:340/1822 txid:2341883282] [DELETE]
LOG:  duration: 5.669 ms  execute unnamed: delete from agency_statistics
2012-02-17 22:35:58 MSK 14333 [vxid:340/1823 txid:0] [BIND] LOG:
duration: 2.787 ms  bind unnamed: insert into agency_statistics
(employer_id, area_id, area_pa
th, professional_area_id, vacancies_number) select e.employer_id,
vb.area_id, a.path, s.professional_area_id, count(distinct v.vacancy_id)
from vacancy v inner join employer e on v.employer
_id = e.employer_id inner join vacancy_body vb on v.vacancy_body_id =
vb.vacancy_body_id inner join vacancy_body_specialization vs on
vb.vacancy_body_id = vs.vacancy_body_id inner join spec
ialization s on s.specialization_id = vs.specialization_id inner join area
a on vb.area_id = a.area_id where v.archived_status in (0) and v.disabled =
false and e.category = $1 and e.state
= $2 and e.manager_id  0 group by e.employer_id, vb.area_id, a.path,
s.professional_area_id
2012-02-17 22:35:59 MSK 14333 [vxid:340/1823 txid:2341883336] [INSERT]
LOG:  duration: 726.700 ms  execute unnamed: insert into
agency_statistics (employer_id,
area_id, area_path, professional_area_id, vacancies_number) select
e.employer_id, vb.area_id, a.path, s.professional_area_id, count(distinct
v.vacancy_id) from vacancy v inner join employer
 e on v.employer_id = e.employer_id inner join vacancy_body vb on
v.vacancy_body_id = vb.vacancy_body_id inner join
vacancy_body_specialization vs on vb.vacancy_body_id = vs.vacancy_body_id
 inner join specialization s on s.specialization_id = vs.specialization_id
inner join area a on vb.area_id = a.area_id where v.archived_status in (0)
and v.disabled = false and e.category =
 $1 and e.state = $2 and e.manager_id  0 group by e.employer_id,
vb.area_id, a.path, s.professional_area_id
2012-02-17 22:35:59 MSK 14333 [vxid:340/18230070 txid:0] [BIND] LOG:
duration: 30.195 ms  bind S_35: insert into agency_statistics (employer_id,
area_id, area_path,
professional_area_id, vacancies_number, rank) values ($1, $2, $3, $4, $5, 0)
2012-02-17 22:35:59 MSK 14333 [vxid:340/18230082 txid:2341883427] [INSERT]
ERROR:  could not read block 135 in file base/16404/118881486: read only
0 of 8192 bytes
2012-02-17 22:35:59 MSK 14333 [vxid:340/18230082 txid:2341883427] [INSERT]
STATEMENT:  insert into agency_statistics (employer_id, area_id, area_path,
professional_area_id, vacancies_number, rank) values ($1, $2, $3, $4, $5, 0)

No concurrent writes was happened to that table outside of that connection.
May be autovacuum were fired on that table between delete and first or
second inserts (unfortunately it wasn't logged).

I recreated table from scratch and keep the damaged table under another
name (through alter table agency_statistics rename to
agency_statistics_old).
So I have files to dig into.

What I see in file system:
hh=# SELECT relfilenode from pg_class where relname='agency_statistics_old';
 relfilenode
-
   118881486
postgres@db10:~/tmp$ ls -la
/var/lib/postgresql/9.0/main/base/16404/118881486
-rw--- 1 postgres postgres 0 2012-02-20 12:04
/var/lib/postgresql/9.0/main/base/16404/118881486

So table file size zero bytes (seems autovacuum truncated that table to 0
bytes).

But index show:
hh=#  SELECT relfilenode from pg_class where
relname='agency_statistics_pkey';
 relfilenode
-
   118881489
postgres@db10:~/tmp$ ls -la
/var/lib/postgresql/9.0/main/base/16404/118881489
-rw--- 1 postgres postgres 4956160 2012-02-20 12:07
/var/lib/postgresql/9.0/main/base/16404/118881489

hh=# SELECT * from pgstatindex('agency_statistics_pkey');
-[ RECORD 1 ]--+
version| 2
tree_level | 2
index_size | 4947968
root_block_no  | 295
internal_pages | 1
leaf_pages | 601
empty_pages| 0
deleted_pages  | 1
avg_leaf_density   | 0.45
leaf_fragmentation | 13.14

I out of ideas now.
Any suggestions where and what I should look next?

Kind Regards,
Maksym

-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again

2012-02-20 Thread Maxim Boguk

  So table file size zero bytes (seems autovacuum truncated that table to 0
  bytes).

 Hmmm  something did, but I see no clear evidence that it was
 autovacuum.

 Do you know why the mod date on the file is 2012-02-20 12:04?  That's
 more than two days after the error in your logs, so it's not clear to me
 that the current state of the file tells us much about what happened on
 the 17th.  If autovacuum had truncated the table then, and the table
 wasn't touched otherwise, the file mod date shouldn't have increased.

regards, tom lane



Hi,

Cron was attempt to populate the table once per hour after that problem
happened.
And each time it was produced the same error.
So table had attempted write activity after error happen:
(
2012-02-18 00:36:01 MSK 27743 hh.app@hh from [local] [vxid:95/19965879
txid:2342867908] [INSERT] ERROR:  could not read block 171 in file
base/16404/118881486: read only 0 of 8192 bytes
2012-02-18 00:36:01 MSK 27743 hh.app@hh from [local] [vxid:95/19965879
txid:2342867908] [INSERT] STATEMENT:  insert into agency_statistics
(employer_id, area_id, area_path, professional_area_id, vacancies_number)
select e.employer_id, vb.area_id, a.path, s.professional_area_id,
count(distinct v.vacancy_id) from vacancy v inner join employer e on
v.employer_id = e.employer_id inner join vacancy_body vb on
v.vacancy_body_id = vb.vacancy_body_id inner join
vacancy_body_specialization vs on vb.vacancy_body_id = vs.vacancy_body_id
inner join specialization s on s.specialization_id = vs.specialization_id
inner join area a on vb.area_id = a.area_id where v.archived_status in (0)
and v.disabled = false and e.category = $1 and e.state = $2 and
e.manager_id  0 group by e.employer_id, vb.area_id, a.path,
s.professional_area_id
...
and so on until Monday when I was informed about that problem.
).

For information the table has quite simple structure without any unusual
features:

hh=# \d+ agency_statistics_old
Table public.agency_statistics_old
Column|  Type  | Modifiers | Storage  |
Description
--++---+--+-
 employer_id  | integer| not null  | plain|
 area_id  | integer| not null  | plain|
 area_path| character varying(255) | not null  | extended |
 professional_area_id | integer| not null  | plain|
 vacancies_number | integer| not null  | plain|
 rank | integer|   | plain|
 normalised_rank  | integer|   | plain|
Indexes:
agency_statistics_pkey PRIMARY KEY, btree (employer_id, area_id,
professional_area_id) CLUSTER
Has OIDs: no
Options: fillfactor=50


I almost sure I will get an error if I try insert something to the
agency_statistics_old.
Should I perform that test?

Kind Regards,
Maxim


Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again

2012-02-20 Thread Maxim Boguk
On Tue, Feb 21, 2012 at 1:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  Do you know why the mod date on the file is 2012-02-20 12:04?

  Cron was attempt to populate the table once per hour after that problem
  happened.
  And each time it was produced the same error.

 That's interesting ... is there any possibility that the insertions were
 attempting to insert values that matched a previously-existing primary
 key value?  I'm thinking there's no reason for the INSERT per se to be
 touching nonexistent blocks, but if for some reason the pkey index still
 had entries pointing at vanished rows (as it seems to) then the errors
 could be coming from uniqueness checks attempting to fetch those rows to
 see if they're live.

regards, tom lane


Hi,

There isn't possibility but close to 100% new inserted values were matched
a previously-existing primary
key value.
The table is hand-made 'materialyzed view'-type statistic table which is
getting recalculated via cron.

-- 
Maxim Boguk
Senior Postgresql DBA.


Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again

2012-02-20 Thread Maxim Boguk
On Tue, Feb 21, 2012 at 1:46 PM, Maxim Boguk maxim.bo...@gmail.com wrote:



 On Tue, Feb 21, 2012 at 1:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  Do you know why the mod date on the file is 2012-02-20 12:04?

  Cron was attempt to populate the table once per hour after that problem
  happened.
  And each time it was produced the same error.

 That's interesting ... is there any possibility that the insertions were
 attempting to insert values that matched a previously-existing primary
 key value?  I'm thinking there's no reason for the INSERT per se to be
 touching nonexistent blocks, but if for some reason the pkey index still
 had entries pointing at vanished rows (as it seems to) then the errors
 could be coming from uniqueness checks attempting to fetch those rows to
 see if they're live.

regards, tom lane


 Hi,

 There isn't possibility but close to 100% new inserted values were matched
 a previously-existing primary
 key value.
 The table is hand-made 'materialyzed view'-type statistic table which is
 getting recalculated via cron.


To be clear - the new inserted values do match a previously-existing
primary key values almost always.
Sorry for not being clear.


Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again

2012-02-20 Thread Maxim Boguk

 OK, so that pretty much explains where the visible symptoms are coming
 from: somehow, the table got truncated but its pkey index did not get
 cleared out.  So an insert creates an empty page zero, inserts a heap
 tuple there, tries to insert an index entry.  The btree code sees there
 is an index entry for that key already, and tries to fetch the heap
 tuple for that index entry to see if it's dead (which would allow the
 insertion to proceed).  But the block number the index is pointing at
 isn't there, so you get the quoted error message.  The insertion rolls
 back, leaving a dead tuple that can be garbage-collected by autovacuum,
 after which it truncates the table again --- but of course without
 removing any index entries, except maybe one for TID (0,1) if that's
 still there.  Lather rinse repeat.

 But this brings us little closer to understanding the cause of the
 problem.  How could the table have gotten truncated without vacuuming
 out the index?

 Are you sure the only operations happening on that table are INSERT,
 DELETE, autovacuum?  No DDL of any kind?

regards, tom lane


Not what I know. There are could be some updates as well, but they
performed by the same cron as initial delete and they doesn't happened in
that case because the cron script exit on the first database error. E.g. no
one in my knowledge could write into that table except that single cron and
autovacuum.
And I have logged of the all queries with duration over 1ms.

Kind Regards,
Maksym


Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again

2012-02-20 Thread Maxim Boguk
On Tue, Feb 21, 2012 at 3:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I wrote:
  OK, so that pretty much explains where the visible symptoms are coming
  from: somehow, the table got truncated but its pkey index did not get
  cleared out.  So an insert creates an empty page zero, inserts a heap
  tuple there, tries to insert an index entry.  The btree code sees there
  is an index entry for that key already, and tries to fetch the heap
  tuple for that index entry to see if it's dead (which would allow the
  insertion to proceed).  But the block number the index is pointing at
  isn't there, so you get the quoted error message.  The insertion rolls
  back, leaving a dead tuple that can be garbage-collected by autovacuum,
  after which it truncates the table again --- but of course without
  removing any index entries, except maybe one for TID (0,1) if that's
  still there.  Lather rinse repeat.

 Hmm ... actually there is a point that this theory doesn't explain
 entirely.  If the probability of a collision with an existing index
 entry was near 100%, then each hourly cron job should only have been
 able to insert one or a few heap tuples before failing.  That would
 not trigger an autovacuum right away.  Eventually the number of dead
 tuples would build up to the point where autovacuum got interested,
 but it strains credulity a bit to assume that this happened exactly
 after the last hourly run before you renamed the table.  Yet, if that
 didn't happen just that way, how come the size of the table is exactly
 zero now?

 The theory would be more satisfactory if we could expect that an hourly
 run would be able to insert some thousands of tuples before failing,
 enough to trigger an autovacuum run.  So I'm wondering if maybe the
 index is *partially* cleaned out, but not completely.  Does this
 materialized view have a fairly predictable number of rows, and if so
 how does that compare to the number of entries in the index?  (If you
 have no other way to get the number of entries in the index, try
 inserting a dummy row, deleting it, and then VACUUM VERBOSE.)

regards, tom lane


There is some funny results:

hh=# VACUUM verbose agency_statistics_old;
INFO:  vacuuming public.agency_statistics_old
INFO:  index agency_statistics_pkey now contains 0 row versions in 605
pages
DETAIL:  0 index row versions were removed.
595 index pages have been deleted, 595 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  agency_statistics_old: found 0 removable, 0 nonremovable row
versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
VACUUM

However when I try populate that table with production data I get an error:

hh=# insert into agency_statistics_old select * from agency_statistics;
ERROR:  could not read block 228 in file base/16404/118881486: read only
0 of 8192 bytes

E.g. the database see that index have zero rows, but an insert still fail.

May be I should use pageinspect addon to see an actual index pages content?

-- 
Maxim Boguk
Senior Postgresql DBA.


Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again

2012-02-20 Thread Maxim Boguk
On Tue, Feb 21, 2012 at 3:47 PM, Maxim Boguk maxim.bo...@gmail.com wrote:



 On Tue, Feb 21, 2012 at 3:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I wrote:
  OK, so that pretty much explains where the visible symptoms are coming
  from: somehow, the table got truncated but its pkey index did not get
  cleared out.  So an insert creates an empty page zero, inserts a heap
  tuple there, tries to insert an index entry.  The btree code sees there
  is an index entry for that key already, and tries to fetch the heap
  tuple for that index entry to see if it's dead (which would allow the
  insertion to proceed).  But the block number the index is pointing at
  isn't there, so you get the quoted error message.  The insertion rolls
  back, leaving a dead tuple that can be garbage-collected by autovacuum,
  after which it truncates the table again --- but of course without
  removing any index entries, except maybe one for TID (0,1) if that's
  still there.  Lather rinse repeat.

 Hmm ... actually there is a point that this theory doesn't explain
 entirely.  If the probability of a collision with an existing index
 entry was near 100%, then each hourly cron job should only have been
 able to insert one or a few heap tuples before failing.  That would
 not trigger an autovacuum right away.  Eventually the number of dead
 tuples would build up to the point where autovacuum got interested,
 but it strains credulity a bit to assume that this happened exactly
 after the last hourly run before you renamed the table.  Yet, if that
 didn't happen just that way, how come the size of the table is exactly
 zero now?

 The theory would be more satisfactory if we could expect that an hourly
 run would be able to insert some thousands of tuples before failing,
 enough to trigger an autovacuum run.  So I'm wondering if maybe the
 index is *partially* cleaned out, but not completely.  Does this
 materialized view have a fairly predictable number of rows, and if so
 how does that compare to the number of entries in the index?  (If you
 have no other way to get the number of entries in the index, try
 inserting a dummy row, deleting it, and then VACUUM VERBOSE.)

regards, tom lane


 There is some funny results:

 hh=# VACUUM verbose agency_statistics_old;
 INFO:  vacuuming public.agency_statistics_old
 INFO:  index agency_statistics_pkey now contains 0 row versions in 605
 pages
 DETAIL:  0 index row versions were removed.
 595 index pages have been deleted, 595 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.01 sec.
 INFO:  agency_statistics_old: found 0 removable, 0 nonremovable row
 versions in 0 out of 0 pages
 DETAIL:  0 dead row versions cannot be removed yet.
 There were 0 unused item pointers.
 0 pages are entirely empty.
 CPU 0.00s/0.00u sec elapsed 0.01 sec.
 VACUUM

 However when I try populate that table with production data I get an error:

 hh=# insert into agency_statistics_old select * from agency_statistics;
 ERROR:  could not read block 228 in file base/16404/118881486: read only
 0 of 8192 bytes

 E.g. the database see that index have zero rows, but an insert still fail.

 May be I should use pageinspect addon to see an actual index pages content?



What makes Your idea about: index is *partially* cleaned out, but not
completely highly probable that is the next query produce no error:

hh=# insert into agency_statistics_old select * from agency_statistics
limit 1;
INSERT 0 1


-- 
Maxim Boguk
Senior Postgresql DBA.


Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again

2012-02-20 Thread Maxim Boguk
On Tue, Feb 21, 2012 at 4:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  There is some funny results:

  hh=# VACUUM verbose agency_statistics_old;
  INFO:  vacuuming public.agency_statistics_old
  INFO:  index agency_statistics_pkey now contains 0 row versions in 605
  pages
  DETAIL:  0 index row versions were removed.

 Wow.  That seems to blow my theory to small pieces.  If the index
 contains no entries then it shouldn't be causing any uniqueness check
 probes.  But at the same time, if the index is empty then how come
 pgstatindex showed avg_leaf_density = 0.45 ?

  May be I should use pageinspect addon to see an actual index pages
 content?

 That or pg_filedump would be interesting.  But your experiments with
 adding data from the other table will probably have produced some new
 index entries, which will confuse the situation.  Did you save a
 physical copy of the index before that?

 Another idea is to attach to the backend with gdb, set a breakpoint at
 errfinish, and get a stack trace from the point of the could not read
 block error.  That would show definitively if this is coming from a
 uniqueness check or something else entirely.

regards, tom lane


Yes I had saved a physical copy of the file before start playing with it.

Unfortunately that is a production server with no gcc and gdb available so
pg_filedump or gdb yet (but I going to work on it).

While I waiting for gdb/gcc on that server I had built pg_filedump on the
development server using same postgresql version and created pg_filedump of
the index file.
It can be downloaded there:
http://maximboguk.com/static/etc/agency_statistics_pkey.pg_filedump.gz

I have not enough knowledge of the b-tree index structure to extract any
useful information from that file.

Kind Regards,
Maksym


Re: [GENERAL] Unfamous 'could not read block ... in file ...: read only 0 of 8192 bytes' again

2012-02-20 Thread Maxim Boguk
On Tue, Feb 21, 2012 at 5:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  While I waiting for gdb/gcc on that server I had built pg_filedump on the
  development server using same postgresql version and created pg_filedump
 of
  the index file.
  It can be downloaded there:
  http://maximboguk.com/static/etc/agency_statistics_pkey.pg_filedump.gz

 If the index key values are not private information, could we see that
 with pg_filedump -i -f not just bare?

regards, tom lane


There is it:
http://maximboguk.com/static/etc/agency_statistics_pkey.pg_full_filedump.gz

-- 
Maxim Boguk
Senior Postgresql DBA.


Re: [GENERAL] Question about (probably wrong) index scan cost for conditional indexes

2012-01-29 Thread Maxim Boguk
On Mon, Jan 30, 2012 at 12:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maxim Boguk maxim.bo...@gmail.com writes:
 Seems previous test case not clear demonstrate the problem which i have
 stuck with.
 Now much better and close to reality test case:

 AFAICT, these behaviors all boil down to the fact that contrib/intarray
 doesn't provide a real cost estimator for its  operator.  It's using
 the contsel stub function, which provides a fixed selectivity of
 0.001.  In your test case, with 100 rows in the table, the estimate
 for the number of rows satisfying sections  '{2}' thus comes out to
 exactly 1000.  Unfortunately, the true number is around 10, and it's
 that discrepancy that is leading to all of these bad cost estimates.

 What I'd like to see done about this is for somebody to adapt the
 work Jan Urbanski did on tsvector stats collection and estimation
 so that it works for the anyarray operators.  It's a bit too late
 to imagine that that'll get done for 9.2, but maybe for 9.3.

                        regards, tom lane

Hi,

Thank you very much for the answer.
I know there is issue with statistics over intarrays (it was there
very long time and sometime it's complicating things a lot).

However,  the 100x cost difference between:
SELECT * from test order by id limit 100;  (over primary key (id) btree index)
Limit  (cost=0.00..3.43 rows=100 width=37)
vs
SELECT * from test where sections  '{2}' order by value limit 100;
(over test_value_in2section_key on test(value) where sections 
'{2}'   btree index)
Limit  (cost=0.00..539.29 rows=100 width=37)
seems wrong for me.

Both queries performs the absolutely same task: fetch 100 entries from
the table based on the ideally suitable index (no post
processing/filtering were done at all... just return 100 sorted tuples
based on single index scan).

I don't understand where 2x+ order of cost difference come from.

And even if I drop the intarray index completely, than I still have a
wrong plan (bitmap scan + sort),  because planner cost for the index
scan over conditional index 100 more the it should be.
(e.g. there is still an issue even in absence of the intarray index).

Is absence of frequency statistics over intarrays somehow linked to
the wrong planner cost estimates for conditional index scan?

King Regards,
Maxim Boguk

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Question about (probably wrong) index scan cost for conditional indexes

2012-01-23 Thread Maxim Boguk
Hi.

Seems previous test case not clear demonstrate the problem which i have
stuck with.

Now much better and close to reality test case:

Preparation:
set random_page_cost to 4;
set seq_page_cost to 1;

create table test (id integer primary key, sections integer[], value float);
insert into test select id, ('{'||((random()*10)::integer)||'}')::integer[]
as value, random() as value from generate_series(1,100) as g(id);
--generic gist index for array
CREATE INDEX test_sections_gist on test using gist(sections);
--specialized index on value for sections  '{2}'
CREATE INDEX test_value_in2section_key on test(value) where sections 
'{2}';
analyze test;

Now actual tests:

Good query but cost definitely wrong:
postgres=# EXPLAIN ANALYZE SELECT * from test where sections  '{2}' order
by value limit 100;
  QUERY PLAN
---
 Limit  (cost=0.00..539.29 rows=100 width=37) (actual time=0.043..0.499
rows=100 loops=1)
   -  Index Scan using test_value_in2section_key on test
(cost=0.00..5392.87 rows=1000 width=37) (actual time=0.040..0.434 rows=100
loops=1)
 Total runtime: 0.570 ms

Compare with almost equivalent query:
postgres=#  EXPLAIN ANALYZE SELECT * from test order by id limit 100;
QUERY PLAN
---
 Limit  (cost=0.00..3.43 rows=100 width=37) (actual time=0.057..0.192
rows=100 loops=1)
   -  Index Scan using test_pkey on test  (cost=0.00..34317.36
rows=100 width=37) (actual time=0.054..0.115 rows=100 loops=1)
 Total runtime: 0.258 ms

Actual speed almost same but cost differs 100 times.



Now if I increase the limit I start getting slow plans because it switch to
GIST index and bitmap scan (because cost of common index scan too high):

postgres=# EXPLAIN ANALYZE SELECT * from test where sections  '{2}' order
by value limit 1000;
   QUERY
PLAN
-
 Limit  (cost=2941.68..2944.18 rows=1000 width=37) (actual
time=175.301..175.766 rows=1000 loops=1)
   -  Sort  (cost=2941.68..2944.18 rows=1000 width=37) (actual
time=175.298..175.541 rows=1000 loops=1)
 Sort Key: value
 Sort Method: top-N heapsort  Memory: 127kB
 -  Bitmap Heap Scan on test  (cost=56.48..2891.85 rows=1000
width=37) (actual time=80.230..132.479 rows=99641 loops=1)
   Recheck Cond: (sections  '{2}'::integer[])
   -  Bitmap Index Scan on test_sections_gist
(cost=0.00..56.23 rows=1000 width=0) (actual time=78.112..78.112 rows=99641
loops=1)
 Index Cond: (sections  '{2}'::integer[])
 Total runtime: 175.960 ms
(9 rows)

Even if I drop GIST index i'm still getting wrong plan:
postgres=# drop index test_sections_gist;
DROP INDEX
postgres=# EXPLAIN ANALYZE SELECT * from test where sections  '{2}' order
by value limit 1000;

QUERY PLAN
--
 Limit  (cost=4489.88..4492.38 rows=1000 width=37) (actual
time=116.637..117.088 rows=1000 loops=1)
   -  Sort  (cost=4489.88..4492.38 rows=1000 width=37) (actual
time=116.635..116.857 rows=1000 loops=1)
 Sort Key: value
 Sort Method: top-N heapsort  Memory: 127kB
 -  Bitmap Heap Scan on test  (cost=1604.68..4440.05 rows=1000
width=37) (actual time=22.175..74.556 rows=99641 loops=1)
   Recheck Cond: (sections  '{2}'::integer[])
   -  Bitmap Index Scan on test_value_in2section_key
(cost=0.00..1604.43 rows=1000 width=0) (actual time=20.248..20.248
rows=99641 loops=1)
 Total runtime: 117.261 ms


And only if I completely disable bitmap scan I get good fast plan (but with
exceptional high cost):

postgres=# set enable_bitmapscan to 0;
SET
postgres=# EXPLAIN ANALYZE SELECT * from test where sections  '{2}' order
by value limit 1000;
   QUERY
PLAN

 Limit  (cost=0.00..5392.87 rows=1000 width=37) (actual time=0.047..4.123
rows=1000 loops=1)
   -  Index Scan using test_value_in2section_key on test
(cost=0.00..5392.87 rows=1000 width=37) (actual time=0.044..3.552 rows=1000
loops=1)
 Total runtime: 4.460 ms


I hope that test case will make my issue more clear.

Regards,
Maksym

On Mon, Jan 23, 2012 at 11:46 AM, Maxim Boguk maxim.bo...@gmail.com wrote:


 On Mon, Jan 23, 2012 at 11

[GENERAL] Question about (probably wrong) index scan cost for conditional indexes

2012-01-22 Thread Maxim Boguk
I not sure it is bug or just planner work that way.
Postgresql 9.1.2 on Linux.

But it seems that index scan cost for very narrow/selective conditional
indexes is greatly overestimated at least in some cases.

In my case I have an special conditional index like:

news_dtime_in208section_active_key2 btree (dtime) WHERE status = 1 AND
class::text = 'Sports::News'::text AND sections  '{208}'::integer[]

And query:

db=# EXPLAIN ANALYZE select * from news  where  (status = 1)  and  (class =
'Sports::News')  and  (sections  '{208}')  order by dtime  limit 10;

QUERY PLAN
-
 Limit  (cost=0.00..26.38 rows=10 width=1262) (actual time=0.040..0.082
rows=10 loops=1)
   -  Index Scan using news_dtime_in208section_active_key2 on news
(cost=0.00..1429.55 rows=542 width=1262) (actual time=0.038..0.073 rows=10
loops=1)
 Total runtime: 0.142 ms
(3 rows)

I see no reasons why cost of that query that high... i think it should be
very close equvalent in cost of query:

news_pkey PRIMARY KEY, btree (id)

db=# EXPLAIN ANALYZE select * from news  order by id limit 10;
QUERY PLAN
---
 Limit  (cost=0.00..0.33 rows=10 width=1262) (actual time=0.043..0.085
rows=10 loops=1)
   -  Index Scan using news_pkey on news  (cost=0.00..25944.34 rows=775090
width=1262) (actual time=0.041..0.077 rows=10 loops=1)
 Total runtime: 0.147 ms
(3 rows)

(and if you compare real execution times - they are same but cost is
different by 2 orders).
No changes of costing setting have an effect that difference.


That problem leads to switching to very slow plan for medium limits:

db=# EXPLAIN ANALYZE select * from news  where  (status = 1)  and  (class =
'Sports::News')  and  (sections  '{208}')  order by dtime  limit 40;

QUERY PLAN

 Limit  (cost=91.97..92.07 rows=40 width=1262) (actual
time=630.865..630.889 rows=40 loops=1)
   -  Sort  (cost=91.97..93.32 rows=542 width=1262) (actual
time=630.862..630.872 rows=40 loops=1)
 Sort Key: dtime
 Sort Method: top-N heapsort  Memory: 89kB
 -  Bitmap Heap Scan on news  (cost=6.18..74.83 rows=542
width=1262) (actual time=145.816..412.254 rows=262432 loops=1)
   Recheck Cond: ((sections  '{208}'::integer[]) AND (status
= 1) AND ((class)::text = 'Sports::News'::text))
   -  Bitmap Index Scan on news_sections_gin2_special
(cost=0.00..6.05 rows=542 width=0) (actual time=98.954..98.954 rows=262754
loops=1)
 Index Cond: (sections  '{208}'::integer[])
 Total runtime: 632.049 ms
(9 rows)

Kind regards,
Maksym

-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


Re: [GENERAL] Question about (probably wrong) index scan cost for conditional indexes

2012-01-22 Thread Maxim Boguk
On Mon, Jan 23, 2012 at 11:28 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  But it seems that index scan cost for very narrow/selective conditional
  indexes is greatly overestimated at least in some cases.

 I realized in connection with
 http://archives.postgresql.org/pgsql-general/2012-01/msg00459.php
 that btcostestimate is not correctly estimating numIndexTuples for
 partial indexes.  But it's impossible to tell from this amount of
 information whether you're seeing an effect of that, or something else.
 Can you provide a self-contained test case?

regards, tom lane


Prorably simpliest test case:

set random_page_cost to 4;
set seq_page_cost to 1;
drop table  if exists test;
CREATE TABLE test (id integer primary key, value1 float, value2 float,
value3 float, value4 float);
INSERT into test select id,random() as value1,random() as value2, random()
as value3,random() as value4 from generate_series(1,100) as g(id);
CREATE INDEX test_special_key on test(value1) where value2*20.01 and
value3*20.01 and value4*20.01;
ANALYZE test;

postgres=# EXPLAIN ANALYZE select * from test order by id limit 100;
QUERY PLAN
---
 Limit  (cost=0.00..3.43 rows=100 width=36) (actual time=0.042..0.170
rows=100 loops=1)
   -  Index Scan using test_pkey on test  (cost=0.00..34317.36
rows=100 width=36) (actual time=0.040..0.108 rows=100 loops=1)
 Total runtime: 0.243 ms
(3 rows)

vs

postgres=# EXPLAIN ANALYZE select * from test where value2*20.01 and
value3*20.01 and value4*20.01 order by value1 limit 100;
  QUERY PLAN
--
 Limit  (cost=0.00..92.52 rows=100 width=36) (actual time=0.072..0.072
rows=0 loops=1)
   -  Index Scan using test_special_key on test  (cost=0.00..34264.97
rows=37037 width=36) (actual time=0.070..0.070 rows=0 loops=1)
 Total runtime: 0.113 ms
(3 rows)

cost difference:
(cost=0.00..3.43 rows=100 width=36)
vs
(cost=0.00..92.52 rows=100 width=36)

An actual speed (and theoretical performance) almost same.

More selective conditions added to conditional index - worse situation with
wrong costing.

Kind Regards,
Maksym

-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.


[GENERAL] Question about HoT updates and conditional indexes

2011-12-13 Thread Maxim Boguk
Lets assume I have a table with an index defined as:
create index test_key on test_table(mtime) where status=1;

Will be update such as:
update test_table set mtime=NOW() where id=10;
threated as HOT update if  the entry with id=10 have status=0 ?

That update seems valid one for using HOT, however in practice it seems use
the full update way.


-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.


[GENERAL] WITH and WITH RECURSIVE in single query

2011-12-04 Thread Maxim Boguk
Hi.

Is here any way to combine WITH and WITH RECURSIVE into single query?

Something like:

WITH t AS (some complicated select to speed up recursive part),
RECURSIVE r AS
(
...
UNION ALL
...
)

?

-- 
Maxim Boguk
Senior Postgresql DBA.


[GENERAL] Questions about setting an array element value outside of the update

2011-12-04 Thread Maxim Boguk
Lets say i have subquery which produce array[], position and new_value

Is here less clumsy way to set  array[position] to the new_value (not
update but just change an element inside an array) than:

SELECT
_array[1:pos-1]
||newval
||_array[_pos+1:array_length(_array, 1)]
FROM
(
SELECT _array,
   pos,
   newval
 FROM
   some_colmplicated_logic
);

The:
_array[1:pos-1]
||newval
||_array[_pos+1:array_length(_array, 1)]
part is very clumsy for my eyes.

PS: that is just small part of the complicated WITH RECURSIVE iterator in
real task.

-- 
Maxim Boguk
Senior Postgresql DBA.


Re: [GENERAL] WITH and WITH RECURSIVE in single query

2011-12-04 Thread Maxim Boguk
On Mon, Dec 5, 2011 at 2:45 PM, David Johnston pol...@yahoo.com wrote:

 On Dec 4, 2011, at 22:28, Maxim Boguk maxim.bo...@gmail.com wrote:

  Hi.
 
  Is here any way to combine WITH and WITH RECURSIVE into single query?
 
  Something like:
 
  WITH t AS (some complicated select to speed up recursive part),
  RECURSIVE r AS
  (
  ...
  UNION ALL
  ...
  )
 
  ?
 
  --
  Maxim Boguk
  Senior Postgresql DBA.

 WITH RECURSIVE q1 As (), q2 AS () ...

 Add RECURSIVE after the WITH; it then applies to any/all the CTEs.

 Look at the specification (and description) in the SELECT documentation
 closely.

 David J.


Trouble is I trying to precalculate some data through WITH syntax (non
recursive).
To be used later in WITH RECURSIVE part (and keep a single of that data
instead of N).

Something like:

WITH _t AS (some complicated select to speed up recursive part),
RECURSIVE r AS
(
 ...
UNION ALL
SELECT * FROM r
JOIN t ON ...
)

So I need have precalculated t table before I start an iterator.

Now instead of _t  I using record[] + unnest  but that appoach very memory
hungry for long iterations:

WITH RECURSIVE r AS
(
  SELECT ...
  ARRAY(SELECT ROW(t.*) FROM some complicated select to speed up
recursive part) as _t_array
  FROM ...

UNION ALL
  SELECT
  ...,
  _t_array
  FROM r
  JOIN (unnest(_t_array) ...)  ON something
)

However that approach lead to having copy of the _t_array per each final
row, so can use a lot of memory.

PS: Yes I know about pl/pgsql but WITH RECURSIVE iterators can give 2-10
time performance gains over implemenation of the same algorythm inside
pl/pgsql.

-- 
Maxim Boguk
Senior Postgresql DBA.


[GENERAL] pl/pgsql and arrays[]

2011-12-04 Thread Maxim Boguk
Some quetions about pl/pgsql and arrays[].

Is such constructions as:

RETURN NEXT array[1];

OR

SELECT val INTO array[1] FROM ...;

Should not work?

At least documentation about RETURN NEXT  says:
RETURN NEXT expression;

I think array[1] is a valid expression.

-- 
Maxim Boguk
Senior Postgresql DBA.


Re: [GENERAL] WITH and WITH RECURSIVE in single query

2011-12-04 Thread Maxim Boguk
On Mon, Dec 5, 2011 at 3:15 PM, David Johnston pol...@yahoo.com wrote:

 On Dec 4, 2011, at 22:58, Maxim Boguk maxim.bo...@gmail.com wrote:



 On Mon, Dec 5, 2011 at 2:45 PM, David Johnston  pol...@yahoo.com
 pol...@yahoo.com wrote:

 On Dec 4, 2011, at 22:28, Maxim Boguk  maxim.bo...@gmail.com
 maxim.bo...@gmail.com wrote:

  Hi.
 
  Is here any way to combine WITH and WITH RECURSIVE into single query?
 
  Something like:
 
  WITH t AS (some complicated select to speed up recursive part),
  RECURSIVE r AS
  (
  ...
  UNION ALL
  ...
  )
 
  ?
 
  --
  Maxim Boguk
  Senior Postgresql DBA.

 WITH RECURSIVE q1 As (), q2 AS () ...

 Add RECURSIVE after the WITH; it then applies to any/all the CTEs.

 Look at the specification (and description) in the SELECT documentation
 closely.

 David J.


 Trouble is I trying to precalculate some data through WITH syntax (non
 recursive).
 To be used later in WITH RECURSIVE part (and keep a single of that data
 instead of N).

 Something like:

 WITH _t AS (some complicated select to speed up recursive part),
 RECURSIVE r AS
 (
  ...
 UNION ALL
 SELECT * FROM r
 JOIN t ON ...
 )

 So I need have precalculated t table before I start an iterator.

 Now instead of _t  I using record[] + unnest  but that appoach very memory
 hungry for long iterations:

 WITH RECURSIVE r AS
 (
   SELECT ...
   ARRAY(SELECT ROW(t.*) FROM some complicated select to speed up
 recursive part) as _t_array
   FROM ...

 UNION ALL
   SELECT
   ...,
   _t_array
   FROM r
   JOIN (unnest(_t_array) ...)  ON something
 )

 However that approach lead to having copy of the _t_array per each final
 row, so can use a lot of memory.

 PS: Yes I know about pl/pgsql but WITH RECURSIVE iterators can give 2-10
 time performance gains over implemenation of the same algorythm inside
 pl/pgsql.

 --
 Maxim Boguk
 Senior Postgresql DBA.


 Read the documentation closely, the syntax definition for WITH is precise
 and accurate.

 No matter how many queries you want to create you write the word WITH one
 time.  If ANY of your queries require iterative behavior you put the word
 RECURSIVE after the word WITH.  Between individual queries you may only put
 the name, and optional column alias, along with the required comma.

 As a side benefit to adding RECURSIVE the order in which the queries
 appear is no longer relevant.  Without RECURSIVE you indeed must list the
 queries in order of use.

 David J.


Thank you very much David.
That work like a charm.
another 30% runtime gone.

-- 
Maxim Boguk
Senior Postgresql DBA.


Re: [GENERAL] pl/pgsql and arrays[]

2011-12-04 Thread Maxim Boguk
On Mon, Dec 5, 2011 at 3:53 PM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 it work on my pc

 postgres=# \sf fx
 CREATE OR REPLACE FUNCTION public.fx()
  RETURNS SETOF integer
  LANGUAGE plpgsql
 AS $function$ declare g int[] = '{20}';
 begin
  return next g[1];
  return;
 end;
 $function$
 postgres=# select fx();
  fx
 
  20
 (1 row)

 regards

 Pavel Stehule


Oh sorry.
Seems I didn't tested simple cases.

Error happened when you work with record[] types and return setof:

create table test (id serial);
insert into test select generate_series(1,10);

CREATE OR REPLACE FUNCTION _test_array()
RETURNS SETOF test
LANGUAGE plpgsql
AS $$
DECLARE
_array test[];
_row   test%ROWTYPE;
BEGIN
 SELECT array(SELECT test FROM test) INTO _array;

 --work
 --_row := _array[1];
 --RETURN NEXT _row;

 --also work
 --RETURN QUERY SELECT (_array[1]).*;

 --error
 --RETURN NEXT _array[1];

 --error
 --RETURN NEXT (_array[1]);

 --error
 --RETURN NEXT (_array[1]).*;

 RETURN;
END;
$$;






 2011/12/5 Maxim Boguk maxim.bo...@gmail.com:
  Some quetions about pl/pgsql and arrays[].
 
  Is such constructions as:
 
  RETURN NEXT array[1];
 
  OR
 
  SELECT val INTO array[1] FROM ...;
 
  Should not work?
 
  At least documentation about RETURN NEXT  says:
  RETURN NEXT expression;
 
  I think array[1] is a valid expression.
 
  --
  Maxim Boguk
  Senior Postgresql DBA.




-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.


Re: [GENERAL] Problem with custom aggregates and record pseudo-type

2011-12-01 Thread Maxim Boguk
On Fri, Dec 2, 2011 at 3:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  I created special custom aggregate function to append arrays defined as:
  CREATE AGGREGATE array_accum (anyarray)
  (
  sfunc = array_cat,
  stype = anyarray,
  initcond = '{}'
  );


  On arrays of common types it work without any problems:
  SELECT array_accum(i) from (values (ARRAY[1,2]), (ARRAY[3,4])) as t(i);
   array_accum
  -
   {1,2,3,4}
  (1 row)


  However once I try use it with record[] type I get an error:
  SELECT array_accum(i) from (values (ARRAY[row(1,2),row(2,3)]),
  (ARRAY[row(1,2),row(2,3)])) as t(i);
  ERROR:  cannot concatenate incompatible arrays

 Hm ... it looks like this case can be made to work with a simple adjustment
 to getTypeIOData, but in the meantime you might be able to get away with
 leaving the initial value as NULL (ie, leave off the initcond clause).
 The only behavioral difference would be that you'd get NULL not an empty
 array for zero rows of input.

regards, tom lane


Thank you very much for an idea, you suggestiong work excellent as usual.
And again thank you for commiting a fix.

Kind Regards,
Maksym


[GENERAL] Problem with custom aggregates and record pseudo-type

2011-11-30 Thread Maxim Boguk
I created special custom aggregate function to append arrays defined as:
CREATE AGGREGATE array_accum (anyarray)
(
sfunc = array_cat,
stype = anyarray,
initcond = '{}'
);


On arrays of common types it work without any problems:
SELECT array_accum(i) from (values (ARRAY[1,2]), (ARRAY[3,4])) as t(i);
 array_accum
-
 {1,2,3,4}
(1 row)


However once I try use it with record[] type I get an error:
SELECT array_accum(i) from (values (ARRAY[row(1,2),row(2,3)]),
(ARRAY[row(1,2),row(2,3)])) as t(i);
ERROR:  cannot concatenate incompatible arrays
DETAIL:  Arrays with element types record[] and record are not compatible
for concatenation.


The base function of the aggregate:
array_cat work with record[] without any complains:
SELECT array_cat(ARRAY[row(1,2),row(2,3)], ARRAY[row(1,2),row(2,3)]);
 array_cat
---
 {(1,2),(2,3),(1,2),(2,3)}


What I doing wrong? Or how to create correct version of such aggregate
function?


-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.


[GENERAL] Quite a fast lockless vacuum full implemenation

2010-12-09 Thread Maxim Boguk
 reports.


-- 
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
МойКруг: http://mboguk.moikrug.ru/

Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.


clear_table_tail.sql
Description: Binary data


vacuum_table.pl
Description: Binary data

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Some strange bug with drop table with slony cluster

2009-06-17 Thread Maxim Boguk

DB version: PostgreSQL 8.3.6 (under linux)
no server/db crashes happen before.

Server was slave in slony replication.

Now problem:
table was unsubscribed from replication (without any errors)
and then dropped from master without any errors

But when i try drop table from slave i got very strange error:

hh=# drop TABLE metro_station_old;
ERROR:  area_pk is an index

hh=# \d+ metro_station_old
Table public.metro_station_old
  Column  | Type  |   Modifiers   | 
Description
--+---+---+-
 metro_station_id | integer   | not null  |
 city_id  | integer   | not null  |
 metro_line_id| integer   | not null  |
 city_district_id | integer   | not null  |
 name | character varying(64) | not null  |
 image_point_x| integer   |   |
 image_point_y| integer   |   |
 acronym_line | character varying(10) | default ''::character varying |
 colocation   | integer   |   |
Foreign-key constraints:
metro_station_ibfk_1 FOREIGN KEY (city_id) REFERENCES area(area_id)
metro_station_ibfk_2 FOREIGN KEY (metro_line_id) REFERENCES 
metro_line(metro_line_id)
metro_station_ibfk_3 FOREIGN KEY (city_district_id) REFERENCES 
tables_to_drop.city_district(city_district_id)
Has OIDs: no

hh=# \d+ area_pk
 Index public.area_pk
 Column  |  Type   | Description
-+-+-
 area_id | integer |
primary key, btree, for table public.area

complete independent things.
Also:
hh=# ALTER TABLE metro_station_old drop constraint metro_station_ibfk_1;
ERROR:  area_pk is an index
hh=# ALTER TABLE metro_station_old drop constraint metro_station_ibfk_2;
ERROR:  metro_line_pk is an index
hh=# ALTER TABLE metro_station_old drop constraint metro_station_ibfk_3;
ERROR:  city_district_pk is an index

First thing witch i think it is corrupted cache in shared memory or something 
like...
but fresh restart of db server do not change situation.

second thing is possible broken system indexes... but
reindex system hh;
not help too

hh=# SELECT oid from pg_class where relname='metro_station_old';
  oid
---
 17542
(1 row)

hh=# SELECT oid from pg_class where relname='area_pk';
  oid
---
 18933
(1 row)

Now i out of ideas.

Main thing what made me worry is second slave show same issue. So that is not 
single random error.

Because one of slaves is pure backup i can easy experiment with him.
But need any hints what to look next.

PS: one additional info... i can rename table via alter without errors. But 
still can't drop it.

--
SY, Maxim Boguk

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Issue with Inheritance and Partitioning and grants

2009-06-02 Thread Maxim Boguk

I have one large table Partitioned via timestamp on monthly parts.
And i use Inheritance on it.

So structure look like:
Parent table:
entity_log
...

And lots child tables like:
entity_log_2002_01
...
Check constraints:
chk_entity_log_2002_01 CHECK (ctime = '2002-01-01 00:00:00'::timestamp without time zone AND ctime  '2002-02-01 00:00:00'::timestamp without 
time zone)

Inherits: entity_log

... and so on
entity_log_2009_06
...

Business logic required past months entries must by read only. Do it on 
application side look impossible (way too many old spaghetti code).
So after some think about alternatives i choose do it via limiting grants on 
past months tables.

So i removed write access from application user on all old entity_log_* tables 
(on test system).
And testers found unexpected bug:

12:15:18 MSD b...@billing 67842 ERROR:  permission denied for relation 
entity_log_2002_01
12:15:18 MSD b...@billing 67842 STATEMENT:  UPDATE entity_log SET usermsg = 
'some text' WHERE id = 46248962

really entry with id=46248962 located in entity_log_2009_06 table where write 
grants set (e.g. only entity_log_2009_06 table need be updated).

I not sure are it is bug or intentional. Probably would be good first check where entries to update located and check grants only after (I not sure 
about possibility learn PostgreSQL do such tricks).


I really look for some working solution of my problem. I cannot solve this 
issue via adding 'on update' trigger on entity_log table.


--
SY, Maxim Boguk

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Postgresql selecting strange index for simple query

2009-02-26 Thread Maxim Boguk
 
without time zone)
 Total runtime: 1386.074 ms
(4 rows)

Thanks for help.

--
SY, Maxim Boguk

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgresql selecting strange index for simple query

2009-02-26 Thread Maxim Boguk

Tom Lane wrote:

Maxim Boguk mbo...@masterhost.ru writes:

Somehow postgres think index scan on singlecolumn index slower comparing to 
scan on 4th field of 4column index.


It does know better than that.  I'm wondering if the single-column index
has become very bloated or something.  Have you compared the physical
index sizes?


Table fresh loaded from dump on test server... So no index bloat for sure...
As for comparing physical sizes, right single column index indeed smaller then 
wrong one:

Right index:
hh=# SELECT pg_size_pretty(pg_relation_size('resume_last_change_time_idx'));
 pg_size_pretty

 125 MB
(1 row)

Wrong index:
hh=# SELECT 
pg_size_pretty(pg_relation_size('resume_user_id_disabled_is_finished_last_change_time_idx'));
 pg_size_pretty

 226 MB
(1 row)

Regards, Maxim Boguk

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgresql selecting strange index for simple query

2009-02-26 Thread Maxim Boguk

Maxim Boguk mbo...@masterhost.ru writes:

Tom Lane wrote:

It does know better than that.  I'm wondering if the single-column index
has become very bloated or something.  Have you compared the physical
index sizes?



Table fresh loaded from dump on test server... So no index bloat for sure...
As for comparing physical sizes, right single column index indeed smaller then 
wrong one:


Huh.  I get sane-looking choices when I try a similar case here.  Can
you put together a self-contained test case?


Not full self-contained test case but some minimal setup data which can give 
you ideas whats going wrong:
Test confirm my theory wrong index selection linked with long rows in table.

My tests contain such queries:

Ok TEST1:
set random_page_cost=1;
drop TABLE if exists test_table ;
SELECT user_id,last_change_time,rpad('a',1,'b') as f1 into test_table from 
resume;
ANALYZE test_table;
SELECT count(*) from test_table;
CREATE INDEX right_idx on test_table(last_change_time);
CREATE INDEX wrong_idx on test_table(user_id, last_change_time);
EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time  '2009-01-10 
00:00:00';
result:
 Index Scan using right_idx on test_table  (cost=0.00..42763.35 rows=388718 
width=0) (actual time=0.020..342.653 rows=390370 loops=1)
   Index Cond: (last_change_time  '2009-01-10 00:00:00'::timestamp without 
time zone)
 Total runtime: 368.699 ms



Ok TEST2 (but see: cost increased 4x times when real work time increased only 
by 30%):
same but rpad('a',200,'b') instead of rpad('a',1,'b') when populating 
test_table:
result:
 Index Scan using right_idx on test_table  (cost=0.00..179346.09 rows=392268 
width=0) (actual time=0.089..422.439 rows=390370 loops=1)
   Index Cond: (last_change_time  '2009-01-10 00:00:00'::timestamp without 
time zone)
 Total runtime: 448.717 ms


!!Not ok TEST3:!!
same but rpad('a',500,'b') instead of rpad('a',1,'b') when populating 
test_table:
Oops wrong index used:
 Index Scan using wrong_idx on test_table  (cost=0.00..254918.19 rows=392231 
width=0) (actual time=0.067..730.097 rows=390370 loops=1)
   Index Cond: (last_change_time  '2009-01-10 00:00:00'::timestamp without 
time zone)
 Total runtime: 757.930 ms
(3 rows)
(btw if drop wrong_idx query become works almost 2х faster:
drop INDEX wrong_idx;
 Index Scan using right_idx on test_table  (cost=0.00..259709.09 rows=392231 
width=0) (actual time=0.019..416.108 rows=390370 loops=1)
   Index Cond: (last_change_time  '2009-01-10 00:00:00'::timestamp without 
time zone)
 Total runtime: 442.790 ms
)


Again Ok TEST4 (here toast engine removed all long values from test_table):
same but rpad('a',2000,'b') instead of rpad('a',1,'b') when populating 
test_table:
 Index Scan using right_idx on test_table  (cost=0.00..64606.50 rows=393002 
width=0) (actual time=0.058..371.723 rows=390370 loops=1)
   Index Cond: (last_change_time  '2009-01-10 00:00:00'::timestamp without 
time zone)
 Total runtime: 397.929 ms


So i have two theory (just waving hands ofcourse):
1)integer owerflow somewhere in cost calculation
2)floating rounding errors (because cost very close in wrong situations: 
254918.19 vs 259709.09)

PS: second issue is cost increasing with increasing lenght of rows in table 
complete indepenent with increase real work time.
(probably i need tune other _cost parameters for get more exact cost values)

PPS: sorry for my poor english

Regards, Maxim Boguk

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgresql selecting strange index for simple query

2009-02-26 Thread Maxim Boguk

Tom Lane wrote:

Maxim Boguk mbo...@masterhost.ru writes:

So i have two theory (just waving hands ofcourse):
1)integer owerflow somewhere in cost calculation


Costs are floats, and in any case you're not showing costs anywhere near
the integer overflow limit...


2)floating rounding errors (because cost very close in wrong situations: 
254918.19 vs 259709.09)


The planner is intentionally set up to consider costs within a percent
or so of each other as being effectively equal.  If the estimated costs
are that close then it doesn't surprise me if it sometimes picks the
wrong plan.  The real question is why are the estimates so close?
They should not be, since AFAICS you are talking about a situation
where we'd have to scan all of the multicol index versus only about
a fifth of the single-col one.


Ok i exploring more:

just one thing:

hh=# SHOW default_statistics_target ;
 default_statistics_target
---
 10
(1 row)
(btw increase statistic to 1000 do not fix situation).

I try simplify test case and:
Now use sequential user_id, and truncate last_change_time to date:

SELECT nextval('test_seq') as user_id,last_change_time::date ,rpad('a',500,'b') 
as f1 into test_table from resume;
ANALYZE test_table;
SELECT count(*) from test_table;
CREATE INDEX right_idx on test_table(last_change_time);
CREATE INDEX wrong_idx on test_table(user_id, last_change_time);
EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time  '2009-01-10 
00:00:00';

 Index Scan using wrong_idx on test_table  (cost=0.00..182623.51 rows=316522 
width=0) (actual time=0.056..534.620 rows=382671 loops=1)
   Index Cond: (last_change_time  '2009-01-10'::date)

DROP INDEX wrong_idx;
EXPLAIN ANALYZE SELECT 1 from test_table where last_change_time  '2009-01-10 
00:00:00';

 Index Scan using right_idx on test_table  (cost=0.00..221765.19 rows=316522 
width=0) (actual time=0.023..346.213 rows=382671 loops=1)
   Index Cond: (last_change_time  '2009-01-10'::date)

Full index scan over wrong index cost reasonable lower then 1/17 of single 
column index  (182623 vs 221765)!

So just last_change_time still cannot be generated... but:

hh=# SELECT count(distinct last_change_time) from test_table;
 count
---
  2133
(1 row)

And statistic values for last_change_time is:

hh=# SELECT * from pg_stats where tablename='test_table' and 
attname='last_change_time';
-[ RECORD 1 
]-+---
schemaname| public
tablename | test_table
attname   | last_change_time
null_frac | 0
avg_width | 4
n_distinct| 1211
most_common_vals  | 
{2004-02-27,2009-01-26,2008-03-11,2009-01-27,2004-01-15,2008-10-13,2009-01-19,2009-01-22,2009-01-21,2008-09-29}
most_common_freqs | 
{0.0083,0.0077,0.0073,0.007,0.0067,0.0053,0.0053,0.0053,0.005,0.0047}
histogram_bounds  | 
{2003-03-31,2005-08-26,2006-10-04,2007-04-29,2007-09-27,2008-01-24,2008-05-01,2008-07-29,2008-10-01,2008-11-27,2009-01-25}
correlation   | 0.261512

I think it is all what planner can use when choose plan... because user_id is 
unique sequential values.

regargs, Maxim Boguk







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Indexes on NULL's and order by ... limit N queries

2008-12-01 Thread Maxim Boguk

When i read about 8.3 support indexed queries on NULL values (like rubric_id is 
NULL) i was really happy.

But reality strike again... look like NULL in WHERE don't allow effective using
index on (rubric_id, pos) for queries like:
... WHERE rubric_id IS NULL ORDER BY pos LIMIT 5


Here is some details about my issue (all tests on fresh loaded/analyzed into 
empty 8.3.5 DB):


mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where 
cluster_weight.rubric_id=8 order by pos limit 5;
  QUERY PLAN
---
 Limit  (cost=1337.02..1337.03 rows=5 width=28) (actual time=27.556..27.575 
rows=5 loops=1)
   -  Sort  (cost=1337.02..1340.77 rows=1501 width=28) (actual 
time=27.552..27.558 rows=5 loops=1)
 Sort Key: pos
 Sort Method:  top-N heapsort  Memory: 25kB
 -  Seq Scan on cluster_weight  (cost=0.00..1312.09 rows=1501 
width=28) (actual time=0.058..25.008 rows=1501 loops=1)
   Filter: (rubric_id = 8)
 Total runtime: 27.638 ms
(7 rows)

ok so we need index on (rubric_id, pos), lets add it:


mboguk_billing=# CREATE INDEX cluster_weight_2 on cluster_weight(rubric_id, 
pos);
CREATE INDEX


And try again:
mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where 
cluster_weight.rubric_id=8 order by pos limit 5;
 QUERY PLAN
-
 Limit  (cost=0.00..1.70 rows=5 width=28) (actual time=0.095..0.122 rows=5 
loops=1)
   -  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..509.31 
rows=1501 width=28) (actual time=0.090..0.104 rows=5 loops=1)
 Index Cond: (rubric_id = 8)
 Total runtime: 0.176 ms
(4 rows)

Ok... so now query works as intended...

Lets check are index used on search NULL values:
mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where 
cluster_weight.rubric_id IS NULL;
  QUERY PLAN
--
 Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..1034.21 
rows=26435 width=28) (actual time=0.053..48.123 rows=26435 loops=1)
   Index Cond: (rubric_id IS NULL)
 Total runtime: 85.210 ms
(3 rows)

Yes it is working...

Now lets try main query over NULL:

mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where 
cluster_weight.rubric_id IS NULL ORDER BY pos LIMIT 5;
QUERY 
PLAN
--
 Limit  (cost=1473.29..1473.30 rows=5 width=28) (actual time=92.220..92.239 
rows=5 loops=1)
   -  Sort  (cost=1473.29..1539.37 rows=26435 width=28) (actual 
time=92.216..92.223 rows=5 loops=1)
 Sort Key: pos
 Sort Method:  top-N heapsort  Memory: 25kB
 -  Index Scan using cluster_weight_2 on cluster_weight  
(cost=0.00..1034.21 rows=26435 width=28) (actual time=0.033..47.333 rows=26435 
loops=1)
   Index Cond: (rubric_id IS NULL)
 Total runtime: 92.310 ms
(7 rows)

Ooops... that is surprise...
I wasn't ready see that plan here... and performance difference over 1000.

Looks like when indexed search over NULL's added, planner wasn't learned right way use such index on  where something is NULL order by ... limit ... 
queries.



Thanks for any responses and sorry for not so good English.

--
SY, Maxim Boguk

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Indexes on NULL's and order by ... limit N queries

2008-12-01 Thread Maxim Boguk

Sorry with all my respect to you, you look like wrong.
Here example:

With NULL's:
mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where 
cluster_weight.rubric_id IS NULL ORDER BY pos LIMIT 5;
  QUERY PLAN
---
 Limit  (cost=1503.75..1503.76 rows=5 width=28) (actual time=93.334..93.353 
rows=5 loops=1)
   -  Sort  (cost=1503.75..1569.84 rows=26435 width=28) (actual 
time=93.329..93.335 rows=5 loops=1)
 Sort Key: pos
 Sort Method:  top-N heapsort  Memory: 25kB
 -  Bitmap Heap Scan on cluster_weight  (cost=314.32..1064.67 
rows=26435 width=28) (actual time=7.519..48.678 rows=26435 loops=1)
   Recheck Cond: (rubric_id IS NULL)
   -  Bitmap Index Scan on cluster_weight_2  (cost=0.00..307.72 
rows=26435 width=0) (actual time=7.350..7.350 rows=26435 loops=1)
 Index Cond: (rubric_id IS NULL)
 Total runtime: 93.433 ms
(9 rows)

Now lets change NULL's to -1
mboguk_billing=# UPDATE cluster_weight set rubric_id=-1 where rubric_id IS NULL;
UPDATE 26435

And ANALYZE
mboguk_billing=# ANALYZE cluster_weight;
ANALYZE

And try same query with -1 instead of NULL:
mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where 
cluster_weight.rubric_id=-1 ORDER BY pos LIMIT 5;
  QUERY PLAN
---
 Limit  (cost=0.00..0.25 rows=5 width=28) (actual time=0.056..0.080 rows=5 
loops=1)
   -  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..1334.41 
rows=26435 width=28) (actual time=0.053..0.065 rows=5 loops=1)
 Index Cond: (rubric_id = (-1))
 Total runtime: 0.133 ms
(4 rows)


And plan become normal. So issue not with too many NULL's in my dataset.


--
SY, Maxim Boguk

Tom Lane wrote:

Maxim Boguk [EMAIL PROTECTED] writes:
Looks like when indexed search over NULL's added, planner wasn't learned right way use such index on  where something is NULL order by ... limit ... 
queries.


There's nothing wrong with the plan; you've just got too many NULLs to
make it worth using the index for that.

regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Indexes on NULL's and order by ... limit N queries

2008-12-01 Thread Maxim Boguk

Alvaro Herrera wrote:

Maxim Boguk wrote:

Sorry with all my respect to you, you look like wrong.


The difference is that the plan with -1 does not need to sort the
output, because it comes sorted out of the index; so the execution can
be stopped as soon as 5 tuples have come out.  With NULL, that can't be
done.


But why? NULL's have some special representation in index which don't work same 
as normal values?
Eg output with rubric_id is NULL dont come sorted from index?

Really my tests show same behavior of -1 and NULL values:

mboguk_billing=# SELECT pos from cluster_weight where rubric_id=-1 limit 20;
 pos
-
  20
  20
  25
  40
  40
  50
  60
  60
  80
  80
 100
 120
 140
 160
 180
 200
 220
 240
 260
 280
(20 rows)

mboguk_billing=# UPDATE cluster_weight set rubric_id=NULL where rubric_id=-1;
UPDATE 26435
mboguk_billing=# ANALYZE cluster_weight;
ANALYZE
mboguk_billing=# SELECT pos from cluster_weight where rubric_id is NULL limit 
20;
 pos
-
  20
  20
  25
  40
  40
  50
  60
  60
  80
  80
 100
 120
 140
 160
 180
 200
 220
 240
 260
 280
(20 rows)

Eg output with rubric_id is NULL come ordered be pos from index (rubric_id, pos)

(
Here is explains:

mboguk_billing=# EXPLAIN ANALYZE SELECT pos from cluster_weight where rubric_id 
is NULL limit 20;
  QUERY PLAN
---
 Limit  (cost=0.00..0.99 rows=20 width=2) (actual time=0.050..0.144 rows=20 
loops=1)
   -  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..1314.94 
rows=26435 width=2) (actual time=0.045..0.082 rows=20 loops=1)
 Index Cond: (rubric_id IS NULL)
 Total runtime: 0.214 ms
(4 rows)

mboguk_billing=# UPDATE cluster_weight set rubric_id=-1 where rubric_id IS NULL;
UPDATE 26435
mboguk_billing=# ANALYZE cluster_weight;
ANALYZE

mboguk_billing=# EXPLAIN ANALYZE SELECT pos from cluster_weight where 
rubric_id=-1 limit 20;
  QUERY PLAN
---
 Limit  (cost=0.00..0.95 rows=20 width=2) (actual time=0.050..0.141 rows=20 
loops=1)
   -  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..1259.05 
rows=26435 width=2) (actual time=0.045..0.081 rows=20 loops=1)
 Index Cond: (rubric_id = (-1))
 Total runtime: 0.214 ms
(4 rows)

Plans look same.
)

PS: REINDEX do not change situation.

--
SY Maxim Boguk

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgresql optimisator deoptimise queries sometime...

2008-09-05 Thread Maxim Boguk
)
   SubPlan
 -  Aggregate  (cost=849.50..849.51 rows=1 width=0) (actual 
time=16.072..16.073 rows=1 loops=50)
   -  Seq Scan on table2  (cost=0.00..847.00 rows=1000 
width=0) (actual time=0.020..14.599 rows=1000 loops=50)
 Filter: (fk = $0)
 Total runtime: 804.552 ms
(9 rows)


performance difference 20 times... :(((

I think is is just missoptimisation from db side.

PS: in real world query work around view:
CREATE VIEW test_view as SELECT table1.id,(select count(*) from table2 where 
table2.fk=table1.id) as total from table1;

and i have no way put offset 0 into query
select * from test_view where total=990 or total=991 or total=992 or total=993 or total=994 or total=995 or total=996 or total=997 or total=998 or total=999 or total=1000 or 
total=1001 or total=1002 or total=1003 or total=1004 or total=1005 or total=1006 or total=1007 or total=1008 or total=1009 or total=1010;



--
Maxim Boguk

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Postgresql optimisator deoptimise queries sometime...

2008-09-03 Thread Maxim Boguk
(*) from table2 
where table2.fk=table1.id and random()0.1) as count from table1) as t1 where 
count900;
  QUERY PLAN
---
 Seq Scan on table1  (cost=0.00..73557.24 rows=17 width=4) (actual 
time=47.385..1346.824 rows=32 loops=1)
   Filter: ((subplan)  900)
   SubPlan
 -  Aggregate  (cost=1097.84..1097.85 rows=1 width=0) (actual 
time=16.333..16.334 rows=1 loops=50)
   -  Seq Scan on table2  (cost=0.00..1097.00 rows=333 width=0) 
(actual time=0.025..14.995 rows=900 loops=50)
 Filter: ((fk = $0) AND (random()  0.1::double precision))
 -  Aggregate  (cost=1097.84..1097.85 rows=1 width=0) (actual 
time=16.537..16.539 rows=1 loops=32)
   -  Seq Scan on table2  (cost=0.00..1097.00 rows=333 width=0) 
(actual time=0.028..15.141 rows=934 loops=32)
 Filter: ((fk = $0) AND (random()  0.1::double precision))
 Total runtime: 1346.972 ms
(10 rows)
This plan can produce just wrong result (wich is clear bug).

VS right plan:

testdb=# EXPLAIN ANALYZE select * from (SELECT table1.id,(select count(*) from table2 
where table2.fk=table1.id and random()0.1) as count from table1 offset 0) as t1 
where count900;
QUERY PLAN
---
 Subquery Scan t1  (cost=0.00..54894.38 rows=17 width=12) (actual 
time=31.181..800.898 rows=35 loops=1)
   Filter: (t1.count  900)
   -  Limit  (cost=0.00..54893.75 rows=50 width=4) (actual 
time=14.992..800.754 rows=50 loops=1)
 -  Seq Scan on table1  (cost=0.00..54893.75 rows=50 width=4) (actual 
time=14.988..800.602 rows=50 loops=1)
   SubPlan
 -  Aggregate  (cost=1097.84..1097.85 rows=1 width=0) (actual 
time=16.003..16.004 rows=1 loops=50)
   -  Seq Scan on table2  (cost=0.00..1097.00 rows=333 
width=0) (actual time=0.025..14.725 rows=898 loops=50)
 Filter: ((fk = $0) AND (random()  0.1::double 
precision))
 Total runtime: 801.021 ms
(9 rows)


--
Maxim Boguk

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Postgresql 8.3 statistic stadistinct value dead wrong even with default_statistics_target=1000

2008-08-22 Thread Maxim Boguk

Some time ago i found one simple sql over large table eat whole ram+swap and 
almost killed server (postgresql 8.3.3 on 4gb freebsd server):
After some exploring i found what happens:

Query was over simple table:

profiles=# \d+ counter_vis
  Table counter_vis
Column|  Type   | Modifiers | Description
--+-+---+-
 counter_id   | bigint  | not null  |
 visitor_id   | bigint  | not null  |
 ts   | bigint  | not null  |
 sessionstart | bigint  | not null  |
 sessionend   | bigint  | not null  |
 numpageviews | integer | not null  |
Indexes:
counter_vis_counter btree (counter_id)
counter_vis_vis btree (visitor_id)
Has OIDs: no

Which contain around 648M entries. (according fresh analyzed stats from 
pg_stat_user_tables).

Query was: select count(*) from (select visitor_id, sum(numpageviews) as s from 
counter_vis group by visitor_id having sum(numpageviews)1) as foo;

With plan:
QUERY PLAN
---
 Aggregate  (cost=17429989.40..17429989.41 rows=1 width=0)
   -  HashAggregate  (cost=17422096.40..17426700.65 rows=263100 width=12)
 Filter: (sum(counter_vis.numpageviews)  1)
 -  Seq Scan on counter_vis (cost=0.00..12554826.80 rows=648969280 
width=12)


Plan look ok... but how query eat over 4gb ram?

After lookin i found one strange point:  rows=263100 because i know in table 
must have around 70M unique visitor_id's.

Manual analyze on table do not changed plan.
Here is pg_statistic entry after analyze (with default_statistics_target=10):

profiles=# SELECT * from pg_statistic where starelid=25488 and staattnum=2 
order by 2 asc;
-[ RECORD 1 
]---
starelid| 25488
staattnum   | 2
stanullfrac | 0
stawidth| 8
stadistinct | 263100  ( here is 70M distinct values in 
reality)
stakind1| 1
stakind2| 2
stakind3| 3
stakind4| 0
staop1  | 410
staop2  | 412
staop3  | 412
staop4  | 0
stanumbers1 | 
{0.00067,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067}
stanumbers2 |
stanumbers3 | {-0.0443004}
stanumbers4 |
stavalues1  | 
{413866965,489514660,624858316,753063164,790095243,1279713644,1628857812,2104294292,2726728837,2771123172}
stavalues2  | 
{-9035671468843485583,184524075,555699387,921684844,1329929495,1897558820,2602632340,3083805588,3511488708,3963719633,9173061477162286552}
stavalues3  |
stavalues4  |


so stadistinct is 300 times wrong from reality.

Already strange... and i need run that query anyway... so i changed 
default_statistics_target to 1000 and analyzed table again:

profiles=# ANALYZE verbose counter_vis;
INFO:  analyzing counter_vis
INFO:  counter_vis: scanned 30 of 6065134 pages, containing 3210 live 
rows and 0 dead rows; 30 rows in sample, 648969338 estimated total rows
ANALYZE

After statistic was better:

starelid| 25488
staattnum   | 2
stanullfrac | 0
stawidth| 8
stadistinct | 7.12958e+06
stakind1| 1
stakind2| 2
stakind3| 3
stakind4| 0
staop1  | 410
staop2  | 412
staop3  | 412
staop4  | 0
... long stats skipped...

 but stadistinct still 10 times wrong from reality:

profiles=# SELECT count(distinct visitor_id) from counter_vis;
  count
--
 69573318
(1 row)

Any way deal with such situations? Because 10х difference if postgres choose 
hashed plan will easy kill server because OOM (because query will use 10x more 
ram then postgres awaited).

Probably some strange effects in statdistinc count algoritm?
Or just any way remove limits on default_statistics_target?

Thanks for help.

PS: sorry for bad english.

--
Maxim Boguk

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Cannot drop user (PostgreSQL 8.1.11)

2008-06-02 Thread Maxim Boguk

I trying drop old user but got some strange issues:

template1=# drop USER szhuchkov;
ERROR:  role szhuchkov cannot be dropped because some objects depend on it
DETAIL:  1 objects in database billing
2 objects in database shop

ok... lets look closer these two DB:


shop=# drop USER szhuchkov;
ERROR:  role szhuchkov cannot be dropped because some objects depend on it
DETAIL:  owner of type pg_toast.pg_toast_406750
owner of type pg_toast.pg_toast_1770195
1 objects in database billing

hm damn strange...

shop=# SELECT * from pg_user where usename='szhuchkov';
  usename  | usesysid | usecreatedb | usesuper | usecatupd |  passwd  | 
valuntil |   useconfig
---+--+-+--+---+--+--+
 szhuchkov |16387 | f   | f| f |  |  | 
{search_path=bill, billstat}
(1 запись)

Lets look pg_type for these two types:

shop=# SELECT typname,typowner from pg_type where typname IN 
('pg_toast_406750', 'pg_toast_1770195');
 typname  | typowner
--+--
 pg_toast_1770195 |   10
 pg_toast_406750  |   10
(записей: 2)

owner right (pgsql)



Lets look db billing:

billing=# drop USER szhuchkov;
ERROR:  role szhuchkov cannot be dropped because some objects depend on it
DETAIL:  owner of function vz_vds_ip_add(integer,bigint)
2 objects in database shop

billing=# SELECT proname,proowner from pg_proc where proname like 
'%vz_vds_ip_add%';
proname| proowner
---+--
 vz_vds_ip_add |   10
(1 запись)

ok... again right owner... no signs of szhuchkov

Last test... lets try pg_dumpall -s :


[EMAIL PROTECTED] /home/mboguk]$ pg_dumpall -s | grep szhuchkov
CREATE ROLE szhuchkov;
ALTER ROLE szhuchkov WITH NOSUPERUSER INHERIT NOCREATEROLE NOCREATEDB LOGIN 
PASSWORD '***';
ALTER ROLE szhuchkov SET search_path TO bill, billstat;
GRANT bill1c_r TO szhuchkov GRANTED BY pgsql;
GRANT bill_r TO szhuchkov GRANTED BY pgsql;
GRANT billexch_r TO szhuchkov GRANTED BY pgsql;
GRANT billstat_r TO szhuchkov GRANTED BY pgsql;
GRANT shop_r TO szhuchkov GRANTED BY pgsql;
GRANT templar_r TO szhuchkov GRANTED BY pgsql;

Nothing more... so according pg_dumpall szhuchkov also doesnt have any active 
objects in DB.

In all other sides DB work 24x7 well without any other issues
(and because 24x7 requirements i cannot stop DB and drop user from single user 
mode).

Any ideas? or what to check else?

--
Maxim Boguk

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Postgresql 8.3: stats collector process eat all CPU all time

2008-04-09 Thread Maxim Boguk

details:

db version: postgres (PostgreSQL) 8.3.0
OS version: FreeBSD 7.0-RELEASE

stats collector eating whole single CPU last week or so...

TOP lookin like:

  PID USERNAME  THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
69360 pgsql   1 1240 22380K  5548K CPU0   1 258.6H 99.02% postgres

where 69360 is:
USERPID %CPU %MEM   VSZ   RSS  TT  STAT STARTED  TIME COMMAND
pgsql 69360 100,0  0,1 22380  5548  ??  Rs   20мар08 15513:55,99 postgres: 
stats collector process(postgres)

Server is dedicated server for DB... under small load so stats collector eat 
more CPU then all other db activity by 5-10x.

Stats collector itself doing his work (eg i see all required requests in 
pg_stat_activity and  counts work well too).

And stat collector add a lot system load on server.

ktrace/kdump this process show just one sequence:

 69360 postgres CALL  poll(0x7fffd4e0,0x1,0x7d0)
 69360 postgres RET   poll -1 errno 4 Interrupted system call
 69360 postgres CALL  poll(0x7fffd4e0,0x1,0x7d0)
 69360 postgres RET   poll -1 errno 4 Interrupted system call
 69360 postgres CALL  poll(0x7fffd4e0,0x1,0x7d0)
 69360 postgres RET   poll -1 errno 4 Interrupted system call
 69360 postgres CALL  poll(0x7fffd4e0,0x1,0x7d0)
 69360 postgres RET   poll -1 errno 4 Interrupted system call
 69360 postgres CALL  poll(0x7fffd4e0,0x1,0x7d0)

kdump.out for 1 second size over 10Mbytes.


Look like something wrong going on.

Any idea what happend and how to fix situation (after postgres reboot situation 
become bad again in few days).

--
Maxim Boguk

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general