Re: [PERFORM] : PostgreSQL Index behavior

2012-09-13 Thread Venkat Balaji
On Wed, Sep 12, 2012 at 7:42 PM, Scott Marlowe scott.marl...@gmail.comwrote:

 On Wed, Sep 12, 2012 at 12:57 AM, Venkat Balaji venkat.bal...@verse.in
 wrote:

  We are using PostgreSQL-9.0.1.

 You are missing almost 2 years of updates, bug fixes, and security fixes.


Thank you Scott, We are planning to upgrade to the latest version (9.2) in
the near future.

Regards,
VB

-- 
 

DISCLAIMER:

Please note that this message and any attachments may contain confidential 
and proprietary material and information and are intended only for the use 
of the intended recipient(s). If you are not the intended recipient, you 
are hereby notified that any review, use, disclosure, dissemination, 
distribution or copying of this message and any attachments is strictly 
prohibited. If you have received this email in error, please immediately 
notify the sender and delete this e-mail , whether electronic or printed. 
Please also note that any views, opinions, conclusions or commitments 
expressed in this message are those of the individual sender and do not 
necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.



Re: [PERFORM] : PostgreSQL Index behavior

2012-09-12 Thread Venkat Balaji
Thank you Jeff !

My comments are inline.

 explain  SELECT tv.short_code, tv.chn as pkg_subscription_chn,
 tv.vert as pkg_vert, ubs.campaign_id as campaign,
  'none'::varchar as referer,
 CAST('CAMPAIGNWISE_SUBSCRIBER_BASE' AS VARCHAR) as vn,
  count(tv.msisdn) as n_count, '0'::numeric AS tot_revenue
 FROM campaign_base ubs
 JOIN tab_current_day_v2 tv
 ON ubs.ubs_seq_id = tv.ubs_seq_id
   AND tv.dt = CAST('2012-09-08' AS DATE)
   GROUP BY tv.short_code, tv.vert, tv.chn, ubs.campaign_id,
 vn;
 ...
 
  The above plan shows seq scan on tab_current_day_v2 table, though
 there is
  an index on ubs_seq_id column which is an unique column.
 
  Can anyone please help us understand, why PostgreSQL optimizer is not
  prioritizing the unique column and hitting ubs_seq_id_idx Index here ?

 The query where clause does not specify a constant value for
 ubs_seq_id.  So it is likely that the only way to use that index would
 be to reverse the order of the nested loop and seq scan the other
 table.  Is there any reason to think that doing that would be faster?


I believe, I missed an important point here. Yes, since the constant value
is not provided for ubs_seq_id, Index scan is not a prime preference. Makes
sense. Further analysis is explained below.


  Later -
 
  We have created composite Index on dt (one distinct value) and
  ubs_seq_id (no duplicate values) and the index has been picked up.

 Postgres seems to think that dt has no duplicate values, the
 opposite of having one distinct value.
 That is based on the estimates given in the explain plan, that teh seq
 scan will return only one row after the filter on Filter: (dt =
 '2012-09-08'::date).   This does seem to conflict with what you
 report from pg_stats, but I'm not familiar with that view, and you
 haven't told us what version of pgsql you are using.


We are using PostgreSQL-9.0.1.

Yes, dt has one distinct value all the time is generated on daily basis.
2012-09-08 is an non-existent value, so, Postgres seems to think there
are no duplicates.

If i pass on the value which is existing in the table 2012-09-11,
PostgreSQL optimizer is picking up Seq Scan (what ever Index is existing).

In our scenario, we cannot expect an Index scan to happen, because I
believe, following are the reasons -

ubs_seq_id column in campaign_base table has 1.2 m rows -- all distinct
ubs_seq_id column in tab_current_day_v2 table has 1.9 m rows -- all distinct
dt has only 1 distinct value.

All being used with AND operator, extracted rows will be minimum 1.2 m. So,
I believe, seq scan is the best choice PG is opting for.

I got the point. Thanks !

Regards,
Venkat

-- 
 

DISCLAIMER:

Please note that this message and any attachments may contain confidential 
and proprietary material and information and are intended only for the use 
of the intended recipient(s). If you are not the intended recipient, you 
are hereby notified that any review, use, disclosure, dissemination, 
distribution or copying of this message and any attachments is strictly 
prohibited. If you have received this email in error, please immediately 
notify the sender and delete this e-mail , whether electronic or printed. 
Please also note that any views, opinions, conclusions or commitments 
expressed in this message are those of the individual sender and do not 
necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.



[PERFORM] : PostgreSQL Index behavior

2012-09-10 Thread Venkat Balaji
Hello Community,

I intend to understand further on PostgreSQL Index behavior on a SELECT
statement.

We have a situation where-in Index on unique column is not being picked up
as expected when used with-in the WHERE clause with other non-unique
columns using AND operator.

explain  SELECT tv.short_code, tv.chn as pkg_subscription_chn,
   tv.vert as pkg_vert, ubs.campaign_id as campaign,
'none'::varchar as referer,
   CAST('CAMPAIGNWISE_SUBSCRIBER_BASE' AS VARCHAR) as vn,
count(tv.msisdn) as n_count, '0'::numeric AS tot_revenue
   FROM campaign_base ubs
   JOIN tab_current_day_v2 tv
   ON ubs.ubs_seq_id = tv.ubs_seq_id
 AND tv.dt = CAST('2012-09-08' AS DATE)
 GROUP BY tv.short_code, tv.vert, tv.chn, ubs.campaign_id, vn;

 QUERY PLAN

 HashAggregate  (cost=77754.57..77754.58 rows=1 width=38)
   -  Nested Loop  (cost=0.00..77754.56 rows=1 width=38)
 -  Seq Scan on tab_current_day_v2 tv  (cost=0.00..77746.26 rows=1
width=39)
   Filter: (dt = '2012-09-08'::date)
 -  Index Scan using cb_ubs_id_idx on campaign_base ubs
 (cost=0.00..8.28 rows=1 width=15)
   Index Cond: (ubs.ubs_seq_id = tv.ubs_seq_id)
(6 rows)

The above plan shows seq scan on tab_current_day_v2 table, though there
is an index on ubs_seq_id column which is an unique column.

Can anyone please help us understand, why PostgreSQL optimizer is not
prioritizing the unique column and hitting ubs_seq_id_idx Index here ?

Later -

We have created composite Index on dt (one distinct value) and
ubs_seq_id (no duplicate values) and the index has been picked up.

Below is the scenario where-in the same query's plan picking up the
composite Index.

prod-db=# create index concurrently tab_dt_ubs_seq_id_idx on
tab_current_day_v2(dt,ubs_seq_id);
CREATE INDEX

prod-db=# explain  SELECT tv.short_code, tv.chn as pkg_subscription_chn,
   tv.vert as pkg_vert, ubs.campaign_id as campaign,
'none'::varchar as referer,
   CAST('CAMPAIGNWISE_SUBSCRIBER_BASE' AS VARCHAR) as vn,
count(tv.msisdn) as n_count, '0'::numeric AS tot_revenue
FROM campaign_base ubs
JOIN tab_current_day_v2 tv
  ON ubs.ubs_seq_id = tv.ubs_seq_id
 AND tv.dt = CAST('2012-09-08' AS DATE)
 GROUP BY tv.short_code, tv.vert, tv.chn, ubs.campaign_id, vn;

 QUERY PLAN
-
 HashAggregate  (cost=16.88..16.89 rows=1 width=38)
   -  Nested Loop  (cost=0.00..16.86 rows=1 width=38)
 -  Index Scan using tab_dt_ubs_seq_id_idx on tab_current_day_v2
tv  (cost=0.00..8.57 rows=1 width=39)
   Index Cond: (dt = '2012-09-08'::date)
 -  Index Scan using cb_ubs_id_idx on campaign_base ubs
 (cost=0.00..8.28 rows=1 width=15)
   Index Cond: (ubs.ubs_seq_id = tv.ubs_seq_id)
(6 rows)

I was expecting the above behavior without a composite Index. A column with
most unique values must be picked up when multiple columns are used in
WHERE clause using AND operator. Any thoughts ?

prod-db# \d tab_current_day_v2

 Table public.tab_current_day_v2
  Column  |   Type   | Modifiers
--+--+---
 dt   | date |
 chn  | character varying(10)|
 vert | character varying(20)|
 isdn | character varying|
 bc  | character varying(40)|
 status   | text |
 is_rene  | boolean  |
 age_in_sys   | integer  |
 age_in_grace | integer  |
 has_prof | boolean  |
 short_code   | character varying|
 sub_vert | character varying(30)|
 mode | character varying|
 ubs_seq_id   | bigint   |
 pkg_name | character varying(200)   |
 pkg_id   | integer  |
 subs_charge  | money|
 subs_time| timestamp with time zone |
 ulq_seq_id   | bigint   |
 valid_till_time  | timestamp with time zone |
 valid_from_time  | timestamp with time zone |
 latest_ube_seq_id| bigint   |
 latest_pkg_id| integer  |
 price| integer  |
Indexes:
tab_dt_ubs_seq_id_idx btree (dt, ubs_seq_id)
tab_isdn_idx btree (msisdn)
tab_status_idx btree 

Re: [PERFORM] : Cost calculation for EXPLAIN output

2012-02-26 Thread Venkat Balaji

  The cost is 13.88 to fetch 1 row by scanning an Primary Key
  indexed column.
 
  Isn't the cost for fetching 1 row is too high ?

 I don't know, how many index pages will need to be randomly accessed
 in addition to the random heap access?  How many dead versions of
 the row will need to be visited besides the row which is actually
 visible?  How many of these pages are in shared_buffers?  How many
 of these pages are in OS cache?


Total Index pages are 140310. Yes. I suspect most of the times the required
page is found in either OS cache or disk (shared_buffers is .9 GB) as we
have 200+ GB of highly active database and the Index is on a 10GB table.


Re: [PERFORM] : Cost calculation for EXPLAIN output

2012-02-26 Thread Venkat Balaji
Thanks for your valuable inputs !

The cost is 13.88 to fetch 1 row by scanning an Primary Key
 indexed column.

 Isn't the cost for fetching 1 row is too high ?


 Not really. The cost is really just an estimate to rank alternate query
 plans so the database picks the least expensive plan. The number '13.88' is
 basically meaningless. It doesn't translate to any real-world equivalent.
 What you actually care about is the execution time. If it takes 0.25ms or
 something per row, that's what really matters.

 For what it's worth, it looks like you have the right query plan, there.
 Scan the primary key for one row. What's wrong with that? Our systems have
 tables far larger than yours, handling 300M queries per day that are far
 more expensive than a simple primary key index scan. You'll be fine. :)


Execution time is 0.025 ms per row. I am quite happy with the execution
time, even the plan is going the correct way. The total execution time
reduced from 2.5 hrs to 5.5 seconds. I am looking for a room for further
improvement -- probably quite ambitious :-) ..

But, when i reduced random_page_cost to 2, the cost reduced to 7.03 and
execution also have reduced by almost 50%. Is this an gain ?

Please comment !

Thanks,
VB

_**


 See 
 http://www.peak6.com/email_**disclaimer/http://www.peak6.com/email_disclaimer/for
  terms and conditions related to this email



[PERFORM] : Cost calculation for EXPLAIN output

2012-02-23 Thread Venkat Balaji
Hello,

I am trying to understand the analysis behind the cost attribute in
EXPLAIN output.

postgres = # explain select * from table_event where seq_id=8520960;


QUERY PLAN
-
 Index Scan using te_pk on table_event  (cost=0.00..13.88  rows=1  width=62)
   Index Cond: (sequence_id = 8520960)

The cost is 13.88 to fetch 1 row by scanning an Primary Key indexed
column.

Isn't the cost for fetching 1 row is too high ?

On the same table, the cost calculation for scanning the full table is
looking justified --

postgres=# explain select * from table_event;

  QUERY PLAN

 Seq Scan on table_event  (cost=0.00..853043.44 rows=38679544 width=62)
(1 row)

(disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost) = (466248 *
1) + (38679544 * 0.01) = 853043.44

By the way below are the details -

Version - Postgres-9.0

Table size is- 3643 MB
+Indexes the size is - 8898 MB

I am looking for a way to reduce cost as much as possible because the query
executes 10+ times a day.

Any thoughts ?

Thanks,
VB


Re: [PERFORM] : bg_writer overloaded ?

2011-11-13 Thread Venkat Balaji
On Wed, Nov 9, 2011 at 8:16 PM, Scott Marlowe scott.marl...@gmail.comwrote:

 On Wed, Nov 9, 2011 at 2:25 AM, Venkat Balaji venkat.bal...@verse.in
 wrote:
  Hello Everyone,
  I could see the following in the production server (result of the top M
  command) -
   PIDUSER PR  NI  VIRTRES   SHR   S  %CPU   %MEM   TIME+
  COMMAND
  25265 postgres  15   0  3329m   2.5g   1.9g   S 0.0  4.0
   542:47.83   postgres: writer process
  The writer process refers to bg_writer ? and we have shared_buffers
 set to
  1920 MB (around 1.9 GB).

 So it is using 2.5G of mem of which 1.9G is shared memory (i.e. shared
 buffers) so the actual amount of RAM it's using is ~600Megs.

 I see no problem.


Is this not the indication that the shared_buffers is undersized ?



  In an other similar situation, we have postgres writer process using
 up 7
  - 8 GB memory constantly.

 I doubt it.  Sounds more like you're misreading the output of top.


Below is the output directly taken from our production box. As per you, the
writer process is taking up 1.9g from
shared_buffers and the remaining (around 5.6 GB) from RAM.

Mem:  65980808k total, 65620700k used,   360108k free,   210792k buffers
Swap:  1052248k total,   321144k used,   731104k free, 51721468k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
10306 postgres  15   0 15.7g 7.5g 1.9g S  1.9 12.0   1037:05 postgres:
writer process

Is this not a problem ?


  pg_tune is suggesting to increase the shared_buffers to 8 GB.

 Reasonable.

  If the shared_buffer is not enough, Postgres uses OS cache ?

 Not really how things work.  The OS uses all spare memory as cache.
 PostgreSQL uses shared_buffers as a cache.  The OS is much more
 efficient about caching in dozens of gigabytes than pgsql is.


What if the shared_buffers is not enough to cache the data being read from
the database ?

Thanks for your help !

Regards,
VB


[PERFORM] : bg_writer overloaded ?

2011-11-09 Thread Venkat Balaji
Hello Everyone,

I could see the following in the production server (result of the top M
command) -

 PIDUSER PR  NI  VIRTRES   SHR   S  %CPU   %MEM   TIME+
COMMAND
25265 postgres  15   0  3329m   2.5g   1.9g   S 0.0  4.0
 542:47.83   postgres: writer process

The writer process refers to bg_writer ? and we have shared_buffers set
to 1920 MB (around 1.9 GB).

In an other similar situation, we have postgres writer process using up 7
- 8 GB memory constantly.

pg_tune is suggesting to increase the shared_buffers to 8 GB.

If the shared_buffer is not enough, Postgres uses OS cache ?

We have a 64 GB RAM.

We have decided the following -

1. We have 20 databases running in one cluster and all are more or less
highly active databases.
2. We will be splitting across the databases across multiple clusters to
have multiple writer processes working across databases.

Please help us if you have any other solutions around this.

Thanks
VB


Re: [PERFORM] : PG9.0 - Checkpoint tuning and pg_stat_bgwriter

2011-10-24 Thread Venkat Balaji
Thanks Greg !

Sorry for delayed response.

We are actually waiting to change the checkpoint_segments in our production
systems (waiting for the downtime).

Thanks
VB

On Wed, Oct 5, 2011 at 11:02 AM, Greg Smith g...@2ndquadrant.com wrote:

 On 10/04/2011 07:50 PM, Venkat Balaji wrote:

 I was thinking to increase checkpoint_segments to around 16 or 20.

 I think 50 is a bit higher.


 Don't be afraid to increase that a lot.  You could set it to 1000 and that
 would be probably turn out fine; checkpoints will still happen every 5
 minutes.

 Checkpoints represent a lot of the I/O in a PostgreSQL database.  The main
 downside to making them less frequent is that recovery after a crash will
 take longer; a secondary one is that WAL files in pg_xlog will take up more
 space.  Most places don't care much about either of those things.  The
 advantage to making them happen less often is that you get less total
 writes.  People need to be careful about going a long *time* between
 checkpoints.  But there's very few cases where you need to worry about the
 segment count going too high before another one is triggered.


 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: [PERFORM] : PG9.0 - Checkpoint tuning and pg_stat_bgwriter

2011-10-24 Thread Venkat Balaji
Oh yes.

Thanks a lot Robert !

Regards
VB

On Tue, Oct 25, 2011 at 7:47 AM, Robert Haas robertmh...@gmail.com wrote:

 On Oct 24, 2011, at 8:16 AM, Venkat Balaji venkat.bal...@verse.in wrote:
  Thanks Greg !
 
  Sorry for delayed response.
 
  We are actually waiting to change the checkpoint_segments in our
 production systems (waiting for the downtime).

 That setting can be changed without downtime.

 ...Robert


Re: [PERFORM] : Performance Improvement Strategy

2011-10-05 Thread Venkat Balaji
Hello,

I was attempting to calculate the actual occupied space by a Table.

Below is what i did -

I summed up the avg_width of each column of a table from pg_stats, which
gives me the average size of a row (277 bytes).

select* sum(avg_width) as average_row_size from pg_stats *where
tablename='tablename'

 average_row_size
---
   277

(1 row)

Calculated the actual occupied space by rows in the table as below -

*Took the average_row_size * number_of_rows from pg_class*

select 277*reltuples/1024 as occupied_space from pg_class where
relname='tablename'; == 552 KB

 occupied_space
-
 552.6474609375

Calculated the actual Table size (600 kb)

select pg_size_pretty(pg_relation_size('tablename'));


pg_size_pretty

 600 KB

(1 row)

Calculated the free space with in the table (by scanning the pages - as
suggested by Shaun Thomas) -- 14 KB

SELECT pg_size_pretty(free_space) AS mb_free FROM pgstattuple('tablename');

 mb_free
-
 14 KB

(1 row)

600 KB is the size of the table (taken through pg_size_pretty)
14 KB is the free space (taken through contrib modules)
600+14 = 586 KB -- is the occupied space by normal calculation through
contrib modules. This is based on number of pages allocated to the table.
552 KB is the actual occupied size by the rows (taken by calculating avg row
size ). This is based on number of rows with in the pages.
586-552 = 34 KB -- is still free some where with in the occupied pages (
calculated through pg_stats and pg_class )
34 KB is still free within the pages ( each 8K ) which is basically taken as
occupied space.

This is similar concept which i successfully applied in an other RDBMS
Technology to calculate space usage metrics on production.
This is all calculated after considering Vacuum and Analyze jobs are
executed.

Please comment !

Sorry if this is too confusing and too long.

Thanks
VB

On Wed, Sep 21, 2011 at 6:33 PM, Shaun Thomas stho...@peak6.com wrote:

 On 09/20/2011 11:22 AM, Venkat Balaji wrote:

  Please help me understand how to calculate free space in Tables and
 Indexes even after vacuuming and analyzing is performed.


 Besides the query Mark gave you using freespacemap, there's also the
 pgstattuple contrib module. You'd use it like this:

 SELECT pg_size_pretty(free_space) AS mb_free
  FROM pgstattuple('some_table');

 Query must be run as a super-user, and I wouldn't recommend running it on
 huge tables, since it scans the actual data files to get its information.
 There's a lot of other useful information in that function, such as the
 number of dead rows.


  What i understand is that, even if we perform VACUUM ANALYZE
 regularly, the free space generated is not filled up.


 VACUUM does not actually generate free space. It locates and marks reusable
 tuples. Any future updates or inserts on that table will be put in those
 newly reclaimed spots, instead of being bolted onto the end of the table.


  I see lot of free spaces or free pages in Tables and Indexes. But, I
 need to give an exact calculation on how much space will be reclaimed
 after VACUUM FULL and RE-INDEXING.


 Why? If your database is so desperate for space, VACUUM and REINDEX won't
 really help you. A properly maintained database will still have a certain
 amount of bloat equal to the number of rows that change between
 maintenance intervals. One way or another, that space is going to be used by
 *something*.

 It sounds more like you need to tweak your autovacuum settings to be more
 aggressive if you're seeing significant enough turnover that your tables are
 bloating significantly. One of our tables, for instance, gets vacuumed more
 than once per hour because it experiences 1,000% turnover daily.

 --
 Shaun Thomas
 OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
 312-676-8870
 stho...@peak6.com

 __**

 See 
 http://www.peak6.com/email-**disclaimer/http://www.peak6.com/email-disclaimer/for
  terms and conditions related to this email



[PERFORM] : Column Performance in a query

2011-10-04 Thread Venkat Balaji
Hello Everyone,

Generally when it comes to query performance, I check how the vacuuming and
statistics collection is performed on Tables and Indexes hit by the query.

Apart from the above i check the code logic ( for any bad joins ) and column
statistics as well.

I got hold of two catalog tables pg_stats and pg_class.

Column avg_width and distinct in pg_stats gets me lot of sensible
information regarding, column values and size of the column.

Can someone help me know when the values in these columns are bound to
change ? Is it only when ANALYZE runs ?

I am about to calculate lot of metrics depending on above values. Please
help !

Thanks
Venkat


Re: [PERFORM] : PG9.0 - Checkpoint tuning and pg_stat_bgwriter

2011-10-04 Thread Venkat Balaji
Hello,

Sorry. I should have put some more details in the email.

I have got a situation where in i see the production system is loaded with
the checkpoints and at-least 1000+ buffers are being written for every
checkpoint.

Checkpoint occurs every 3 to 4 minutes and every checkpoint takes 150
seconds minimum to write off the buffers and 150+ seconds for checkpoint
syncing. A warning messages can be seen in the dbserver logs checkpoint
occuring too frequently.

I had a look at the pg_stat_bgwriter as well. Below is what i see.

 select * from pg_stat_bgwriter;

 checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean |
maxwritten_clean | buffers_backend | buffers_alloc
---+-++---+--+-+---
  9785 |   36649 | 493002109  |
282600872 |1276056 |  382124461| 7417638175
(1 row)

I am thinking of increasing the checkpoint_segments.

Below are our current settings -

checkpoint_segments = 8
checkpoint_timeout = 5 mins
checkpoint_completion_target = 0.5
bgwriter_delay = 200ms
bgwriter_lru_maxpages = 100
bgwriter_lru_multiplier = 2

Looking forward for suggestions.

Thanks
VB




On Thu, Sep 29, 2011 at 12:40 PM, Venkat Balaji venkat.bal...@verse.inwrote:

 Hello Everyone,

 We are experience a huge drop in performance for one of our production
 servers.

 I suspect this is because of high IO due to frequent Checkpoints. Attached
 is the excel sheet with checkpoint information we tracked.

 Below is the configuration we have

 checkpoint_segments = default
 checkpoint_timeout = default

 I suspect archive data generation to be around 250 MB.

 Please share your thoughts !

 Thanks
 VB






Re: [PERFORM] : PG9.0 - Checkpoint tuning and pg_stat_bgwriter

2011-10-04 Thread Venkat Balaji
Thanks Heikki !

Regards,
VB

On Tue, Oct 4, 2011 at 4:38 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 On 04.10.2011 13:50, Venkat Balaji wrote:

 I have got a situation where in i see the production system is loaded with
 the checkpoints and at-least 1000+ buffers are being written for every
 checkpoint.


 1000 buffers isn't very much, that's only 8 MB, so that's not alarming
 itself.


  I am thinking of increasing the checkpoint_segments.

 Below are our current settings -

 checkpoint_segments = 8
 checkpoint_timeout = 5 mins
 checkpoint_completion_target = 0.5
 bgwriter_delay = 200ms
 bgwriter_lru_maxpages = 100
 bgwriter_lru_multiplier = 2

 Looking forward for suggestions.


 Yep, increase checkpoint_segments. And you probably want to raise
 checkpoint_timeout too.

 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com



Re: [PERFORM] : PG9.0 - Checkpoint tuning and pg_stat_bgwriter

2011-10-04 Thread Venkat Balaji
I was thinking to increase checkpoint_segments to around 16 or 20.

I think 50 is a bit higher.

Greg,

Sure. I would collect the info from pg_stat_bgwriter on regular intervals.

As we have too many transactions going on I am thinking to collect the info
every 6 or 8 hrs.

Thanks
VB

On Wed, Oct 5, 2011 at 4:02 AM, Greg Smith g...@2ndquadrant.com wrote:

 On 10/04/2011 03:50 AM, Venkat Balaji wrote:

 I had a look at the pg_stat_bgwriter as well.


 Try saving it like this instead:

 select now(),* from pg_stat_bgwriter;

 And collect two data points, space a day or more apart.  That gives a lot
 more information about the rate at which things are actually happening.  The
 long-term totals are less interesting than that.

 Generally the first round of tuning work here is to increase
 checkpoint_segments until most checkpoints appear in checkpoints_timed
 rather than checkpoints_req.  After that, increasing checkpoint_timeout
 might also be useful.

 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.**
 org pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/**mailpref/pgsql-performancehttp://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] : Performance Improvement Strategy

2011-10-03 Thread Venkat Balaji
Hello,

Thanks for your suggestions !

We CLUSTERED a table using mostly used Index. Application is performing
better now.

Thanks
VB

On Tue, Sep 27, 2011 at 6:01 PM, Venkat Balaji venkat.bal...@verse.inwrote:

 Forgot to mention -

 Kevin,

 CLUSTER seems to be an very interesting concept to me.

 I am thinking to test the CLUSTER TABLE on our production according to the
 Index usage on the table.

 Will let you know once i get the results.

 Regards,
 VB

 On Tue, Sep 27, 2011 at 5:59 PM, Venkat Balaji venkat.bal...@verse.inwrote:

 We had performed VACUUM FULL on our production and performance has
 improved a lot !

 I started using pg_stattuple and pg_freespacemap for tracking freespace in
 the tables and Indexes and is helping us a lot.

 Thanks for all your inputs and help !

 Regards,
 VB


 On Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner 
 kevin.gritt...@wicourts.gov wrote:

 Venkat Balaji venkat.bal...@verse.in wrote:

  If i got it correct, CLUSTER would do the same what VACUUM FULL
  does (except being fast)

 CLUSTER copies the table (in the sequence of the specified index) to
 a new set of files, builds fresh indexes, and then replaces the
 original set of files with the new ones.  So you do need room on
 disk for a second copy of the table, but it tends to be much faster
 then VACUUM FULL in PostgreSQL versions before 9.0.  (Starting in
 9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
 the table data rather than using an index.)  REINDEX is not needed
 when using CLUSTER or 9.x VACUUM FULL.  Older versions of VACUUM
 FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
 generally a good idea.

 When choosing an index for CLUSTER, pick one on which you often
 search for a *range* of rows, if possible.  Like a name column if
 you do a lot of name searches.

 -Kevin






Re: [PERFORM] : Performance Improvement Strategy

2011-10-03 Thread Venkat Balaji
Thanks a lot Kevin !

This email has deepened my understanding on the clustering concept.

Keeping this in mind, I have recommended a new disk layout at the OS level
for our production servers so that IOs will be balanced on the disks as
well.

Currently, we do not have mount points divided according to the type of IOs.

I will share my recommended plan in an different email thread.

Thanks again for this detailed explanation.

Regards,
VB

On Mon, Oct 3, 2011 at 9:45 PM, Kevin Grittner
kevin.gritt...@wicourts.govwrote:

 Venkat Balaji venkat.bal...@verse.in wrote:

  We CLUSTERED a table using mostly used Index. Application is
  performing better now.

 CLUSTER can help in at least four ways:

 (1)  It eliminates bloat in the table heap.

 (2)  It eliminates bloat in the indexes.

 (3)  It can correct fragmentation in the underlying disk files.

 (4)  It can put tuples which are accessed by the same query into
 adjacent locations on disk, reducing physical disk access.

 An aggressive autovacuum configuration can generally prevent the
 first two from coming back to haunt you, and the third may not be a
 big problem (depending on your OS and file system), but that last
 one is a benefit which will degrade over time in most use cases --
 the order in the heap is set by the cluster, but not maintained
 after that.  If this ordering is a significant part of the
 performance improvement you're seeing, you may want to schedule some
 regular CLUSTER run.  It's hard to say what frequency would make
 sense, but if performance gradually deteriorates and a CLUSTER fixes
 it, you'll get a sense of how often it pays to do it.

 -Kevin



Re: [PERFORM] PostgreSQL-9.0 Monitoring System to improve performance

2011-09-30 Thread Venkat Balaji
Thanks Greg !

Sorry, I should have put it the other way.

Actually, I am looking for any tool (if exists) which gets me the following
information with one installation or so.

Please see my replies below.

Thanks
VB

On Wed, Sep 28, 2011 at 12:35 PM, Greg Smith g...@2ndquadrant.com wrote:

 Venkat Balaji wrote:


 1. Big Full Table Scans
 2. Table with high IOs (hot tables)
 3. Highly used Indexes
 4. Tables undergoing high DMLs with index scans 0 (with unused indexes)
 5. Index usage for heap blk hits
 6. Tracking Checkpoints


 This is fairly easy to collect and analyze.  You might take a look at
 pgstatspack to see how one program collects snapshots of this sort of
 information:  
 http://pgfoundry.org/projects/**pgstatspack/http://pgfoundry.org/projects/pgstatspack/

  I am in the process  of installing pgstatspack ( i have used it before ).
 We are waiting for the downtime (to load this through shared preloaded
 libraries).




 8. Buffer cache usage


 High-level information about this can be collected by things like the
 pg_statio* views.  If you want to actually look inside the buffer cache and
 get detailed statistics on it, that's a harder problem.  I have some sample
 queries for that sort of thing in my book.

 I do have pgstattuple contrib module installed and is collecting the data
 and loading it into the auditing tables.




  9. Tables, Indexes and Database growth statistics


 This is valuable information to monitor over time, but I'm not aware of any
 existing tools that track it well.  It won't be hard to collect it on your
 own though.

 We are getting it done on daily basis and we also have metrics of data
 growth

  7. Tracking CPU, IO and memory usage ( by PG processes ) -- desperately
 needed


 I'm not aware of any open-source tool that tracks this information yet.
  PostgreSQL has no idea what CPU, memory, and I/O is being done by the OS
 when you execute a query.  The operating system knows some of that, but has
 no idea what the database is doing.  You can see a real-time snapshot
 combining the two pieces of info using the pg_top program:
 http://ptop.projects.**postgresql.org/http://ptop.projects.postgresql.org/but
  I suspect what you want is a historical record of it instead.

 Writing something that tracks both at once and logs all the information for
 later analysis is one of the big missing pieces in PostgreSQL management.  I
 have some ideas for how to build such a thing.  But I expect it will take a
 few months of development time to get right, and I haven't come across
 someone yet who wants to fund that size of project for this purpose yet.

As of now i am relying on MPSTAT and will be testing NMON analyzer (this
gets me the graph)

 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: [PERFORM] PostgreSQL-9.0 Monitoring System to improve performance

2011-09-30 Thread Venkat Balaji
Hi Tomas,

I will let you know about check_postgres.pl.

We will explore pgmonitor as well.

The other tool we are working on is pgwatch, we found this very useful.

Thanks
VB

On Wed, Sep 28, 2011 at 5:44 PM, Tomas Vondra t...@fuzzy.cz wrote:

 On 28 Září 2011, 9:05, Greg Smith wrote:
  Venkat Balaji wrote:
 
  1. Big Full Table Scans
  2. Table with high IOs (hot tables)
  3. Highly used Indexes
  4. Tables undergoing high DMLs with index scans 0 (with unused indexes)
  5. Index usage for heap blk hits
  6. Tracking Checkpoints
 
  This is fairly easy to collect and analyze.  You might take a look at
  pgstatspack to see how one program collects snapshots of this sort of
  information:  http://pgfoundry.org/projects/pgstatspack/

 It's definitely fairly easy to collect, and pgstatspack help a lot. But
 interpreting the collected data is much harder, especially when it comes
 to indexes. For example UNIQUE indexes often have idx_scan=0, because
 checking the uniqueness is not an index scan. Other indexes may be created
 for rare queries (e.g. a batch running once a year), so you need a very
 long interval between the snapshots.

  8. Buffer cache usage
 
  High-level information about this can be collected by things like the
  pg_statio* views.  If you want to actually look inside the buffer cache
  and get detailed statistics on it, that's a harder problem.  I have some
  sample queries for that sort of thing in my book.

 There's an extension pg_buffercache for that (the queries are using it
 IIRC).

  9. Tables, Indexes and Database growth statistics
 
  This is valuable information to monitor over time, but I'm not aware of
  any existing tools that track it well.  It won't be hard to collect it
  on your own though.

 What about check_postgres.pl script?

  7. Tracking CPU, IO and memory usage ( by PG processes ) --
  desperately needed

 What about using check_postgres.pl and other plugins? Never used that
 though, so maybe there are issues I'm not aware of.

  I'm not aware of any open-source tool that tracks this information yet.
  PostgreSQL has no idea what CPU, memory, and I/O is being done by the OS
  when you execute a query.  The operating system knows some of that, but
  has no idea what the database is doing.  You can see a real-time
  snapshot combining the two pieces of info using the pg_top program:
  http://ptop.projects.postgresql.org/ but I suspect what you want is a
  historical record of it instead.
 
  Writing something that tracks both at once and logs all the information
  for later analysis is one of the big missing pieces in PostgreSQL
  management.  I have some ideas for how to build such a thing.  But I
  expect it will take a few months of development time to get right, and I
  haven't come across someone yet who wants to fund that size of project
  for this purpose yet.

 A long (long long long) time ago I wrote something like this, it's called
 pgmonitor and is available here:

  http://sourceforge.net/apps/trac/pgmonitor/

 But the development stalled (not a rare thing for projects developed by a
 single person) and I'm not quite sure about the right direction. Maybe
 it's worthless, maybe it would be a good starting point - feel free to
 comment.

 Tomas




Re: [PERFORM] : Create table taking time

2011-09-30 Thread Venkat Balaji
CPU load was hitting 100% constantly with high IOs.

We tuned some queries to decrease the CPU usage and everything is normal
now.

Thanks
VB

On Fri, Sep 30, 2011 at 10:52 AM, Venkat Balaji venkat.bal...@verse.inwrote:

 I did not calculate the IO behavior of the server.

 What i noticed for the logs is that, the checkpoints are occurring too
 frequently each checkpoint is taking up to minimum 80 - 200+ seconds to
 complete write and checkpoint sync is taking 80 - 200+ seconds to sync,
 which is  i believe IO intensive.

 Thanks
 VB



 On Thu, Sep 29, 2011 at 10:22 PM, Merlin Moncure mmonc...@gmail.comwrote:

 On Wed, Sep 28, 2011 at 12:06 PM, Venkat Balaji venkat.bal...@verse.in
 wrote:
  Hello Everyone,
  I am back with an issue (likely).
  I am trying to create a table in our production database, and is taking
 5
  seconds.
  We have executed VACUUM FULL and yet to run ANALYZE. Can i expect the
 CREATE
  TABLE to be faster after ANALYZE finishes ?
  Or is there anything serious ?

 just ruling out something obvious -- this is vanilla create table, not
 CREATE TABLE AS SELECT...?

 also, what's i/o wait -- are you sure your not i/o bound and waiting
 on transaction commit?

 merlin





Re: [PERFORM] : Create table taking time

2011-09-29 Thread Venkat Balaji
We had performed VACUUM FULL and ANALYZE on the whole database.

Yes, the CPU is ticking at 99-100% when i see the top command.

But, we have 8 CPUs with 6 cores each.

Thanks
VB


On Thu, Sep 29, 2011 at 12:44 AM, Scott Marlowe scott.marl...@gmail.comwrote:

 On Wed, Sep 28, 2011 at 11:06 AM, Venkat Balaji venkat.bal...@verse.in
 wrote:
  Hello Everyone,
  I am back with an issue (likely).
  I am trying to create a table in our production database, and is taking 5
  seconds.
  We have executed VACUUM FULL and yet to run ANALYZE. Can i expect the
 CREATE
  TABLE to be faster after ANALYZE finishes ?
  Or is there anything serious ?
  Please share your thoughts.

 Are your system tables heavily bloated?



Re: [PERFORM] : Create table taking time

2011-09-29 Thread Venkat Balaji
I did not calculate the IO behavior of the server.

What i noticed for the logs is that, the checkpoints are occurring too
frequently each checkpoint is taking up to minimum 80 - 200+ seconds to
complete write and checkpoint sync is taking 80 - 200+ seconds to sync,
which is  i believe IO intensive.

Thanks
VB



On Thu, Sep 29, 2011 at 10:22 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Wed, Sep 28, 2011 at 12:06 PM, Venkat Balaji venkat.bal...@verse.in
 wrote:
  Hello Everyone,
  I am back with an issue (likely).
  I am trying to create a table in our production database, and is taking 5
  seconds.
  We have executed VACUUM FULL and yet to run ANALYZE. Can i expect the
 CREATE
  TABLE to be faster after ANALYZE finishes ?
  Or is there anything serious ?

 just ruling out something obvious -- this is vanilla create table, not
 CREATE TABLE AS SELECT...?

 also, what's i/o wait -- are you sure your not i/o bound and waiting
 on transaction commit?

 merlin



Re: [PERFORM] : Tracking Full Table Scans

2011-09-28 Thread Venkat Balaji
Thanks Kevin !!

I will have a look at the source tree.

Regards
VB

On Tue, Sep 27, 2011 at 10:45 PM, Kevin Grittner 
kevin.gritt...@wicourts.gov wrote:

 Venkat Balaji venkat.bal...@verse.in wrote:

  I would like to know the difference between n_tup_upd and
  n_tup_hot_upd.

 A HOT update is used when none of the updated columns are used in an
 index and there is room for the new tuple (version of the row) on
 the same page as the old tuple.  This is faster for a number of
 reasons, and cleanup of the old tuple is a little different.

 If you want the gory implementation details, take a look at this
 file in the source tree:

 src/backend/access/heap/README.HOT

 -Kevin



Re: [PERFORM] : Tracking Full Table Scans

2011-09-28 Thread Venkat Balaji
Yes. I am looking for the justified full table scans.

If bigger tables are getting scanned, I would like to know %age rows scanned
against %age rows as the output.

If the query needs 80% of the rows as the output, then a full table scan is
always better.

I believe there is a possibility for this in Postgres. I think we can get
this using pg_stat_user_table, pg_statio_user_tables and pg_stats.

I will post the calculation once it i get it.

Thanks
VB

On Wed, Sep 28, 2011 at 6:25 AM, Craig Ringer ring...@ringerc.id.au wrote:

 On 09/28/2011 12:26 AM, Venkat Balaji wrote:

 Thanks a lot Kevin !!

 Yes. I intended to track full table scans first to ensure that only
 small tables or tables with very less pages are (as you said) getting
 scanned full.


 It can also be best to do a full table scan of a big table for some
 queries. If the query needs to touch all the data in a table - for example,
 for an aggregate - then the query will often complete fastest and with less
 disk use by using a sequential scan.

 I guess what you'd really want to know is to find out about queries that do
 seqscans to match relatively small fractions of the total tuples scanned, ie
 low-selectivity seqscans. I'm not sure whether or not it's possible to
 gather this data with PostgreSQL's current level of stats detail.

 --
 Craig Ringer



[PERFORM] : Create table taking time

2011-09-28 Thread Venkat Balaji
Hello Everyone,

I am back with an issue (likely).

I am trying to create a table in our production database, and is taking 5
seconds.

We have executed VACUUM FULL and yet to run ANALYZE. Can i expect the CREATE
TABLE to be faster after ANALYZE finishes ?

Or is there anything serious ?

Please share your thoughts.

Thanks
VB


[PERFORM] : Looking for PG Books

2011-09-28 Thread Venkat Balaji
Hello Everyone,

I have been working on PostgreSQL for quite a while (2 yrs) now.

I have got PostgreSQL 9.0 High Performance book today and quite excited to
go through it.

Please let me know any source where i can get more books on PG, I am
especially looking for books on PG internals, architecture, Backup 
Recovery and HA.

(This will help me do thorough research and testing on production. I would
like to come back with documentation on results, learning's, findings and
suggestions).

Looking forward for the information.

Regards,
VB


Re: [PERFORM] : Performance Improvement Strategy

2011-09-27 Thread Venkat Balaji
We had performed VACUUM FULL on our production and performance has improved
a lot !

I started using pg_stattuple and pg_freespacemap for tracking freespace in
the tables and Indexes and is helping us a lot.

Thanks for all your inputs and help !

Regards,
VB

On Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner 
kevin.gritt...@wicourts.gov wrote:

 Venkat Balaji venkat.bal...@verse.in wrote:

  If i got it correct, CLUSTER would do the same what VACUUM FULL
  does (except being fast)

 CLUSTER copies the table (in the sequence of the specified index) to
 a new set of files, builds fresh indexes, and then replaces the
 original set of files with the new ones.  So you do need room on
 disk for a second copy of the table, but it tends to be much faster
 then VACUUM FULL in PostgreSQL versions before 9.0.  (Starting in
 9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
 the table data rather than using an index.)  REINDEX is not needed
 when using CLUSTER or 9.x VACUUM FULL.  Older versions of VACUUM
 FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
 generally a good idea.

 When choosing an index for CLUSTER, pick one on which you often
 search for a *range* of rows, if possible.  Like a name column if
 you do a lot of name searches.

 -Kevin



Re: [PERFORM] : Performance Improvement Strategy

2011-09-27 Thread Venkat Balaji
Forgot to mention -

Kevin,

CLUSTER seems to be an very interesting concept to me.

I am thinking to test the CLUSTER TABLE on our production according to the
Index usage on the table.

Will let you know once i get the results.

Regards,
VB

On Tue, Sep 27, 2011 at 5:59 PM, Venkat Balaji venkat.bal...@verse.inwrote:

 We had performed VACUUM FULL on our production and performance has improved
 a lot !

 I started using pg_stattuple and pg_freespacemap for tracking freespace in
 the tables and Indexes and is helping us a lot.

 Thanks for all your inputs and help !

 Regards,
 VB


 On Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner 
 kevin.gritt...@wicourts.gov wrote:

 Venkat Balaji venkat.bal...@verse.in wrote:

  If i got it correct, CLUSTER would do the same what VACUUM FULL
  does (except being fast)

 CLUSTER copies the table (in the sequence of the specified index) to
 a new set of files, builds fresh indexes, and then replaces the
 original set of files with the new ones.  So you do need room on
 disk for a second copy of the table, but it tends to be much faster
 then VACUUM FULL in PostgreSQL versions before 9.0.  (Starting in
 9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
 the table data rather than using an index.)  REINDEX is not needed
 when using CLUSTER or 9.x VACUUM FULL.  Older versions of VACUUM
 FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
 generally a good idea.

 When choosing an index for CLUSTER, pick one on which you often
 search for a *range* of rows, if possible.  Like a name column if
 you do a lot of name searches.

 -Kevin





[PERFORM] : Tracking Full Table Scans

2011-09-27 Thread Venkat Balaji
Hello Everyone,

I am preparing a plan to track the tables undergoing Full Table Scans for
most number of times.

If i track seq_scan from the pg_stat_user_tables, will that help
(considering the latest analyzed ones) ?

Please help !

Thanks
VB


Re: [PERFORM] : Tracking Full Table Scans

2011-09-27 Thread Venkat Balaji
Thanks a lot Kevin !!

Yes. I intended to track full table scans first to ensure that only small
tables or tables with very less pages are (as you said) getting scanned
full.

I am yet to identify slow running queries. Will surely hit back with them in
future.

Thanks
VB



On Tue, Sep 27, 2011 at 8:02 PM, Kevin Grittner kevin.gritt...@wicourts.gov
 wrote:

 Venkat Balaji venkat.bal...@verse.in wrote:

  I am preparing a plan to track the tables undergoing Full Table
  Scans for most number of times.
 
  If i track seq_scan from the pg_stat_user_tables, will that help
  (considering the latest analyzed ones) ?

 Well, yeah; but be careful not to assume that a sequential scan is
 always a bad thing.  Here's our top ten tables for sequential scans
 in a database which is performing quite well:

 cc= select seq_scan, n_live_tup, relname
 cc-   from pg_stat_user_tables
 cc-   order by seq_scan desc
 cc-   limit 10;
  seq_scan | n_live_tup |  relname
 --++
  81264339 | 20 | MaintCode
  16840299 |  3 | DbTranImageStatus
  14905181 | 18 | ControlFeature
  11908114 | 10 | AgingBoundary
  8789288 | 22 | CtofcTypeCode
  7786110 |  6 | PrefCounty
  6303959 |  9 | ProtOrderHistEvent
  5835430 |  1 | ControlRecord
  5466806 |  1 | ControlAccounting
  5202028 | 12 | ProtEventOrderType
 (10 rows)

 You'll notice that they are all very small tables.  In all cases the
 entire heap fits in one page, so any form of indexed scan would at
 least double the number of pages visited, and slow things down.

 If you have queries which are not performing to expectations, your
 best bet might be to pick one of them and post it here, following
 the advice on this page:

 http://wiki.postgresql.org/wiki/SlowQueryQuestions

 -Kevin



[PERFORM] PostgreSQL-9.0 Monitoring System to improve performance

2011-09-27 Thread Venkat Balaji
Hello Everyone,

I am implementing a PostgreSQL performance monitoring system (to monitor the
below) which would help us understand the database behavior -

1. Big Full Table Scans
2. Table with high IOs (hot tables)
3. Highly used Indexes
4. Tables undergoing high DMLs with index scans 0 (with unused indexes)
5. Index usage for heap blk hits
6. Tracking Checkpoints
7. Tracking CPU, IO and memory usage ( by PG processes ) -- desperately
needed
8. Buffer cache usage
9. Tables, Indexes and Database growth statistics

and more..

I am struck at building a script or monitoring tool which gets us CPU usage,
IO metrics and RAM usage of the database server.

Can someone please help me achieve this ?

I need to monitor a 12 processor system with 6 cores. I need to know how
each CPU is performing.

Please help me know the availability of any open source monitoring tools or
scripts for PG-9.0 on RHEL5.

I will hit back with questions regarding monitoring in coming days.

Thanks
VB


Re: [PERFORM] : Tracking Full Table Scans

2011-09-27 Thread Venkat Balaji
I would like to know the difference between n_tup_upd and n_tup_hot_upd.

Thanks
VB

On Tue, Sep 27, 2011 at 9:56 PM, Venkat Balaji venkat.bal...@verse.inwrote:

 Thanks a lot Kevin !!

 Yes. I intended to track full table scans first to ensure that only small
 tables or tables with very less pages are (as you said) getting scanned
 full.

 I am yet to identify slow running queries. Will surely hit back with them
 in future.

 Thanks
 VB



 On Tue, Sep 27, 2011 at 8:02 PM, Kevin Grittner 
 kevin.gritt...@wicourts.gov wrote:

 Venkat Balaji venkat.bal...@verse.in wrote:

  I am preparing a plan to track the tables undergoing Full Table
  Scans for most number of times.
 
  If i track seq_scan from the pg_stat_user_tables, will that help
  (considering the latest analyzed ones) ?

 Well, yeah; but be careful not to assume that a sequential scan is
 always a bad thing.  Here's our top ten tables for sequential scans
 in a database which is performing quite well:

 cc= select seq_scan, n_live_tup, relname
 cc-   from pg_stat_user_tables
 cc-   order by seq_scan desc
 cc-   limit 10;
  seq_scan | n_live_tup |  relname
 --++
  81264339 | 20 | MaintCode
  16840299 |  3 | DbTranImageStatus
  14905181 | 18 | ControlFeature
  11908114 | 10 | AgingBoundary
  8789288 | 22 | CtofcTypeCode
  7786110 |  6 | PrefCounty
  6303959 |  9 | ProtOrderHistEvent
  5835430 |  1 | ControlRecord
  5466806 |  1 | ControlAccounting
  5202028 | 12 | ProtEventOrderType
 (10 rows)

 You'll notice that they are all very small tables.  In all cases the
 entire heap fits in one page, so any form of indexed scan would at
 least double the number of pages visited, and slow things down.

 If you have queries which are not performing to expectations, your
 best bet might be to pick one of them and post it here, following
 the advice on this page:

 http://wiki.postgresql.org/wiki/SlowQueryQuestions

 -Kevin





Re: [PERFORM] : Performance Improvement Strategy

2011-09-21 Thread Venkat Balaji
Can you please help me understand what blkno column refers to ?

Thanks
Venkat

On Wed, Sep 21, 2011 at 11:08 AM, Venkat Balaji venkat.bal...@verse.inwrote:

 Thank Everyone for your inputs !

 Mark,

 We are using 9.0, so, i should be able to make use of this freespacemap
 contrib module and would get back to you with the results.

 I was using below query (which i got it by googling)..

 But, was not sure, if its picking up the correct information. I want to
 avoid mis-prediction cost after whole production has been scheduled for
 downtime for maintenance.

 SELECT
   current_database(), schemaname, tablename, /*reltuples::bigint,
 relpages::bigint, otta,*/
   ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS
 tbloat,
   CASE WHEN relpages  otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END
 AS wastedbytes,
   iname, /*ituples::bigint, ipages::bigint, iotta,*/
   ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric
 END,1) AS ibloat,
   CASE WHEN ipages  iotta THEN 0 ELSE bs*(ipages-iotta) END AS
 wastedibytes
 FROM (
   SELECT
 schemaname, tablename, cc.reltuples, cc.relpages, bs,
 CEIL((cc.reltuples*((datahdr+ma-
   (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma
 END))+nullhdr2+4))/(bs-20::float)) AS otta,
 COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples,
 COALESCE(c2.relpages,0) AS ipages,
 COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta
 -- very rough approximation, assumes all cols
   FROM (
 SELECT
   ma,bs,schemaname,tablename,
   (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma
 END)))::numeric AS datahdr,
   (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE
 nullhdr%ma END))) AS nullhdr2
 FROM (
   SELECT
 schemaname, tablename, hdr, ma, bs,
 SUM((1-null_frac)*avg_width) AS datawidth,
 MAX(null_frac) AS maxfracsum,
 hdr+(
   SELECT 1+count(*)/8
   FROM pg_stats s2
   WHERE null_frac0 AND s2.schemaname = s.schemaname AND
 s2.tablename = s.tablename
 ) AS nullhdr
   FROM pg_stats s, (
 SELECT
   (SELECT current_setting('block_size')::numeric) AS bs,
   CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE
 23 END AS hdr,
   CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
 FROM (SELECT version() AS v) AS foo
   ) AS constants
   GROUP BY 1,2,3,4,5
 ) AS foo
   ) AS rs
   JOIN pg_class cc ON cc.relname = rs.tablename
   JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname =
 rs.schemaname AND nn.nspname  'information_schema'
   LEFT JOIN pg_index i ON indrelid = cc.oid
   LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
 ) AS sml
 ORDER BY wastedbytes DESC

 Thanks
 Venkat


 On Wed, Sep 21, 2011 at 3:40 AM, Mark Kirkwood 
 mark.kirkw...@catalyst.net.nz wrote:

 On 21/09/11 10:05, Mark Kirkwood wrote:


 ...then using the freespacemap contrib module should give very accurate
 results:


 Sorry, should have said - for 8.4 and later!


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.**
 org pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/**mailpref/pgsql-performancehttp://www.postgresql.org/mailpref/pgsql-performance





Re: [PERFORM] REINDEX not working for wastedspace

2011-09-21 Thread Venkat Balaji
Could you please let us know if you have analyzed after the re-indexing is
done ?

This must show differences for only Indexes not the Tables.

For Tables, you need to do VACUUM FULL to show the difference.

Thanks
Venkat

On Wed, Sep 21, 2011 at 12:31 PM, AI Rumman rumman...@gmail.com wrote:

 I am using Postgresql 9.0.1.

 Using the query http://wiki.postgresql.org/wiki/Show_database_bloat, I got
 the following result for a table:

 -[ RECORD 1 ]+---
 current_database | crm
 schemaname   | public
 tablename| _attachments
 tbloat   | 0.9
 wastedbytes  | 0
 iname| attachments_description_type_attachmentsid_idx
 ibloat   | 2.3
 wastedibytes | 5439488
 -[ RECORD 2 ]+---
 current_database | crm
 schemaname   | public
 tablename| _attachments
 tbloat   | 0.9
 wastedbytes  | 0
 iname| attachments_attachmentsid_idx
 ibloat   | 0.2
 wastedibytes | 0
 -[ RECORD 3 ]+---
 current_database | crm
 schemaname   | public
 tablename| _attachments
 tbloat   | 0.9
 wastedbytes  | 0
 iname| _attachments_pkey
 ibloat   | 0.2
 wastedibytes | 0

 I REINDEXED  both the indexes and table, but I did not find any change in
 wastedspace or wastedispace.
 Could you please tell me why?



Re: [PERFORM] : Performance Improvement Strategy

2011-09-21 Thread Venkat Balaji
Thank you very much for your detailed explanation !

I will be working on our existing auto-vacuuming strategy to see
if that's optimal. But, we do have VACUUM VERBOSE ANALYZE running at the
cluster level every day and auto-vacuum is aggressive for highly active
tables.

Today, we have vacuumed a 10GB table and the table size decreased to 5 GB.

I understand that, it would very expensive for the table to reclaim the
space back from the filesystem. We have decided to do the maintenance after
a thorough analysis and our databases were not subjected to any kind of
maintenance activity since 2 yrs (with downtime).

I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE to
ensure that IO performance and Indexing performance would be good and the PG
optimizer would pick up the optimal plan. As said earlier, our databases
have never been part of any re-organization since 2 years and are highly
transactional databases. I believe that, performing VACUUM FULL and
RE-INDEXING would have tightly packed rows (in every page) would ensure good
IOs.

I might have not put across the explanation in an understandable manner.

Please help me know the following -

1. When would pg_stat_user_tables will be updated and what would the
information show ?
2. Will the information about dead-rows and live-rows vanish after VACUUM or
ANALYZE or VACUUM FULL ?

I am just preparing a monitoring system which would help us know the rate of
bloats and data generation on daily basis.

Sorry for the long email !

Looking forward for your help !

Thanks
Venkat




On Wed, Sep 21, 2011 at 7:27 PM, Kevin Grittner kevin.gritt...@wicourts.gov
 wrote:

 Shaun Thomas stho...@peak6.com wrote:
  Venkat Balaji wrote:

  I see lot of free spaces or free pages in Tables and Indexes.
  But, I need to give an exact calculation on how much space will
  be reclaimed after VACUUM FULL and RE-INDEXING.
 
  Why?

 I've been wondering that, too.  And talking about the space being
 reclaimed seems to be at odds with your subject line.  The space
 is given up by the database engine to the file system free space,
 where reuse by the database will be much more expensive.  For good
 performance you want some free space in the tables and indexes,
 where it can be allocated to new tuples without going out through OS
 calls to the file system.

 Clearly, if free space gets higher than necessary to support
 creation of new tuples, it can start to harm performance, and you
 may need to take aggressive action (such as CLUSTER) to reclaim it;
 but any time you find it necessary to do *that* you should be
 investigating what went wrong to put you in such a spot.  Either
 your autovacuum is (as Shaun suggested) not aggressive enough, or
 you have some long running transaction (possibly idle in
 transaction) which is preventing vacuums from doing their work
 effectively.  Investigating that is going to help more than
 calculating just how much space the database is going to give up to
 file system free space.

 -Kevin



Re: [PERFORM] REINDEX not working for wastedspace

2011-09-21 Thread Venkat Balaji
It is very important to remove it from the WIKI page.

I ran it on production PG9.0 and it does not error out and displays numbered
output.

I noticed that, this works till PG-8.2 (as per the message).

Venkat

On Wed, Sep 21, 2011 at 8:25 PM, Shaun Thomas stho...@peak6.com wrote:

 On 09/21/2011 09:12 AM, Tom Lane wrote:

  The PG wiki is editable by anyone who signs up for an account.  Feel
 free to put in an appropriate disclaimer, or improve the sample
 query.


 Ah, well then. I do have an account, but thought there were more granular
 page restrictions than that. I may have to start wading into them when I see
 stuff like this. :)


 --
 Shaun Thomas
 OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
 312-676-8870
 stho...@peak6.com

 __**

 See 
 http://www.peak6.com/email-**disclaimer/http://www.peak6.com/email-disclaimer/for
  terms and conditions related to this email

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.**
 org pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/**mailpref/pgsql-performancehttp://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] : Performance Improvement Strategy

2011-09-21 Thread Venkat Balaji
Thanks Greg !

If i got it correct, CLUSTER would do the same what VACUUM FULL does (except
being fast).

CLUSTER is recommended only because it is faster ? As per the link, the
table would be unavailable (for shorter period compared to VACUUM FULL) when
CLUSTER is executed as well. Hope i got it correct !

Thanks
Venkat

On Wed, Sep 21, 2011 at 11:27 PM, Greg Smith g...@2ndquadrant.com wrote:

 On 09/21/2011 12:13 PM, Venkat Balaji wrote:

 I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE to
 ensure that IO performance and Indexing performance would be good



 Read 
 http://wiki.postgresql.org/**wiki/VACUUM_FULLhttp://wiki.postgresql.org/wiki/VACUUM_FULLbefore
  you run VACUUM FULL.  You probably don't want to do that.  A
 multi-gigabyte table can easily be unavailable for several hours if you
 execute VACUUM FULL against it.  CLUSTER is almost always faster.

 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.**
 org pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/**mailpref/pgsql-performancehttp://www.postgresql.org/mailpref/pgsql-performance



[PERFORM] : Performance Improvement Strategy

2011-09-20 Thread Venkat Balaji
Hello Everyone,

I had  posted  a query in GENERAL category, not sure if that was the
correct category to post.

Please help me understand how to calculate free space in Tables and Indexes
even after vacuuming and analyzing is performed.

What i understand is that, even if we perform VACUUM ANALYZE regularly, the
free space generated is not filled up.

I see lot of free spaces or free pages in Tables and Indexes. But, I need to
give an exact calculation on how much space will be reclaimed after VACUUM
FULL and RE-INDEXING.

Is there any standard procedure or process to calculate the same ?

Please help !

Thanks
Venkat


Re: [PERFORM] : Performance Improvement Strategy

2011-09-20 Thread Venkat Balaji
Thank Everyone for your inputs !

Mark,

We are using 9.0, so, i should be able to make use of this freespacemap
contrib module and would get back to you with the results.

I was using below query (which i got it by googling)..

But, was not sure, if its picking up the correct information. I want to
avoid mis-prediction cost after whole production has been scheduled for
downtime for maintenance.

SELECT
  current_database(), schemaname, tablename, /*reltuples::bigint,
relpages::bigint, otta,*/
  ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS
tbloat,
  CASE WHEN relpages  otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END
AS wastedbytes,
  iname, /*ituples::bigint, ipages::bigint, iotta,*/
  ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric
END,1) AS ibloat,
  CASE WHEN ipages  iotta THEN 0 ELSE bs*(ipages-iotta) END AS
wastedibytes
FROM (
  SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
  (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma
END))+nullhdr2+4))/(bs-20::float)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples,
COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta --
very rough approximation, assumes all cols
  FROM (
SELECT
  ma,bs,schemaname,tablename,
  (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma
END)))::numeric AS datahdr,
  (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE
nullhdr%ma END))) AS nullhdr2
FROM (
  SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
  SELECT 1+count(*)/8
  FROM pg_stats s2
  WHERE null_frac0 AND s2.schemaname = s.schemaname AND
s2.tablename = s.tablename
) AS nullhdr
  FROM pg_stats s, (
SELECT
  (SELECT current_setting('block_size')::numeric) AS bs,
  CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23
END AS hdr,
  CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
  ) AS constants
  GROUP BY 1,2,3,4,5
) AS foo
  ) AS rs
  JOIN pg_class cc ON cc.relname = rs.tablename
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname =
rs.schemaname AND nn.nspname  'information_schema'
  LEFT JOIN pg_index i ON indrelid = cc.oid
  LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC

Thanks
Venkat

On Wed, Sep 21, 2011 at 3:40 AM, Mark Kirkwood 
mark.kirkw...@catalyst.net.nz wrote:

 On 21/09/11 10:05, Mark Kirkwood wrote:


 ...then using the freespacemap contrib module should give very accurate
 results:


 Sorry, should have said - for 8.4 and later!


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.**
 org pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/**mailpref/pgsql-performancehttp://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Re: How to track number of connections and hosts to Postgres cluster

2011-09-04 Thread Venkat Balaji
Hi Scott,

Yes, we are logging connections and disconnections with duration as well.

We have process of rolling out at every 500MB and old log files are deleted
before a certain period of time.

Thanks a lot for your help !

Regards,
Venkat

On Fri, Sep 2, 2011 at 12:12 PM, Scott Marlowe scott.marl...@gmail.comwrote:

 On Thu, Sep 1, 2011 at 11:46 PM, Venkat Balaji venkat.bal...@verse.in
 wrote:
  Hi Scott,
  Log generation rate -
  500MB size of log file is generated within minimum 3 mins to maximum of
 20
  mins depending on the database behavior.
  I did not understand the fsync stuff you mentioned. Please help me know
  how would fsync is related to log generation or logging host IPs in the
 log

 So you're generating logs at a rate of about 166MB a minute or 2.7MB/s
  Seagates from the early 90s are faster than that.  Are you logging
 more than just connections and disconnections? If you log just those
 what's the rate?

 fsync is when the OS says to write to disk and the disk confirms the
 write is complete.  It probably doesn't matter here whether the file
 system is using a journaling method that's real safe or not, and you
 can go to something like ext2 where there's no journaling and probably
 do fine on a dedicated SATA drive or pair if you want them redundant.

 The real issue then is what to do with old log files.  Right now
 you're creating them at 10G an hour, or 240G a day.  So you'll need
 some cron job to go in and delete the old ones.  Still with a 1TB
 drive it'll take about 4 days to fill up, so it's not like you're
 gonna run out of space in a few minutes or anything.

 Since log files are pretty much written sequentially they don't need
 the fastest drives ever made.  Most modern 7200RPM 3.5 SATA drives
 can write at least at 50 or 60 MB/s on their slowest portions.  Just
 rotate them hourly or daily or whatever and process them and delete
 them.



Re: [PERFORM] Re: How to track number of connections and hosts to Postgres cluster

2011-09-01 Thread Venkat Balaji
Hi Scott,

Log generation rate -

500MB size of log file is generated within minimum 3 mins to maximum of 20
mins depending on the database behavior.

I did not understand the fsync stuff you mentioned. Please help me know
how would fsync is related to log generation or logging host IPs in the log
file ?

Thanks
Venkat

On Tue, Aug 30, 2011 at 12:09 PM, Scott Marlowe scott.marl...@gmail.comwrote:

 On Mon, Aug 29, 2011 at 11:55 PM, Venkat Balaji venkat.bal...@verse.in
 wrote:
  If i notice high IO's and huge log generation, then i think Greg
 Spileburg
  has suggested a good idea of using tcpdump on a different server. I would
  use this utility and see how it works (never used it before). Greg
  Spileburg, please  help me with any sources of documents you have to use
  tcpdump.

 There's also a lot to be said for dumping to a dedicated local drive
 with fsync turned off.  They're logs so you can chance losing them by
 putting them on a cheap fast 7200 rpm SATA drive.  If your logs take
 up more than a few megs a second then they are coming out really fast.
  Do you know what your log generation rate in bytes/second is?



Re: [PERFORM] Query performance issue

2011-08-31 Thread Venkat Balaji
Missed out looping in community...

On Wed, Aug 31, 2011 at 5:01 PM, Venkat Balaji venkat.bal...@verse.inwrote:

 Could you help us know the tables and columns on which Indexes are built ?

 Query is performing sorting based on key upper(column) and that is where i
 believe the cost is high.

 The 'upper'  function is used up in the where clause?

 Thanks
 Venkat


 On Wed, Aug 31, 2011 at 4:49 PM, Jayadevan M jayadevan.maym...@ibsplc.com
  wrote:

 Hello,

  
   Please run EXPLAIN ANALYZE on the query and post that, it's hard to
 say
   what's wrong from just the query plan, without knowing where the time
 is
   actually spent.
  Here is the explain analyze
  http://explain.depesz.com/s/MY1

 Going through the url tells me that statistics may be off. I will try
 analyzing the tables. That should help?
 Regards,
 Jayadevan





 DISCLAIMER:

 The information in this e-mail and any attachment is intended only for
 the person to whom it is addressed and may contain confidential and/or
 privileged material. If you have received this e-mail in error, kindly
 contact the sender and destroy all copies of the original communication. IBS
 makes no warranty, express or implied, nor guarantees the accuracy, adequacy
 or completeness of the information contained in this email or any attachment
 and is not liable for any errors, defects, omissions, viruses or for
 resultant loss or damage, if any, direct or indirect.








Re: [PERFORM] Re: How to track number of connections and hosts to Postgres cluster

2011-08-29 Thread Venkat Balaji
Thanks to all for your very helpful replies !

As Greg Smith rightly said, i faced a problem of missing connections between
the runs. I even ran the cron every less than a second, but, still that
would become too many runs per second and later i need to take the burden of
calculating every thing from the log.

I did not really calculate the IO load while the logging is on. I would
switch on log_connections and log_disconnections to log the number of
connections and duration of a connection.

If i notice high IO's and huge log generation, then i think Greg Spileburg
has suggested a good idea of using tcpdump on a different server. I would
use this utility and see how it works (never used it before). Greg
Spileburg, please  help me with any sources of documents you have to use
tcpdump.

Thanks again and sorry for replying late on this !

Regards,
Venkat

On Thu, Aug 25, 2011 at 6:02 AM, MirrorX mirr...@gmail.com wrote:

 lately i did sth similar in one of our servers, to keep track of active,
 idle
 and idle in transaction connections so as to make some optimization in the
 connection pooling and i didn't notice any serious io activity there (had
 the cron job run every minute). so imho unless the server is seriously io
 bound at the moment, you won't notice any difference

 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/How-to-track-number-of-connections-and-hosts-to-Postgres-cluster-tp4729546p4732518.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



[PERFORM] How to track number of connections and hosts to Postgres cluster

2011-08-24 Thread Venkat Balaji
Hello Everyone,

I am working on an alert script to track the number of connections with the
host IPs to the Postgres cluster.

1. I need all the host IPs making a connection to Postgres Cluster (even for
a fraction of second).
2. I would also want to track number of IDLE connections, IDLE IN
TRANSACTION connections and length of the connections as well.

I would be making use of pg_stat_activity and also thought of enabling
logging the host ips in the db server log files which seems to be expensive
for me (in terms of IO and logfile size).

Please let me know you if there are any alternatives.

Thanks
Venkat


Re: [PERFORM] How to track number of connections and hosts to Postgres cluster

2011-08-24 Thread Venkat Balaji
Thanks Guillaume !!

But, if put log_connections to on and log_disconnections to on wouldn't the
Postgres be logging in lot of data ?

Will this not be IO intensive ? I understand that this is the best way, but,
would want to know if there is an other way to reduce IO ( may be through
queries to catalog tables ).

Thanks
Venkat

On Wed, Aug 24, 2011 at 1:19 PM, Guillaume Lelarge
guilla...@lelarge.infowrote:

 On Wed, 2011-08-24 at 13:05 +0530, Venkat Balaji wrote:
  Hello Everyone,
 
  I am working on an alert script to track the number of connections with
 the
  host IPs to the Postgres cluster.
 
  1. I need all the host IPs making a connection to Postgres Cluster (even
 for
  a fraction of second).

 You should set log_connections to on.

  2. I would also want to track number of IDLE connections, IDLE IN
  TRANSACTION connections and length of the connections as well.
 

 IDLE and IDLE in transactions are the kind of informations you get in
 pg_stat_activity.

 Length of connections, you can get it with log_disconnections.

  I would be making use of pg_stat_activity and also thought of enabling
  logging the host ips in the db server log files which seems to be
 expensive
  for me (in terms of IO and logfile size).
 

 Using pg_stat_activity won't get you really small connections. You need
 log_connections for that, and log_disconnections for the duration of
 connections. So you'll have to work on a tool that could get some
 informations with queries on pg_stat_activity, and that could read
 PostgreSQL log files.


 --
 Guillaume
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com




Re: [PERFORM] How to track number of connections and hosts to Postgres cluster

2011-08-24 Thread Venkat Balaji
But, the information vanishes if the application logs off.

I am looking for an alternative to track the total amount of the connections
with the host IPs through a Cron job.

What could be the frequency of cron ?

I know the best is using log_connections and log_disconnections parameters,
but, information logged would be too high and is also IO intensive.

Thanks
Venkat

On Wed, Aug 24, 2011 at 4:39 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

 **
 pg_stat_activity keeps track of all this information.

 select * from pg_stat_activity where datname='databasename';




 Venkat Balaji wrote:

 Thanks Guillaume !!

  But, if put log_connections to on and log_disconnections to on wouldn't
 the Postgres be logging in lot of data ?

  Will this not be IO intensive ? I understand that this is the best way,
 but, would want to know if there is an other way to reduce IO ( may be
 through queries to catalog tables ).

  Thanks
 Venkat

 On Wed, Aug 24, 2011 at 1:19 PM, Guillaume Lelarge guilla...@lelarge.info
  wrote:

 On Wed, 2011-08-24 at 13:05 +0530, Venkat Balaji wrote:
  Hello Everyone,
 
  I am working on an alert script to track the number of connections with
 the
  host IPs to the Postgres cluster.
 
  1. I need all the host IPs making a connection to Postgres Cluster (even
 for
  a fraction of second).

  You should set log_connections to on.

  2. I would also want to track number of IDLE connections, IDLE IN
  TRANSACTION connections and length of the connections as well.
 

  IDLE and IDLE in transactions are the kind of informations you get in
 pg_stat_activity.

 Length of connections, you can get it with log_disconnections.

  I would be making use of pg_stat_activity and also thought of enabling
  logging the host ips in the db server log files which seems to be
 expensive
  for me (in terms of IO and logfile size).
 

  Using pg_stat_activity won't get you really small connections. You need
 log_connections for that, and log_disconnections for the duration of
 connections. So you'll have to work on a tool that could get some
 informations with queries on pg_stat_activity, and that could read
 PostgreSQL log files.


 --
 Guillaume
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com