Re: [PERFORM] Ineffective autovacuum

2011-09-27 Thread Royce Ausburn

On 27/09/2011, at 8:29 PM, Marti Raudsepp wrote:

> 1. First things first: vacuum cannot delete tuples that are still
> visible to any old running transactions. You might have some very long
> queries or transactions that prevent it from cleaning properly:
> 
> select * from pg_stat_activity where xact_start < now()-interval '10 minutes';

Thanks -- that query is very handy.  I suspect this might be the cause of our 
woes as this query results in a handful of long lived connections, however 
they're connections to databases other than the one that I'm having trouble 
with.  

I've checked up on the FSM as you suggested, I don't think that's the problem 
as there're no warnings in the verbose output nor the logs.  But another clue:

DETAIL:  93 dead row versions cannot be removed yet.

After clearing those stuffed transactions vacuum verbose manages to clear away 
all the dead rows… That's confirmation enough for me - Now to find the 
application bugs - Thanks Tom, Marti & Scott for your help!

--Royce



Re: [PERFORM] Ineffective autovacuum

2011-09-26 Thread Royce Ausburn



On 27/09/2011, at 2:21 PM, Tom Lane wrote:

> Royce Ausburn  writes:
>> I have a problem with autovacuum apparently not doing the job I need it to 
>> do.
> 
> Hm, I wonder whether you're getting bit by bug #5759, which was fixed
> after 8.3.12.

If this were the case, would I see lots of auto vacuum worker processes in ps 
that are essentially doing nothing because they're sleeping all the time?  If 
so, then I think perhaps not.

> 
>> I have a table named datasession that is frequently inserted, updated and 
>> deleted from.  Typically the table will have a few thousand rows in it.  
>> Each row typically survives a few days and is updated every 5 - 10 mins.  
>> The application receives unreliable, potentially duplicate data from its 
>> source, so this table is heavily used for synchronising application threads 
>> as well.  A typical access pattern is:
> 
>> - tx begin
>> - SELECT FOR UPDATE on a single row
>> - Do some application processing (1 - 100 ms)
>> - Possibly UPDATE the row
>> - tx commit
> 
> Transactions of that form would not interfere with autovacuum.  You'd
> need something that wants exclusive lock, like a schema change.
> 
>> I've read some recent threads and found a discussion (below) on auto vacuum 
>> that mentions auto vacuum will be cancelled when a client requests a lock 
>> that auto vacuum is using∑ My questions:
>> 1) Does it look like I'm affected by the same problem as in the below 
>> discussion?
> 
> Not unless you're seeing a lot of "canceling autovacuum task" messages
> in the postmaster log.

Okay - This is not the case.

Since sending this first email I've up'd the autovacuum log level and I've 
noticed that the same tables seem to be auto vacuum'd over and over again… Some 
of the tables are a bit surprising in that they're updated semi-regularly, but 
not enough (I'd think) to warrant an autovacuum every few minutes… Is this 
unusual?


Perhaps unrelated: I've done some digging around and happened across a nightly 
task doing:

select pg_stat_reset()

on each of the databases in the cluster…. I've no idea why we're doing that 
(and our usual sysadmin / DBA has resigned, so I doubt I'll ever know).  There 
must have been a reason at the time, but I wonder if this might be interfering 
with things?

At any rate, I think the logs might glean some more interesting information, 
I'll let it alone for a few hours and hopefully I'll have some more useful 
information.

--Royce


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Ineffective autovacuum

2011-09-26 Thread Royce Ausburn
Hi all,

I have a problem with autovacuum apparently not doing the job I need it to do.

I have a table named datasession that is frequently inserted, updated and 
deleted from.  Typically the table will have a few thousand rows in it.  Each 
row typically survives a few days and is updated every 5 - 10 mins.  The 
application receives unreliable, potentially duplicate data from its source, so 
this table is heavily used for synchronising application threads as well.  A 
typical access pattern is:

- tx begin
- SELECT FOR UPDATE on a single row
- Do some application processing (1 - 100 ms)
- Possibly UPDATE the row
- tx commit

In a few instances of our application we're seeing this table grow obscenely to 
the point where our monitoring servers get us out of bed to manually vacuum.  I 
like sleep, so I want to fix this =D

I've read some recent threads and found a discussion (below) on auto vacuum 
that mentions auto vacuum will be cancelled when a client requests a lock that 
auto vacuum is using… My questions:

1) Does it look like I'm affected by the same problem as in the below 
discussion?

2) Are there better solutions to this problem than a periodic task that 
vacuums/truncates-and-rebuilds the table?  


Perhaps relevant info:


# select version();
 version
  
--
 PostgreSQL 8.3.12 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4 
(Ubuntu 4.2.4-1ubuntu3)
(1 row)

Auto vacuum and vacuum parameters are set to the factory defaults.

Cheers,

--Royce

> From: Tom Lane 
> Subject: Re: [GENERAL] Vacuum as "easily obtained" locks 
> Date: 4 August 2011 1:52:02 AM AEST
> To: Michael Graham 
> Cc: Pavan Deolasee , pgsql-gene...@postgresql.org
> 
>>> On Wed, 2011-08-03 at 11:40 -0400, Tom Lane wrote:
>>> The other problem is that once autovacuum has gotten the lock, it has
>>> to keep it for long enough to re-scan the truncatable pages (to make
>>> sure they're still empty).  And it is set up so that any access to the
>>> table will kick autovacuum off the lock.  An access pattern like that
>>> would very likely prevent it from ever truncating, if there are a lot
>>> of pages that need to be truncated.  (There's been some discussion of
>>> modifying this behavior, but nothing's been done about it yet.) 

> Michael Graham  writes:
>> Ah!  This looks like it is very much the issue.  Since I've got around
>> 150GB of data that should be truncatable and a select every ~2s.
> 
>> Just to confirm would postgres write:
> 
>> 2011-08-03 16:09:55 BST ERROR:  canceling autovacuum task
>> 2011-08-03 16:09:55 BST CONTEXT:  automatic vacuum of table
>> "traffic.public.logdata5queue"
> 
>> Under those circumstances?
> 
> Yup ...
> 
> If you do a manual VACUUM, it won't allow itself to get kicked off the
> lock ... but as noted upthread, that will mean your other queries get
> blocked till it's done.  Not sure there's any simple fix for this that
> doesn't involve some downtime.
> 
>   regards, tom lane
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general




Re: [PERFORM] Prepared statements and suboptimal plans

2011-09-21 Thread Royce Ausburn

Sorry all - this was a duplicate from another of my addresses =(  Thanks to all 
that have helped out on both threads.





On 21/09/2011, at 8:44 AM, Royce Ausburn wrote:

> Hi all,
> 
> It looks like I've been hit with this well known issue.  I have a complicated 
> query that is intended to run every few minutes, I'm using JDBC's 
> Connection.prepareStatement() mostly for nice parameterisation, but postgres 
> produces a suboptimal plan due to its lack of information when the statement 
> is prepared.
> 
> I've been following the mailing list for a few years and I've seen this topic 
> come up a bit.  I've just done a quick google and I'm not quite sure how to 
> fix this short of manually substituting my query parameters in to a query 
> string -- avoiding prepared statements… An alternative might be to re-write 
> the query and hope that the planner's general plan is a bit closer to 
> optimal… but are these my only options?  
> 
> I notice that the non-prepared-statement (both below my sig) plan estimates 
> 5500 rows output.  I think that's out by a factor of up to 100, suggesting 
> that I might want to increase my statistics and re-analyse… but as I 
> understand the prepared-statement problem, this probably won't help here.  
> Correct?
> 
> We've been worst hit by this query on an 8.3 site.  Another site is running 
> 8.4.  Have there been improvements in this area recently?  Upgrading to 9.0 
> might be viable for us.
> 
> Any tips would be appreciated,
> 
> --Royce
> 
> 
> test=# PREPARE test (integer) as 
>   select 
> sid, 
> role, 
> starttime::date, 
> nasid, importer, 
> max(eventbinding.biid) as biid, 
> sum(bytesin) as bytesin, 
> sum(bytesout) as bytesout,  
> sum(seconds) as seconds, 
> sum(coalesce(pages, 0)) as pages, 
> sum(coalesce(count, 0)) as count,  
> sum(coalesce(rate, 0.0)) as rate, 
> sum(coalesce(bytesSentRate, 0.0)) as bytesSentRate,  
> sum(coalesce(bytesReceivedRate, 0.0)) as bytesReceivedRate, 
> count(*) as entries  
>   from billingItem, eventBinding , fqun  
>   where eventBinding.biid > $1 and eventBinding.biid = billingItem.biid and 
> fqun.uid = eventBinding.uid  
>   group by sid, starttime::date, nasid, importer, role;
> PREPARE
> test=# explain EXECUTE test(57205899);
>   QUERY PLAN  
> 
> ---
>  GroupAggregate  (cost=12338998.78..13770049.38 rows=18465169 width=148)
>->  Sort  (cost=12338998.78..12385161.70 rows=18465169 width=148)
>  Sort Key: fqun.sid, ((billingitem.starttime)::date), 
> billingitem.nasid, billingitem.importer, eventbinding.role
>  ->  Hash Join  (cost=1498473.48..7333418.55 rows=18465169 width=148)
>Hash Cond: (eventbinding.uid = fqun.uid)
>->  Hash Join  (cost=1496916.06..6916394.83 rows=18465169 
> width=148)
>  Hash Cond: (billingitem.biid = eventbinding.biid)
>  ->  Seq Scan on billingitem  (cost=0.00..1433087.88 
> rows=56222688 width=142)
>  ->  Hash  (cost=1175939.45..1175939.45 rows=18465169 
> width=10)
>->  Bitmap Heap Scan on eventbinding  
> (cost=427409.84..1175939.45 rows=18465169 width=10)
>  Recheck Cond: (biid > $1)
>  ->  Bitmap Index Scan on 
> eventbinding_biid_uid_role_idx  (cost=0.00..422793.55 rows=18465169 width=0)
>Index Cond: (biid > $1)
>->  Hash  (cost=943.85..943.85 rows=49085 width=8)
>  ->  Seq Scan on fqun  (cost=0.00..943.85 rows=49085 
> width=8)
> (15 rows)
> 
> 
> 
> 
> As a query on the psql command line:
> 
> test=# explain 
>   select 
> sid, 
> role, 
> starttime::date, 
> nasid, 
> importer, 
> max(eventbinding.biid) as biid, 
> sum(bytesin) as bytesin, 
> sum(bytesout) as bytesout,  
> sum(seconds) as seconds, 
> sum(coalesce(pages, 0)) as pages, 
> sum(coalesce(count, 0)) as count,  
> sum(coalesce(rate, 0.0)) as rate, 
> sum(coalesce(bytesSentRate, 0.0)) as bytesSentRate,  
> sum(coalesce(bytesReceivedRate, 0.0)) as bytesReceivedRate, 
> count(*) as entries  
>   from billingItem, eventBinding , fqun  
>   where eventBinding.biid > 57205899 and eventBinding.b

[PERFORM] Prepared statements and suboptimal plans

2011-09-21 Thread Royce Ausburn
Hi all,

It looks like I've been hit with this well known issue.  I have a complicated 
query that is intended to run every few minutes, I'm using JDBC's 
Connection.prepareStatement() mostly for nice parameterisation, but postgres 
produces a suboptimal plan due to its lack of information when the statement is 
prepared.

I've been following the mailing list for a few years and I've seen this topic 
come up a bit.  I've just done a quick google and I'm not quite sure how to fix 
this short of manually substituting my query parameters in to a query string -- 
avoiding prepared statements… An alternative might be to re-write the query and 
hope that the planner's general plan is a bit closer to optimal… but are these 
my only options?  

I notice that the non-prepared-statement (both below my sig) plan estimates 
5500 rows output.  I think that's out by a factor of up to 100, suggesting that 
I might want to increase my statistics and re-analyse… but as I understand the 
prepared-statement problem, this probably won't help here.  Correct?

We've been worst hit by this query on an 8.3 site.  Another site is running 
8.4.  Have there been improvements in this area recently?  Upgrading to 9.0 
might be viable for us.

Any tips would be appreciated,

--Royce


test=# PREPARE test (integer) as 
  select 
sid, 
role, 
starttime::date, 
nasid, importer, 
max(eventbinding.biid) as biid, 
sum(bytesin) as bytesin, 
sum(bytesout) as bytesout,  
sum(seconds) as seconds, 
sum(coalesce(pages, 0)) as pages, 
sum(coalesce(count, 0)) as count,  
sum(coalesce(rate, 0.0)) as rate, 
sum(coalesce(bytesSentRate, 0.0)) as bytesSentRate,  
sum(coalesce(bytesReceivedRate, 0.0)) as bytesReceivedRate, 
count(*) as entries  
  from billingItem, eventBinding , fqun  
  where eventBinding.biid > $1 and eventBinding.biid = billingItem.biid and 
fqun.uid = eventBinding.uid  
  group by sid, starttime::date, nasid, importer, role;
PREPARE
test=# explain EXECUTE test(57205899);
  QUERY PLAN
  
---
 GroupAggregate  (cost=12338998.78..13770049.38 rows=18465169 width=148)
   ->  Sort  (cost=12338998.78..12385161.70 rows=18465169 width=148)
 Sort Key: fqun.sid, ((billingitem.starttime)::date), 
billingitem.nasid, billingitem.importer, eventbinding.role
 ->  Hash Join  (cost=1498473.48..7333418.55 rows=18465169 width=148)
   Hash Cond: (eventbinding.uid = fqun.uid)
   ->  Hash Join  (cost=1496916.06..6916394.83 rows=18465169 
width=148)
 Hash Cond: (billingitem.biid = eventbinding.biid)
 ->  Seq Scan on billingitem  (cost=0.00..1433087.88 
rows=56222688 width=142)
 ->  Hash  (cost=1175939.45..1175939.45 rows=18465169 
width=10)
   ->  Bitmap Heap Scan on eventbinding  
(cost=427409.84..1175939.45 rows=18465169 width=10)
 Recheck Cond: (biid > $1)
 ->  Bitmap Index Scan on 
eventbinding_biid_uid_role_idx  (cost=0.00..422793.55 rows=18465169 width=0)
   Index Cond: (biid > $1)
   ->  Hash  (cost=943.85..943.85 rows=49085 width=8)
 ->  Seq Scan on fqun  (cost=0.00..943.85 rows=49085 
width=8)
(15 rows)




As a query on the psql command line:

test=# explain 
  select 
sid, 
role, 
starttime::date, 
nasid, 
importer, 
max(eventbinding.biid) as biid, 
sum(bytesin) as bytesin, 
sum(bytesout) as bytesout,  
sum(seconds) as seconds, 
sum(coalesce(pages, 0)) as pages, 
sum(coalesce(count, 0)) as count,  
sum(coalesce(rate, 0.0)) as rate, 
sum(coalesce(bytesSentRate, 0.0)) as bytesSentRate,  
sum(coalesce(bytesReceivedRate, 0.0)) as bytesReceivedRate, 
count(*) as entries  
  from billingItem, eventBinding , fqun  
  where eventBinding.biid > 57205899 and eventBinding.biid = billingItem.biid 
and fqun.uid = eventBinding.uid  
  group by sid, starttime::date, nasid, importer, role;
 QUERY PLAN 


 HashAggregate  (cost=102496.80..102704.55 rows=5540 width=148)
   ->  Hash Join  (cost=1697.13..102289.05 rows=5540 width=148)
 Hash Cond: (eventbinding.uid = fqun.uid)
 ->  Nested Loop  (cost=139.71..100606.99 rows=5540 width=148)
   ->  Bitmap Heap Scan on eventbinding  (cost=139.71..20547.20 
rows=5540 width=10)
 Recheck Cond: (biid > 57205899)
 ->  Bitmap Index Sc

Re: [PERFORM] Prepared statements and suboptimal plans

2011-09-20 Thread Royce Ausburn

On 21/09/2011, at 9:39 AM, Craig Ringer wrote:

> On 21/09/2011 7:27 AM, Royce Ausburn wrote:
>> Hi all,
>> 
>> It looks like I've been hit with this well known issue.  I have a 
>> complicated query that is intended to run every few minutes, I'm using 
>> JDBC's Connection.prepareStatement() mostly for nice parameterisation, but 
>> postgres produces a suboptimal plan due to its lack of information when the 
>> statement is prepared.
>> 
>> [snip]
>> 
>> We've been worst hit by this query on an 8.3 site.  Another site is running 
>> 8.4.  Have there been improvements in this area recently?  Upgrading to 9.0 
>> might be viable for us.
> 
> Tom just mentioned that 9.1 will be able to re-plan parameterized prepared 
> statements, so this issue will go away. In the mean time you can only really 
> use the standard workaround of setting the prepare theshold to 0 to disable 
> server-side prepare, so you can continue to use JDBC prepared statements and 
> have the driver do the parameter substitution for you.

Thanks Craig -- that trick helps a lot.  

--Royce
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Prepared statements and suboptimal plans

2011-09-20 Thread Royce Ausburn
Hi all,

It looks like I've been hit with this well known issue.  I have a complicated 
query that is intended to run every few minutes, I'm using JDBC's 
Connection.prepareStatement() mostly for nice parameterisation, but postgres 
produces a suboptimal plan due to its lack of information when the statement is 
prepared.

I've been following the mailing list for a few years and I've seen this topic 
come up a bit.  I've just done a quick google and I'm not quite sure how to fix 
this short of manually substituting my query parameters in to a query string -- 
avoiding prepared statements… An alternative might be to re-write the query and 
hope that the planner's general plan is a bit closer to optimal… but are these 
my only options?  

I notice that the non-prepared-statement (both below my sig) plan estimates 
5500 rows output.  I think that's out by a factor of up to 100, suggesting that 
I might want to increase my statistics and re-analyse… but as I understand the 
prepared-statement problem, this probably won't help here.  Correct?

We've been worst hit by this query on an 8.3 site.  Another site is running 
8.4.  Have there been improvements in this area recently?  Upgrading to 9.0 
might be viable for us.

Any tips would be appreciated,

--Royce


test=# PREPARE test (integer) as 
  select 
sid, 
role, 
starttime::date, 
nasid, importer, 
max(eventbinding.biid) as biid, 
sum(bytesin) as bytesin, 
sum(bytesout) as bytesout,  
sum(seconds) as seconds, 
sum(coalesce(pages, 0)) as pages, 
sum(coalesce(count, 0)) as count,  
sum(coalesce(rate, 0.0)) as rate, 
sum(coalesce(bytesSentRate, 0.0)) as bytesSentRate,  
sum(coalesce(bytesReceivedRate, 0.0)) as bytesReceivedRate, 
count(*) as entries  
  from billingItem, eventBinding , fqun  
  where eventBinding.biid > $1 and eventBinding.biid = billingItem.biid and 
fqun.uid = eventBinding.uid  
  group by sid, starttime::date, nasid, importer, role;
PREPARE
test=# explain EXECUTE test(57205899);
  QUERY PLAN
  
---
 GroupAggregate  (cost=12338998.78..13770049.38 rows=18465169 width=148)
   ->  Sort  (cost=12338998.78..12385161.70 rows=18465169 width=148)
 Sort Key: fqun.sid, ((billingitem.starttime)::date), 
billingitem.nasid, billingitem.importer, eventbinding.role
 ->  Hash Join  (cost=1498473.48..7333418.55 rows=18465169 width=148)
   Hash Cond: (eventbinding.uid = fqun.uid)
   ->  Hash Join  (cost=1496916.06..6916394.83 rows=18465169 
width=148)
 Hash Cond: (billingitem.biid = eventbinding.biid)
 ->  Seq Scan on billingitem  (cost=0.00..1433087.88 
rows=56222688 width=142)
 ->  Hash  (cost=1175939.45..1175939.45 rows=18465169 
width=10)
   ->  Bitmap Heap Scan on eventbinding  
(cost=427409.84..1175939.45 rows=18465169 width=10)
 Recheck Cond: (biid > $1)
 ->  Bitmap Index Scan on 
eventbinding_biid_uid_role_idx  (cost=0.00..422793.55 rows=18465169 width=0)
   Index Cond: (biid > $1)
   ->  Hash  (cost=943.85..943.85 rows=49085 width=8)
 ->  Seq Scan on fqun  (cost=0.00..943.85 rows=49085 
width=8)
(15 rows)




As a query on the psql command line:

test=# explain 
  select 
sid, 
role, 
starttime::date, 
nasid, 
importer, 
max(eventbinding.biid) as biid, 
sum(bytesin) as bytesin, 
sum(bytesout) as bytesout,  
sum(seconds) as seconds, 
sum(coalesce(pages, 0)) as pages, 
sum(coalesce(count, 0)) as count,  
sum(coalesce(rate, 0.0)) as rate, 
sum(coalesce(bytesSentRate, 0.0)) as bytesSentRate,  
sum(coalesce(bytesReceivedRate, 0.0)) as bytesReceivedRate, 
count(*) as entries  
  from billingItem, eventBinding , fqun  
  where eventBinding.biid > 57205899 and eventBinding.biid = billingItem.biid 
and fqun.uid = eventBinding.uid  
  group by sid, starttime::date, nasid, importer, role;
 QUERY PLAN 


 HashAggregate  (cost=102496.80..102704.55 rows=5540 width=148)
   ->  Hash Join  (cost=1697.13..102289.05 rows=5540 width=148)
 Hash Cond: (eventbinding.uid = fqun.uid)
 ->  Nested Loop  (cost=139.71..100606.99 rows=5540 width=148)
   ->  Bitmap Heap Scan on eventbinding  (cost=139.71..20547.20 
rows=5540 width=10)
 Recheck Cond: (biid > 57205899)
 ->  Bitmap Index Sc

Re: [PERFORM] Which RAID Controllers to pick/avoid?

2011-02-06 Thread Royce Ausburn

> On Wed, Feb 2, 2011 at 7:00 PM, Craig Ringer
>  wrote:
>> Whatever RAID controller you get, make sure you have a battery backup
>> unit (BBU) installed so you can safely enable write-back caching.
>> Without that, you might as well use software RAID - it'll generally be
>> faster (and cheaper) than HW RAID w/o a BBU.
> 
> Recently we had to pull our RAID controllers and go to plain SAS
> cards.  While random access dropped a bit, sequential throughput
> skyrocketed, saturating the 4 lane cable we use.4x300Gb/s =
> 1200Gb/s or right around 1G of data a second off the array.  VERY
> impressive.


This is really surprising.  Software raid generally outperform hardware raid 
without BBU?  Why is that?  My company uses hardware raid quite a bit without 
BBU and have never thought to compare with software raid =/

Thanks!

--Royce
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Auto-clustering?

2010-12-18 Thread Royce Ausburn

On 17/12/2010, at 9:20 PM, Pierre C wrote:

> 
>> fc=# explain  analyse select collection, period, tariff, sum(bytesSent), 
>> sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600 as 
>> startchunk from sample_20101001 where starttime between 1287493200 and 
>> 1290171599  and collection=128and ip = '10.9.125.207' group by 
>> startchunk, tariff, collection, period;
> 
> If CLUSTER locks bother you, and you don't do UPDATEs, you might consider 
> doing something like this :
> 
> - accumulate the rows in a "recent" table
> - every hour, INSERT INTO archive SELECT * FROM recent ORDER BY (your cluster 
> fields)
> - DELETE FROM recent the rows you just inserted
> - VACUUM recent
> 
> The cluster in your archive table will not be perfect but at least all rows 
> from 1 source in 1 hour will be stored close together. But clustering doesn't 
> need to be perfect either, if you get 100x better locality, that's already 
> good !

That's a really decent idea and can slot in perfectly well with how the 
application already works!  We have existing DBAO code that handles monthly 
tables; it'll happily pop data in to a recent table In fact we can probably 
tolerate having a "today" table.  Thanks!

--Royce



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Auto-clustering?

2010-12-18 Thread Royce Ausburn

On 17/12/2010, at 8:27 PM, Filip Rembiałkowski wrote:

> 
> 2010/12/17 Royce Ausburn 
> Hi all,
> 
> I have a table that in the typical case holds two minute sample data for a 
> few thousand sources.  Often we need to report on these data for a particular 
> source over a particular time period and we're finding this query tends to 
> get a bit slow.
> 
> 
> how about (auto)vacuuming?

A key piece of information I left out: we almost never update rows in this 
table.

>  
> 
> I figure at most there should only be ~20,000 rows to be read from disk, and 
> I expect that the index is doing a pretty good job of making sure only the 
> rows that need reading are read. inclusion of the ip in the query is almost 
> redundant as most of the time an ip has its own collection  My suspicion 
> is that the rows that we're interested in are very sparsely distributed on 
> disk, so we're having to read too many pages for the query...
> 
> 
> you can test this suspicion in very simple way:
> - create test table (like yours including indexes including constraints, but 
> with no data)
> - insert into test select * from yours order by 
> - analyze test tablee available 
> - test the query on the new table
> 
> If new query is much faster, and if you have intensive random UPD/DEL/INS 
> activity, periodic CLUSTER could be a good idea... 
> but it depends on actual usage patterns (SELECT/modify ratio, types of 
> updates, and so on).

Good idea!  This vastly improves query times.
> 
> 
> and finally, you did not specify what PostgreSQL version are you using.

In the case I've been working with it's 8.1 =(  But we have a few instances of 
this database... I believe the rest are a mixture of 8.4s and they all have the 
same problem.

--Royce

[PERFORM] Auto-clustering?

2010-12-17 Thread Royce Ausburn
Hi all,

I have a table that in the typical case holds two minute sample data for a few 
thousand sources.  Often we need to report on these data for a particular 
source over a particular time period and we're finding this query tends to get 
a bit slow.

The structure of the table:

 Table "public.sample"
  Column   |   Type   |Modifiers

---+--+-
 client| integer  | not null
 aggregateid   | bigint   | not null
 sample| bigint   | not null default 
nextval('samplekey'::regclass)
 customer  | integer  | 
 period| integer  | not null
 starttime | integer  | not null
 duration  | integer  | not null
 ip| text | 
 tariff| integer  | 
 bytessentrate | bigint   | 
 bytessent | bigint   | 
 bytesreceived | bigint   | 
 packets   | integer  | not null
 queuetype | integer  | not null default 0
 collection| integer  | 
 bytesreceivedrate | bigint   | 
 greatestrate  | bigint   | 
 invalidated   | timestamp with time zone | 
Indexes:
"sample_pkey" PRIMARY KEY, btree (sample)
"sample_collection_starttime_idx" btree (collection, starttime)
"sample_customer_starttime_idx" btree (customer, starttime)
"sample_sample_idx" btree (client, sample)
Foreign-key constraints:
"sample_client_fkey" FOREIGN KEY (client) REFERENCES client(client)


fc=# explain  analyse select collection, period, tariff, sum(bytesSent), 
sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600 as 
startchunk from sample_20101001 where starttime between 1287493200 and 
1290171599  and collection=128and ip = '10.9.125.207' group by startchunk, 
tariff, collection, period; 
QUERY PLAN  

-
 HashAggregate  (cost=34959.01..34959.03 rows=1 width=44) (actual 
time=67047.850..67047.850 rows=0 loops=1)
   ->  Bitmap Heap Scan on sample_20101001  (cost=130.56..34958.91 rows=5 
width=44) (actual time=67047.847..67047.847 rows=0 loops=1)
 Recheck Cond: ((collection = 128) AND (starttime >= 1287493200) AND 
(starttime <= 1290171599))
 Filter: (ip = '10.9.125.207'::text)
 ->  Bitmap Index Scan on sample_20101001_collection_starttime_idx  
(cost=0.00..130.56 rows=9596 width=0) (actual time=9806.115..9806.115 rows=6830 
loops=1)
   Index Cond: ((collection = 128) AND (starttime >= 1287493200) 
AND (starttime <= 1290171599))
 Total runtime: 67048.201 ms
(7 rows)


I figure at most there should only be ~20,000 rows to be read from disk, and I 
expect that the index is doing a pretty good job of making sure only the rows 
that need reading are read. inclusion of the ip in the query is almost 
redundant as most of the time an ip has its own collection  My suspicion is 
that the rows that we're interested in are very sparsely distributed on disk, 
so we're having to read too many pages for the query...

All of the queries on this table are reporting on a single collection, so 
ideally a collection's data would all be stored in the same part of the disk... 
or at least clumped together.  This can be achieved using "cluster", however as 
far as I know there's no automated, non-cronesque means of clustering and 
having the table become unusable during the cluster is not ideal.  

I've considered partitioning, but I don't think that's going to give the effect 
I need.  Apparently clustering is only going to scale to a few dozen child 
tables, so that's only going to give one order of magnitude performance for 
significant complexity.

Are there any other options?

Cheers!

--Royce



Re: [PERFORM] CPU bound

2010-12-14 Thread Royce Ausburn
Thanks guys - interesting.  


On 14/12/2010, at 5:59 AM, Josh Berkus wrote:

> On 12/12/10 6:43 PM, Royce Ausburn wrote:
>> Hi all,
>> 
>> I notice that when restoring a DB on a laptop with an SDD, typically 
>> postgres is maxing out a CPU - even during a COPY.  I wonder, what is 
>> postgres usually doing with the CPU?  I would have thought the disk would 
>> usually be the bottleneck in the DB, but occasionally it's not.  We're 
>> embarking on a new DB server project and it'd be helpful to understand where 
>> the CPU is likely to be the bottleneck.
> 
> That's pretty normal; as soon as you get decent disk, especially
> something like an SSD with a RAM cache, you become CPU-bound.  COPY does
> a LOT of parsing and data manipulation.  Index building, of course, is
> almost pure CPU if you have a decent amount of RAM available.
> 
> If you're restoring from a pg_dump file, and have several cores
> available, I suggest using parallel pg_restore.
> 
> 
> -- 
>  -- Josh Berkus
> PostgreSQL Experts Inc.
> http://www.pgexperts.com
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] CPU bound

2010-12-12 Thread Royce Ausburn
Hi all,

I notice that when restoring a DB on a laptop with an SDD, typically postgres 
is maxing out a CPU - even during a COPY.  I wonder, what is postgres usually 
doing with the CPU?  I would have thought the disk would usually be the 
bottleneck in the DB, but occasionally it's not.  We're embarking on a new DB 
server project and it'd be helpful to understand where the CPU is likely to be 
the bottleneck.

Cheers,

--Royce


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance