Re: [PERFORM] Why creating GIN table index is so slow than inserting data into empty table with the same index?

2009-03-24 Thread Heikki Linnakangas

Tom Lane wrote:

Sergey Burladyan eshkin...@gmail.com writes:

show maintenance_work_mem ;
 maintenance_work_mem
--
 128MB



create table a (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int);
insert into a select n, n, n, n, n, n from generate_series(1, 10) as n;
create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) );


[ takes forever ]

Seems the reason this is so awful is that the incoming data is exactly
presorted, meaning that the tree structure that ginInsertEntry() is
trying to build degenerates to a linear list (ie, every new element
becomes the right child of the prior one).  So the processing becomes
O(N^2) up till you reach maintenance_work_mem and flush the tree.  With
a large setting for maintenance_work_mem it gets spectacularly slow.


Yes, this is probably the same issue I bumped into a while ago:

http://archives.postgresql.org/message-id/49350a13.3020...@enterprisedb.com


I think a reasonable solution for this might be to keep an eye on
maxdepth and force a flush if that gets too large (more than a few
hundred, perhaps?).  Something like this:

/* If we've maxed out our available memory, dump everything to the index */
+   /* Also dump if the tree seems to be getting too unbalanced */
-   if (buildstate-accum.allocatedMemory = maintenance_work_mem * 1024L)
+   if (buildstate-accum.allocatedMemory = maintenance_work_mem * 1024L ||
+   buildstate-accum.maxdepth  DEPTH_LIMIT)
{

The new fast-insert code likely will need a similar defense.


I fooled around with a balanced tree, which solved the problem but 
unfortunately made the unsorted case slower. Limiting the depth like 
that should avoid that so it's worth trying.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-24 Thread Joe Uhl


On Mar 20, 2009, at 4:58 PM, Scott Marlowe wrote:


On Fri, Mar 20, 2009 at 2:49 PM, Joe Uhl joe...@gmail.com wrote:


On Mar 20, 2009, at 4:29 PM, Scott Marlowe wrote:



What does the cs entry on vmstat say at this time?  If you're cs is
skyrocketing then you're getting a context switch storm, which is
usually a sign that there are just too many things going on at  
once /

you've got an old kernel things like that.


cs column (plus cpu columns) of vmtstat 1 30 reads as follows:

csus  sy id wa
11172 95  4  1  0
12498 94  5  1  0
14121 91  7  1  1
11310 90  7  1  1
12918 92  6  1  1
10613 93  6  1  1
9382  94  4  1  1
14023 89  8  2  1
10138 92  6  1  1
11932 94  4  1  1
15948 93  5  2  1
12919 92  5  3  1
10879 93  4  2  1
14014 94  5  1  1
9083  92  6  2  0
11178 94  4  2  0
10717 94  5  1  0
9279  97  2  1  0
12673 94  5  1  0
8058  82 17  1  1
8150  94  5  1  1
11334 93  6  0  0
13884 91  8  1  0
10159 92  7  0  0
9382  96  4  0  0
11450 95  4  1  0
11947 96  3  1  0
8616  95  4  1  0
10717 95  3  1  0

We are running on 2.6.28.7-2 kernel.  I am unfamiliar with vmstat  
output but
reading the man page (and that cs = context switches per second)  
makes my

numbers seem very high.


No, those aren't really all that high.  If you were hitting cs
contention, I'd expect it to be in the 25k to 100k range.  10k
average under load is pretty reasonable.

Our sum JDBC pools currently top out at 400 connections (and we are  
doing

work on all 400 right now).  I may try dropping those pools down even
smaller. Are there any general rules of thumb for figuring out how  
many

connections you should service at maximum?  I know of the memory
constraints, but thinking more along the lines of connections per  
CPU core.


Well, maximum efficiency is usually somewhere in the range of 1 to 2
times the number of cores you have, so trying to get the pool down to
a dozen or two connections would be the direction to generally head.
May not be reasonable or doable though.


Turns out we may have an opportunity to purchase a new database server  
with this increased load.  Seems that the best route, based on  
feedback to this thread, is to go whitebox, get quad opterons, and get  
a very good disk controller.


Can anyone recommend a whitebox vendor?

Is there a current controller anyone on this list has experience with  
that they could recommend?


This will be a bigger purchase so will be doing research and  
benchmarking but any general pointers to a vendor/controller greatly  
appreciated.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-24 Thread Greg Smith

On Tue, 24 Mar 2009, Joe Uhl wrote:


Can anyone recommend a whitebox vendor?


I dumped a list of recommended vendors from a discussion here a while back 
at http://wiki.postgresql.org/wiki/SCSI_vs._IDE/SATA_Disks you could get 
started with.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-24 Thread Scott Marlowe
On Tue, Mar 24, 2009 at 1:29 PM, Greg Smith gsm...@gregsmith.com wrote:
 On Tue, 24 Mar 2009, Joe Uhl wrote:

 Can anyone recommend a whitebox vendor?

 I dumped a list of recommended vendors from a discussion here a while back
 at http://wiki.postgresql.org/wiki/SCSI_vs._IDE/SATA_Disks you could get
 started with.

I'd add Aberdeen Inc to that list.  They supply quality white box
servers with 3ware, areca, or LSI controllers, and provide a 5 year
all inclusive warranty.  Their customer service is top notch too.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help Me Understand Why I'm Getting a Bad Query Plan

2009-03-24 Thread Bryan Murphy
I hate to nag, but could anybody help me with this?  We have a few
related queries that are causing noticeable service delays in our
production system.  I've tried a number of different things, but I'm
running out of ideas and don't know what to do next.

Thanks,
Bryan

On Mon, Mar 23, 2009 at 2:03 PM, Bryan Murphy bmurphy1...@gmail.com wrote:
 Hey Guys,

 I've got a query on our production system that isn't choosing a good
 plan.  I can't see why it's choosing to do a sequential scan on the
 ItemExperienceLog table.  That table is about 800mb and has about 2.5
 million records.  This example query only returns 4 records.  I've
 tried upping the statics for ItemExperienceLog.VistorId and
 ItemExperienceLog.ItemId to 1000 (from out default of 100) with no
 success.

 Our primary keys are guids stored as char(32) not null.  Our database
 uses UTF-8 encoding and is currently version v8.3.5.

 The queries:



 --SET enable_seqscan = off
 --SET enable_seqscan = on

 --ALTER TABLE ItemExperienceLog ALTER COLUMN VisitorId SET STATISTICS 1000
 --ALTER TABLE ItemExperienceLog ALTER COLUMN ItemId SET STATISTICS 1000
 --ANALYZE ItemExperienceLog

 SELECT MAX(l.Id) as Id, l.ItemId
 FROM ItemExperienceLog l
 INNER JOIN Items_Primary p ON p.Id = l.ItemId
 INNER JOIN Feeds f ON f.Id = p.FeedId
 INNER JOIN Visitors v ON v.Id = l.VisitorId
 WHERE
        v.UserId = 'fbe2537f21d94f519605612c0bf7c2c5'
        AND LOWER(f.Slug) = LOWER('Wealth_Building_by_NightingaleConant')
 GROUP BY l.ItemId



 Explain verbose output (set enable_seqscan = on):



 HashAggregate  (cost=124392.54..124392.65 rows=9 width=37) (actual
 time=7765.650..7765.654 rows=4 loops=1)
  -  Nested Loop  (cost=2417.68..124392.49 rows=9 width=37) (actual
 time=1706.703..7765.611 rows=11 loops=1)
        -  Nested Loop  (cost=2417.68..123868.75 rows=1807 width=70)
 (actual time=36.374..7706.677 rows=3174 loops=1)
              -  Hash Join  (cost=2417.68..119679.50 rows=1807
 width=37) (actual time=36.319..7602.221 rows=3174 loops=1)
                    Hash Cond: (l.visitorid = v.id)
                    -  Seq Scan on itemexperiencelog l
 (cost=0.00..107563.09 rows=2581509 width=70) (actual
 time=0.010..4191.251 rows=2579880 loops=1)
                    -  Hash  (cost=2401.43..2401.43 rows=1300
 width=33) (actual time=3.673..3.673 rows=897 loops=1)
                          -  Bitmap Heap Scan on visitors v
 (cost=22.48..2401.43 rows=1300 width=33) (actual time=0.448..2.523
 rows=897 loops=1)
                                Recheck Cond: (userid =
 'fbe2537f21d94f519605612c0bf7c2c5'::bpchar)
                                -  Bitmap Index Scan on
 visitors_userid_index2  (cost=0.00..22.16 rows=1300 width=0) (actual
 time=0.322..0.322 rows=897 loops=1)
                                      Index Cond: (userid =
 'fbe2537f21d94f519605612c0bf7c2c5'::bpchar)
              -  Index Scan using items_primary_pkey on items_primary
 p  (cost=0.00..2.31 rows=1 width=66) (actual time=0.027..0.029 rows=1
 loops=3174)
                    Index Cond: (p.id = l.itemid)
        -  Index Scan using feeds_pkey on feeds f  (cost=0.00..0.28
 rows=1 width=33) (actual time=0.016..0.016 rows=0 loops=3174)
              Index Cond: (f.id = p.feedid)
              Filter: (lower((f.slug)::text) =
 'wealth_building_by_nightingaleconant'::text)
 Total runtime: 7765.767 ms



 Explain verbose output (set enable_seqscan = off):



 HashAggregate  (cost=185274.71..185274.82 rows=9 width=37) (actual
 time=185.024..185.028 rows=4 loops=1)
  -  Nested Loop  (cost=0.00..185274.67 rows=9 width=37) (actual
 time=0.252..184.989 rows=11 loops=1)
        -  Nested Loop  (cost=0.00..184751.21 rows=1806 width=70)
 (actual time=0.223..134.943 rows=3174 loops=1)
              -  Nested Loop  (cost=0.00..180564.28 rows=1806
 width=37) (actual time=0.192..60.214 rows=3174 loops=1)
                    -  Index Scan using visitors_userid_index2 on
 visitors v  (cost=0.00..2580.97 rows=1300 width=33) (actual
 time=0.052..2.342 rows=897 loops=1)
                          Index Cond: (userid =
 'fbe2537f21d94f519605612c0bf7c2c5'::bpchar)
                    -  Index Scan using
 itemexperiencelog__index__visitorid on itemexperiencelog l
 (cost=0.00..134.04 rows=230 width=70) (actual time=0.013..0.040 rows=4
 loops=897)
                          Index Cond: (l.visitorid = v.id)
              -  Index Scan using items_primary_pkey on items_primary
 p  (cost=0.00..2.31 rows=1 width=66) (actual time=0.019..0.020 rows=1
 loops=3174)
                    Index Cond: (p.id = l.itemid)
        -  Index Scan using feeds_pkey on feeds f  (cost=0.00..0.28
 rows=1 width=33) (actual time=0.014..0.014 rows=0 loops=3174)
              Index Cond: (f.id = p.feedid)
              Filter: (lower((f.slug)::text) =
 'wealth_building_by_nightingaleconant'::text)
 Total runtime: 185.117 ms



 The relevent portions of postgresql.conf:


 max_connections = 100
 shared_buffers = 2GB
 temp_buffers = 32MB
 work_mem = 64MB
 

Re: [PERFORM] High CPU Utilization

2009-03-24 Thread Ron

At 02:47 PM 3/24/2009, Joe Uhl wrote:

Turns out we may have an opportunity to purchase a new database 
server with this increased load.  Seems that the best route, based 
on feedback to this thread, is to go whitebox, get quad opterons, 
and get a very good disk controller.


Can anyone recommend a whitebox vendor?

I'll 2nd the Aberdeen recommendation.  I'll add Pogolinux to that list as well.


Is there a current controller anyone on this list has experience 
with that they could recommend?
The 2 best performing RAID controller vendors at this time are AMCC 
(AKA 3Ware) and Areca.
In general, the 8+ port Areca's with their BB cache maxed outperform 
every other controller available.



This will be a bigger purchase so will be doing research and 
benchmarking but any general pointers to a vendor/controller greatly 
appreciated.


Be =very= careful to thoroughly bench both the AMD and Intel CPU 
options.  It is far from clear which is the better purchase.


I'd be very interested to see the results of your research and 
benchmarks posted here on pgsql-performance.


Ron Peacetree 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-24 Thread Scott Marlowe
On Tue, Mar 24, 2009 at 4:58 PM, Ron rjpe...@earthlink.net wrote:
 At 02:47 PM 3/24/2009, Joe Uhl wrote:

 Turns out we may have an opportunity to purchase a new database server
 with this increased load.  Seems that the best route, based on feedback to
 this thread, is to go whitebox, get quad opterons, and get a very good disk
 controller.

 Can anyone recommend a whitebox vendor?

 I'll 2nd the Aberdeen recommendation.  I'll add Pogolinux to that list as
 well.


 Is there a current controller anyone on this list has experience with that
 they could recommend?

 The 2 best performing RAID controller vendors at this time are AMCC (AKA
 3Ware) and Areca.
 In general, the 8+ port Areca's with their BB cache maxed outperform every
 other controller available.


 This will be a bigger purchase so will be doing research and benchmarking
 but any general pointers to a vendor/controller greatly appreciated.

 Be =very= careful to thoroughly bench both the AMD and Intel CPU options.
  It is far from clear which is the better purchase.

My anecdotal experience has been that the Opterons stay afloat longer
as load increases, but I haven't had machines with similar enough
hardware to really test that.

 I'd be very interested to see the results of your research and benchmarks
 posted here on pgsql-performance.

Me too.  I'm gonna spend some time this summer benchmarking and tuning
the database servers that I pretty much had to burn in and put in
production this year due to time pressures.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-24 Thread Scott Carey

On 3/24/09 4:16 PM, Scott Marlowe scott.marl...@gmail.com wrote:

 On Tue, Mar 24, 2009 at 4:58 PM, Ron rjpe...@earthlink.net wrote:
 At 02:47 PM 3/24/2009, Joe Uhl wrote:
 
 Turns out we may have an opportunity to purchase a new database server
 with this increased load.  Seems that the best route, based on feedback to
 this thread, is to go whitebox, get quad opterons, and get a very good disk
 controller.
 
 Can anyone recommend a whitebox vendor?
 
 I'll 2nd the Aberdeen recommendation.  I'll add Pogolinux to that list as
 well.
 
 
 Is there a current controller anyone on this list has experience with that
 they could recommend?
 
 The 2 best performing RAID controller vendors at this time are AMCC (AKA
 3Ware) and Areca.
 In general, the 8+ port Areca's with their BB cache maxed outperform every
 other controller available.

I personally have had rather bad performance experiences with 3Ware
9550/9650 SATA cards.  I have no experience with the AMCC SAS stuff though.
Adaptec demolished the 9650 on arrays larger than 4 drives, and Areca will
do better at the very high end.

However, if CPU is the issue for this particular case, then the RAID
controller details are less significant.

I don't know how much data you have, but don't forget the option of SSDs, or
a mix of hard drives and SSDs for different data.  Ideally, you would want
the OS to just extend its pagecache onto a SSD, but only OpenSolaris can do
that right now and it is rather new (needs to be persistent across reboots).

http://blogs.sun.com/brendan/entry/test
http://blogs.sun.com/brendan/entry/l2arc_screenshots
 

 
 
 This will be a bigger purchase so will be doing research and benchmarking
 but any general pointers to a vendor/controller greatly appreciated.
 
 Be =very= careful to thoroughly bench both the AMD and Intel CPU options.
  It is far from clear which is the better purchase.
 
 My anecdotal experience has been that the Opterons stay afloat longer
 as load increases, but I haven't had machines with similar enough
 hardware to really test that.
 

One may want to note that Intel's next generation servers are due out within
45 days from what I can sense ('Q2' traditionally means ~April 1 for Intel
when on time).  These should be a rather significant bump for a database as
they adopt the AMD / Alpha style memory-controller-on-CPU architecture and
add a lot of cache.  Other relevant improvements:  increased performance on
compare-and-swap operations, the return of hyper threading, and ridiculous
memory bandwidth per CPU (3 DDR3 memory channels per CPU).

 I'd be very interested to see the results of your research and benchmarks
 posted here on pgsql-performance.
 
 Me too.  I'm gonna spend some time this summer benchmarking and tuning
 the database servers that I pretty much had to burn in and put in
 production this year due to time pressures.
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Raid 10 chunksize

2009-03-24 Thread Mark Kirkwood
I'm trying to pin down some performance issues with a machine where I 
work, we are seeing (read only) query response times blow out by an 
order of magnitude or more at busy times. Initially we blamed 
autovacuum, but after a  tweak of the cost_delay it is *not* the 
problem. Then I looked at checkpoints... and altho there was some 
correlation with them and the query response - I'm thinking that the 
raid chunksize may well be the issue.


Fortunately there is an identical DR box, so I could do a little 
testing. Details follow:


Sun 4140 2x quad-core opteron 2356 16G RAM,  6x 15K 140G SAS
Debian Lenny
Pg 8.3.6

The disk is laid out using software (md) raid:

4 drives raid 10 *4K* chunksize with database files (ext3 ordered, noatime)
2 drives raid 1 with database transaction logs (ext3 ordered, noatime)

The relevant non default .conf params are:

shared_buffers = 2048MB   
work_mem = 4MB   
maintenance_work_mem = 1024MB   
max_fsm_pages = 153600   
bgwriter_lru_maxpages = 200   
wal_buffers = 2MB   
checkpoint_segments = 32  
effective_cache_size = 4096MB
autovacuum_vacuum_scale_factor = 0.1   
autovacuum_vacuum_cost_delay = 60# This is high, but seemed to help...


I've run pgbench:

transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 24
number of transactions per client: 12000
number of transactions actually processed: 288000/288000
tps = 655.335102 (including connections establishing)
tps = 655.423232 (excluding connections establishing)


Looking at iostat while it is running shows (note sda-sdd raid10, sde 
and sdf raid 1):


Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.0056.800.00  579.00 0.00 2.47 
8.74   133.76  235.10   1.73 100.00
sdb   0.0045.600.00  583.60 0.00 2.45 
8.5952.65   90.03   1.71 100.00
sdc   0.0049.000.00  579.80 0.00 2.45 
8.6672.56  125.09   1.72 100.00
sdd   0.0058.400.00  565.00 0.00 2.42 
8.79   135.31  235.52   1.77 100.00
sde   0.00 0.000.000.00 0.00 0.00 
0.00 0.000.00   0.00   0.00
sdf   0.00 0.000.000.00 0.00 0.00 
0.00 0.000.00   0.00   0.00


Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.0012.800.00   23.40 0.00 0.15
12.85 3.04  103.38   4.27  10.00
sdb   0.0012.800.00   22.80 0.00 0.14
12.77 2.31   73.51   3.58   8.16
sdc   0.0012.800.00   21.40 0.00 0.13
12.86 2.38   79.21   3.63   7.76
sdd   0.0012.800.00   21.80 0.00 0.14
12.70 2.66   90.02   3.93   8.56
sde   0.00  2546.800.00  146.80 0.0010.53   
146.94 0.976.38   5.34  78.40
sdf   0.00  2546.800.00  146.60 0.0010.53   
147.05 0.976.38   5.53  81.04


Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00   231.400.00  566.80 0.00 3.16
11.41   124.92  228.26   1.76  99.52
sdb   0.00   223.000.00  558.00 0.00 3.06
11.2346.64   83.55   1.70  94.88
sdc   0.00   230.600.00  551.60 0.00 3.07
11.4094.38  171.54   1.76  96.96
sdd   0.00   231.400.00  528.60 0.00 2.94
11.37   122.55  220.81   1.83  96.48
sde   0.00  1495.800.00   99.00 0.00 6.23   
128.86 0.818.15   7.76  76.80
sdf   0.00  1495.800.00   99.20 0.00 6.26   
129.24 0.737.40   7.10  70.48


Top looks like:

Cpu(s):  2.5%us,  1.9%sy,  0.0%ni, 71.9%id, 23.4%wa,  0.2%hi,  0.2%si,  
0.0%st

Mem:  16474084k total, 15750384k used,   723700k free,  1654320k buffers
Swap:  2104440k total,  944k used,  2103496k free, 13552720k cached

It looks to me like we are maxing out the raid 10 array, and I suspect 
the chunksize (4K) is the culprit. However as this is a pest to change 
(!) I'd like some opinions on whether I'm jumping to conclusions. I'd 
also appreciate comments about what chunksize to use (I've tended to use 
256K in the past, but what are folks preferring these days?)


regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Raid 10 chunksize

2009-03-24 Thread Scott Carey

On 3/24/09 6:09 PM, Mark Kirkwood mar...@paradise.net.nz wrote:

 I'm trying to pin down some performance issues with a machine where I
 work, we are seeing (read only) query response times blow out by an
 order of magnitude or more at busy times. Initially we blamed
 autovacuum, but after a  tweak of the cost_delay it is *not* the
 problem. Then I looked at checkpoints... and altho there was some
 correlation with them and the query response - I'm thinking that the
 raid chunksize may well be the issue.
 
 Fortunately there is an identical DR box, so I could do a little
 testing. Details follow:
 
 Sun 4140 2x quad-core opteron 2356 16G RAM,  6x 15K 140G SAS
 Debian Lenny
 Pg 8.3.6
 
 The disk is laid out using software (md) raid:
 
 4 drives raid 10 *4K* chunksize with database files (ext3 ordered, noatime)
 2 drives raid 1 with database transaction logs (ext3 ordered, noatime)
 

 
 Top looks like:
 
 Cpu(s):  2.5%us,  1.9%sy,  0.0%ni, 71.9%id, 23.4%wa,  0.2%hi,  0.2%si,
 0.0%st
 Mem:  16474084k total, 15750384k used,   723700k free,  1654320k buffers
 Swap:  2104440k total,  944k used,  2103496k free, 13552720k cached
 
 It looks to me like we are maxing out the raid 10 array, and I suspect
 the chunksize (4K) is the culprit. However as this is a pest to change
 (!) I'd like some opinions on whether I'm jumping to conclusions. I'd
 also appreciate comments about what chunksize to use (I've tended to use
 256K in the past, but what are folks preferring these days?)
 
 regards
 
 Mark
 
 

md tends to work great at 1MB chunk sizes with RAID 1 or 10 for whatever
reason.  Unlike a hardware raid card, smaller chunks aren't going to help
random i/o as it won't read the whole 1MB or bother caching much.  Make sure
any partitions built on top of md are 1MB aligned if you go that route.
Random I/O on files smaller than 1MB would be affected -- but that's not a
problem on a 16GB RAM server running a database that won't fit in RAM.

Your xlogs are occasionally close to max usage too -- which is suspicious at
10MB/sec.  There is no reason for them to be on ext3 since they are a
transaction log that syncs writes so file system journaling doesn't mean
anything.  Ext2 there will lower the sync times and reduced i/o utilization.

I also tend to use xfs if sequential access is important at all (obviously
not so in pg_bench).  ext3 is slightly safer in a power failure with unsyncd
data, but Postgres has that covered with its own journal anyway so those
differences are irrelevant.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Raid 10 chunksize

2009-03-24 Thread David Rees
On Tue, Mar 24, 2009 at 6:48 PM, Scott Carey sc...@richrelevance.com wrote:
 Your xlogs are occasionally close to max usage too -- which is suspicious at
 10MB/sec.  There is no reason for them to be on ext3 since they are a
 transaction log that syncs writes so file system journaling doesn't mean
 anything.  Ext2 there will lower the sync times and reduced i/o utilization.

I would tend to recommend ext3 in data=writeback and make sure that
it's mounted with noatime over using ext2 - for the sole reason that
if the system shuts down unexpectedly, you don't have to worry about a
long fsck when bringing it back up.

Performance between the two filesystems should really be negligible
for Postgres logging.

-Dave

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Raid 10 chunksize

2009-03-24 Thread Scott Marlowe
On Tue, Mar 24, 2009 at 7:09 PM, Mark Kirkwood mar...@paradise.net.nz wrote:
 I'm trying to pin down some performance issues with a machine where I work,
 we are seeing (read only) query response times blow out by an order of
 magnitude or more at busy times. Initially we blamed autovacuum, but after a
  tweak of the cost_delay it is *not* the problem. Then I looked at
 checkpoints... and altho there was some correlation with them and the query
 response - I'm thinking that the raid chunksize may well be the issue.

Sounds to me like you're mostly just running out of bandwidth on your
RAID array.  Whether or not you can tune it to run faster is the real
issue.  This problem becomes worse as you add clients and the RAID
array starts to thrash.  Thrashing is likely to be worse with a small
chunk size, so that's definitely worth a look at fixing.

 Fortunately there is an identical DR box, so I could do a little testing.

Can you try changing the chunksize on the test box you're testing on
to see if that helps?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help Me Understand Why I'm Getting a Bad Query Plan

2009-03-24 Thread Josh Berkus

Brian,


I hate to nag, but could anybody help me with this?  We have a few
related queries that are causing noticeable service delays in our
production system.  I've tried a number of different things, but I'm
running out of ideas and don't know what to do next.


For some reason, your first post didn't make it to the list, which is 
why nobody responded.




I've got a query on our production system that isn't choosing a good
plan.  I can't see why it's choosing to do a sequential scan on the
ItemExperienceLog table.  That table is about 800mb and has about 2.5
million records.  This example query only returns 4 records.  I've
tried upping the statics for ItemExperienceLog.VistorId and
ItemExperienceLog.ItemId to 1000 (from out default of 100) with no
success.


Yes, that is kind of inexplicable.  For some reason, it's assigning a 
very high cost to the nestloops, which is why it wants to avoid them 
with a seq scan.  Can you try lowering cpu_index_cost to 0.001 and see 
how that affects the plan?


--Josh

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help Me Understand Why I'm Getting a Bad Query Plan

2009-03-24 Thread marcin mank
There is one thing I don`t understand:

  -  Nested Loop  (cost=0.00..180564.28 rows=1806
width=37) (actual time=0.192..60.214 rows=3174 loops=1)
-  Index Scan using visitors_userid_index2 on
visitors v  (cost=0.00..2580.97 rows=1300 width=33) (actual
time=0.052..2.342 rows=897 loops=1)
  Index Cond: (userid =
'fbe2537f21d94f519605612c0bf7c2c5'::bpchar)
-  Index Scan using
itemexperiencelog__index__visitorid on itemexperiencelog l
(cost=0.00..134.04 rows=230 width=70) (actual time=0.013..0.040 rows=4
loops=897)
  Index Cond: (l.visitorid = v.id)

If it expects 1300 visitors with the userid, and for each of them to
have 230 entries in itemexperiencelog, how can it come up with 1806
returned rows (and be about right!)?

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Raid 10 chunksize

2009-03-24 Thread Greg Smith

On Tue, 24 Mar 2009, David Rees wrote:


I would tend to recommend ext3 in data=writeback and make sure that
it's mounted with noatime over using ext2 - for the sole reason that
if the system shuts down unexpectedly, you don't have to worry about a
long fsck when bringing it back up.


Well, Mark's system is already using noatime, and if you believe 
http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/ 
there's little difference between writeback and ordered on the WAL disk. 
Might squeeze out some improvements with ext2 though, and if there's 
nothing besides the WAL on there fsck isn't ever going to take very long 
anyway--not much of a directory tree to traverse there.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help Me Understand Why I'm Getting a Bad Query Plan

2009-03-24 Thread Bryan Murphy
On Tue, Mar 24, 2009 at 10:04 PM, marcin mank marcin.m...@gmail.com wrote:
 There is one thing I don`t understand:

              -  Nested Loop  (cost=0.00..180564.28 rows=1806
 width=37) (actual time=0.192..60.214 rows=3174 loops=1)
                    -  Index Scan using visitors_userid_index2 on
 visitors v  (cost=0.00..2580.97 rows=1300 width=33) (actual
 time=0.052..2.342 rows=897 loops=1)
                          Index Cond: (userid =
 'fbe2537f21d94f519605612c0bf7c2c5'::bpchar)
                    -  Index Scan using
 itemexperiencelog__index__visitorid on itemexperiencelog l
 (cost=0.00..134.04 rows=230 width=70) (actual time=0.013..0.040 rows=4
 loops=897)
                          Index Cond: (l.visitorid = v.id)

 If it expects 1300 visitors with the userid, and for each of them to
 have 230 entries in itemexperiencelog, how can it come up with 1806
 returned rows (and be about right!)?

I'm not sure I follow what you're saying.

One thing to keep in mind, due to a lapse in our judgement at the
time, this itemexperiencelog table serves as both a current state
table, and a log table.  Therefore, it potentially has multiple
redundant entries, but we typically only look at the most recent entry
to figure out the state of the current item.

We're in the process of re-factoring this now, as well as
denormalizing some of the tables to eliminate unnecessary joins, but I
keep running into these problems and need to understand what is going
on so that I know we're fixing the correct things.

Thanks,
Bryan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help Me Understand Why I'm Getting a Bad Query Plan

2009-03-24 Thread Bryan Murphy
On Tue, Mar 24, 2009 at 9:30 PM, Josh Berkus j...@agliodbs.com wrote:
 For some reason, your first post didn't make it to the list, which is why
 nobody responded.

Weird... I've been having problems with gmail and google reader all week.

 I've got a query on our production system that isn't choosing a good
 plan.  I can't see why it's choosing to do a sequential scan on the
 ItemExperienceLog table.  That table is about 800mb and has about 2.5
 million records.  This example query only returns 4 records.  I've
 tried upping the statics for ItemExperienceLog.VistorId and
 ItemExperienceLog.ItemId to 1000 (from out default of 100) with no
 success.

 Yes, that is kind of inexplicable.  For some reason, it's assigning a very
 high cost to the nestloops, which is why it wants to avoid them with a seq
 scan.  Can you try lowering cpu_index_cost to 0.001 and see how that affects
 the plan?

I'm assuming you meant cpu_index_tuple_cost.  I changed that to 0.001
as you suggested, forced postgres to reload it's configuration and I'm
still getting the same execution plan.

Looking through our configuration one more time, I see that at some
point I set random_page_cost to 2.0, but I don't see any other changes
to query planner settings from their default values.

Bryan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Raid 10 chunksize

2009-03-24 Thread Mark Kirkwood

Scott Marlowe wrote:

On Tue, Mar 24, 2009 at 7:09 PM, Mark Kirkwood mar...@paradise.net.nz wrote:
  

I'm trying to pin down some performance issues with a machine where I work,
we are seeing (read only) query response times blow out by an order of
magnitude or more at busy times. Initially we blamed autovacuum, but after a
 tweak of the cost_delay it is *not* the problem. Then I looked at
checkpoints... and altho there was some correlation with them and the query
response - I'm thinking that the raid chunksize may well be the issue.



Sounds to me like you're mostly just running out of bandwidth on your
RAID array.  Whether or not you can tune it to run faster is the real
issue.  This problem becomes worse as you add clients and the RAID
array starts to thrash.  Thrashing is likely to be worse with a small
chunk size, so that's definitely worth a look at fixing.

  


Yeah, I was wondering if we are maxing out the bandwidth...

Fortunately there is an identical DR box, so I could do a little testing.



Can you try changing the chunksize on the test box you're testing on
to see if that helps?

  


Yes - or I am hoping to anyway (part of posting here was to collect some 
outside validation for the idea). Thanks for your input!



Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance