Re: [PERFORM] vacuum in Postgresql 8.0.x slowing down the database

2008-03-28 Thread Vinubalaji Gopal
On Wed, 26 Mar 2008 13:02:13 -0700
Joshua D. Drake [EMAIL PROTECTED] wrote:


 The slowness is likely attributed to Vacuum's use of I/O. When vacuum
 is running what does iostat -k 10 say?

Seems to be higher than normal - here is the output with vacuum run
without the other queries and the default vacuum taking ~1 hr:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  13.300.004.50   25.910.00   56.29
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
md23356.94  2005.59 12945.45  20076 129584


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  16.200.006.32   24.890.00   52.59

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
md2 461.70   667.20  1512.00   6672  15120


I don't know if the output helps much since the vacuum took some time
and I lost more than half of my iostat -k screen output. (I scrolled up
- but got only some of the data)

If vacuum does affect the io what are the ways to reduce the io during
vacuum (the delay and cost parameter did not help that much - should I
consider reducing the cost even further)? Should I consider
partitioning the table?

Thank you.

--
Vinu

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Planning a new server - help needed

2008-03-28 Thread Laszlo Nagy

Hello,

I need to install a new server for postgresql 8.3. It will run two 
databases, web server and some background programs. We already have a 
server but it is becoming slow and we would like to have something that 
is faster. It is a cost sensitive application, and I would like to get 
your opinion in some questions.


The database itself is an OLTP system. There are many smaller tables, 
and some bigger ones (biggest table with 1.2 million records, table size 
966MB, indexes size 790MB). In the bigger tables there are only a few 
records updated frequently, most of the other records are not changed. 
The smaller tables are updated continuously.


Question 1. We are going to use PostgreSQL 3.1 with FreeBSD. The pg docs 
say that it is better to use FreeBSD because it can alter the I/O 
priority of processes dynamically. The latest legacy release is 6.3 
which is probably more stable. However, folks say that 7.0 has superior 
performance on the same hardware. Can I use 7.0 on a production server?


Question 2. SCSI or SATA? I plan to buy a RocketRAID 3520 controller 
with 8 SATA 2 disks. The operating system would be on another disk pair, 
connected to the motherboard's controller. I wonder if I can get more 
performance with SCSI, for the same amount of money? (I can spend about 
$1500 on the controller and the disks, that would cover 10 SATA 2 disks 
and the controller.)


Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that UFS 
2 + soft updates will be better, but I'm not sure. Which is better?


Question 4. How to make the partitions? This is the hardest question. 
Here is my plan:


- the OS resides on 2 disks, RAID 1
- the databases should go on 8 disks, RAID 0 + 1

However, the transaction log file should be on a separate disk and maybe 
I could gain more performance by putting indexes on a separate drive, 
but I do not want to reduce the number of disks in the RAID 0+1 array. 
Should I put indexes and transaction log on the RAID 1 array? Or should 
I invest a bit more money, add an SATA RAID controller with 16 channels 
and add more disks? Would it pay the bill? Another alternative is to put 
the biggest tables on a separate array so that it will be faster when we 
join these tables with other tables.


I know that it is hard to answer without knowing the structure of the 
databases. :-( I can make tests with different configurations later, but 
I would like to know your opinion first - what should I try?


Thanks,

  Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planning a new server - help needed

2008-03-28 Thread Claus Guttesen
  I need to install a new server for postgresql 8.3. It will run two
  databases, web server and some background programs. We already have a
  server but it is becoming slow and we would like to have something that
  is faster. It is a cost sensitive application, and I would like to get
  your opinion in some questions.

  The database itself is an OLTP system. There are many smaller tables,
  and some bigger ones (biggest table with 1.2 million records, table size
  966MB, indexes size 790MB). In the bigger tables there are only a few
  records updated frequently, most of the other records are not changed.
  The smaller tables are updated continuously.

  Question 1. We are going to use PostgreSQL 3.1 with FreeBSD. The pg docs
  say that it is better to use FreeBSD because it can alter the I/O
  priority of processes dynamically. The latest legacy release is 6.3
  which is probably more stable. However, folks say that 7.0 has superior
  performance on the same hardware. Can I use 7.0 on a production server?

I guess you mean postgresql 8.3.1? :-)

I use FreeBSD 7 release on a 8-way HP DL360 G5 with a ciss controller.
Works out of the box and I haven't had any issue with 7.0 at all.

  Question 2. SCSI or SATA? I plan to buy a RocketRAID 3520 controller
  with 8 SATA 2 disks. The operating system would be on another disk pair,
  connected to the motherboard's controller. I wonder if I can get more
  performance with SCSI, for the same amount of money? (I can spend about
  $1500 on the controller and the disks, that would cover 10 SATA 2 disks
  and the controller.)

SAS would probably be the way to go. I haven't tried the
rocketraid-controller. I use the built-in p400i-controller on my
servers using the ciss-driver. I've heard many positive remarks about
areca.

  Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that UFS
  2 + soft updates will be better, but I'm not sure. Which is better?

I'd stick with ufs2 atm. There are some issues with zfs which probably
have been ironed out by now but ufs2 has been deployed for a longer
time. Performance-wise they are about the same.

  Question 4. How to make the partitions? This is the hardest question.
  Here is my plan:

  - the OS resides on 2 disks, RAID 1
  - the databases should go on 8 disks, RAID 0 + 1

If you have enough disks raid-6 should perform almost as good as raid
1+0. I've setup 11 disks in raid-6 plus one hotspare so I can get more
space out of it. Enough disks are approx. eight and up.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planning a new server - help needed

2008-03-28 Thread Reko Turja
Question 1. We are going to use PostgreSQL 3.1 with FreeBSD. The pg 
docs say that it is better to use FreeBSD because it can alter the 
I/O priority of processes dynamically. The latest legacy release is 
6.3 which is probably more stable. However, folks say that 7.0 has 
superior performance on the same hardware. Can I use 7.0 on a 
production server?


FreeBSD 7.x is pretty stable, and it has the advantage of having the 
new ULE and other things that can't be MFC'd to 6.x branch. And as a 
long time FreeBSD enthusiast having Cisco sponsoring Dtrace, Nokia 
sponsoring scheduler development etc. 7.x is definitely in my opinion 
now the branch to install and start following for ease of upgrading 
later. Of course, as always check that your intended hardware is 
supported.


ULE which is pretty much the key for performance boost in 7.x branch 
isn't yet the default scheduler, but will be in 7.1 and afterwards. 
This means you have to roll custom kernel if you want to use ULE.


Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that 
UFS 2 + soft updates will be better, but I'm not sure. Which is 
better?


For now I'd choose between UFS+gjournal or plain UFS, although with 
bigger disks journaling is a boon compared to fsck'ing plain UFS 
partition. ZFS isn't yet ready for production I think, but it seems to 
be getting there. This is opinion based on bug reports and discussions 
in stable and current mailing lists, not on personal testing though. 
My experiences with gjournal have been positive so far.


On the drives and controller - I'm not sure whether SCSI/SAS will give 
any noticeable boost over SATA, but based on personal experience SCSI 
is still ahead on terms of drive reliability. Whatever technology I'd 
choose, for production server getting decent battery backed controller 
would be the start. And of course a controller that does the RAID's in 
hardware.


-Reko 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planning a new server - help needed

2008-03-28 Thread Laszlo Nagy



I guess you mean postgresql 8.3.1? :-)
  

Yep. Sorry.

 Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that UFS
 2 + soft updates will be better, but I'm not sure. Which is better?



I'd stick with ufs2 atm. There are some issues with zfs which probably
have been ironed out by now but ufs2 has been deployed for a longer
time. Performance-wise they are about the same.
  
Thank you. I suspected the same but it was good to get positive 
confirmation.

 Question 4. How to make the partitions? This is the hardest question.
 Here is my plan:

 - the OS resides on 2 disks, RAID 1
 - the databases should go on 8 disks, RAID 0 + 1



If you have enough disks raid-6 should perform almost as good as raid
1+0. 
Hmm, I have heard that RAID 1 or RAID 1 + 0 should be used for 
databases,  never RAID 5. I know nothing about RAID 6. I guess I must 
accept your suggestion since you have more experience than I have. :-) 
Obviously, it would be easier to manage a single RAID 6 array.

I've setup 11 disks in raid-6 plus one hotspare so I can get more
space out of it. Enough disks are approx. eight and up.
  
The RAID controller that I have selected can only handle 8 disks. I 
guess I need to find a different one with 16 channels and use more 
disks. So are you saying that with all disks in a bigger RAID 6 array, I 
will get the most out of the hardware? In that case, I'll try to get a 
bit more money from the management and build RAID 6 with 12 disks.


I also feel that I need to use a separate RAID 1 array (I prefer 
gmirror) for the base system.


Thanks,

  Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] using like in a prepare doesnt' use the right index

2008-03-28 Thread Gavin Sherry
On Thu, Mar 27, 2008 at 03:14:49PM -0400, Dave Cramer wrote:
 I have a query which is
 
 prepare s_18 as select uid from user_profile where name like  
 $1::varchar and isactive=$2 order by name asc limit 250;
 
 explain analyze execute s_18 ('atxchery%','t');
QUERY 
 PLAN
 -
  Limit  (cost=0.00..7965.22 rows=250 width=14) (actual  
 time=301.714..3732.269 rows=1 loops=1)
-  Index Scan using user_profile_name_key on user_profile   
 (cost=0.00..404856.37 rows=12707 width=14) (actual  
 time=301.708..3732.259 rows=1 loops=1)
  Filter: (((name)::text ~~ $1) AND (isactive = $2))
  Total runtime: 3732.326 ms
 
 without prepared statements we get
 
 explain analyze select uid from user_profile where name like 'foo%'  
 and isactive='t' order by name asc limit 250;
   
 QUERY PLAN
 -
  Limit  (cost=293.89..294.08 rows=73 width=14) (actual  
 time=5.947..6.902 rows=250 loops=1)
-  Sort  (cost=293.89..294.08 rows=73 width=14) (actual  
 time=5.943..6.258 rows=250 loops=1)
  Sort Key: name
  Sort Method:  top-N heapsort  Memory: 38kB
  -  Bitmap Heap Scan on user_profile  (cost=5.36..291.64  
 rows=73 width=14) (actual time=0.394..2.481 rows=627 loops=1)
Filter: (isactive AND ((name)::text ~~ 'foo%'::text))
-  Bitmap Index Scan on user_profile_name_idx   
 (cost=0.00..5.34 rows=73 width=0) (actual time=0.307..0.307 rows=628  
 loops=1)
  Index Cond: (((name)::text ~=~ 'foo'::text) AND  
 ((name)::text ~~ 'fop'::text))
 
 
 There are two indexes on it
 
 user_profile_name_idx UNIQUE, btree (name varchar_pattern_ops)
 user_profile_name_key UNIQUE, btree (name)
 
 one for equality, one for like

This is behaving as designed because the planner transforms the
predicate in the second query: Index Cond: (((name)::text ~=~
'foo'::text) AND ((name)::text ~~ 'fop'::text)).

It cannot make this transformation for a prepared statement where the 
LIKE argument is a PREPARE parameter (the first query), since the
transformation depends on inspecting the actual string.

You could probably continue using prepared statements and make this
transformation yourself but you'll have to be careful about creating the
'greater' string (see make_greater_string()).

Come to think of it, it'd easier to just make a set returning function
which executes this query, if you need to stick with prepare/execute.

Thanks,

Gavin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planning a new server - help needed

2008-03-28 Thread Greg Smith

On Fri, 28 Mar 2008, Laszlo Nagy wrote:

We already have a server but it is becoming slow and we would like to 
have something that is faster.


What's it slow at?  Have you identified the bottlenecks and current 
sources of sluggish behavior?  That sort of thing is much more informative 
to look into in regards to redesigning for new hardware than trivia like 
disk layout.  For all we know you're CPU bound.


The database itself is an OLTP system. There are many smaller tables, and 
some bigger ones (biggest table with 1.2 million records, table size 966MB, 
indexes size 790MB).


The total database size is the interesting number you left out here.  And 
you didn't mention how much RAM either.  That ratio has a lot of impact on 
how hard you'll push the disks.


Question 1. We are going to use PostgreSQL 3.1 with FreeBSD. The pg docs say 
that it is better to use FreeBSD because it can alter the I/O priority of 
processes dynamically.


You shouldn't make an OS decision based on a technical detail that small. 
I won't knock FreeBSD because it's a completely reasonable choice, but 
there's no credible evidence it's a better performer for the workload you 
expect than, say, Linux or even Solaris x64.  (The benchmarks the FreeBSD 
team posted as part of their 7.0 fanfare are not representative of real 
PostgreSQL performance, and are read-only as well).


All the reasonable OS choices here are close enough to one another (as 
long as you get FreeBSD 7, earlier versions are really slow) that you 
should be thinking in terms of reliability, support, and features rather 
than viewing this from a narrow performance perspective.  There's nothing 
about what you've described that sounds like it needs bleeding-edge 
performance to achieve. For reliability, I first look at how good the disk 
controller and its matching driver in the OS used is, which brings us to:


Question 2. SCSI or SATA? I plan to buy a RocketRAID 3520 controller with 8 
SATA 2 disks. The operating system would be on another disk pair, connected 
to the motherboard's controller. I wonder if I can get more performance with 
SCSI, for the same amount of money? (I can spend about $1500 on the 
controller and the disks, that would cover 10 SATA 2 disks and the 
controller.)


Highpoint has traditionally made disk controllers that were garbage.  The 
3520 is from a relatively new series of products from them, and it seems 
like a reasonable unit.  However:  do you want to be be deploying your 
system on a new card with zero track record for reliability, and from a 
company that has never done a good job before?  I can't think of any 
reason at all why you should take that risk.


The standard SATA RAID controller choices people suggest here are 3ware, 
Areca, and LSI Logic.  Again, unless you're really pushing what the 
hardware is capable of these are all close to each other performance-wise 
(see http://femme.tweakblogs.net/blog/196/highpoint-rocketraid-3220.html 
for something that include the Highpoint card).  You should be thinking in 
terms of known reliability and stability when you select a database 
controller card, and Highpoint isn't even on the list of vendors to 
consider yet by those standards.


As for SCSI vs. SATA, I collected up the usual arguments on both sides at 
http://www.postgresqldocs.org/index.php/SCSI_vs._IDE/SATA_Disks


However, the transaction log file should be on a separate disk and maybe I 
could gain more performance by putting indexes on a separate drive, but I do 
not want to reduce the number of disks in the RAID 0+1 array.


If you're looking at 8+ disks and have a caching controller with a battery 
backup, which appears to be your target configuration, there little reason 
to expect a big performance improvement from splitting the transaction log 
out onto a seperate disk.  As you note, doing that will reduce the spread 
of disk for the database which may cost you more in performance than 
seperate transaction logs gain.


It is worth considering creating a seperate filesystem on the big array to 
hold the xlog data through, because that gives you more flexibility in 
terms of mount parameters there.  For example, you can always turn off 
atime updates on the transaction log filesystem, and in many cases the 
filesystem journal updates can be optimized more usefully (the xlog 
doesn't require them).


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planning a new server - help needed

2008-03-28 Thread Dan Harris

Laszlo Nagy wrote:


Question 4. How to make the partitions? This is the hardest question. 
Here is my plan:


- the OS resides on 2 disks, RAID 1
- the databases should go on 8 disks, RAID 0 + 1
Make sure you understand the difference between RAID 1+0 and RAID 0+1.. 
I suspect you'll end up going with 1+0 instead.


-Dan



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planning a new server - help needed

2008-03-28 Thread Weinzierl Stefan

Laszlo Nagy schrieb:
[...]
The RAID controller that I have selected can only handle 8 disks. I 
guess I need to find a different one with 16 channels and use more 
disks. So are you saying that with all disks in a bigger RAID 6 array, I 
will get the most out of the hardware? In that case, I'll try to get a 
bit more money from the management and build RAID 6 with 12 disks.


Here a good SATA-Controllers for 4/8/12/16-Disks:
http://www.tekram.com/product2/product_detail.asp?pid=51

Stefan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance