Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Mario Weilguni

Scott Marlowe schrieb:

On Tue, Dec 2, 2008 at 2:22 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote:
  

Has anyone benchmarked this controller (PCIe/4x, 512 MB BBC)? We try to use
it with 8x SATA 1TB drives in RAID-5 mode under Linux, and measure strange
values. An individual drive is capable of delivering 91 MB/sec sequential
read performance, and we get values ~102MB/sec out of a 8-drive RAID5, seems
to be ridiculous slow. Write performance seems to be much better, ~300 MB
/sec - seems ok to me.

I guess I must be doing something wrong, I cannot believe that a 500 €
controller is delivering such poor performance.



A few suggestions...  Try to find the latest driver for your card, try
using the card as nothing but a caching controller and run your RAID
on software in linux (or whatever IS you're on).  Test a 2 drive
RAID-0 to see what kind of performance increase you get.  If you can't
dd a big file off of a RAID-0 at about 2x the rate of a single drive
then something IS wrong with it. Try RAID 10.  Try RAID-1 sets on the
controller and RAID 0 over that in software.

  


I've already tried Softraid with individual drives, performs much 
better. However, it's no option to use softraid, so I'm stuck. The card 
has the latest firmware installed, and there are no drivers needed, 
they're already included in the linux kernel.


I still think we must be doing something wrong here, I googled the 
controller and Linux, and did not find anything indicating a problem. 
The HP SmartArray series is quite common, so a lot of users would have 
the same problem.


Thanks!

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Mario Weilguni

Kevin Grittner schrieb:
Mario Weilguni <[EMAIL PROTECTED]> wrote: 


Has anyone benchmarked this controller (PCIe/4x, 512 MB BBC)? We try

to 
  

use it with 8x SATA 1TB drives in RAID-5 mode under Linux, and

measure 
  

strange values. An individual drive is capable of delivering 91

MB/sec 
  
sequential read performance, and we get values ~102MB/sec out of a 
8-drive RAID5, seems to be ridiculous slow. Write performance seems

to 
  

be much better, ~300 MB /sec - seems ok to me.

 
What's your stripe size?
 
-Kevin
  

We used the default settings, it's 64k. Might a bigger value help here?

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Mario Weilguni

Alan Hodgson schrieb:

Mario Weilguni <[EMAIL PROTECTED]> wrote:
  

strange values. An individual drive is capable of delivering 91
MB/sec
  
sequential read performance, and we get values ~102MB/sec out of a
8-drive RAID5, seems to be ridiculous slow. 
  



What command are you using to test the reads?

Some recommendations to try:

1) /sbin/blockdev --setra 2048 device (where device is the partition or LVM 
volume)


2) Use XFS, and make sure your stripe settings match the RAID.

Having said that, 102MB/sec sounds really low for any modern controller with 
8 drives, regardless of tuning or filesystem choice.


  


First, thanks alot for this and all the other answers.

I measured the raw device performance:
dd if=/dev/cciss/c0d0 bs=64k count=10 of=/dev/null

I get poor performance when all 8 drives are configured as one, large 
RAID-5, and slightly poorer performance when configured as JBOD. In 
production, we use XFS as FS, but I doubt this has anything to do with 
FS tuning.




--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Gregory Stark

Greg Smith <[EMAIL PROTECTED]> writes:

> On Mon, 8 Dec 2008, Merlin Moncure wrote:
>
>> I wonder if shared_buffers has any effect on how far you can go before
>> you hit the 'tipping point'.
>
> If your operating system has any reasonable caching itself, not so much at
> first.  As long as the index on the account table fits in shared_buffers, even
> the basic sort of caching logic an OS uses is perfectly functional for 
> swapping
> the individual pages of the account table in and out, the main limiting factor
> on pgbench performance.

I would expect higher shared_buffers to raise the curve before the first
breakpoint but after the first breakpoint make the drop steeper and deeper.
The equilibrium where the curve becomes flatter should be lower.

That is, as long as the database fits entirely in RAM having more of the
buffers be immediately in shared buffers is better. Once there's contention
for the precious cache stealing some of it for duplicated buffers will only
hurt.

> There is a further out tipping point I've theorized about but not really
> explored:  the point where even the database indexes stop fitting in memory
> usefully.  As you get closer to that, I'd expect that the clock sweep 
> algorithm
> used by shared_buffers should make it a bit more likely that those important
> blocks would hang around usefully if you put them there, rather than giving
> most of the memory to the OS to manage.

Hm, sounds logical. At that point the slow drop-off should become even
shallower and possibly become completely flat. Greater shared_buffers might
start helping again at that point.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Jean-David Beyer

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Greg Smith wrote:
| On Mon, 8 Dec 2008, Merlin Moncure wrote:
|
|> I wonder if shared_buffers has any effect on how far you can go before
|> you hit the 'tipping point'.
|
| If your operating system has any reasonable caching itself, not so much at
| first.  As long as the index on the account table fits in shared_buffers,
| even the basic sort of caching logic an OS uses is perfectly functional
| for swapping the individual pages of the account table in and out, the
| main limiting factor on pgbench performance.
|
| There is a further out tipping point I've theorized about but not really
| explored:  the point where even the database indexes stop fitting in
| memory usefully.  As you get closer to that, I'd expect that the clock
| sweep algorithm used by shared_buffers should make it a bit more likely
| that those important blocks would hang around usefully if you put them
| there, rather than giving most of the memory to the OS to manage.

I am by no means an expert at this.

But one thing that can matter is whether you want to improve just the
performance of the dbms, or the performance of the entire system, on which
the dbms runs. Because if you want to improve the whole system, you would
want as much of the caching to take place in the system's buffers so the use
of the memory could be optimized over the entire workload, not just the load
of the dbms itself. I suppose on a dedicated system with only one dbms
running with only one database open (at a time, anyway), this might be moot,
but not otherwise.

Now I agree that it would be good to get the entire index (or at least the
working set of the index) into the memory of the computer. But does it
really matter if it is in the system's cache, or the postgres cache? Is it
any more likely to be in postgres's cache than in the system cache if the
system is hurting for memory? I would think the system would be equally
likely to page out "idle" pages no matter where they are unless they are
locked to memory, and I do not know if all operating systems can do this,
and even if they can, I do not know if postgres uses that ability. I doubt
it, since I believe (at least in Linux) a process can do that only if run as
root, which I imagine few (if any) users do.
|
| Since the data is about 7.5X as large as the indexes, that point is way
| further out than the basic bottlenecks.  And if you graph pgbench results
| on a scale that usefully shows the results for in-memory TPS scores, you
| can barely see that part of the chart a well.  One day I may get to
| mapping that out better, and if I do it will be interesting to see if the
| balance of shared_buffers to OS cache works the way I expect.  I was
| waiting until I finished the pgtune program for that, that's building some
| of the guts I wanted to make it easier to tweak postgresql.conf settings
| programmatically in between pgbench runs.
|
| --
| * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD
|


- --
~  .~.  Jean-David Beyer  Registered Linux User 85642.
~  /V\  PGP-Key: 9A2FC99A Registered Machine   241939.
~ /( )\ Shrewsbury, New Jerseyhttp://counter.li.org
~ ^^-^^ 07:55:02 up 5 days, 18:13, 4 users, load average: 4.18, 4.17, 4.11
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org

iD8DBQFJPm2+Ptu2XpovyZoRAlcJAKCIN098quZKZ7MfAs3MOkuL3WWxrQCdHCVl
sUQoIVleRWVLvcMZoihztpE=
=n6uO
-END PGP SIGNATURE-

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Simon Waters
On Tuesday 09 December 2008 13:08:14 Jean-David Beyer wrote:
> 
> and even if they can, I do not know if postgres uses that ability. I doubt
> it, since I believe (at least in Linux) a process can do that only if run
> as root, which I imagine few (if any) users do.

Disclaimer: I'm not a system programmer... 

I believe that at Linux kernel revision 2.6.8 and before processes need Posix 
capability CAP_IPC_LOCK, and 2.6.9 and after they need CAP_IPC_LOCK to lock 
more than RLIMIT_MEMLOCK.

It is a capability, so a process can run as any user assuming it is started 
with or gained the capability.

No idea if Postgres uses any of this, other than to protect security of 
certain password operations there is probably not much point. If key parts of 
your database are being paged out, get more RAM, if idle parts of your 
database are paged out, you probably could more usefully apply that RAM for 
something else.

The Varnish cache design is the place to look for enlightenment on relying on 
the kernel paging (using memory mapped files) rather than trying to do it 
yourself, but then a proxy server is a lot simpler than a RDBMS. That said, 
Varnish is fast at what it does (reverse HTTP proxy) !

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] file system and raid performance

2008-12-09 Thread M. Edward (Ed) Borasky
Scott Marlowe wrote:
> On Sun, Dec 7, 2008 at 10:59 PM, M. Edward (Ed) Borasky
> <[EMAIL PROTECTED]> wrote:
>> Ah, but shouldn't a PostgreSQL (or any other database, for that matter)
>> have its own set of filesystems tuned to the application's I/O patterns?
>> Sure, there are some people who need to have all of their eggs in one
>> basket because they can't afford multiple baskets. For them, maybe the
>> OS defaults are the right choice. But if you're building a
>> database-specific server, you can optimize the I/O for that.
> 
> It's really about a cost / benefits analysis.  20 years ago file
> systems were slow and buggy and a database could, with little work,
> outperform them.  Nowadays, not so much.  I'm guessing that the extra
> cost and effort of maintaining a file system for pgsql outweighs any
> real gain you're likely to see performance wise.
> 
> But I'm sure that if you implemented one that outran XFS / ZFS / ext3
> et. al. people would want to hear about it.
> 
I guess I wasn't clear -- I didn't mean a PostgreSQL-specific filesystem
design, although BTRFS does have some things that are "RDBMS-friendly".
I meant that one should hand-tune existing filesystems / hardware for
optimum performance on specific workloads. The tablespaces in PostgreSQL
give you that kind of potential granularity, I think.

-- 
M. Edward (Ed) Borasky, FBG, AB, PTA, PGS, MS, MNLP, NST, ACMC(P), WOM

"A mathematician is a device for turning coffee into theorems." --
Alfréd Rényi via Paul Erdős


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Greg Stark
On Sun, Dec 7, 2008 at 7:38 PM, Josh Berkus <[EMAIL PROTECTED]> wrote:
>
> Also, the following patches currently still have bugs, but when the bugs are
> fixed I'll be looking for performance testers, so please either watch the
> wiki or watch this space:
>...
> -- posix_fadvise (Gregory Stark)

Eh? Quite possibly but none that I'm aware of. The only problem is a
couple of trivial bits of bitrot. I'll a post an update now if you
want.





-- 
greg

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Scott Marlowe
On Tue, Dec 9, 2008 at 5:17 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote:
> Alan Hodgson schrieb:
>>
>> Mario Weilguni <[EMAIL PROTECTED]> wrote:
>>

 strange values. An individual drive is capable of delivering 91
 MB/sec
  sequential read performance, and we get values ~102MB/sec out of a
 8-drive RAID5, seems to be ridiculous slow.
>>
>>
>> What command are you using to test the reads?
>>
>> Some recommendations to try:
>>
>> 1) /sbin/blockdev --setra 2048 device (where device is the partition or
>> LVM volume)
>>
>> 2) Use XFS, and make sure your stripe settings match the RAID.
>>
>> Having said that, 102MB/sec sounds really low for any modern controller
>> with 8 drives, regardless of tuning or filesystem choice.
>>
>>
>
> First, thanks alot for this and all the other answers.
>
> I measured the raw device performance:
> dd if=/dev/cciss/c0d0 bs=64k count=10 of=/dev/null
>
> I get poor performance when all 8 drives are configured as one, large
> RAID-5, and slightly poorer performance when configured as JBOD. In
> production, we use XFS as FS, but I doubt this has anything to do with FS
> tuning.

Yeah, having just trawled the pgsql-performance archives, there are
plenty of instances of people having terrible performance from HP
smart array controllers before the P800.  Is it possible for you to
trade up to a better RAID controller?  Whichever salesman sold you the
P400 should take one for the team and make this right for you.

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Joshua D. Drake
On Tue, 2008-12-09 at 13:10 +0100, Mario Weilguni wrote:
> Scott Marlowe schrieb:
> > On Tue, Dec 2, 2008 at 2:22 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote:

> I still think we must be doing something wrong here, I googled the 
> controller and Linux, and did not find anything indicating a problem. 
> The HP SmartArray series is quite common, so a lot of users would have 
> the same problem.

Yes the SmartArray series is quite common and actually know to perform
reasonably well, in RAID 10. You still appear to be trying RAID 5.

Joshua D. Drake


> 
> Thanks!
> 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Gabriele Turchi
We reached a fairly good performance on a P400 controller (8 SATA 146GB 
2,5" 10k rpm) with raid5 or raid6 Linux software raid: the writing 
bandwidth reached about 140 MB/s sustained throughput (the hardware 
raid5 gave a sustained 20 MB/s...). With a second, equal controller (16 
disks) we reached (raid6 spanning all 16 disks) about 200 MB/s sustained.


The CPU load is negligible. Reading performance is about 20% better.

Best regards and my apologies for my bad English.

GT

P.S.: on a P800, 12 SATA 750GB 3,5" 7200 rpm, the hardware raid5 writing 
performance was about 30 MB/s, software raid5 is between 60 and 80 MB/s.




Scott Marlowe ha scritto:

On Tue, Dec 9, 2008 at 5:17 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote:

Alan Hodgson schrieb:

Mario Weilguni <[EMAIL PROTECTED]> wrote:


strange values. An individual drive is capable of delivering 91
MB/sec
 sequential read performance, and we get values ~102MB/sec out of a
8-drive RAID5, seems to be ridiculous slow.


What command are you using to test the reads?

Some recommendations to try:

1) /sbin/blockdev --setra 2048 device (where device is the partition or
LVM volume)

2) Use XFS, and make sure your stripe settings match the RAID.

Having said that, 102MB/sec sounds really low for any modern controller
with 8 drives, regardless of tuning or filesystem choice.



First, thanks alot for this and all the other answers.

I measured the raw device performance:
dd if=/dev/cciss/c0d0 bs=64k count=10 of=/dev/null

I get poor performance when all 8 drives are configured as one, large
RAID-5, and slightly poorer performance when configured as JBOD. In
production, we use XFS as FS, but I doubt this has anything to do with FS
tuning.


Yeah, having just trawled the pgsql-performance archives, there are
plenty of instances of people having terrible performance from HP
smart array controllers before the P800.  Is it possible for you to
trade up to a better RAID controller?  Whichever salesman sold you the
P400 should take one for the team and make this right for you.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Aidan Van Dyk
* Joshua D. Drake <[EMAIL PROTECTED]> [081209 11:01]:
 
> Yes the SmartArray series is quite common and actually know to perform
> reasonably well, in RAID 10. You still appear to be trying RAID 5.

*boggle* 

Are people *still* using raid5?

/me gives up!

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Scott Marlowe
On Tue, Dec 9, 2008 at 9:03 AM, Gabriele Turchi
<[EMAIL PROTECTED]> wrote:
> We reached a fairly good performance on a P400 controller (8 SATA 146GB 2,5"
> 10k rpm) with raid5 or raid6 Linux software raid: the writing bandwidth
> reached about 140 MB/s sustained throughput (the hardware raid5 gave a
> sustained 20 MB/s...). With a second, equal controller (16 disks) we reached
> (raid6 spanning all 16 disks) about 200 MB/s sustained.

That's better than you were getting but still quite slow.  I was
bothered that my 12x15k4 SAS RAID-10 array could only sustain about
350Megs/second sequential read, thinking that each drive should be
able to approach 80 or so megs/second and I was only getting about
60...

This sounds more and more like HP is trying to undercompete along with
Dell in the RAID controller market or at least the raid controller
driver market.

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Peter Eisentraut

Aidan Van Dyk wrote:

* Joshua D. Drake <[EMAIL PROTECTED]> [081209 11:01]:
 

Yes the SmartArray series is quite common and actually know to perform
reasonably well, in RAID 10. You still appear to be trying RAID 5.


*boggle* 


Are people *still* using raid5?

/me gives up!


What do you suggest when there is not enough room for a RAID 10?

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Aidan Van Dyk
* Peter Eisentraut <[EMAIL PROTECTED]> [081209 11:28]:

> What do you suggest when there is not enough room for a RAID 10?

More disks ;-)

But if you've given up on performance and reliability in favour of
cheaper storage, I guess raid5 is ok.  But then I'm not sure what the
point of asking about it's poor performance is...

a.

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Scott Carey
> 
> From: [EMAIL PROTECTED] [EMAIL PROTECTED] On Behalf Of > Jean-David Beyer 
> [EMAIL PROTECTED]
> Sent: Tuesday, December 09, 2008 5:08 AM
> To: [email protected]
> Subject: Re: [PERFORM] Need help with 8.4 Performance Testing
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1

> But one thing that can matter is whether you want to improve just the
> performance of the dbms, or the performance of the entire system, on which
> the dbms runs. Because if you want to improve the whole system, you would
> want as much of the caching to take place in the system's buffers so the use
> of the memory could be optimized over the entire workload, not just the load
> of the dbms itself. I suppose on a dedicated system with only one dbms
> running with only one database open (at a time, anyway), this might be moot,
> but not otherwise.

Yes, the OS is in better position to arbitrate between multiple things.  Of 
course, we aren't talking about the highest performance databases if we are 
talking about mixed use systems though.
Additionally, the OS can never really get it right, with a DB or other apps.  
Any app can behave badly and grab too much RAM and access it regularly enough 
for it to not be 'idle' much but give the OS VM fits trying to figure out if 
its important or not versus other processes.

> Now I agree that it would be good to get the entire index (or at least the
> working set of the index) into the memory of the computer. But does it
> really matter if it is in the system's cache, or the postgres cache? Is it
> any more likely to be in postgres's cache than in the system cache if the
> system is hurting for memory? I would think the system would be equally
> likely to page out "idle" pages no matter where they are unless they are
> locked to memory, and I do not know if all operating systems can do this,
> and even if they can, I do not know if postgres uses that ability. I doubt
> it, since I believe (at least in Linux) a process can do that only if run as
> root, which I imagine few (if any) users do.

The problem, is when none of them are really 'idle'.  When the OS has to decide 
which pages, all of which have been accessed recently, to evict.  Most OS's 
will make bad choices if the load is mixed random and sequential access, as 
they treat all pages equally with respect to freshness versus eviction.
Another problem is that there IS a difference between being in postgres' cache 
and the OS cache.  One is more expensive to retrieve than the other.  
Significantly.

Aaccessing buffers in shared_buffers, in process, uses a good chunk less CPU 
(and data copy and shared buffer eviction overhead) than going over the sys 
call to the OS.

And as far as I can tell, even after the 8.4 fadvise patch, all I/O is in 
block_size chunks.  (hopefully I am wrong)
My system is now CPU bound, the I/O can do sequential reads of more than 
1.2GB/sec but Postgres can't do a seqscan 30% as fast because it eats up CPU 
like crazy just reading and identifying tuples.  It does seqscans ~ 25% faster 
if its from shared_buffers than from the OS's page cache though.   Seqscans are 
between 250MB/sec and 400MB/sec peak, from mem or disk, typically showing no 
more than 35% iostat utilization of the array if off disk -- so we run a few 
concurrently where we can.

In addition to the fadvise patch, postgres needs to merge adjacent I/O's into 
larger ones to reduce the overhead.  It only really needs to merge up to sizes 
of about 128k or 256k, and gain a 8x to 16x drop in syscall overhead, and 
additionally potentially save code trips down the shared buffer management code 
paths.  At lest, thats my guess I haven't looked at any code and could be wrong.


Additionally, the "If your operating system has any reasonable caching itself" 
comment earlier in this conversation ---  Linux (2.6.18, Centos 5.2) does NOT.  
I can easily make it spend 100% CPU in system time trying to figure out what to 
do with the system cache for an hour.  Just do large seqscans with memory 
pressure from work_mem or other forces that the OS will not deem 'idle'.  Once 
the requested memory is ~75% of the system total, it will freak out.  Linux 
simply will not give up that last 25% or so of the RAM for anything but page 
cache, even though the disk subsustem is very fast and most of the access is 
sequential, marginalizing the benefit of the cache.  Depending on how you tune 
it, it will either spin system cpu or swap storm, but the system cpu spin times 
for the same work load are a lot shorter than an equivalent swap storm.
Mount the data drive in O_DIRECT and the problem vanishes.  I've been told that 
this problem may be gone in some of the latest kernels.  I have seriously 
considered bumping shared_buffers up a lot and mounting the thing direct -- but 
then we lose the useful scheduler and readahead algorithms.  The other way 
around (small shared_buffers, let the OS do it) hurts performa

Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Joshua D. Drake
On Tue, 2008-12-09 at 18:27 +0200, Peter Eisentraut wrote:
> Aidan Van Dyk wrote:
> > * Joshua D. Drake <[EMAIL PROTECTED]> [081209 11:01]:
> >  
> >> Yes the SmartArray series is quite common and actually know to perform
> >> reasonably well, in RAID 10. You still appear to be trying RAID 5.
> > 
> > *boggle* 
> > 
> > Are people *still* using raid5?
> > 
> > /me gives up!
> 
> What do you suggest when there is not enough room for a RAID 10?

RAID 1.

Joshua D. Drake


> 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Joshua D. Drake
On Tue, 2008-12-09 at 09:25 -0700, Scott Marlowe wrote:
> On Tue, Dec 9, 2008 at 9:03 AM, Gabriele Turchi
> <[EMAIL PROTECTED]> wrote:
> > We reached a fairly good performance on a P400 controller (8 SATA 146GB 2,5"
> > 10k rpm) with raid5 or raid6 Linux software raid: the writing bandwidth
> > reached about 140 MB/s sustained throughput (the hardware raid5 gave a
> > sustained 20 MB/s...). With a second, equal controller (16 disks) we reached
> > (raid6 spanning all 16 disks) about 200 MB/s sustained.
> 
> That's better than you were getting but still quite slow.  I was
> bothered that my 12x15k4 SAS RAID-10 array could only sustain about
> 350Megs/second sequential read, thinking that each drive should be
> able to approach 80 or so megs/second and I was only getting about
> 60...
> 
> This sounds more and more like HP is trying to undercompete along with
> Dell in the RAID controller market or at least the raid controller
> driver market.

It is certainly possible. The 400 is the higher end of the lower end
with HP... 200, 400, 600, 800 (800 is a nice controller).

Joshua D. Drake


> 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Scott Marlowe
On Tue, Dec 9, 2008 at 9:37 AM, Scott Carey <[EMAIL PROTECTED]> wrote:

> As for tipping points and pg_bench -- It doesn't seem to reflect the kind of 
> workload we use postgres for at all, though my workload does a lot of big 
> hashes and seqscans, and I'm curious how much improved those may be due to 
> the hash improvements.  32GB RAM and 3TB data (about 250GB scanned regularly) 
> here.  And yes, we are almost completely CPU bound now except for a few 
> tasks.  Iostat only reports above 65% disk utilization for about 5% of the 
> workload duty-cycle, and is regularly < 20%.  COPY doesn't get anywhere near 
> platter speeds, on indexless bulk transfer.  The highest disk usage spikes 
> occur when some of our radom-access data/indexes get shoved out of cache.  
> These aren't too large, but high enough seqscan load will cause postgres and 
> the OS to dump them from cache.  If we put these on some SSD's the disk 
> utilization % would drop a lot further.

It definitely reflects our usage pattern, which is very random and
involves tiny bits of data scattered throughout the database.  Our
current database is about 20-25 Gig, which means it's quickly reaching
the point where it will not fit in our 32G of ram, and it's likely to
grow too big for 64Gig before a year or two is out.

> I feel confident in saying that in about a year, I could spec out a medium 
> sized budget for hardware ($25k) for almost any postgres setup and make it 
> almost pure CPU bound.
> SSDs and hybrid tech such as ZFS L2ARC make this possible with easy access to 
> 10k+ iops, and it it will take no more than 12 SATA drives in raid 10 next 
> year (and a good controller or software raid) to get 1GB/sec sequential reads.

Lucky you, having needs that are fulfilled by sequential reads.  :)

I wonder how many hard drives it would take to be CPU bound on random
access patterns?  About 40 to 60?  And probably 15k / SAS drives to
boot.  Cause that's what we're looking at in the next few years where
I work.

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Joshua D. Drake
On Tue, 2008-12-09 at 10:21 -0700, Scott Marlowe wrote:
> On Tue, Dec 9, 2008 at 9:37 AM, Scott Carey <[EMAIL PROTECTED]> wrote:

> Lucky you, having needs that are fulfilled by sequential reads.  :)
> 
> I wonder how many hard drives it would take to be CPU bound on random
> access patterns?  About 40 to 60?  And probably 15k / SAS drives to
> boot.  Cause that's what we're looking at in the next few years where
> I work.

I was able to achieve only 10-20% IO/Wait even after beating the heck
out of the machine with 50 spindles (of course it does have 16 CPUs):

http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/


> 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Scott Carey
> Lucky you, having needs that are fulfilled by sequential reads.  :)

> I wonder how many hard drives it would take to be CPU bound on random
> access patterns?  About 40 to 60?  And probably 15k / SAS drives to
> boot.  Cause that's what we're looking at in the next few years where
> I work.

About $3000 worth of Intel --- mainstream SSD's = 240GB space (6 in raid 10) 
today, 2x to 3x that storage area in 1 year.

Random reads are even easier, provided you don't need more than 500GB or so.

And with something like ZFS + L2ARC you can back your data with large slow iops 
disks and have cache access to data without requiring mirrors on the cache ($3k 
of ssds for that covers 2x the area, then).

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Matthew Wakeling

On Tue, 9 Dec 2008, Scott Marlowe wrote:

I wonder how many hard drives it would take to be CPU bound on random
access patterns?  About 40 to 60?  And probably 15k / SAS drives to
boot.  Cause that's what we're looking at in the next few years where
I work.


There's a problem with that thinking. That is, in order to exercise many 
spindles, you will need to have just as many (if not more) concurrent 
requests. And if you have many concurrent requests, then you can spread 
them over multiple CPUs. So it's more a case of "How many hard drives PER 
CPU". It also becomes a matter of whether Postgres can scale that well.


Matthew

--
Those who do not understand Unix are condemned to reinvent it, poorly.
   -- Henry Spencer

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Scott Marlowe
On Tue, Dec 9, 2008 at 10:35 AM, Matthew Wakeling <[EMAIL PROTECTED]> wrote:
> On Tue, 9 Dec 2008, Scott Marlowe wrote:
>>
>> I wonder how many hard drives it would take to be CPU bound on random
>> access patterns?  About 40 to 60?  And probably 15k / SAS drives to
>> boot.  Cause that's what we're looking at in the next few years where
>> I work.
>
> There's a problem with that thinking. That is, in order to exercise many
> spindles, you will need to have just as many (if not more) concurrent
> requests. And if you have many concurrent requests, then you can spread them
> over multiple CPUs. So it's more a case of "How many hard drives PER CPU".
> It also becomes a matter of whether Postgres can scale that well.

For us, all that is true.  We typically have a dozen or more
concurrent requests running at once.  We'll likely see that increase
linearly with our increase in users over the next year or so.  We
bought the machines with dual quad core opterons knowing the 6,8 and
12 core opterons were due out on the same socket design in the next
year or so and we could upgrade those too if needed.  PostgreSQL seems
to scale well in most tests I've seen to at least 16 cores, and after
that it's anyone's guess.  The Sparc Niagra seems capable of scaling
to 32 threads on 8 cores with pgsql 8.2 quite well.

I worry about the linux kernel scaling that well, and we might have to
look at open solaris or something like the solaris kernel under ubuntu
distro to get better scaling.

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Scott Carey
Let me re-phrase this.

For today, at 200GB or less of required space, and 500GB or less next year.

"Where we're going, we don't NEED spindles."


Seriously, go down to the store and get 6 X25-M's, they're as cheap as $550 
each and will be sub $500 soon.  These are more than sufficient for all but 
heavy write workloads (each can withstand ~600+ TB of writes in a lifetime, and 
SMART will tell you before they go).  6 in a RAID 10 will give you 750MB/sec 
read, and equivalent MB/sec in random reads.  I've tested them.  Random writes 
are very very fast too, faster than any SAS drive.
Put this in your current system, and you won't need to upgrade the RAM unless 
you need items in cache to reduce CPU load or need it for the work_mem space.

Spindles will soon be only for capacity and sequential access performance 
requirements.  Solid state will be for IOPS, and I would argue that for most 
Postgres installations, already is (now that the Intel SSD drive, which does 
random writes and read/write concurrency well, has arrived - more such next gen 
drives are on the way).


On 12/9/08 9:28 AM, "Scott Carey" <[EMAIL PROTECTED]> wrote:

> Lucky you, having needs that are fulfilled by sequential reads.  :)

> I wonder how many hard drives it would take to be CPU bound on random
> access patterns?  About 40 to 60?  And probably 15k / SAS drives to
> boot.  Cause that's what we're looking at in the next few years where
> I work.

About $3000 worth of Intel --- mainstream SSD's = 240GB space (6 in raid 10) 
today, 2x to 3x that storage area in 1 year.

Random reads are even easier, provided you don't need more than 500GB or so.

And with something like ZFS + L2ARC you can back your data with large slow iops 
disks and have cache access to data without requiring mirrors on the cache ($3k 
of ssds for that covers 2x the area, then).

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Scott Marlowe
On Tue, Dec 9, 2008 at 11:01 AM, Scott Carey <[EMAIL PROTECTED]> wrote:
> Let me re-phrase this.
>
> For today, at 200GB or less of required space, and 500GB or less next year.
>
> "Where we're going, we don't NEED spindles."

Those intel SSDs sound compelling.  I've been waiting for SSDs to get
competitive price and performance wise for a while, and when the
intels came out and I read the first benchmarks I immediately began
scheming.  Sadly, that was right after we're ordered our new 16 drive
servers, and I didn't have time to try something new and hope it would
work.  Now that the servers are up and running, we'll probably look at
adding the SSDs next summer before our high load period begins.

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Joshua D. Drake
On Tue, 2008-12-09 at 11:08 -0700, Scott Marlowe wrote:
> On Tue, Dec 9, 2008 at 11:01 AM, Scott Carey <[EMAIL PROTECTED]> wrote:
> > Let me re-phrase this.
> >
> > For today, at 200GB or less of required space, and 500GB or less next year.
> >
> > "Where we're going, we don't NEED spindles."
> 
> Those intel SSDs sound compelling.  I've been waiting for SSDs to get
> competitive price and performance wise for a while, and when the
> intels came out and I read the first benchmarks I immediately began
> scheming.  Sadly, that was right after we're ordered our new 16 drive
> servers, and I didn't have time to try something new and hope it would
> work.  Now that the servers are up and running, we'll probably look at
> adding the SSDs next summer before our high load period begins.
> 

The idea of SSDs is interesting. However I think I will wait for all the
other early adopters to figure out the problems before I start
suggesting them to clients.

Hard drives work, their cheap and fast. I can get 25 spindles, 15k in a
3U with controller and battery backed cache for <$10k.

Joshua D. Drake



-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Merlin Moncure
On Tue, Dec 9, 2008 at 1:11 PM, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
> Hard drives work, their cheap and fast. I can get 25 spindles, 15k in a
> 3U with controller and battery backed cache for <$10k.

While I agree with your general sentiments about early adoption, etc
(the intel ssd products are the first flash drives that appear to have
real promise in the enterprise), the numbers tell a different story.
A *single* X25-E will give similar sustained write IOPS as your tray
for far less price and a much better worst case read latency.  All
this without the 25 sets of whizzing ball bearings, painful spin-up
times, fanning, RAID controller firmware, and various other sundry
technologies to make the whole thing work.

The main issue that I see with flash SSD is if the promised wear
lifetimes are believable in high load environments and the mechanism
of failure (slowly degrade into read only) is accurate.

So, at least in relative terms, 15k sas drives are not 'fast'. They
are terribly, awfully, painfully slow.  They are also not cheap in
terms of $/IOPS.  The end is near.

merlin

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Joshua D. Drake
On Tue, 2008-12-09 at 15:07 -0500, Merlin Moncure wrote:
> On Tue, Dec 9, 2008 at 1:11 PM, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
> > Hard drives work, their cheap and fast. I can get 25 spindles, 15k in a
> > 3U with controller and battery backed cache for <$10k.
> 
> While I agree with your general sentiments about early adoption, etc
> (the intel ssd products are the first flash drives that appear to have
> real promise in the enterprise), the numbers tell a different story.

Oh I have read about them and I am excited. I am just saying that there
are plenty of people who can take advantage of the unknown without the
worry of the pain that can cause. My client, can't.

> 
> The main issue that I see with flash SSD is if the promised wear
> lifetimes are believable in high load environments and the mechanism
> of failure (slowly degrade into read only) is accurate.
> 

Right.

> So, at least in relative terms, 15k sas drives are not 'fast'. They
> are terribly, awfully, painfully slow.  They are also not cheap in
> terms of $/IOPS.  The end is near.
> 

No doubt about it. I give it 24 months tops.

Joshua D. Drake


> merlin
> 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Scott Carey
Which brings this back around to the point I care the most about:

I/O per second will diminish as the most common database performance limiting 
factor in Postgres 8.4's lifetime, and become almost irrelevant in 8.5's.
Becoming more CPU efficient will become very important, and for some, already 
is.  The community needs to be proactive on this front.
This turns a lot of old assumptions on their head, from the database down 
through the OS and filesystem.  We're bound to run into many surprises due to 
this major shift in something that has had its performance characteristics 
taken for granted for decades.

> On 12/9/08 12:20 PM, "Joshua D. Drake" <[EMAIL PROTECTED]> wrote:

> > So, at least in relative terms, 15k sas drives are not 'fast'. They
> > are terribly, awfully, painfully slow.  They are also not cheap in
> > terms of $/IOPS.  The end is near.
> >

> No doubt about it. I give it 24 months tops.
>
> Joshua D. Drake




[PERFORM] query plan with index having a btrim is different for strings of different length

2008-12-09 Thread Richard Yen

Hi,

I've discovered a peculiarity with using btrim in an index and was  
wondering if anyone has any input.


My table is like this:
 Table "public.m_object_paper"
   Column|  Type  |   Modifiers
-++
 id  | integer| not null
 title   | character varying(200) | not null
 x_firstname | character varying(50)  |
 x_lastname  | character varying(50)  |
<...snip...>
 page_count  | smallint   |
 compare_to_database | bit varying| not null
Indexes:
"m_object_paper_pkey" PRIMARY KEY, btree (id)
"last_name_fnc_idx" btree (lower(btrim(x_lastname::text)))
"m_object_paper_assignment_idx" btree (assignment)
"m_object_paper_owner_idx" btree (owner) CLUSTER
<...snip to end...>

My query is like this:
SELECT m_object_paper.id FROM m_object_paper, m_assignment WHERE  
m_object_paper.assignment = m_assignment.id AND  
m_object_paper.owner=-1 AND m_assignment.class = 2450798 AND  
lower(btrim(x_firstname)) = lower(btrim($FIRSTNAME)) and  
lower(btrim(x_lastname)) = lower(btrim($LASTNAME));


Strangely, if $LASTNAME is 5 chars, the query plan looks like this:
tii=# explain SELECT m_object_paper.id FROM m_object_paper,  
m_assignment WHERE m_object_paper.assignment = m_assignment.id AND  
m_object_paper.owner=-1 AND m_assignment.class = 2450798 AND  
lower(btrim(x_firstname)) = lower(btrim('Jordan')) and  
lower(btrim(x_lastname)) = lower(btrim('Smith'));

  QUERY PLAN
---
 Hash Join  (cost=181704.85..291551.77 rows=1 width=4)
   Hash Cond: (m_object_paper.assignment = m_assignment.id)
   ->  Bitmap Heap Scan on m_object_paper  (cost=181524.86..291369.66  
rows=562 width=8)
 Recheck Cond: ((lower(btrim((x_lastname)::text)) =  
'smith'::text) AND (owner = (-1)))

 Filter: (lower(btrim((x_firstname)::text)) = 'jordan'::text)
 ->  BitmapAnd  (cost=181524.86..181524.86 rows=112429 width=0)
   ->  Bitmap Index Scan on last_name_fnc_idx   
(cost=0.00..5468.29 rows=496740 width=0)
 Index Cond: (lower(btrim((x_lastname)::text)) =  
'smith'::text)
   ->  Bitmap Index Scan on m_object_paper_owner_idx   
(cost=0.00..176056.04 rows=16061244 width=0)

 Index Cond: (owner = (-1))
   ->  Hash  (cost=177.82..177.82 rows=174 width=4)
 ->  Index Scan using m_assignment_class_idx on m_assignment   
(cost=0.00..177.82 rows=174 width=4)

   Index Cond: (class = 2450798)
(13 rows)

However, if $LASTNAME is != 5 chars (1 char, 100 chars, doesn't make a  
difference), the query plan looks like this:
tii=# explain SELECT m_object_paper.id FROM m_object_paper,  
m_assignment WHERE m_object_paper.assignment = m_assignment.id AND  
m_object_paper.owner=-1 AND m_assignment.class = 2450798 AND  
lower(btrim(x_firstname)) = lower(btrim('Jordan')) and  
lower(btrim(x_lastname)) = lower(btrim('Smithers'));

QUERY PLAN
---
 Nested Loop  (cost=0.00..10141.06 rows=1 width=4)
   ->  Index Scan using last_name_fnc_idx on m_object_paper   
(cost=0.00..10114.24 rows=11 width=8)
 Index Cond: (lower(btrim((x_lastname)::text)) =  
'smithers'::text)
 Filter: ((owner = (-1)) AND  
(lower(btrim((x_firstname)::text)) = 'jordan'::text))
   ->  Index Scan using m_assignment_pkey on m_assignment   
(cost=0.00..2.43 rows=1 width=4)

 Index Cond: (m_assignment.id = m_object_paper.assignment)
 Filter: (m_assignment.class = 2450798)
(7 rows)

In practice, the difference is 300+ seconds when $LASTNAME == 5 chars  
and <1 second when $LASTNAME != 5 chars.


Would anyone know what's going on here?  Is there something about the  
way btrim works, or perhaps with the way indexes are created?  It's  
strange that the query plan would change for just one case ("Jones,"  
"Smith," "Brown," etc., all cause the query plan to use that extra  
heap scan).


Thanks for any help!
--Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Tom Lane
Scott Carey <[EMAIL PROTECTED]> writes:
> Which brings this back around to the point I care the most about:
> I/O per second will diminish as the most common database performance limiting 
> factor in Postgres 8.4's lifetime, and become almost irrelevant in 8.5's.
> Becoming more CPU efficient will become very important, and for some, already 
> is.  The community needs to be proactive on this front.
> This turns a lot of old assumptions on their head, from the database down 
> through the OS and filesystem.  We're bound to run into many surprises due to 
> this major shift in something that has had its performance characteristics 
> taken for granted for decades.

Hmm ... I wonder whether this means that the current work on
parallelizing I/O (the posix_fadvise patch in particular) is a dead
end.  Because what that is basically going to do is expend more CPU
to improve I/O efficiency.  If you believe this thesis then that's
not the road we want to go down.

regards, tom lane

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query plan with index having a btrim is different for strings of different length

2008-12-09 Thread David Wilson
On Tue, Dec 9, 2008 at 2:56 PM, Richard Yen <[EMAIL PROTECTED]> wrote:

> In practice, the difference is 300+ seconds when $LASTNAME == 5 chars and <1
> second when $LASTNAME != 5 chars.
>
> Would anyone know what's going on here?  Is there something about the way
> btrim works, or perhaps with the way indexes are created?  It's strange that
> the query plan would change for just one case ("Jones," "Smith," "Brown,"
> etc., all cause the query plan to use that extra heap scan).

Those are likely common names, and may be showing up in the table
stats as common values, causing the planner to change things around.
Does this hold even for non-existent 5-character lastname strings?

Speaking of table statistics, might be worth upping the statistics
target on that table/column, analyzing, and seeing if you get
different results.

-- 
- David T. Wilson
[EMAIL PROTECTED]

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Robert Haas
> Hmm ... I wonder whether this means that the current work on
> parallelizing I/O (the posix_fadvise patch in particular) is a dead
> end.  Because what that is basically going to do is expend more CPU
> to improve I/O efficiency.  If you believe this thesis then that's
> not the road we want to go down.

I don't believe the thesis.  The gap between disk speeds and memory
speeds may narrow over time, but I doubt it's likely to disappear
altogether any time soon, and certainly not for all users.

Besides which, I believe the CPU overhead of that patch is pretty darn
small when the feature is not enabled.

...Robert

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Joshua D. Drake
On Tue, 2008-12-09 at 17:38 -0500, Tom Lane wrote:
> Scott Carey <[EMAIL PROTECTED]> writes:
> > Which brings this back around to the point I care the most about:
> > I/O per second will diminish as the most common database performance 
> > limiting factor in Postgres 8.4's lifetime, and become almost irrelevant in 
> > 8.5's.
> > Becoming more CPU efficient will become very important, and for some, 
> > already is.  The community needs to be proactive on this front.
> > This turns a lot of old assumptions on their head, from the database down 
> > through the OS and filesystem.  We're bound to run into many surprises due 
> > to this major shift in something that has had its performance 
> > characteristics taken for granted for decades.
> 
> Hmm ... I wonder whether this means that the current work on
> parallelizing I/O (the posix_fadvise patch in particular) is a dead
> end.  Because what that is basically going to do is expend more CPU
> to improve I/O efficiency.  If you believe this thesis then that's
> not the road we want to go down.

The per cpu performance increase against the per I/O system increase
line is going to be vastly different. Anything that reduces overall I/O
is going to help (remember, you can never go too fast).

The idea that somehow I/O per second will diminish as the most common
database performance factor is IMO a pipe dream. Even as good as SSDs
are getting, they still have to go through the bus. Something CPUs are
better at (especially those CPUs that connect to memory directly without
the bus).

In 5 years maybe, in the next two postgresql releases, not likely. Not
to mention all of this is around the idea of a different class of
hardware than 99% of our community will be running.


Sincerely,

Joshua D. Drake


> 
>   regards, tom lane
> 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Ben Chobot

On Tue, 9 Dec 2008, Robert Haas wrote:


I don't believe the thesis.  The gap between disk speeds and memory
speeds may narrow over time, but I doubt it's likely to disappear
altogether any time soon, and certainly not for all users.


I think the "not for all users" is the critical part. In 2 years, we may 
(or may not) start using SSD instead of traditional drives for new 
installs, but we certainly won't be throwing out our existing servers any 
time soon just because something (much) better is now available.


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread justin

Tom Lane wrote:

Scott Carey <[EMAIL PROTECTED]> writes:
  

Which brings this back around to the point I care the most about:
I/O per second will diminish as the most common database performance limiting 
factor in Postgres 8.4's lifetime, and become almost irrelevant in 8.5's.
Becoming more CPU efficient will become very important, and for some, already 
is.  The community needs to be proactive on this front.
This turns a lot of old assumptions on their head, from the database down 
through the OS and filesystem.  We're bound to run into many surprises due to 
this major shift in something that has had its performance characteristics 
taken for granted for decades.



Hmm ... I wonder whether this means that the current work on
parallelizing I/O (the posix_fadvise patch in particular) is a dead
end.  Because what that is basically going to do is expend more CPU
to improve I/O efficiency.  If you believe this thesis then that's
not the road we want to go down.

regards, tom lane
  


What does the CPU/ Memory/Bus performance road map look like?

Is the IO performance for storage device for what ever it be, going to 
be on par with the above to cause this problem?


Once IO performance numbers start jumping up I think DBA will have the 
temptation start leaving more and more data in the production database  
instead of moving it out of the production database. Or start 
consolidating databases onto fewer servers .  Again pushing more load 
onto the IO. 


Re: [PERFORM] query plan with index having a btrim is different for strings of different length

2008-12-09 Thread Tom Lane
Richard Yen <[EMAIL PROTECTED]> writes:
> I've discovered a peculiarity with using btrim in an index and was  
> wondering if anyone has any input.

What PG version is this?

In particular, I'm wondering if it's one of the early 8.2.x releases,
which had some bugs in and around choose_bitmap_and() that caused
them to sometimes make weird choices of indexes in a BitmapAnd plan
structure ...

Like David, I'm pretty dubious that the behavior has anything to do with
strings being 5 characters long exactly; but it could very much depend
on whether the string you choose is a common last name or not.  That
would change the estimated number of matching rows and hence affect the
apparent relative attractiveness of different indexes.

regards, tom lane

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Scott Carey
Prefetch CPU cost should be rather low in the grand scheme of things, and does 
help performance even for very fast I/O.  I would not expect a very large CPU 
use increase from that sort of patch in the grand scheme of things - there is a 
lot that is more expensive to do on a per block basis.

There are two ways to look at non-I/O bound performance:
* Aggregate performance across many concurrent activities - here you want the 
least CPU used possible per action, and the least collisions on locks or shared 
data structures.  Using resources for as short of an interval as possible also 
helps a lot here.
* Single query performance, where you want to shorten the query time, perhaps 
at the cost of more average CPU.  Here, something like the fadvise stuff helps 
- as would any thread parallelism.  Perhaps less efficient in aggregate, but 
more efficient for a single query.

Overall CPU cost of accessing and reading data.  If this comes from disk, the 
big gains will be along the whole chain:  Driver to file system cache, file 
system cache to process, process specific tasks (cache eviction, placement, 
tracking), examining page tuples, locating tuples within pages, etc.   Anything 
that currently occurs on a per-block basis that could be done in a larger batch 
or set of blocks may be a big gain.  Another place that commonly consumes CPU 
in larger software projects is memory allocation if more advanced allocation 
techniques are not used.  I have no idea what Postgres uses here however.  I do 
know that commercial databases have extensive work in this area for 
performance, as well as reliability (harder to cause a leak, or easier to 
detect) and ease of use (don't have to even bother to free in certain contexts).

> On 12/9/08 2:58 PM, "Robert Haas" <[EMAIL PROTECTED]> wrote:

> I don't believe the thesis.  The gap between disk speeds and memory
> speeds may narrow over time, but I doubt it's likely to disappear
> altogether any time soon, and certainly not for all users.

Well, when select count(1) reads pages slower than my disk, its 16x + slower 
than my RAM.  Until one can demonstrate that the system can even read pages in 
RAM faster than what disks will do next year, it doesn't matter much that RAM 
is faster.   It does matter that RAM is faster for sorts, hashes, and other 
operations, but at the current time it does not for the raw pages themselves, 
from what I can measure.

This is in fact, central to my point.  Things will be CPU bound, not I/O bound. 
 It is mentioned that we still have to access things over the bus, and memory 
is faster, etc.  But Postgres is too CPU bound on page access to take advantage 
of the fact that memory is faster (for reading data pages).

The biggest change is not just that disks are getting closer to RAM, but that 
the random I/O penalty is diminishing significantly.  Low latencies makes 
seek-driven queries that used to consume mostly disk time consume CPU time 
instead.  High CPU costs for accessing pages makes a fast disk surprisingly 
close to RAM speed.

> Besides which, I believe the CPU overhead of that patch is pretty darn
> small when the feature is not enabled.

> ...Robert

I doubt it is much CPU, on or off.  It will help with SSD's when optimizing a 
single query, it may not help much if a system has enough 'natural' parallelism 
from other concurrent queries.  However there is a clear CPU benefit for 
getting individual queries out of the way faster, and occupying precious 
work_mem or other resources for a shorter time.  Occupying resources for a 
shorter period always translates to some CPU savings on a machine running at 
its limit with high concurrency.


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Gregory Stark
Scott Carey <[EMAIL PROTECTED]> writes:

> And as far as I can tell, even after the 8.4 fadvise patch, all I/O is in
> block_size chunks. (hopefully I am wrong) 
>...
> In addition to the fadvise patch, postgres needs to merge adjacent I/O's
> into larger ones to reduce the overhead. It only really needs to merge up to
> sizes of about 128k or 256k, and gain a 8x to 16x drop in syscall overhead,
> and additionally potentially save code trips down the shared buffer
> management code paths. At lest, thats my guess I haven't looked at any code
> and could be wrong.

There are a lot of assumptions here that I would be interested in seeing
experiments to back up.

FWIW when I was doing testing of posix_fadvise I did a *lot* of experiments
though only with a couple systems. One had a 3-drive array and one with a
15-drive array, both running Linux. I sometimes could speed up the sequential
scan by about 10% but not consistently. It was never more than about 15% shy
of the highest throughput from dd. And incidentally the throughput from dd
didn't seem to depend much at all on the blocksize.

On your system does "dd bs=8k" and "dd bs=128k" really have an 8x performance
difference?

In short, at least from the evidence available, this all seems like it might
be holdover beliefs from the olden days of sysadmining where syscalls were
much slower and OS filesystem caches much dumber.

I'm still interested in looking into it but I'll have to see actual vmstat or
iostat output while it's happening, preferably some oprofile results too. And
how many drives do you actually need to get into this situation. Also, what is
the output of "vacuum verbose" on the table?


> Additionally, the "If your operating system has any reasonable caching
> itself" comment earlier in this conversation --- Linux (2.6.18, Centos 5.2)
> does NOT. I can easily make it spend 100% CPU in system time trying to
> figure out what to do with the system cache for an hour. Just do large
> seqscans with memory pressure from work_mem or other forces that the OS will
> not deem 'idle'. Once the requested memory is ~75% of the system total, it
> will freak out. Linux simply will not give up that last 25% or so of the RAM
> for anything but page cache

This seems like just a misconfigured system. Linux and most Unixen definitely
expect to have a substantial portion of RAM dedicated to disk cache. Keep in
mind all your executable pages count towards this page cache too. You can
adjust this to some extent with the "swappiness" variable in Linux -- but I
doubt you'll be happy with the results regardless.

> The other way around (small shared_buffers, let the OS do it) hurts
> performance overall quite a bit -- randomly accessed pages get pushed out to
> the OS cache more often, and the OS tosses thouse out when a big seqscan
> occurs, resulting in a lot more random access from disk and more disk bound
> periods of time. Great wonder, this operating system caching, eh?

How do you observe this?


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Gregory Stark

Matthew Wakeling <[EMAIL PROTECTED]> writes:

> On Tue, 9 Dec 2008, Scott Marlowe wrote:
>> I wonder how many hard drives it would take to be CPU bound on random
>> access patterns?  About 40 to 60?  And probably 15k / SAS drives to
>> boot.  Cause that's what we're looking at in the next few years where
>> I work.
>
> There's a problem with that thinking. That is, in order to exercise many
> spindles, you will need to have just as many (if not more) concurrent 
> requests.
> And if you have many concurrent requests, then you can spread them over
> multiple CPUs. So it's more a case of "How many hard drives PER CPU". It also
> becomes a matter of whether Postgres can scale that well.

Well:

$ units
2445 units, 71 prefixes, 33 nonlinear units
You have: 8192 byte/5ms
You want: MB/s
* 1.6384
/ 0.61035156

At 1.6MB/s per drive if find Postgres is cpu-bound doing sequential scans at
1GB/s you'll need about 640 drives to keep one cpu satisfied doing random I/O
-- assuming you have perfect read-ahead and the read-ahead itself doesn't add
cpu overhead. Both of which are false of course, but at least in theory that's
what it'll take.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Ron Mayer

Tom Lane wrote:

Scott Carey <[EMAIL PROTECTED]> writes:

Which brings this back around to the point I care the most about:
I/O per second will diminish as the most common database performance limiting 
factor in Postgres 8.4's lifetime, and become almost irrelevant in 8.5's.
Becoming more CPU efficient will become very important, and for some, already 
is.  The community needs to be proactive on this front.
This turns a lot of old assumptions on their head, from the database down 
through the OS and filesystem.  We're bound to run into many surprises due to 
this major shift in something that has had its performance characteristics 
taken for granted for decades.


Hmm ... I wonder whether this means that the current work on
parallelizing I/O (the posix_fadvise patch in particular) is a dead
end.  Because what that is basically going to do is expend more CPU
to improve I/O efficiency.  If you believe this thesis then that's
not the road we want to go down.



I imagine the larger postgres installations will still benefit from
this patch - because I imagine they will stay on hard disks for
quite some time; simply because the cost of 70TB of disks seems like
it'll be lower than RAM for at least the intermediate term.

I imagine the smaller postgres installations will also still benefit
from this patch - because my postgres installations with the most
painful I/O bottlenecks are small virtual machines without dedicated
drives where I/O (I guess emulated by the virtual machine software)
is very painful.

Perhaps there's a mid-sized system that won't benefit from fadvise()
in the intermediate term -- where the size of the database is about
the same size as a cost-effective flash drive -- but I don't have
any databases in that range now.


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Craig James

justin wrote:

Tom Lane wrote:

Hmm ... I wonder whether this means that the current work on
parallelizing I/O (the posix_fadvise patch in particular) is a dead
end.  Because what that is basically going to do is expend more CPU
to improve I/O efficiency.  If you believe this thesis then that's
not the road we want to go down.

regards, tom lane


What does the CPU/ Memory/Bus performance road map look like?

Is the IO performance for storage device for what ever it be, going to 
be on par with the above to cause this problem?


Flash memory will become just a fourth layer in the memory caching system 
(on-board CPU, high-speed secondary cache, main memory, and persistent memory). 
 The idea of external storage will probably disappear altogether -- computers 
will just have memory, and won't forget anything when you turn them off.  Since 
most computers are 64 bits these days, all data and programs will just hang out 
in memory at all times, and be directly addressable by the CPU.

The distinction between disk and memory arose from the fact that disks were large, slow 
devices relative to "core" memory and had to be connected by long wires, hence 
the need for I/O subsystems.  As flash memory becomes mainstream, I expect this 
distinction to disappear.

Craig

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Robert Haas
> Well, when select count(1) reads pages slower than my disk, its 16x + slower
> than my RAM.  Until one can demonstrate that the system can even read pages
> in RAM faster than what disks will do next year, it doesn't matter much that
> RAM is faster.   It does matter that RAM is faster for sorts, hashes, and
> other operations, but at the current time it does not for the raw pages
> themselves, from what I can measure.
>
> This is in fact, central to my point.  Things will be CPU bound, not I/O
> bound.  It is mentioned that we still have to access things over the bus,
> and memory is faster, etc.  But Postgres is too CPU bound on page access to
> take advantage of the fact that memory is faster (for reading data pages).

As I understand it, a big part of the reason for the posix_fadvise
patch is that the current system doesn't do a good job leveraging many
spindles in the service of a single query.  So the problem is not that
the CPU overhead is too large in some general sense but that the disk
and CPU operations get serialized, leading to an overall loss of
performance.  On the other hand, there are certainly cases (such as a
database which is entirely in RAM, or all the commonly used parts are
in RAM) where there really isn't very much I/O, and in those cases of
course the CPU cost will dominate.

...Robert

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Degenerate Performance Problem

2008-12-09 Thread Vincent Predoehl
I have postgresql 8.3.5 installed on MacOS X / Darwin.  I remember  
setting shared memory buffer parameters and that solved the initial  
performance problem, but after running several tests, the performance  
goes way, way down. Restarting the server doesn't seem to help.


I'm using pqxx to access the database, if that makes any difference.

--
Vincent





Re: [PERFORM] Degenerate Performance Problem

2008-12-09 Thread Scott Marlowe
On Tue, Dec 9, 2008 at 8:16 PM, Vincent Predoehl
<[EMAIL PROTECTED]> wrote:
> I have postgresql 8.3.5 installed on MacOS X / Darwin.  I remember setting
> shared memory buffer parameters and that solved the initial performance
> problem, but after running several tests, the performance goes way, way
> down. Restarting the server doesn't seem to help.
> I'm using pqxx to access the database, if that makes any difference.

Could be a vacuuming issue.

What does

vacuum verbose;

on the database say?

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Scott Carey
Just to clarify, I'm not talking about random I/O bound loads today, on hard 
drives, targetted by the fadvise stuff - these aren't CPU bound, and they will 
be helped by it.

For sequential scans, this situation is different, since the OS has sufficient 
read-ahead prefetching algorithms of its own for sequential reads, and the CPU 
work and I/O work ends up happening in parallel due to that.

For what it is worth, you can roughly double to triple the iops of an Intel 
X-25M on pure random reads if you queue up multiple concurrent reads rather 
than serialize them.  But it is not due to spindles, it is due to the latency 
of the SATA interface and the ability of the controller chip to issue reads to 
flash devices on different banks concurrently to some extent.


On 12/9/08 7:06 PM, "Robert Haas" <[EMAIL PROTECTED]> wrote:

> Well, when select count(1) reads pages slower than my disk, its 16x + slower
> than my RAM.  Until one can demonstrate that the system can even read pages
> in RAM faster than what disks will do next year, it doesn't matter much that
> RAM is faster.   It does matter that RAM is faster for sorts, hashes, and
> other operations, but at the current time it does not for the raw pages
> themselves, from what I can measure.
>
> This is in fact, central to my point.  Things will be CPU bound, not I/O
> bound.  It is mentioned that we still have to access things over the bus,
> and memory is faster, etc.  But Postgres is too CPU bound on page access to
> take advantage of the fact that memory is faster (for reading data pages).

As I understand it, a big part of the reason for the posix_fadvise
patch is that the current system doesn't do a good job leveraging many
spindles in the service of a single query.  So the problem is not that
the CPU overhead is too large in some general sense but that the disk
and CPU operations get serialized, leading to an overall loss of
performance.  On the other hand, there are certainly cases (such as a
database which is entirely in RAM, or all the commonly used parts are
in RAM) where there really isn't very much I/O, and in those cases of
course the CPU cost will dominate.

...Robert



Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Mario Weilguni

Scott Marlowe schrieb:

On Tue, Dec 9, 2008 at 5:17 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote:
  

Alan Hodgson schrieb:


Mario Weilguni <[EMAIL PROTECTED]> wrote:

  

strange values. An individual drive is capable of delivering 91
MB/sec
 sequential read performance, and we get values ~102MB/sec out of a
8-drive RAID5, seems to be ridiculous slow.
  

What command are you using to test the reads?

Some recommendations to try:

1) /sbin/blockdev --setra 2048 device (where device is the partition or
LVM volume)

2) Use XFS, and make sure your stripe settings match the RAID.

Having said that, 102MB/sec sounds really low for any modern controller
with 8 drives, regardless of tuning or filesystem choice.


  

First, thanks alot for this and all the other answers.

I measured the raw device performance:
dd if=/dev/cciss/c0d0 bs=64k count=10 of=/dev/null

I get poor performance when all 8 drives are configured as one, large
RAID-5, and slightly poorer performance when configured as JBOD. In
production, we use XFS as FS, but I doubt this has anything to do with FS
tuning.



Yeah, having just trawled the pgsql-performance archives, there are
plenty of instances of people having terrible performance from HP
smart array controllers before the P800.  Is it possible for you to
trade up to a better RAID controller?  Whichever salesman sold you the
P400 should take one for the team and make this right for you.

  
A customer of us uses the P400 on a different machine, 8 SAS drives 
(Raid 5 as well), and the performance is very, very good. So we thought 
it's a good choice. Maybe the SATA drives are the root of this problem?




--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Scott Carey
It could be the drives, it could be a particular interaction between them and 
the drivers or firmware.

Do you know if NCQ is activated for them?
Can you test a single drive JBOD through the array to the same drive through 
something else, perhaps the motherboard's SATA port?

You may also have better luck with software raid-0 on top of 2 4 disk raid 5's 
or raid 10s.  But not if a single disk JBOD still performs well under par.  You 
may need new drivers for the card, or firmware for the drive  and or card.  Or, 
the card may simply be incompatible with those drives.  I've seen several hard 
drive - raid card incompatibilities before.



On 12/9/08 11:45 PM, "Mario Weilguni" <[EMAIL PROTECTED]> wrote:

Scott Marlowe schrieb:
> On Tue, Dec 9, 2008 at 5:17 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote:
>
>> Alan Hodgson schrieb:
>>
>>> Mario Weilguni <[EMAIL PROTECTED]> wrote:
>>>
>>>
> strange values. An individual drive is capable of delivering 91
> MB/sec
>  sequential read performance, and we get values ~102MB/sec out of a
> 8-drive RAID5, seems to be ridiculous slow.
>
>>> What command are you using to test the reads?
>>>
>>> Some recommendations to try:
>>>
>>> 1) /sbin/blockdev --setra 2048 device (where device is the partition or
>>> LVM volume)
>>>
>>> 2) Use XFS, and make sure your stripe settings match the RAID.
>>>
>>> Having said that, 102MB/sec sounds really low for any modern controller
>>> with 8 drives, regardless of tuning or filesystem choice.
>>>
>>>
>>>
>> First, thanks alot for this and all the other answers.
>>
>> I measured the raw device performance:
>> dd if=/dev/cciss/c0d0 bs=64k count=10 of=/dev/null
>>
>> I get poor performance when all 8 drives are configured as one, large
>> RAID-5, and slightly poorer performance when configured as JBOD. In
>> production, we use XFS as FS, but I doubt this has anything to do with FS
>> tuning.
>>
>
> Yeah, having just trawled the pgsql-performance archives, there are
> plenty of instances of people having terrible performance from HP
> smart array controllers before the P800.  Is it possible for you to
> trade up to a better RAID controller?  Whichever salesman sold you the
> P400 should take one for the team and make this right for you.
>
>
A customer of us uses the P400 on a different machine, 8 SAS drives
(Raid 5 as well), and the performance is very, very good. So we thought
it's a good choice. Maybe the SATA drives are the root of this problem?



--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance