Re: [PERFORM] What is the max number of database I can create in an instance of pgsql?

2005-11-18 Thread Jaime Casanova
On 11/18/05, anon permutation <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> We want to create a database for each one of our departments, but we only
> want to have one instance of postgresql running.  There are about 10-20
> departments.  I can easily use createdb to create these databases.  However,
>

After of doing this, you have to think if you will want to make querys
across the info of some or all databases (and you will) if that is the
case the better you can do is create schemas instead of databases...

> what is the max number of database I can create before performance goes
> down?
>

the problem isn't about number of databases but concurrent users...
after all you will have the same resources for 1 or 100 databases, the
important thing is the number of users, the amount of data normal
users will process in a normal day, and complexity of your queries.

> Assuming each database is performing well alone, how would putting 10-20 of
>
> them together in one instance affect postgres?
>
> In terms of getting a new server for this project, how do I gauge how
> powerful of a server should I get?
>
> Thanks.
>
>


--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] What is the max number of database I can create in an instance of pgsql?

2005-11-18 Thread Michael Glaesemann


On Nov 19, 2005, at 12:24 , anon permutation wrote:

However, what is the max number of database I can create before  
performance goes down?


Assuming each database is performing well alone, how would putting  
10-20 of them together in one instance affect postgres?


In terms of getting a new server for this project, how do I gauge  
how powerful of a server should I get?


I'm sure those wiser than me will chime in with specifics. I think  
you should be think of usage not in terms of number of databases but  
in terms of connections rates, database size (numbers of tables and  
tuples) and the types of queries that will be run. While there may be  
a little overhead in from having a number of databases in the  
cluster, I think this is probably going to be insignificant in  
comparison to these other factors. A better idea of what the usage  
will guide you in choosing your hardware.



Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] What is the max number of database I can create in an instance of pgsql?

2005-11-18 Thread anon permutation


Hi,

We want to create a database for each one of our departments, but we only 
want to have one instance of postgresql running.  There are about 10-20 
departments.  I can easily use createdb to create these databases.  However, 
what is the max number of database I can create before performance goes 
down?


Assuming each database is performing well alone, how would putting 10-20 of 
them together in one instance affect postgres?


In terms of getting a new server for this project, how do I gauge how 
powerful of a server should I get?


Thanks.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Mark Kirkwood

Luke Lonergan wrote:

Mark,

On 11/18/05 3:46 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote:

If you alter this to involve more complex joins (e.g 4. way star) and
(maybe add a small number of concurrent executors too) - is it still the
case?


I may not have listened to you - are you asking about whether the 
readahead works for these cases?


I’ll be running some massive TPC-H benchmarks on these machines soon – 
we’ll see then.



That too, meaning the business of 1 executor random reading a given 
relation file whilst another is sequentially scanning (some other) part 
of it


Cheers

Mark

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Luke Lonergan
Title: Re: [PERFORM] Hardware/OS recommendations for large databases (



Mark,

On 11/18/05 3:46 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote:

If you alter this to involve more complex joins (e.g 4. way star) and
(maybe add a small number of concurrent executors too) - is it still the
case?

I may not have listened to you - are you asking about whether the readahead works for these cases?

I’ll be running some massive TPC-H benchmarks on these machines soon – we’ll see then.

- Luke





Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Luke Lonergan
Mark,

On 11/18/05 3:46 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote:
 
> If you alter this to involve more complex joins (e.g 4. way star) and
> (maybe add a small number of concurrent executors too) - is it still the
> case?

4-way star, same result, that's part of my point.  With Bizgres MPP, the
4-way star uses 4 concurrent scanners, though not all are active all the
time.  And that's per segment instance - we normally use one segment
instance per CPU, so our concurrency is NCPUs plus some.

The trick is the "small number of concurrent executors" part.  The only way
to get this with normal postgres is to have concurrent users, and normally
they are doing different things, scanning different parts of the disk.
These are competing things, and for concurrency enhancement something like
"sync scan" would be an effective optimization.

But in reporting, business analytics and warehousing in general, there are
reports that take hours to run.  If you can knock that down by factors of 10
using parallelism, it's a big win.  That's the reason that Teradata did $1.5
Billion in business last year.

More importantly - that's the kind of work that everyone using internet data
for analytics wants right now.

- Luke



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] ERROR: no value found for parameter 1 with JDBC and Explain Analyze

2005-11-18 Thread Tom Lane
"Virag Saksena" <[EMAIL PROTECTED]> writes:
> ERROR: no value found for parameter 1

> Here is sample code which causes this exception ...
>   pst=prodconn.prepareStatement("explain analyze select count(*) from 
> jam_heaprel r where heap_id = ? and parentaddr = ?");

I don't think EXPLAIN can take parameters (most of the "utility"
statements don't take parameters).

The usual workaround is to use PREPARE:

PREPARE foo(paramtype,paramtype) AS SELECT ...;
EXPLAIN EXECUTE foo(x,y);

This will generate the same parameterized plan as you'd get from the
other way, so it's a reasonable approximation to the behavior with
JDBC parameters.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Mark Kirkwood

Luke Lonergan wrote:

(mass snippage) 
time psql -c "select count(*) from ivp.bigtable1" dgtestdb

[EMAIL PROTECTED] IVP]$ cat sysout3
  count   
--

 8000
(1 row)


real1m9.875s
user0m0.000s
sys 0m0.004s
[EMAIL PROTECTED] IVP]$ !du
du -sk dgtestdb/base
17021260dgtestdb/base


Summary:

 OK – you can get more I/O bandwidth out of the current 
I/O path for sequential scan if you tune the filesystem for large 
readahead.  This is a cheap alternative to overhauling the executor to 
use asynch I/O.


Still, there is a CPU limit here – this is not I/O bound, it is CPU 
limited as evidenced by the sensitivity to readahead settings.   If the 
filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s.





Luke,

Interesting - but possibly only representative for a workload consisting 
entirely of one executor doing "SELECT ... FROM my_single_table".


If you alter this to involve more complex joins (e.g 4. way star) and 
(maybe add a small number of concurrent executors too) - is it still the 
case?


Cheers

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] sort/limit across union all

2005-11-18 Thread Marc Morin



We have a large DB 
with partitioned tables in postgres.   We have had trouble with a 
ORDER/LIMIT type query.  The order and limit are not pushed down to the 
sub-tables
 
CREATE TABLE base 
(
    foo int 
);
 
CREATE TABLE 
bar_0
    extra int
) INHERITS 
(base);
ALTER TABLE bar ADD 
PRIMARY KEY (foo);
 
-- repeated for 
bar_0... bar_40
 
SELECT foo FROM base 
ORDER BY foo LIMIT 10;
 
is real slow. What 
is required to make the query planner generate the following instead... (code 
change i know, but how hard would it be?)
 
SELECT
    foo
FROM
(
    
SELECT
    
*
    FROM bar_0
    ORDER BY foo LIMIT 
10
UNION ALL
    SELECT
    
*
    FROM bar_1
    ORDER BY foo LIMIT 
10

) AS base
ORDER BY foo
LIMIT 10;
 
 


[PERFORM] ERROR: no value found for parameter 1 with JDBC and Explain Analyze

2005-11-18 Thread Virag Saksena

Hi,
I am trying to use Explain Analyze to trace a slow SQL statement called from 
JDBC.
The SQL statement with the parameters taked 11 seconds. When I run a explain 
analyze from psql, it takes < 50 ms with a reasonable explain plan. However 
when I try to run an explain analyze from JDBC with the parameters, I get 
error :

ERROR: no value found for parameter 1

Here is sample code which causes this exception ...
 pst=prodconn.prepareStatement("explain analyze select count(*) from 
jam_heaprel r where heap_id = ? and parentaddr = ?");

 pst.setInt(1,1);
 pst.setInt(2,0);
 rs=pst.executeQuery();

java.sql.SQLException: ERROR: no value found for parameter 1
	at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1471)
	at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1256)
	at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:175)
	at 
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:389)
	at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:330)
	at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:240)

at jsp._testexplain_2ejsp._jspService(_testexplain_2ejsp.java:82)
at org.gjt.jsp.HttpJspPageImpl.service(HttpJspPageImpl.java:75)

Regards,

Virag



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Help speeding up delete

2005-11-18 Thread Leigh Dyer

Steve Wampler wrote:


Is the performance behavior that we're experiencing a known
problem with 7.2 that has been addressed in 7.4?  Or will the
upgrade fix other problems while leaving this one?


I'm pretty sure that in versions earlier than 7.4, IN clauses that use a 
subquery will always use a seqscan, regardless of what indexes are 
available. If you try an IN using explicit values though, it should use 
the index.


Thanks
Leigh

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Hardware/OS recommendations for large databases

2005-11-18 Thread Ron
Breaking the ~120MBps pg IO ceiling by any means 
is an important result.  Particularly when you 
get a ~2x improvement.  I'm curious how far we 
can get using simple approaches like this.


At 10:13 AM 11/18/2005, Luke Lonergan wrote:

Dave,

On 11/18/05 5:00 AM, "Dave Cramer" <[EMAIL PROTECTED]> wrote:
>
> Now there's an interesting line drawn in the sand. I presume you have
> numbers to back this up ?
>
> This should draw some interesting posts.

Part 2: The answer

System A:
This system is running RedHat 3 Update 4, with a Fedora 2.6.10 Linux kernel.

On a single table with 15 columns (the Bizgres 
IVP) at a size double memory (2.12GB), Postgres 
8.0.3 with Bizgres enhancements takes 32 seconds 
to scan the table: that’s 66 MB/s.  Not the 
efficiency I’d hope from the onboard SATA 
controller that I’d like, I would have expected 
to get 85% of the 100MB/s raw read performance.
Have you tried the large read ahead trick with 
this system?  It would be interesting to see how 
much it would help.  It might even be worth it to 
do the experiment at all of [default, 2x default, 
4x default, 8x default, etc] read ahead until 
either a) you run out of resources to support the 
desired read ahead, or b) performance levels 
off.  I can imagine the results being very enlightening.




System B:
This system is running an XFS filesystem, and 
has been tuned to use very large (16MB) 
readahead.  It’s running the Centos 4.1 distro, 
which uses a Linux 2.6.9 kernel.


Same test as above, but with 17GB of data takes 
69.7 seconds to scan (!)  That’s 244.2MB/s, 
which is obviously double my earlier point of 
110-120MB/s.  This system is running with a 16MB 
Linux readahead setting, let’s try it with the 
default (I think) setting of 256KB – AHA! Now we get 171.4 seconds or 99.3MB/s.

The above experiment would seem useful here as well.



Summary:

 OK – you can get more I/O 
bandwidth out of the current I/O path for 
sequential scan if you tune the filesystem for 
large readahead.  This is a cheap alternative to 
overhauling the executor to use asynch I/O.


Still, there is a CPU limit here – this is not 
I/O bound, it is CPU limited as evidenced by the 
sensitivity to readahead settings.   If the 
filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s.


- Luke


I respect your honesty in reporting results that 
were different then your expectations or 
previously taken stance.  Alan Stange's comment 
re: the use of direct IO along with your comments 
re: async IO and mem copies plus the results of 
these experiments could very well point us 
directly at how to most easily solve pg's CPU boundness during IO.


[HACKERS] are you watching this?

Ron



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Dave Cramer
Luke,Interesting numbers. I'm a little concerned about the use of blockdev —setra 16384. If I understand this correctly it assumes that the table is contiguous on the disk does it not ?DaveOn 18-Nov-05, at 10:13 AM, Luke Lonergan wrote: Dave,  On 11/18/05 5:00 AM, "Dave Cramer" <[EMAIL PROTECTED]> wrote: >  > Now there's an interesting line drawn in the sand. I presume you have  > numbers to back this up ? >  > This should draw some interesting posts.  Part 2: The answer  System A: This system is running RedHat 3 Update 4, with a Fedora 2.6.10 Linux kernel.  On a single table with 15 columns (the Bizgres IVP) at a size double memory (2.12GB), Postgres 8.0.3 with Bizgres enhancements takes 32 seconds to scan the table: that’s 66 MB/s.  Not the efficiency I’d hope from the onboard SATA controller that I’d like, I would have expected to get 85% of the 100MB/s raw read performance.  So that’s $1,200 / 66 MB/s (without adjusting for 2003 price versus now) = 18.2 $/MB/s  Raw data: [EMAIL PROTECTED] IVP]$ cat scan.sh  #!/bin/bash  time psql -c "select count(*) from ivp.bigtable1" dgtestdb [EMAIL PROTECTED] IVP]$ cat sysout1   count    --  1000 (1 row)   real    0m32.565s user    0m0.002s sys 0m0.003s  Size of the table data: [EMAIL PROTECTED] IVP]$ du -sk dgtestdb/base 2121648 dgtestdb/base  System B: This system is running an XFS filesystem, and has been tuned to use very large (16MB) readahead.  It’s running the Centos 4.1 distro, which uses a Linux 2.6.9 kernel.  Same test as above, but with 17GB of data takes 69.7 seconds to scan (!)  That’s 244.2MB/s, which is obviously double my earlier point of 110-120MB/s.  This system is running with a 16MB Linux readahead setting, let’s try it with the default (I think) setting of 256KB – AHA! Now we get 171.4 seconds or 99.3MB/s.  So, using the tuned setting of “blockdev —setra 16384” we get $6,000 / 244MB/s = 24.6 $/MB/s If we use the default Linux setting it’s 2.5x worse.  Raw data: [EMAIL PROTECTED] IVP]$ cat scan.sh  #!/bin/bash  time psql -c "select count(*) from ivp.bigtable1" dgtestdb [EMAIL PROTECTED] IVP]$ cat sysout3   count    --  8000 (1 row)   real    1m9.875s user    0m0.000s sys 0m0.004s [EMAIL PROTECTED] IVP]$ !du du -sk dgtestdb/base 17021260    dgtestdb/base  Summary:   OK – you can get more I/O bandwidth out of the current I/O path for sequential scan if you tune the filesystem for large readahead.  This is a cheap alternative to overhauling the executor to use asynch I/O.  Still, there is a CPU limit here – this is not I/O bound, it is CPU limited as evidenced by the sensitivity to readahead settings.   If the filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s.  - Luke  

Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Luke Lonergan
Title: Re: [PERFORM] Hardware/OS recommendations for large databases (



Alan,

On 11/18/05 8:13 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote:

I told you in my initial post that I was observing numbers in excess of
what you claiming, but you seemed to think I didn't know how to measure
an IO rate.

Prove me wrong, post your data.

I should note too that our system uses about 20% of a single cpu when
performing a table scan at >100MB/s of IO. I think you claimed the
system would be cpu bound at this low IO rate.

See above.

- Luke








Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Luke Lonergan
Title: Re: [PERFORM] Hardware/OS recommendations for large databases (



Vivek,

On 11/18/05 8:07 AM, "Vivek Khera" <[EMAIL PROTECTED]> wrote:


On Nov 18, 2005, at 10:13 AM, Luke Lonergan wrote:

Still, there is a CPU limit here – this is not I/O bound, it is CPU limited as evidenced by the sensitivity to readahead settings.   If the filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s.

Yeah, and mysql would probably be faster on your trivial queries.  Try concurrent large joins and updates and see which system is faster.

That’s what we do to make a living.  And it’s Oracle that a lot faster because they implemented a much tighter, optimized I/O path to disk than Postgres.

Since you asked, we bought the 5 systems as a cluster – and with Bizgres MPP we get close to 400MB/s per machine on complex queries.

- Luke  





Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Alan Stange

Greg Stark wrote:

Alan Stange <[EMAIL PROTECTED]> writes:

  

Luke Lonergan wrote:


Alan,

On 11/18/05 9:31 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote:


  

Here's the output from one iteration of iostat -k 60 while the box is
doing a select count(1) on a 238GB table.

avg-cpu:  %user   %nice%sys %iowait   %idle
   0.990.00   17.97   32.40   48.64

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sdd 345.95130732.53 0.007843952  0

We're reading 130MB/s for a full minute.  About 20% of a single cpu was
being used.   The remainder being idle.



Cool - thanks for the results.  Is that % of one CPU, or of 2?  Was the
system otherwise idle?

  
Actually, this was dual cpu 



I hate to agree with him but that looks like a dual machine with one CPU
pegged. Yes most of the time is being spent in the kernel, but you're still
basically cpu limited.

That said, 130MB/s is nothing to sneeze at, that's maxing out two high end
drives and quite respectable for a 3-disk stripe set, even reasonable for a
4-disk stripe set. If you're using 5 or more disks in RAID-0 or RAID 1+0 and
only getting 130MB/s then it does seem likely the cpu is actually holding you
back here.

Still it doesn't show Postgres being nearly so CPU wasteful as the original
poster claimed.
  
Yes and no.   The one cpu is clearly idle.   The second cpu is 40% busy 
and 60% idle (aka iowait in the above numbers).
Of that 40%, other things were happening as well during the 1 minute 
snapshot.   During some iostat outputs that I didn't post the cpu time 
was ~ 20%.


So, you can take your pick.   The single cpu usage is somewhere between 
20% and 40%.  As I can't remove other users of the system, it's the best 
measurement that I can make right now.


Either way, it's not close to being cpu bound.  This is with Opteron 
248, 2.2Ghz cpus.


Note that the storage system has been a bit disappointing:  it's an IBM 
Fast T600 with a 200MB/s fiber attachment.   It could be better, but 
it's not been the bottleneck in our work, so we haven't put any energy 
into it.  


It's all in the kernel either way; using a different scheduler or file
system would change that result. Even better would be using direct IO to not
flush everything else from memory and avoid some memory copies from kernel
to user space. Note that almost none of the time is user time. Changing
postgresql won't change the cpu useage.


Well changing to direct i/o would still be changing Postgres so that's
unclear. And there are plenty of more mundane ways that Postgres is
responsible for how efficiently or not the kernel is used. Just using fewer
syscalls to do the same amount of reading would reduce cpu consumption.
Absolutely.  This is why we're using a 32KB block size and also switched 
to using O_SYNC for the WAL syncing method. That's many MB/s that 
don't need to be cached in the kernel (thus evicting other data), and we 
avoid all the fysnc/fdatasync syscalls.


The purpose of direct IO isn't to make the vacuum or analyze faster, but 
to lessen their impact on queries with someone waiting for the 
results.   That's our biggest hit:  running a sequential scan on 240GB 
of data and flushing everything else out of memory.


Now that I'm think about this a bit, a big chunk of time is probably 
being lost in TLB misses and other virtual memory events that would be 
avoided if a larger page size was being used.


-- Alan 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Alan Stange

Luke Lonergan wrote:

opterons from Sun that we got some time ago.   I think the 130MB/s is
slow given the hardware, but it's acceptable.  I'm not too price
sensitive; I care much more about reliability, uptime, etc.


I don't know what the system cost. It was part of block of dual

Then I know what they cost - we have them too (V20z and V40z).  You should
be getting 400MB/s+ with external RAID.
Yes, but we don't.   This is where I would normally begin a rant on how 
craptacular Linux can be at times.  But, for the sake of this 
discussion, postgresql isn't reading the data any more slowly than does 
any other program.


And we don't have the time to experiment with the box.

I know it should be better, but it's good enough for our purposes at 
this time.


-- Alan


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Luke Lonergan
Greg,

On 11/18/05 11:07 AM, "Greg Stark" <[EMAIL PROTECTED]> wrote:

> That said, 130MB/s is nothing to sneeze at, that's maxing out two high end
> drives and quite respectable for a 3-disk stripe set, even reasonable for a
> 4-disk stripe set. If you're using 5 or more disks in RAID-0 or RAID 1+0 and
> only getting 130MB/s then it does seem likely the cpu is actually holding you
> back here.

With an FC array, it's undoubtedly more like 14 drives, in which case
130MB/s is laughable.  On the other hand, I wouldn't be surprised if it were
a single 200MB/s Fibre Channel attachment.

It does make you wonder why people keep recommending 15K RPM drives, like it
would help *not*.

> Still it doesn't show Postgres being nearly so CPU wasteful as the original
> poster claimed.

It's partly about waste, and partly about lack of a concurrent I/O
mechanism.  We've profiled it for the waste, we've implemented concurrent
I/O to prove the other point.
 
>> It's all in the kernel either way; using a different scheduler or file
>> system would change that result. Even better would be using direct IO to not
>> flush everything else from memory and avoid some memory copies from kernel
>> to user space. Note that almost none of the time is user time. Changing
>> postgresql won't change the cpu useage.
> 
> Well changing to direct i/o would still be changing Postgres so that's
> unclear. And there are plenty of more mundane ways that Postgres is
> responsible for how efficiently or not the kernel is used. Just using fewer
> syscalls to do the same amount of reading would reduce cpu consumption.

Bingo.
 
- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Greg Stark
Alan Stange <[EMAIL PROTECTED]> writes:

> Luke Lonergan wrote:
> > Alan,
> >
> > On 11/18/05 9:31 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote:
> >
> >
> >> Here's the output from one iteration of iostat -k 60 while the box is
> >> doing a select count(1) on a 238GB table.
> >>
> >> avg-cpu:  %user   %nice%sys %iowait   %idle
> >>0.990.00   17.97   32.40   48.64
> >>
> >> Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
> >> sdd 345.95130732.53 0.007843952  0
> >>
> >> We're reading 130MB/s for a full minute.  About 20% of a single cpu was
> >> being used.   The remainder being idle.
> >>
> >
> > Cool - thanks for the results.  Is that % of one CPU, or of 2?  Was the
> > system otherwise idle?
> >
> Actually, this was dual cpu 

I hate to agree with him but that looks like a dual machine with one CPU
pegged. Yes most of the time is being spent in the kernel, but you're still
basically cpu limited.

That said, 130MB/s is nothing to sneeze at, that's maxing out two high end
drives and quite respectable for a 3-disk stripe set, even reasonable for a
4-disk stripe set. If you're using 5 or more disks in RAID-0 or RAID 1+0 and
only getting 130MB/s then it does seem likely the cpu is actually holding you
back here.

Still it doesn't show Postgres being nearly so CPU wasteful as the original
poster claimed.

> It's all in the kernel either way; using a different scheduler or file
> system would change that result. Even better would be using direct IO to not
> flush everything else from memory and avoid some memory copies from kernel
> to user space. Note that almost none of the time is user time. Changing
> postgresql won't change the cpu useage.

Well changing to direct i/o would still be changing Postgres so that's
unclear. And there are plenty of more mundane ways that Postgres is
responsible for how efficiently or not the kernel is used. Just using fewer
syscalls to do the same amount of reading would reduce cpu consumption.


> One IMHO obvious improvement would be to have vacuum and analyze only do 
> direct
> IO.  Now they appear to be very effective memory flushing tools.  Table scans
> on tables larger than say 4x memory should probably also use direct IO for
> reads.

-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Luke Lonergan
Alan,

On 11/18/05 10:30 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote:

> Actually, this was dual cpu and there was other activity during the full
> minute, but it was on other file devices, which I didn't include in the
> above output.   Given that, and given what I see on the box now I'd
> raise the 20% to 30% just to be more conservative.  It's all in the
> kernel either way; using a different scheduler or file system would
> change that result.  Even better would be using direct IO to not flush
> everything else from memory and avoid some memory copies from kernel to
> user space.  Note that almost none of the time is user time.  Changing
> postgresql won't change the cpu useage.

These are all things that help on the IO wait side possibly, however, there
is a producer/consumer problem in postgres that goes something like this:

- Read some (small number of, sometimes 1) 8k pages
- Do some work on those pages, including lots of copies
- repeat

This back and forth without threading (like AIO, or a multiprocessing
executor) causes cycling and inefficiency that limits throughput.
Optimizing some of the memcopies and other garbage out, plus increasing the
internal (postgres) readahead would probably double the disk bandwidth.

But to be disk-bound (meaning that the disk subsystem is running at full
speed), requires asynchronous I/O.  We do this now with Bizgres MPP, and we
get fully saturated disk channels on every machine.  That means that even on
one machine, we run many times faster than non-MPP postgres.

> One IMHO obvious improvement would be to have vacuum and analyze only do
> direct IO.  Now they appear to be very effective memory flushing tools.
> Table scans on tables larger than say 4x memory should probably also use
> direct IO for reads.

That's been suggested many times prior - I agree, but this also needs AIO to
be maximally effective.

> I don't know what the system cost.   It was part of block of dual
> opterons from Sun that we got some time ago.   I think the 130MB/s is
> slow given the hardware, but it's acceptable.  I'm not too price
> sensitive; I care much more about reliability, uptime, etc.

Then I know what they cost - we have them too (V20z and V40z).  You should
be getting 400MB/s+ with external RAID.

>>> What am I doing wrong?
>>> 
>>> 9 years ago I co-designed a petabyte data store with a goal of 1GB/s IO
>>> (for a DOE lab).   And now I don't know what I'm doing,
>>>
>> Cool.  Would that be Sandia?
>> 
>> We routinely sustain 2,000 MB/s from disk on 16x 2003 era machines on
>> complex queries.
> Disk?!  4 StorageTek tape silos.  That would be .002 TB/s.  One has to
> change how you think when you have that much data. And hope you don't
> have a fire, because there's no backup.   That work was while I was at
> BNL.   I believe they are now at 4PB of tape and 150TB of disk.

We had 1.5 Petabytes on 2 STK Silos at NAVO from 1996-1998 where I ran R&D.
We also had a Cray T932 an SGI Origin 3000 with 256 CPUs, a Cray T3E with
1280 CPUs, 2 Cray J916s with 1 TB of shared disk, a Cray C90-16, a Sun E10K,
etc etc, along with clusters of Alpha machines and lots of SGIs.  It's nice
to work with a $40M annual budget.

Later, working with FSL we implemented a weather forecasting cluster that
ultimately became the #5 fastest computer on the TOP500 supercomputing list
from 512 Alpha cluster nodes.  That machine had a 10-way shared SAN, tape
robotics and a Myrinet interconnect and ran 64-bit Linux (in 1998).

- Luke



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Alan Stange

Luke Lonergan wrote:

Alan,

On 11/18/05 9:31 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote:

  

Here's the output from one iteration of iostat -k 60 while the box is
doing a select count(1) on a 238GB table.

avg-cpu:  %user   %nice%sys %iowait   %idle
   0.990.00   17.97   32.40   48.64

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sdd 345.95130732.53 0.007843952  0

We're reading 130MB/s for a full minute.  About 20% of a single cpu was
being used.   The remainder being idle.



Cool - thanks for the results.  Is that % of one CPU, or of 2?  Was the
system otherwise idle?
  
Actually, this was dual cpu and there was other activity during the full 
minute, but it was on other file devices, which I didn't include in the 
above output.   Given that, and given what I see on the box now I'd 
raise the 20% to 30% just to be more conservative.  It's all in the 
kernel either way; using a different scheduler or file system would 
change that result.  Even better would be using direct IO to not flush 
everything else from memory and avoid some memory copies from kernel to 
user space.  Note that almost none of the time is user time.  Changing 
postgresql won't change the cpu useage.


One IMHO obvious improvement would be to have vacuum and analyze only do 
direct IO.  Now they appear to be very effective memory flushing tools.  
Table scans on tables larger than say 4x memory should probably also use 
direct IO for reads.


 
  

We've done nothing fancy and achieved results you claim shouldn't be
possible.  This is a system that was re-installed yesterday, no tuning
was done to the file systems, kernel or storage array.



Are you happy with 130MB/s?  How much did you pay for that?  Is it more than
$2,000, or double my 2003 PC?
  
I don't know what the system cost.   It was part of block of dual 
opterons from Sun that we got some time ago.   I think the 130MB/s is 
slow given the hardware, but it's acceptable.  I'm not too price 
sensitive; I care much more about reliability, uptime, etc.  

 
  

What am I doing wrong?

9 years ago I co-designed a petabyte data store with a goal of 1GB/s IO
(for a DOE lab).   And now I don't know what I'm doing,


Cool.  Would that be Sandia?

We routinely sustain 2,000 MB/s from disk on 16x 2003 era machines on
complex queries.
Disk?!  4 StorageTek tape silos.  That would be .002 TB/s.  One has to 
change how you think when you have that much data. And hope you don't 
have a fire, because there's no backup.   That work was while I was at 
BNL.   I believe they are now at 4PB of tape and 150TB of disk.


-- Alan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Alex Turner
Ok - so I ran the same test on my system and get a total speed of
113MB/sec.  Why is this?  Why is the system so limited to around just
110MB/sec?  I tuned read ahead up a bit, and my results improve a
bit..

Alex


On 11/18/05, Luke Lonergan <[EMAIL PROTECTED]> wrote:
>  Dave,
>
>  On 11/18/05 5:00 AM, "Dave Cramer" <[EMAIL PROTECTED]> wrote:
>  >
>  > Now there's an interesting line drawn in the sand. I presume you have
>  > numbers to back this up ?
>  >
>  > This should draw some interesting posts.
>
>  Part 2: The answer
>
>  System A:
>
> This system is running RedHat 3 Update 4, with a Fedora 2.6.10 Linux kernel.
>
>  On a single table with 15 columns (the Bizgres IVP) at a size double memory
> (2.12GB), Postgres 8.0.3 with Bizgres enhancements takes 32 seconds to scan
> the table: that's 66 MB/s.  Not the efficiency I'd hope from the onboard
> SATA controller that I'd like, I would have expected to get 85% of the
> 100MB/s raw read performance.
>
>  So that's $1,200 / 66 MB/s (without adjusting for 2003 price versus now) =
> 18.2 $/MB/s
>
>  Raw data:
>  [EMAIL PROTECTED] IVP]$ cat scan.sh
>  #!/bin/bash
>
>  time psql -c "select count(*) from ivp.bigtable1" dgtestdb
>  [EMAIL PROTECTED] IVP]$ cat sysout1
>count
>  --
>   1000
>  (1 row)
>
>
>  real0m32.565s
>  user0m0.002s
>  sys 0m0.003s
>
>  Size of the table data:
>  [EMAIL PROTECTED] IVP]$ du -sk dgtestdb/base
>  2121648 dgtestdb/base
>
>  System B:
>
> This system is running an XFS filesystem, and has been tuned to use very
> large (16MB) readahead.  It's running the Centos 4.1 distro, which uses a
> Linux 2.6.9 kernel.
>
>  Same test as above, but with 17GB of data takes 69.7 seconds to scan (!)
> That's 244.2MB/s, which is obviously double my earlier point of 110-120MB/s.
>  This system is running with a 16MB Linux readahead setting, let's try it
> with the default (I think) setting of 256KB – AHA! Now we get 171.4 seconds
> or 99.3MB/s.
>
>  So, using the tuned setting of "blockdev —setra 16384" we get $6,000 /
> 244MB/s = 24.6 $/MB/s
>  If we use the default Linux setting it's 2.5x worse.
>
>  Raw data:
>  [EMAIL PROTECTED] IVP]$ cat scan.sh
>  #!/bin/bash
>
>  time psql -c "select count(*) from ivp.bigtable1" dgtestdb
>  [EMAIL PROTECTED] IVP]$ cat sysout3
>count
>  --
>   8000
>  (1 row)
>
>
>  real1m9.875s
>  user0m0.000s
>  sys 0m0.004s
>  [EMAIL PROTECTED] IVP]$ !du
>  du -sk dgtestdb/base
>  17021260dgtestdb/base
>
>  Summary:
>
>   OK – you can get more I/O bandwidth out of the current I/O
> path for sequential scan if you tune the filesystem for large readahead.
> This is a cheap alternative to overhauling the executor to use asynch I/O.
>
>  Still, there is a CPU limit here – this is not I/O bound, it is CPU limited
> as evidenced by the sensitivity to readahead settings.   If the filesystem
> could do 1GB/s, you wouldn't go any faster than 244MB/s.
>
>  - Luke

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Luke Lonergan
Bill,

On 11/18/05 7:55 AM, "Bill McGonigle" <[EMAIL PROTECTED]> wrote:
> 
> There is some truth to it.  For an app I'm currently running (full-text
> search using tsearch2 on ~100MB of data) on:

Do you mean 100GB?  Sounds like you are more like a decision support
/warehousing application.
 
> Dev System:
> Asus bare-bones bookshelf case/mobo
> 3GHz P4 w/ HT
> 800MHz memory Bus
> Fedora Core 3 (nightly update)
> 1GB RAM
> 1 SATA Seagate disk (7200RPM, 8MB Cache)
> $800
> worst-case query: 7.2 seconds

About the same machine I posted results for, except I had two faster disks.

> now, the machine I'm deploying to:
> 
> Dell SomthingOrOther
> (4) 2.4GHz Xeons
> 533MHz memory bus
> RedHat Enterprise 3.6
> 1GB RAM
> (5) 15 RPM Ultra SCSI 320 on an Adaptec RAID 5 controller
>> $1
> same worst-case query: 9.6 seconds

Your problem here is the HW RAID controller - if you dump it and use the
onboard SCSI channels and Linux RAID you will see a jump from 40MB/s to
about 220MB/s in read performance and from 20MB/s to 110MB/s write
performance.  It will use less CPU too.
 
> Now it's not apples-to-apples.  There's a kernel 2.4 vs. 2.6 difference
> and the memory bus is much faster and I'm not sure what kind of context
> switching hit you get with the Xeon MP memory controller.  On a
> previous postgresql app I did I ran nearly identically spec'ed machines
> except for the memory bus and saw about a 30% boost in performance just
> with the 800MHz bus.  I imagine the Opteron bus does even better.

Memory bandwidth is so high on both that it's not a factor.  Context
switching / memory bus contention isn't either.
  
> So the small machine is probably slower on disk but makes up for it in
> single-threaded access to CPU and memory speed. But if this app were to
> be scaled it would make much more sense to cluster several $800
> machines than it would to buy 'big-iron'.

Yes it does - by a lot too.  Also, having a multiprocessing executor gets
all of each machine by having multiple CPUs scan simultaneously.

- Luke



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Luke Lonergan
Alex,

On 11/18/05 8:28 AM, "Alex Turner" <[EMAIL PROTECTED]> wrote:

> Ok - so I ran the same test on my system and get a total speed of
113MB/sec.
> Why is this?  Why is the system so limited to around just
110MB/sec?  I
> tuned read ahead up a bit, and my results improve a
bit..

OK! Now we're on the same page.  Finally someone who actually tests!

Check the CPU usage while it's doing the scan.  Know what it's doing?
Memory copies.  We've profiled it extensively.

So - that's the suckage - throwing more CPU power helps a bit, but the
underlying issue is poorly optimized code in the Postgres executor and lack
of I/O asynchrony.

- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Alan Stange

Luke Lonergan wrote:

Alan,

On 11/18/05 8:13 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote:

I told you in my initial post that I was observing numbers in
excess of
what you claiming, but you seemed to think I didn't know how to
measure
an IO rate.

Prove me wrong, post your data.

I should note too that our system uses about 20% of a single cpu when
performing a table scan at >100MB/s of IO. I think you claimed the
system would be cpu bound at this low IO rate.


See above.
Here's the output from one iteration of iostat -k 60 while the box is 
doing a select count(1) on a 238GB table.


avg-cpu:  %user   %nice%sys %iowait   %idle
  0.990.00   17.97   32.40   48.64

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sdd 345.95130732.53 0.007843952  0

We're reading 130MB/s for a full minute.  About 20% of a single cpu was 
being used.   The remainder being idle.


We've done nothing fancy and achieved results you claim shouldn't be 
possible.  This is a system that was re-installed yesterday, no tuning 
was done to the file systems, kernel or storage array.


What am I doing wrong?

9 years ago I co-designed a petabyte data store with a goal of 1GB/s IO 
(for a DOE lab).   And now I don't know what I'm doing,


Cheers,

-- Alan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Luke Lonergan
Title: Re: [PERFORM] Hardware/OS recommendations for large databases (



Richard,

On 11/18/05 5:22 AM, "Richard Huxton"  wrote:

Well, I'm prepared to swap Luke *TWO* $1000 systems for one $80,000
system if he's got one going :-)

Finally, a game worth playing!

Except it’s backward – I’ll show you 80 $1,000 systems performing 80 times faster than one $80,000 system.

On your proposition – I don’t have any $80,000 systems for trade, do you?

- Luke





Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Alan Stange

Richard Huxton wrote:

Dave Cramer wrote:


On 18-Nov-05, at 1:07 AM, Luke Lonergan wrote:

Postgres + Any x86 CPU from 2.4GHz up to Opteron 280 is CPU bound  
after

110MB/s of I/O.  This is true of Postgres 7.4, 8.0 and 8.1.

A $1,000 system with one CPU and two SATA disks in a software RAID0  
will
perform exactly the same as a $80,000 system with 8 dual core CPUs  
and the
world's best SCSI RAID hardware on a large database for decision  
support

(what the poster asked about).



Now there's an interesting line drawn in the sand. I presume you 
have  numbers to back this up ?


This should draw some interesting posts.


That's interesting, as I occasionally see more than 110MB/s of 
postgresql IO on our system.  I'm using a 32KB block size, which has 
been a huge win in performance for our usage patterns.   300GB database 
with a lot of turnover.  A vacuum analyze now takes about 3 hours, which 
is much shorter than before.  Postgresql 8.1, dual opteron, 8GB memory, 
Linux 2.6.11, FC drives.


-- Alan

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Luke Lonergan
Alan,

On 11/18/05 5:41 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote:

> 
> That's interesting, as I occasionally see more than 110MB/s of
> postgresql IO on our system.  I'm using a 32KB block size, which has
> been a huge win in performance for our usage patterns.   300GB database
> with a lot of turnover.  A vacuum analyze now takes about 3 hours, which
> is much shorter than before.  Postgresql 8.1, dual opteron, 8GB memory,
> Linux 2.6.11, FC drives.

300GB / 3 hours = 27MB/s.

If you are using the 2.6 linux kernel, you may be fooled into thinking you
burst more than you actually get in net I/O because the I/O stats changed in
tools like iostat and vmstat.

The only meaningful stats are (size of data) / (time to process data).  Do a
sequential scan of one of your large tables that you know the size of, then
divide by the run time and report it.

I'm compiling some new test data to make my point now.

Regards,

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Bill McGonigle

On Nov 18, 2005, at 08:00, Dave Cramer wrote:

A $1,000 system with one CPU and two SATA disks in a software RAID0 
will
perform exactly the same as a $80,000 system with 8 dual core CPUs 
and the
world's best SCSI RAID hardware on a large database for decision 
support

(what the poster asked about).


Now there's an interesting line drawn in the sand. I presume you have 
numbers to back this up ?

This should draw some interesting posts.


There is some truth to it.  For an app I'm currently running (full-text 
search using tsearch2 on ~100MB of data) on:


Dev System:
Asus bare-bones bookshelf case/mobo
3GHz P4 w/ HT
800MHz memory Bus
Fedora Core 3 (nightly update)
1GB RAM
1 SATA Seagate disk (7200RPM, 8MB Cache)
$800
worst-case query: 7.2 seconds

now, the machine I'm deploying to:

Dell SomthingOrOther
(4) 2.4GHz Xeons
533MHz memory bus
RedHat Enterprise 3.6
1GB RAM
(5) 15 RPM Ultra SCSI 320 on an Adaptec RAID 5 controller
> $1
same worst-case query: 9.6 seconds

Now it's not apples-to-apples.  There's a kernel 2.4 vs. 2.6 difference 
and the memory bus is much faster and I'm not sure what kind of context 
switching hit you get with the Xeon MP memory controller.  On a 
previous postgresql app I did I ran nearly identically spec'ed machines 
except for the memory bus and saw about a 30% boost in performance just 
with the 800MHz bus.  I imagine the Opteron bus does even better.


So the small machine is probably slower on disk but makes up for it in 
single-threaded access to CPU and memory speed. But if this app were to 
be scaled it would make much more sense to cluster several $800 
machines than it would to buy 'big-iron'.


-Bill
-
Bill McGonigle, Owner   Work: 603.448.4440
BFC Computing, LLC  Home: 603.448.1668
[EMAIL PROTECTED]   Mobile: 603.252.2606
http://www.bfccomputing.com/Pager: 603.442.1833
Jabber: [EMAIL PROTECTED]  Text: [EMAIL PROTECTED]
Blog: http://blog.bfccomputing.com/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Vivek Khera

On Nov 18, 2005, at 1:07 AM, Luke Lonergan wrote:

A $1,000 system with one CPU and two SATA disks in a software RAID0  
will
perform exactly the same as a $80,000 system with 8 dual core CPUs  
and the
world's best SCSI RAID hardware on a large database for decision  
support

(what the poster asked about).


Hahahahahahahahahahahahaha! Whooo... needed to fall out of my chair  
laughing this morning.


I can tell you from direct personal experience that you're just plain  
wrong.


I've had to move my primary DB server from a dual P3 1GHz with 4-disk  
RAID10 SCSI, to Dual P3 2GHz with 14-disk RAID10 and faster drives,  
to Dual Opteron 2GHz with 8-disk RAID10 and even faster disks to keep  
up with my load on a 60+ GB database.  The Dual opteron system has  
just a little bit of extra capacity if I offload some of the  
reporting operations to a replicated copy (via slony1).  If I run all  
the queries on the one DB it can't keep up.


One most telling point about the difference in speed is that the 14- 
disk array system cannot keep up with the replication being generated  
by the dual opteron, even when it is no doing any other queries of  
its own.  The I/O system just ain't fast enough.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Vivek Khera
On Nov 18, 2005, at 10:13 AM, Luke Lonergan wrote:Still, there is a CPU limit here – this is not I/O bound, it is CPU limited as evidenced by the sensitivity to readahead settings.   If the filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s.Yeah, and mysql would probably be faster on your trivial queries.  Try concurrent large joins and updates and see which system is faster.

Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Dave Cramer
On 18-Nov-05, at 8:30 AM, Luke Lonergan wrote: Richard,  On 11/18/05 5:22 AM, "Richard Huxton"  wrote:  Well, I'm prepared to swap Luke *TWO* $1000 systems for one $80,000 system if he's got one going :-)  Finally, a game worth playing!  Except it’s backward – I’ll show you 80 $1,000 systems performing 80 times faster than one $80,000 system.Now you wouldn't happen to be selling a system that would enable this for postgres, now would ya ?  On your proposition – I don’t have any $80,000 systems for trade, do you?  - Luke  

Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Luke Lonergan
Title: Re: [PERFORM] Hardware/OS recommendations for large databases (



Dave,


On 11/18/05 5:00 AM, "Dave Cramer" <[EMAIL PROTECTED]> wrote:
> 
> Now there's an interesting line drawn in the sand. I presume you have 
> numbers to back this up ?
> 
> This should draw some interesting posts.

OK, here we go:

The $1,000 system (System A):

- I bought 16 of these in 2003 for $1,200 each. They have Intel or Asus motherboards, Intel P4 3.0GHz CPUs with an 800MHz FSB.  They have a system drive and two RAID0 SATA drives, the Western Digital 74GB Raptor (10K RPM).  They have 1GB of RAM.

A test of write and read performance on the RAID0:

[EMAIL PROTECTED] raid0]$ time dd if=/dev/zero of=bigfile bs=8k count=25
25+0 records in
25+0 records out

real    0m17.453s
user    0m0.249s
sys 0m10.246s

[EMAIL PROTECTED] raid0]$ time dd if=bigfile of=/dev/null bs=8k
25+0 records in
25+0 records out

real    0m18.930s
user    0m0.130s
sys 0m3.590s

So, the write performance is 114MB/s and read performance is 106MB/s.

The $6,000 system (System B):

I just bought 5 of these systems for $6,000 each.  They are dual Opteron systems with 8GB of RAM and 2x 250 model CPUs, which are close to the fastest.  They have the new 3Ware 9550SX SATA RAID adapters coupled to  Western Digital 400GB RE2 model hard drives.  They are organized as a RAID5.

A test of write and read performance on the RAID5:

[EMAIL PROTECTED] dbfast1]# time dd if=/dev/zero of=bigfile bs=8k count=200
200+0 records in
200+0 records out

real    0m51.441s
user    0m0.288s
sys 0m29.119s

[EMAIL PROTECTED] dbfast1]# time dd if=bigfile of=/dev/null bs=8k
200+0 records in
200+0 records out

real    0m39.605s
user    0m0.244s
sys 0m19.207s

So, the write performance is 314MB/s and read performance is 404MB/s (!)  This is the fastest I’ve seen 8 disk drives perform.

So, the question is: which of these systems (A or B) can scan a large table faster using non-MPP postgres?  How much faster would you wager?

Send your answer, and I’ll post the result.

Regards,

- Luke





Re: [PERFORM] Hardware/OS recommendations for large databases

2005-11-18 Thread Ron
While I agree with you in principle that pg becomes CPU bound 
relatively easily compared to other DB products (at ~110-120MBps 
according to a recent thread), there's a bit of hyperbole in your post.


a. There's a big difference between the worst performing 1C x86 ISA 
CPU available and the best performing 2C one (IIRC, that's the 
2.4GHz, 1MB L2 cache AMDx2 4800+ as of this writing)


b. Two 2C CPU's vs one 1C CPU means that a pg process will almost 
never be waiting on other non pg processes.  It also means that 3-4 
pg processes, CPU bound or not, can execute in parallel.  Not an 
option with one 1C CPU.


c. Mainboards with support for multiple CPUs and lots' of RAM are 
_not_ the cheap ones.


d.  No one should ever use RAID 0 for valuable data.  Ever.  So at 
the least you need 4 HD's for a RAID 10 set (RAID 5 is not a good 
option unless write performance is unimportant.  4HD RAID 5 is 
particularly not a good option.)


e. The server usually needs to talk to things over a network 
connection.  Often performance here matters.  Mainboards with 2 1GbE 
NICs and/or PCI-X (or PCI-E) slots for 10GbE cards are not the cheap ones.


f. Trash HDs mean poor IO performance and lower reliability.  While 
TOTL 15Krpm 4Gb FC HDs are usually overkill (Not always.  It depends 
on context.),
you at least want SATA II HDs with NCQ or TCQ support.  And you want 
them to have a decent media warranty- preferably a 5 year one if you 
can get it.  Again, these are not the cheapest HD's available.


g. Throughput limitations say nothing about latency 
considerations.  OLTP-like systems _want_ HD spindles.  AMAP.  Even 
non OLTP-like systems need a fair number of spindles to optimize HD 
IO: dedicated WAL set, multiple dedicated DB sets, dedicated OS and 
swap space set, etc, etc.  At 50MBps ASTR, you need 16 HD's operating 
in parallel to saturate the bandwidth of a PCI-X channel.
That's ~8 independent pg tasks (queries using different tables, 
dedicated WAL IO, etc) running in parallel.  Regardless of application domain.


h. Decent RAID controllers and HBAs are not cheap either.  Even SW 
RAID benefits from having a big dedicated RAM buffer to talk to.


While the above may not cost you $80K, it sure isn't costing you $1K either.
Maybe ~$15-$20K, but not $1K.

Ron


At 01:07 AM 11/18/2005, Luke Lonergan wrote:

Greg,


On 11/17/05 9:17 PM, "Greg Stark" <[EMAIL PROTECTED]> wrote:

> Ok, a more productive point: it's not really the size of the database that
> controls whether you're I/O bound or CPU bound. It's the available I/O
> bandwidth versus your CPU speed.

Postgres + Any x86 CPU from 2.4GHz up to Opteron 280 is CPU bound after
110MB/s of I/O.  This is true of Postgres 7.4, 8.0 and 8.1.

A $1,000 system with one CPU and two SATA disks in a software RAID0 will
perform exactly the same as a $80,000 system with 8 dual core CPUs and the
world's best SCSI RAID hardware on a large database for decision support
(what the poster asked about).

Regards,

- Luke



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match





---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Luke Lonergan
Title: Re: [PERFORM] Hardware/OS recommendations for large databases (



Dave,

On 11/18/05 5:00 AM, "Dave Cramer" <[EMAIL PROTECTED]> wrote:
> 
> Now there's an interesting line drawn in the sand. I presume you have 
> numbers to back this up ?
> 
> This should draw some interesting posts.

Part 2: The answer

System A:
This system is running RedHat 3 Update 4, with a Fedora 2.6.10 Linux kernel.

On a single table with 15 columns (the Bizgres IVP) at a size double memory (2.12GB), Postgres 8.0.3 with Bizgres enhancements takes 32 seconds to scan the table: that’s 66 MB/s.  Not the efficiency I’d hope from the onboard SATA controller that I’d like, I would have expected to get 85% of the 100MB/s raw read performance.

So that’s $1,200 / 66 MB/s (without adjusting for 2003 price versus now) = 18.2 $/MB/s

Raw data:
[EMAIL PROTECTED] IVP]$ cat scan.sh 
#!/bin/bash

time psql -c "select count(*) from ivp.bigtable1" dgtestdb
[EMAIL PROTECTED] IVP]$ cat sysout1
  count   
--
 1000
(1 row)


real    0m32.565s
user    0m0.002s
sys 0m0.003s

Size of the table data:
[EMAIL PROTECTED] IVP]$ du -sk dgtestdb/base
2121648 dgtestdb/base

System B:
This system is running an XFS filesystem, and has been tuned to use very large (16MB) readahead.  It’s running the Centos 4.1 distro, which uses a Linux 2.6.9 kernel.

Same test as above, but with 17GB of data takes 69.7 seconds to scan (!)  That’s 244.2MB/s, which is obviously double my earlier point of 110-120MB/s.  This system is running with a 16MB Linux readahead setting, let’s try it with the default (I think) setting of 256KB – AHA! Now we get 171.4 seconds or 99.3MB/s.

So, using the tuned setting of “blockdev —setra 16384” we get $6,000 / 244MB/s = 24.6 $/MB/s
If we use the default Linux setting it’s 2.5x worse.

Raw data:
[EMAIL PROTECTED] IVP]$ cat scan.sh 
#!/bin/bash

time psql -c "select count(*) from ivp.bigtable1" dgtestdb
[EMAIL PROTECTED] IVP]$ cat sysout3
  count   
--
 8000
(1 row)


real    1m9.875s
user    0m0.000s
sys 0m0.004s
[EMAIL PROTECTED] IVP]$ !du
du -sk dgtestdb/base
17021260    dgtestdb/base

Summary:

 OK – you can get more I/O bandwidth out of the current I/O path for sequential scan if you tune the filesystem for large readahead.  This is a cheap alternative to overhauling the executor to use asynch I/O.

Still, there is a CPU limit here – this is not I/O bound, it is CPU limited as evidenced by the sensitivity to readahead settings.   If the filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s.

- Luke





Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Alan Stange

Luke Lonergan wrote:

Alan,

On 11/18/05 5:41 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote:

  

That's interesting, as I occasionally see more than 110MB/s of
postgresql IO on our system.  I'm using a 32KB block size, which has
been a huge win in performance for our usage patterns.   300GB database
with a lot of turnover.  A vacuum analyze now takes about 3 hours, which
is much shorter than before.  Postgresql 8.1, dual opteron, 8GB memory,
Linux 2.6.11, FC drives.



300GB / 3 hours = 27MB/s.
  
That's 3 hours under load, with 80 compute clients beating on the 
database at the same time.   We have the stats turned way up, so the 
analyze tends to read a big chunk of the tables a second time as 
well.We typically don't have three hours a day of idle time.


-- Alan

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Richard Huxton

Dave Cramer wrote:


On 18-Nov-05, at 1:07 AM, Luke Lonergan wrote:


Postgres + Any x86 CPU from 2.4GHz up to Opteron 280 is CPU bound  after
110MB/s of I/O.  This is true of Postgres 7.4, 8.0 and 8.1.

A $1,000 system with one CPU and two SATA disks in a software RAID0  will
perform exactly the same as a $80,000 system with 8 dual core CPUs  
and the

world's best SCSI RAID hardware on a large database for decision  support
(what the poster asked about).



Now there's an interesting line drawn in the sand. I presume you have  
numbers to back this up ?


This should draw some interesting posts.


Well, I'm prepared to swap Luke *TWO* $1000 systems for one $80,000 
system if he's got one going :-)


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Dave Cramer


On 17-Nov-05, at 2:50 PM, Alex Turner wrote:


Just pick up a SCSI drive and a consumer ATA drive.

Feel their weight.

You don't have to look inside to tell the difference.
At one point stereo manufacturers put weights in the case just to  
make them heavier.

The older ones weighed more and the consumer liked heavy stereos.

Be careful what you measure.

Dave


Alex

On 11/16/05, David Boreham <[EMAIL PROTECTED]> wrote:



 I suggest you read this on the difference between enterprise/SCSI  
and

desktop/IDE drives:

http://www.seagate.com/content/docs/pdf/whitepaper/ 
D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf



 This is exactly the kind of vendor propaganda I was talking about
 and it proves my point quite well : that there's nothing specific  
relating
 to reliability that is different between SCSI and SATA drives  
cited in that

paper.
 It does have a bunch of FUD such as 'oh yeah we do a lot more
 drive characterization during manufacturing'.






---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Dave Cramer


On 18-Nov-05, at 1:07 AM, Luke Lonergan wrote:


Greg,


On 11/17/05 9:17 PM, "Greg Stark" <[EMAIL PROTECTED]> wrote:

Ok, a more productive point: it's not really the size of the  
database that
controls whether you're I/O bound or CPU bound. It's the available  
I/O

bandwidth versus your CPU speed.


Postgres + Any x86 CPU from 2.4GHz up to Opteron 280 is CPU bound  
after

110MB/s of I/O.  This is true of Postgres 7.4, 8.0 and 8.1.

A $1,000 system with one CPU and two SATA disks in a software RAID0  
will
perform exactly the same as a $80,000 system with 8 dual core CPUs  
and the
world's best SCSI RAID hardware on a large database for decision  
support

(what the poster asked about).


Now there's an interesting line drawn in the sand. I presume you have  
numbers to back this up ?


This should draw some interesting posts.

Dave


Regards,

- Luke



---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly