Re: [PERFORM] What is the max number of database I can create in an instance of pgsql?
On 11/18/05, anon permutation <[EMAIL PROTECTED]> wrote: > > Hi, > > We want to create a database for each one of our departments, but we only > want to have one instance of postgresql running. There are about 10-20 > departments. I can easily use createdb to create these databases. However, > After of doing this, you have to think if you will want to make querys across the info of some or all databases (and you will) if that is the case the better you can do is create schemas instead of databases... > what is the max number of database I can create before performance goes > down? > the problem isn't about number of databases but concurrent users... after all you will have the same resources for 1 or 100 databases, the important thing is the number of users, the amount of data normal users will process in a normal day, and complexity of your queries. > Assuming each database is performing well alone, how would putting 10-20 of > > them together in one instance affect postgres? > > In terms of getting a new server for this project, how do I gauge how > powerful of a server should I get? > > Thanks. > > -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] What is the max number of database I can create in an instance of pgsql?
On Nov 19, 2005, at 12:24 , anon permutation wrote: However, what is the max number of database I can create before performance goes down? Assuming each database is performing well alone, how would putting 10-20 of them together in one instance affect postgres? In terms of getting a new server for this project, how do I gauge how powerful of a server should I get? I'm sure those wiser than me will chime in with specifics. I think you should be think of usage not in terms of number of databases but in terms of connections rates, database size (numbers of tables and tuples) and the types of queries that will be run. While there may be a little overhead in from having a number of databases in the cluster, I think this is probably going to be insignificant in comparison to these other factors. A better idea of what the usage will guide you in choosing your hardware. Michael Glaesemann grzm myrealbox com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] What is the max number of database I can create in an instance of pgsql?
Hi, We want to create a database for each one of our departments, but we only want to have one instance of postgresql running. There are about 10-20 departments. I can easily use createdb to create these databases. However, what is the max number of database I can create before performance goes down? Assuming each database is performing well alone, how would putting 10-20 of them together in one instance affect postgres? In terms of getting a new server for this project, how do I gauge how powerful of a server should I get? Thanks. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: Mark, On 11/18/05 3:46 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote: If you alter this to involve more complex joins (e.g 4. way star) and (maybe add a small number of concurrent executors too) - is it still the case? I may not have listened to you - are you asking about whether the readahead works for these cases? I’ll be running some massive TPC-H benchmarks on these machines soon – we’ll see then. That too, meaning the business of 1 executor random reading a given relation file whilst another is sequentially scanning (some other) part of it Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware/OS recommendations for large databases (
Title: Re: [PERFORM] Hardware/OS recommendations for large databases ( Mark, On 11/18/05 3:46 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote: If you alter this to involve more complex joins (e.g 4. way star) and (maybe add a small number of concurrent executors too) - is it still the case? I may not have listened to you - are you asking about whether the readahead works for these cases? I’ll be running some massive TPC-H benchmarks on these machines soon – we’ll see then. - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases (
Mark, On 11/18/05 3:46 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote: > If you alter this to involve more complex joins (e.g 4. way star) and > (maybe add a small number of concurrent executors too) - is it still the > case? 4-way star, same result, that's part of my point. With Bizgres MPP, the 4-way star uses 4 concurrent scanners, though not all are active all the time. And that's per segment instance - we normally use one segment instance per CPU, so our concurrency is NCPUs plus some. The trick is the "small number of concurrent executors" part. The only way to get this with normal postgres is to have concurrent users, and normally they are doing different things, scanning different parts of the disk. These are competing things, and for concurrency enhancement something like "sync scan" would be an effective optimization. But in reporting, business analytics and warehousing in general, there are reports that take hours to run. If you can knock that down by factors of 10 using parallelism, it's a big win. That's the reason that Teradata did $1.5 Billion in business last year. More importantly - that's the kind of work that everyone using internet data for analytics wants right now. - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] ERROR: no value found for parameter 1 with JDBC and Explain Analyze
"Virag Saksena" <[EMAIL PROTECTED]> writes: > ERROR: no value found for parameter 1 > Here is sample code which causes this exception ... > pst=prodconn.prepareStatement("explain analyze select count(*) from > jam_heaprel r where heap_id = ? and parentaddr = ?"); I don't think EXPLAIN can take parameters (most of the "utility" statements don't take parameters). The usual workaround is to use PREPARE: PREPARE foo(paramtype,paramtype) AS SELECT ...; EXPLAIN EXECUTE foo(x,y); This will generate the same parameterized plan as you'd get from the other way, so it's a reasonable approximation to the behavior with JDBC parameters. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: (mass snippage) time psql -c "select count(*) from ivp.bigtable1" dgtestdb [EMAIL PROTECTED] IVP]$ cat sysout3 count -- 8000 (1 row) real1m9.875s user0m0.000s sys 0m0.004s [EMAIL PROTECTED] IVP]$ !du du -sk dgtestdb/base 17021260dgtestdb/base Summary: OK – you can get more I/O bandwidth out of the current I/O path for sequential scan if you tune the filesystem for large readahead. This is a cheap alternative to overhauling the executor to use asynch I/O. Still, there is a CPU limit here – this is not I/O bound, it is CPU limited as evidenced by the sensitivity to readahead settings. If the filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s. Luke, Interesting - but possibly only representative for a workload consisting entirely of one executor doing "SELECT ... FROM my_single_table". If you alter this to involve more complex joins (e.g 4. way star) and (maybe add a small number of concurrent executors too) - is it still the case? Cheers Mark ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] sort/limit across union all
We have a large DB with partitioned tables in postgres. We have had trouble with a ORDER/LIMIT type query. The order and limit are not pushed down to the sub-tables CREATE TABLE base ( foo int ); CREATE TABLE bar_0 extra int ) INHERITS (base); ALTER TABLE bar ADD PRIMARY KEY (foo); -- repeated for bar_0... bar_40 SELECT foo FROM base ORDER BY foo LIMIT 10; is real slow. What is required to make the query planner generate the following instead... (code change i know, but how hard would it be?) SELECT foo FROM ( SELECT * FROM bar_0 ORDER BY foo LIMIT 10 UNION ALL SELECT * FROM bar_1 ORDER BY foo LIMIT 10 ) AS base ORDER BY foo LIMIT 10;
[PERFORM] ERROR: no value found for parameter 1 with JDBC and Explain Analyze
Hi, I am trying to use Explain Analyze to trace a slow SQL statement called from JDBC. The SQL statement with the parameters taked 11 seconds. When I run a explain analyze from psql, it takes < 50 ms with a reasonable explain plan. However when I try to run an explain analyze from JDBC with the parameters, I get error : ERROR: no value found for parameter 1 Here is sample code which causes this exception ... pst=prodconn.prepareStatement("explain analyze select count(*) from jam_heaprel r where heap_id = ? and parentaddr = ?"); pst.setInt(1,1); pst.setInt(2,0); rs=pst.executeQuery(); java.sql.SQLException: ERROR: no value found for parameter 1 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1471) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1256) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:175) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:389) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:330) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:240) at jsp._testexplain_2ejsp._jspService(_testexplain_2ejsp.java:82) at org.gjt.jsp.HttpJspPageImpl.service(HttpJspPageImpl.java:75) Regards, Virag ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Help speeding up delete
Steve Wampler wrote: Is the performance behavior that we're experiencing a known problem with 7.2 that has been addressed in 7.4? Or will the upgrade fix other problems while leaving this one? I'm pretty sure that in versions earlier than 7.4, IN clauses that use a subquery will always use a seqscan, regardless of what indexes are available. If you try an IN using explicit values though, it should use the index. Thanks Leigh ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Hardware/OS recommendations for large databases
Breaking the ~120MBps pg IO ceiling by any means is an important result. Particularly when you get a ~2x improvement. I'm curious how far we can get using simple approaches like this. At 10:13 AM 11/18/2005, Luke Lonergan wrote: Dave, On 11/18/05 5:00 AM, "Dave Cramer" <[EMAIL PROTECTED]> wrote: > > Now there's an interesting line drawn in the sand. I presume you have > numbers to back this up ? > > This should draw some interesting posts. Part 2: The answer System A: This system is running RedHat 3 Update 4, with a Fedora 2.6.10 Linux kernel. On a single table with 15 columns (the Bizgres IVP) at a size double memory (2.12GB), Postgres 8.0.3 with Bizgres enhancements takes 32 seconds to scan the table: thats 66 MB/s. Not the efficiency Id hope from the onboard SATA controller that Id like, I would have expected to get 85% of the 100MB/s raw read performance. Have you tried the large read ahead trick with this system? It would be interesting to see how much it would help. It might even be worth it to do the experiment at all of [default, 2x default, 4x default, 8x default, etc] read ahead until either a) you run out of resources to support the desired read ahead, or b) performance levels off. I can imagine the results being very enlightening. System B: This system is running an XFS filesystem, and has been tuned to use very large (16MB) readahead. Its running the Centos 4.1 distro, which uses a Linux 2.6.9 kernel. Same test as above, but with 17GB of data takes 69.7 seconds to scan (!) Thats 244.2MB/s, which is obviously double my earlier point of 110-120MB/s. This system is running with a 16MB Linux readahead setting, lets try it with the default (I think) setting of 256KB AHA! Now we get 171.4 seconds or 99.3MB/s. The above experiment would seem useful here as well. Summary: OK you can get more I/O bandwidth out of the current I/O path for sequential scan if you tune the filesystem for large readahead. This is a cheap alternative to overhauling the executor to use asynch I/O. Still, there is a CPU limit here this is not I/O bound, it is CPU limited as evidenced by the sensitivity to readahead settings. If the filesystem could do 1GB/s, you wouldnt go any faster than 244MB/s. - Luke I respect your honesty in reporting results that were different then your expectations or previously taken stance. Alan Stange's comment re: the use of direct IO along with your comments re: async IO and mem copies plus the results of these experiments could very well point us directly at how to most easily solve pg's CPU boundness during IO. [HACKERS] are you watching this? Ron ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke,Interesting numbers. I'm a little concerned about the use of blockdev —setra 16384. If I understand this correctly it assumes that the table is contiguous on the disk does it not ?DaveOn 18-Nov-05, at 10:13 AM, Luke Lonergan wrote: Dave, On 11/18/05 5:00 AM, "Dave Cramer" <[EMAIL PROTECTED]> wrote: > > Now there's an interesting line drawn in the sand. I presume you have > numbers to back this up ? > > This should draw some interesting posts. Part 2: The answer System A: This system is running RedHat 3 Update 4, with a Fedora 2.6.10 Linux kernel. On a single table with 15 columns (the Bizgres IVP) at a size double memory (2.12GB), Postgres 8.0.3 with Bizgres enhancements takes 32 seconds to scan the table: that’s 66 MB/s. Not the efficiency I’d hope from the onboard SATA controller that I’d like, I would have expected to get 85% of the 100MB/s raw read performance. So that’s $1,200 / 66 MB/s (without adjusting for 2003 price versus now) = 18.2 $/MB/s Raw data: [EMAIL PROTECTED] IVP]$ cat scan.sh #!/bin/bash time psql -c "select count(*) from ivp.bigtable1" dgtestdb [EMAIL PROTECTED] IVP]$ cat sysout1 count -- 1000 (1 row) real 0m32.565s user 0m0.002s sys 0m0.003s Size of the table data: [EMAIL PROTECTED] IVP]$ du -sk dgtestdb/base 2121648 dgtestdb/base System B: This system is running an XFS filesystem, and has been tuned to use very large (16MB) readahead. It’s running the Centos 4.1 distro, which uses a Linux 2.6.9 kernel. Same test as above, but with 17GB of data takes 69.7 seconds to scan (!) That’s 244.2MB/s, which is obviously double my earlier point of 110-120MB/s. This system is running with a 16MB Linux readahead setting, let’s try it with the default (I think) setting of 256KB – AHA! Now we get 171.4 seconds or 99.3MB/s. So, using the tuned setting of “blockdev —setra 16384” we get $6,000 / 244MB/s = 24.6 $/MB/s If we use the default Linux setting it’s 2.5x worse. Raw data: [EMAIL PROTECTED] IVP]$ cat scan.sh #!/bin/bash time psql -c "select count(*) from ivp.bigtable1" dgtestdb [EMAIL PROTECTED] IVP]$ cat sysout3 count -- 8000 (1 row) real 1m9.875s user 0m0.000s sys 0m0.004s [EMAIL PROTECTED] IVP]$ !du du -sk dgtestdb/base 17021260 dgtestdb/base Summary: OK – you can get more I/O bandwidth out of the current I/O path for sequential scan if you tune the filesystem for large readahead. This is a cheap alternative to overhauling the executor to use asynch I/O. Still, there is a CPU limit here – this is not I/O bound, it is CPU limited as evidenced by the sensitivity to readahead settings. If the filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s. - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases (
Title: Re: [PERFORM] Hardware/OS recommendations for large databases ( Alan, On 11/18/05 8:13 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote: I told you in my initial post that I was observing numbers in excess of what you claiming, but you seemed to think I didn't know how to measure an IO rate. Prove me wrong, post your data. I should note too that our system uses about 20% of a single cpu when performing a table scan at >100MB/s of IO. I think you claimed the system would be cpu bound at this low IO rate. See above. - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases (
Title: Re: [PERFORM] Hardware/OS recommendations for large databases ( Vivek, On 11/18/05 8:07 AM, "Vivek Khera" <[EMAIL PROTECTED]> wrote: On Nov 18, 2005, at 10:13 AM, Luke Lonergan wrote: Still, there is a CPU limit here – this is not I/O bound, it is CPU limited as evidenced by the sensitivity to readahead settings. If the filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s. Yeah, and mysql would probably be faster on your trivial queries. Try concurrent large joins and updates and see which system is faster. That’s what we do to make a living. And it’s Oracle that a lot faster because they implemented a much tighter, optimized I/O path to disk than Postgres. Since you asked, we bought the 5 systems as a cluster – and with Bizgres MPP we get close to 400MB/s per machine on complex queries. - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases (
Greg Stark wrote: Alan Stange <[EMAIL PROTECTED]> writes: Luke Lonergan wrote: Alan, On 11/18/05 9:31 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote: Here's the output from one iteration of iostat -k 60 while the box is doing a select count(1) on a 238GB table. avg-cpu: %user %nice%sys %iowait %idle 0.990.00 17.97 32.40 48.64 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sdd 345.95130732.53 0.007843952 0 We're reading 130MB/s for a full minute. About 20% of a single cpu was being used. The remainder being idle. Cool - thanks for the results. Is that % of one CPU, or of 2? Was the system otherwise idle? Actually, this was dual cpu I hate to agree with him but that looks like a dual machine with one CPU pegged. Yes most of the time is being spent in the kernel, but you're still basically cpu limited. That said, 130MB/s is nothing to sneeze at, that's maxing out two high end drives and quite respectable for a 3-disk stripe set, even reasonable for a 4-disk stripe set. If you're using 5 or more disks in RAID-0 or RAID 1+0 and only getting 130MB/s then it does seem likely the cpu is actually holding you back here. Still it doesn't show Postgres being nearly so CPU wasteful as the original poster claimed. Yes and no. The one cpu is clearly idle. The second cpu is 40% busy and 60% idle (aka iowait in the above numbers). Of that 40%, other things were happening as well during the 1 minute snapshot. During some iostat outputs that I didn't post the cpu time was ~ 20%. So, you can take your pick. The single cpu usage is somewhere between 20% and 40%. As I can't remove other users of the system, it's the best measurement that I can make right now. Either way, it's not close to being cpu bound. This is with Opteron 248, 2.2Ghz cpus. Note that the storage system has been a bit disappointing: it's an IBM Fast T600 with a 200MB/s fiber attachment. It could be better, but it's not been the bottleneck in our work, so we haven't put any energy into it. It's all in the kernel either way; using a different scheduler or file system would change that result. Even better would be using direct IO to not flush everything else from memory and avoid some memory copies from kernel to user space. Note that almost none of the time is user time. Changing postgresql won't change the cpu useage. Well changing to direct i/o would still be changing Postgres so that's unclear. And there are plenty of more mundane ways that Postgres is responsible for how efficiently or not the kernel is used. Just using fewer syscalls to do the same amount of reading would reduce cpu consumption. Absolutely. This is why we're using a 32KB block size and also switched to using O_SYNC for the WAL syncing method. That's many MB/s that don't need to be cached in the kernel (thus evicting other data), and we avoid all the fysnc/fdatasync syscalls. The purpose of direct IO isn't to make the vacuum or analyze faster, but to lessen their impact on queries with someone waiting for the results. That's our biggest hit: running a sequential scan on 240GB of data and flushing everything else out of memory. Now that I'm think about this a bit, a big chunk of time is probably being lost in TLB misses and other virtual memory events that would be avoided if a larger page size was being used. -- Alan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: opterons from Sun that we got some time ago. I think the 130MB/s is slow given the hardware, but it's acceptable. I'm not too price sensitive; I care much more about reliability, uptime, etc. I don't know what the system cost. It was part of block of dual Then I know what they cost - we have them too (V20z and V40z). You should be getting 400MB/s+ with external RAID. Yes, but we don't. This is where I would normally begin a rant on how craptacular Linux can be at times. But, for the sake of this discussion, postgresql isn't reading the data any more slowly than does any other program. And we don't have the time to experiment with the box. I know it should be better, but it's good enough for our purposes at this time. -- Alan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware/OS recommendations for large databases (
Greg, On 11/18/05 11:07 AM, "Greg Stark" <[EMAIL PROTECTED]> wrote: > That said, 130MB/s is nothing to sneeze at, that's maxing out two high end > drives and quite respectable for a 3-disk stripe set, even reasonable for a > 4-disk stripe set. If you're using 5 or more disks in RAID-0 or RAID 1+0 and > only getting 130MB/s then it does seem likely the cpu is actually holding you > back here. With an FC array, it's undoubtedly more like 14 drives, in which case 130MB/s is laughable. On the other hand, I wouldn't be surprised if it were a single 200MB/s Fibre Channel attachment. It does make you wonder why people keep recommending 15K RPM drives, like it would help *not*. > Still it doesn't show Postgres being nearly so CPU wasteful as the original > poster claimed. It's partly about waste, and partly about lack of a concurrent I/O mechanism. We've profiled it for the waste, we've implemented concurrent I/O to prove the other point. >> It's all in the kernel either way; using a different scheduler or file >> system would change that result. Even better would be using direct IO to not >> flush everything else from memory and avoid some memory copies from kernel >> to user space. Note that almost none of the time is user time. Changing >> postgresql won't change the cpu useage. > > Well changing to direct i/o would still be changing Postgres so that's > unclear. And there are plenty of more mundane ways that Postgres is > responsible for how efficiently or not the kernel is used. Just using fewer > syscalls to do the same amount of reading would reduce cpu consumption. Bingo. - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware/OS recommendations for large databases (
Alan Stange <[EMAIL PROTECTED]> writes: > Luke Lonergan wrote: > > Alan, > > > > On 11/18/05 9:31 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote: > > > > > >> Here's the output from one iteration of iostat -k 60 while the box is > >> doing a select count(1) on a 238GB table. > >> > >> avg-cpu: %user %nice%sys %iowait %idle > >>0.990.00 17.97 32.40 48.64 > >> > >> Device:tpskB_read/skB_wrtn/skB_readkB_wrtn > >> sdd 345.95130732.53 0.007843952 0 > >> > >> We're reading 130MB/s for a full minute. About 20% of a single cpu was > >> being used. The remainder being idle. > >> > > > > Cool - thanks for the results. Is that % of one CPU, or of 2? Was the > > system otherwise idle? > > > Actually, this was dual cpu I hate to agree with him but that looks like a dual machine with one CPU pegged. Yes most of the time is being spent in the kernel, but you're still basically cpu limited. That said, 130MB/s is nothing to sneeze at, that's maxing out two high end drives and quite respectable for a 3-disk stripe set, even reasonable for a 4-disk stripe set. If you're using 5 or more disks in RAID-0 or RAID 1+0 and only getting 130MB/s then it does seem likely the cpu is actually holding you back here. Still it doesn't show Postgres being nearly so CPU wasteful as the original poster claimed. > It's all in the kernel either way; using a different scheduler or file > system would change that result. Even better would be using direct IO to not > flush everything else from memory and avoid some memory copies from kernel > to user space. Note that almost none of the time is user time. Changing > postgresql won't change the cpu useage. Well changing to direct i/o would still be changing Postgres so that's unclear. And there are plenty of more mundane ways that Postgres is responsible for how efficiently or not the kernel is used. Just using fewer syscalls to do the same amount of reading would reduce cpu consumption. > One IMHO obvious improvement would be to have vacuum and analyze only do > direct > IO. Now they appear to be very effective memory flushing tools. Table scans > on tables larger than say 4x memory should probably also use direct IO for > reads. -- greg ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware/OS recommendations for large databases (
Alan, On 11/18/05 10:30 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote: > Actually, this was dual cpu and there was other activity during the full > minute, but it was on other file devices, which I didn't include in the > above output. Given that, and given what I see on the box now I'd > raise the 20% to 30% just to be more conservative. It's all in the > kernel either way; using a different scheduler or file system would > change that result. Even better would be using direct IO to not flush > everything else from memory and avoid some memory copies from kernel to > user space. Note that almost none of the time is user time. Changing > postgresql won't change the cpu useage. These are all things that help on the IO wait side possibly, however, there is a producer/consumer problem in postgres that goes something like this: - Read some (small number of, sometimes 1) 8k pages - Do some work on those pages, including lots of copies - repeat This back and forth without threading (like AIO, or a multiprocessing executor) causes cycling and inefficiency that limits throughput. Optimizing some of the memcopies and other garbage out, plus increasing the internal (postgres) readahead would probably double the disk bandwidth. But to be disk-bound (meaning that the disk subsystem is running at full speed), requires asynchronous I/O. We do this now with Bizgres MPP, and we get fully saturated disk channels on every machine. That means that even on one machine, we run many times faster than non-MPP postgres. > One IMHO obvious improvement would be to have vacuum and analyze only do > direct IO. Now they appear to be very effective memory flushing tools. > Table scans on tables larger than say 4x memory should probably also use > direct IO for reads. That's been suggested many times prior - I agree, but this also needs AIO to be maximally effective. > I don't know what the system cost. It was part of block of dual > opterons from Sun that we got some time ago. I think the 130MB/s is > slow given the hardware, but it's acceptable. I'm not too price > sensitive; I care much more about reliability, uptime, etc. Then I know what they cost - we have them too (V20z and V40z). You should be getting 400MB/s+ with external RAID. >>> What am I doing wrong? >>> >>> 9 years ago I co-designed a petabyte data store with a goal of 1GB/s IO >>> (for a DOE lab). And now I don't know what I'm doing, >>> >> Cool. Would that be Sandia? >> >> We routinely sustain 2,000 MB/s from disk on 16x 2003 era machines on >> complex queries. > Disk?! 4 StorageTek tape silos. That would be .002 TB/s. One has to > change how you think when you have that much data. And hope you don't > have a fire, because there's no backup. That work was while I was at > BNL. I believe they are now at 4PB of tape and 150TB of disk. We had 1.5 Petabytes on 2 STK Silos at NAVO from 1996-1998 where I ran R&D. We also had a Cray T932 an SGI Origin 3000 with 256 CPUs, a Cray T3E with 1280 CPUs, 2 Cray J916s with 1 TB of shared disk, a Cray C90-16, a Sun E10K, etc etc, along with clusters of Alpha machines and lots of SGIs. It's nice to work with a $40M annual budget. Later, working with FSL we implemented a weather forecasting cluster that ultimately became the #5 fastest computer on the TOP500 supercomputing list from 512 Alpha cluster nodes. That machine had a 10-way shared SAN, tape robotics and a Myrinet interconnect and ran 64-bit Linux (in 1998). - Luke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: Alan, On 11/18/05 9:31 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote: Here's the output from one iteration of iostat -k 60 while the box is doing a select count(1) on a 238GB table. avg-cpu: %user %nice%sys %iowait %idle 0.990.00 17.97 32.40 48.64 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sdd 345.95130732.53 0.007843952 0 We're reading 130MB/s for a full minute. About 20% of a single cpu was being used. The remainder being idle. Cool - thanks for the results. Is that % of one CPU, or of 2? Was the system otherwise idle? Actually, this was dual cpu and there was other activity during the full minute, but it was on other file devices, which I didn't include in the above output. Given that, and given what I see on the box now I'd raise the 20% to 30% just to be more conservative. It's all in the kernel either way; using a different scheduler or file system would change that result. Even better would be using direct IO to not flush everything else from memory and avoid some memory copies from kernel to user space. Note that almost none of the time is user time. Changing postgresql won't change the cpu useage. One IMHO obvious improvement would be to have vacuum and analyze only do direct IO. Now they appear to be very effective memory flushing tools. Table scans on tables larger than say 4x memory should probably also use direct IO for reads. We've done nothing fancy and achieved results you claim shouldn't be possible. This is a system that was re-installed yesterday, no tuning was done to the file systems, kernel or storage array. Are you happy with 130MB/s? How much did you pay for that? Is it more than $2,000, or double my 2003 PC? I don't know what the system cost. It was part of block of dual opterons from Sun that we got some time ago. I think the 130MB/s is slow given the hardware, but it's acceptable. I'm not too price sensitive; I care much more about reliability, uptime, etc. What am I doing wrong? 9 years ago I co-designed a petabyte data store with a goal of 1GB/s IO (for a DOE lab). And now I don't know what I'm doing, Cool. Would that be Sandia? We routinely sustain 2,000 MB/s from disk on 16x 2003 era machines on complex queries. Disk?! 4 StorageTek tape silos. That would be .002 TB/s. One has to change how you think when you have that much data. And hope you don't have a fire, because there's no backup. That work was while I was at BNL. I believe they are now at 4PB of tape and 150TB of disk. -- Alan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware/OS recommendations for large databases (
Ok - so I ran the same test on my system and get a total speed of 113MB/sec. Why is this? Why is the system so limited to around just 110MB/sec? I tuned read ahead up a bit, and my results improve a bit.. Alex On 11/18/05, Luke Lonergan <[EMAIL PROTECTED]> wrote: > Dave, > > On 11/18/05 5:00 AM, "Dave Cramer" <[EMAIL PROTECTED]> wrote: > > > > Now there's an interesting line drawn in the sand. I presume you have > > numbers to back this up ? > > > > This should draw some interesting posts. > > Part 2: The answer > > System A: > > This system is running RedHat 3 Update 4, with a Fedora 2.6.10 Linux kernel. > > On a single table with 15 columns (the Bizgres IVP) at a size double memory > (2.12GB), Postgres 8.0.3 with Bizgres enhancements takes 32 seconds to scan > the table: that's 66 MB/s. Not the efficiency I'd hope from the onboard > SATA controller that I'd like, I would have expected to get 85% of the > 100MB/s raw read performance. > > So that's $1,200 / 66 MB/s (without adjusting for 2003 price versus now) = > 18.2 $/MB/s > > Raw data: > [EMAIL PROTECTED] IVP]$ cat scan.sh > #!/bin/bash > > time psql -c "select count(*) from ivp.bigtable1" dgtestdb > [EMAIL PROTECTED] IVP]$ cat sysout1 >count > -- > 1000 > (1 row) > > > real0m32.565s > user0m0.002s > sys 0m0.003s > > Size of the table data: > [EMAIL PROTECTED] IVP]$ du -sk dgtestdb/base > 2121648 dgtestdb/base > > System B: > > This system is running an XFS filesystem, and has been tuned to use very > large (16MB) readahead. It's running the Centos 4.1 distro, which uses a > Linux 2.6.9 kernel. > > Same test as above, but with 17GB of data takes 69.7 seconds to scan (!) > That's 244.2MB/s, which is obviously double my earlier point of 110-120MB/s. > This system is running with a 16MB Linux readahead setting, let's try it > with the default (I think) setting of 256KB – AHA! Now we get 171.4 seconds > or 99.3MB/s. > > So, using the tuned setting of "blockdev —setra 16384" we get $6,000 / > 244MB/s = 24.6 $/MB/s > If we use the default Linux setting it's 2.5x worse. > > Raw data: > [EMAIL PROTECTED] IVP]$ cat scan.sh > #!/bin/bash > > time psql -c "select count(*) from ivp.bigtable1" dgtestdb > [EMAIL PROTECTED] IVP]$ cat sysout3 >count > -- > 8000 > (1 row) > > > real1m9.875s > user0m0.000s > sys 0m0.004s > [EMAIL PROTECTED] IVP]$ !du > du -sk dgtestdb/base > 17021260dgtestdb/base > > Summary: > > OK – you can get more I/O bandwidth out of the current I/O > path for sequential scan if you tune the filesystem for large readahead. > This is a cheap alternative to overhauling the executor to use asynch I/O. > > Still, there is a CPU limit here – this is not I/O bound, it is CPU limited > as evidenced by the sensitivity to readahead settings. If the filesystem > could do 1GB/s, you wouldn't go any faster than 244MB/s. > > - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hardware/OS recommendations for large databases (
Bill, On 11/18/05 7:55 AM, "Bill McGonigle" <[EMAIL PROTECTED]> wrote: > > There is some truth to it. For an app I'm currently running (full-text > search using tsearch2 on ~100MB of data) on: Do you mean 100GB? Sounds like you are more like a decision support /warehousing application. > Dev System: > Asus bare-bones bookshelf case/mobo > 3GHz P4 w/ HT > 800MHz memory Bus > Fedora Core 3 (nightly update) > 1GB RAM > 1 SATA Seagate disk (7200RPM, 8MB Cache) > $800 > worst-case query: 7.2 seconds About the same machine I posted results for, except I had two faster disks. > now, the machine I'm deploying to: > > Dell SomthingOrOther > (4) 2.4GHz Xeons > 533MHz memory bus > RedHat Enterprise 3.6 > 1GB RAM > (5) 15 RPM Ultra SCSI 320 on an Adaptec RAID 5 controller >> $1 > same worst-case query: 9.6 seconds Your problem here is the HW RAID controller - if you dump it and use the onboard SCSI channels and Linux RAID you will see a jump from 40MB/s to about 220MB/s in read performance and from 20MB/s to 110MB/s write performance. It will use less CPU too. > Now it's not apples-to-apples. There's a kernel 2.4 vs. 2.6 difference > and the memory bus is much faster and I'm not sure what kind of context > switching hit you get with the Xeon MP memory controller. On a > previous postgresql app I did I ran nearly identically spec'ed machines > except for the memory bus and saw about a 30% boost in performance just > with the 800MHz bus. I imagine the Opteron bus does even better. Memory bandwidth is so high on both that it's not a factor. Context switching / memory bus contention isn't either. > So the small machine is probably slower on disk but makes up for it in > single-threaded access to CPU and memory speed. But if this app were to > be scaled it would make much more sense to cluster several $800 > machines than it would to buy 'big-iron'. Yes it does - by a lot too. Also, having a multiprocessing executor gets all of each machine by having multiple CPUs scan simultaneously. - Luke ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware/OS recommendations for large databases (
Alex, On 11/18/05 8:28 AM, "Alex Turner" <[EMAIL PROTECTED]> wrote: > Ok - so I ran the same test on my system and get a total speed of 113MB/sec. > Why is this? Why is the system so limited to around just 110MB/sec? I > tuned read ahead up a bit, and my results improve a bit.. OK! Now we're on the same page. Finally someone who actually tests! Check the CPU usage while it's doing the scan. Know what it's doing? Memory copies. We've profiled it extensively. So - that's the suckage - throwing more CPU power helps a bit, but the underlying issue is poorly optimized code in the Postgres executor and lack of I/O asynchrony. - Luke ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: Alan, On 11/18/05 8:13 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote: I told you in my initial post that I was observing numbers in excess of what you claiming, but you seemed to think I didn't know how to measure an IO rate. Prove me wrong, post your data. I should note too that our system uses about 20% of a single cpu when performing a table scan at >100MB/s of IO. I think you claimed the system would be cpu bound at this low IO rate. See above. Here's the output from one iteration of iostat -k 60 while the box is doing a select count(1) on a 238GB table. avg-cpu: %user %nice%sys %iowait %idle 0.990.00 17.97 32.40 48.64 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sdd 345.95130732.53 0.007843952 0 We're reading 130MB/s for a full minute. About 20% of a single cpu was being used. The remainder being idle. We've done nothing fancy and achieved results you claim shouldn't be possible. This is a system that was re-installed yesterday, no tuning was done to the file systems, kernel or storage array. What am I doing wrong? 9 years ago I co-designed a petabyte data store with a goal of 1GB/s IO (for a DOE lab). And now I don't know what I'm doing, Cheers, -- Alan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Hardware/OS recommendations for large databases (
Title: Re: [PERFORM] Hardware/OS recommendations for large databases ( Richard, On 11/18/05 5:22 AM, "Richard Huxton" wrote: Well, I'm prepared to swap Luke *TWO* $1000 systems for one $80,000 system if he's got one going :-) Finally, a game worth playing! Except it’s backward – I’ll show you 80 $1,000 systems performing 80 times faster than one $80,000 system. On your proposition – I don’t have any $80,000 systems for trade, do you? - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases (
Richard Huxton wrote: Dave Cramer wrote: On 18-Nov-05, at 1:07 AM, Luke Lonergan wrote: Postgres + Any x86 CPU from 2.4GHz up to Opteron 280 is CPU bound after 110MB/s of I/O. This is true of Postgres 7.4, 8.0 and 8.1. A $1,000 system with one CPU and two SATA disks in a software RAID0 will perform exactly the same as a $80,000 system with 8 dual core CPUs and the world's best SCSI RAID hardware on a large database for decision support (what the poster asked about). Now there's an interesting line drawn in the sand. I presume you have numbers to back this up ? This should draw some interesting posts. That's interesting, as I occasionally see more than 110MB/s of postgresql IO on our system. I'm using a 32KB block size, which has been a huge win in performance for our usage patterns. 300GB database with a lot of turnover. A vacuum analyze now takes about 3 hours, which is much shorter than before. Postgresql 8.1, dual opteron, 8GB memory, Linux 2.6.11, FC drives. -- Alan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware/OS recommendations for large databases (
Alan, On 11/18/05 5:41 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote: > > That's interesting, as I occasionally see more than 110MB/s of > postgresql IO on our system. I'm using a 32KB block size, which has > been a huge win in performance for our usage patterns. 300GB database > with a lot of turnover. A vacuum analyze now takes about 3 hours, which > is much shorter than before. Postgresql 8.1, dual opteron, 8GB memory, > Linux 2.6.11, FC drives. 300GB / 3 hours = 27MB/s. If you are using the 2.6 linux kernel, you may be fooled into thinking you burst more than you actually get in net I/O because the I/O stats changed in tools like iostat and vmstat. The only meaningful stats are (size of data) / (time to process data). Do a sequential scan of one of your large tables that you know the size of, then divide by the run time and report it. I'm compiling some new test data to make my point now. Regards, - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware/OS recommendations for large databases (
On Nov 18, 2005, at 08:00, Dave Cramer wrote: A $1,000 system with one CPU and two SATA disks in a software RAID0 will perform exactly the same as a $80,000 system with 8 dual core CPUs and the world's best SCSI RAID hardware on a large database for decision support (what the poster asked about). Now there's an interesting line drawn in the sand. I presume you have numbers to back this up ? This should draw some interesting posts. There is some truth to it. For an app I'm currently running (full-text search using tsearch2 on ~100MB of data) on: Dev System: Asus bare-bones bookshelf case/mobo 3GHz P4 w/ HT 800MHz memory Bus Fedora Core 3 (nightly update) 1GB RAM 1 SATA Seagate disk (7200RPM, 8MB Cache) $800 worst-case query: 7.2 seconds now, the machine I'm deploying to: Dell SomthingOrOther (4) 2.4GHz Xeons 533MHz memory bus RedHat Enterprise 3.6 1GB RAM (5) 15 RPM Ultra SCSI 320 on an Adaptec RAID 5 controller > $1 same worst-case query: 9.6 seconds Now it's not apples-to-apples. There's a kernel 2.4 vs. 2.6 difference and the memory bus is much faster and I'm not sure what kind of context switching hit you get with the Xeon MP memory controller. On a previous postgresql app I did I ran nearly identically spec'ed machines except for the memory bus and saw about a 30% boost in performance just with the 800MHz bus. I imagine the Opteron bus does even better. So the small machine is probably slower on disk but makes up for it in single-threaded access to CPU and memory speed. But if this app were to be scaled it would make much more sense to cluster several $800 machines than it would to buy 'big-iron'. -Bill - Bill McGonigle, Owner Work: 603.448.4440 BFC Computing, LLC Home: 603.448.1668 [EMAIL PROTECTED] Mobile: 603.252.2606 http://www.bfccomputing.com/Pager: 603.442.1833 Jabber: [EMAIL PROTECTED] Text: [EMAIL PROTECTED] Blog: http://blog.bfccomputing.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware/OS recommendations for large databases (
On Nov 18, 2005, at 1:07 AM, Luke Lonergan wrote: A $1,000 system with one CPU and two SATA disks in a software RAID0 will perform exactly the same as a $80,000 system with 8 dual core CPUs and the world's best SCSI RAID hardware on a large database for decision support (what the poster asked about). Hahahahahahahahahahahahaha! Whooo... needed to fall out of my chair laughing this morning. I can tell you from direct personal experience that you're just plain wrong. I've had to move my primary DB server from a dual P3 1GHz with 4-disk RAID10 SCSI, to Dual P3 2GHz with 14-disk RAID10 and faster drives, to Dual Opteron 2GHz with 8-disk RAID10 and even faster disks to keep up with my load on a 60+ GB database. The Dual opteron system has just a little bit of extra capacity if I offload some of the reporting operations to a replicated copy (via slony1). If I run all the queries on the one DB it can't keep up. One most telling point about the difference in speed is that the 14- disk array system cannot keep up with the replication being generated by the dual opteron, even when it is no doing any other queries of its own. The I/O system just ain't fast enough. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware/OS recommendations for large databases (
On Nov 18, 2005, at 10:13 AM, Luke Lonergan wrote:Still, there is a CPU limit here – this is not I/O bound, it is CPU limited as evidenced by the sensitivity to readahead settings. If the filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s.Yeah, and mysql would probably be faster on your trivial queries. Try concurrent large joins and updates and see which system is faster.
Re: [PERFORM] Hardware/OS recommendations for large databases (
On 18-Nov-05, at 8:30 AM, Luke Lonergan wrote: Richard, On 11/18/05 5:22 AM, "Richard Huxton"wrote: Well, I'm prepared to swap Luke *TWO* $1000 systems for one $80,000 system if he's got one going :-) Finally, a game worth playing! Except it’s backward – I’ll show you 80 $1,000 systems performing 80 times faster than one $80,000 system.Now you wouldn't happen to be selling a system that would enable this for postgres, now would ya ? On your proposition – I don’t have any $80,000 systems for trade, do you? - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases (
Title: Re: [PERFORM] Hardware/OS recommendations for large databases ( Dave, On 11/18/05 5:00 AM, "Dave Cramer" <[EMAIL PROTECTED]> wrote: > > Now there's an interesting line drawn in the sand. I presume you have > numbers to back this up ? > > This should draw some interesting posts. OK, here we go: The $1,000 system (System A): - I bought 16 of these in 2003 for $1,200 each. They have Intel or Asus motherboards, Intel P4 3.0GHz CPUs with an 800MHz FSB. They have a system drive and two RAID0 SATA drives, the Western Digital 74GB Raptor (10K RPM). They have 1GB of RAM. A test of write and read performance on the RAID0: [EMAIL PROTECTED] raid0]$ time dd if=/dev/zero of=bigfile bs=8k count=25 25+0 records in 25+0 records out real 0m17.453s user 0m0.249s sys 0m10.246s [EMAIL PROTECTED] raid0]$ time dd if=bigfile of=/dev/null bs=8k 25+0 records in 25+0 records out real 0m18.930s user 0m0.130s sys 0m3.590s So, the write performance is 114MB/s and read performance is 106MB/s. The $6,000 system (System B): I just bought 5 of these systems for $6,000 each. They are dual Opteron systems with 8GB of RAM and 2x 250 model CPUs, which are close to the fastest. They have the new 3Ware 9550SX SATA RAID adapters coupled to Western Digital 400GB RE2 model hard drives. They are organized as a RAID5. A test of write and read performance on the RAID5: [EMAIL PROTECTED] dbfast1]# time dd if=/dev/zero of=bigfile bs=8k count=200 200+0 records in 200+0 records out real 0m51.441s user 0m0.288s sys 0m29.119s [EMAIL PROTECTED] dbfast1]# time dd if=bigfile of=/dev/null bs=8k 200+0 records in 200+0 records out real 0m39.605s user 0m0.244s sys 0m19.207s So, the write performance is 314MB/s and read performance is 404MB/s (!) This is the fastest I’ve seen 8 disk drives perform. So, the question is: which of these systems (A or B) can scan a large table faster using non-MPP postgres? How much faster would you wager? Send your answer, and I’ll post the result. Regards, - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases
While I agree with you in principle that pg becomes CPU bound relatively easily compared to other DB products (at ~110-120MBps according to a recent thread), there's a bit of hyperbole in your post. a. There's a big difference between the worst performing 1C x86 ISA CPU available and the best performing 2C one (IIRC, that's the 2.4GHz, 1MB L2 cache AMDx2 4800+ as of this writing) b. Two 2C CPU's vs one 1C CPU means that a pg process will almost never be waiting on other non pg processes. It also means that 3-4 pg processes, CPU bound or not, can execute in parallel. Not an option with one 1C CPU. c. Mainboards with support for multiple CPUs and lots' of RAM are _not_ the cheap ones. d. No one should ever use RAID 0 for valuable data. Ever. So at the least you need 4 HD's for a RAID 10 set (RAID 5 is not a good option unless write performance is unimportant. 4HD RAID 5 is particularly not a good option.) e. The server usually needs to talk to things over a network connection. Often performance here matters. Mainboards with 2 1GbE NICs and/or PCI-X (or PCI-E) slots for 10GbE cards are not the cheap ones. f. Trash HDs mean poor IO performance and lower reliability. While TOTL 15Krpm 4Gb FC HDs are usually overkill (Not always. It depends on context.), you at least want SATA II HDs with NCQ or TCQ support. And you want them to have a decent media warranty- preferably a 5 year one if you can get it. Again, these are not the cheapest HD's available. g. Throughput limitations say nothing about latency considerations. OLTP-like systems _want_ HD spindles. AMAP. Even non OLTP-like systems need a fair number of spindles to optimize HD IO: dedicated WAL set, multiple dedicated DB sets, dedicated OS and swap space set, etc, etc. At 50MBps ASTR, you need 16 HD's operating in parallel to saturate the bandwidth of a PCI-X channel. That's ~8 independent pg tasks (queries using different tables, dedicated WAL IO, etc) running in parallel. Regardless of application domain. h. Decent RAID controllers and HBAs are not cheap either. Even SW RAID benefits from having a big dedicated RAM buffer to talk to. While the above may not cost you $80K, it sure isn't costing you $1K either. Maybe ~$15-$20K, but not $1K. Ron At 01:07 AM 11/18/2005, Luke Lonergan wrote: Greg, On 11/17/05 9:17 PM, "Greg Stark" <[EMAIL PROTECTED]> wrote: > Ok, a more productive point: it's not really the size of the database that > controls whether you're I/O bound or CPU bound. It's the available I/O > bandwidth versus your CPU speed. Postgres + Any x86 CPU from 2.4GHz up to Opteron 280 is CPU bound after 110MB/s of I/O. This is true of Postgres 7.4, 8.0 and 8.1. A $1,000 system with one CPU and two SATA disks in a software RAID0 will perform exactly the same as a $80,000 system with 8 dual core CPUs and the world's best SCSI RAID hardware on a large database for decision support (what the poster asked about). Regards, - Luke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware/OS recommendations for large databases (
Title: Re: [PERFORM] Hardware/OS recommendations for large databases ( Dave, On 11/18/05 5:00 AM, "Dave Cramer" <[EMAIL PROTECTED]> wrote: > > Now there's an interesting line drawn in the sand. I presume you have > numbers to back this up ? > > This should draw some interesting posts. Part 2: The answer System A: This system is running RedHat 3 Update 4, with a Fedora 2.6.10 Linux kernel. On a single table with 15 columns (the Bizgres IVP) at a size double memory (2.12GB), Postgres 8.0.3 with Bizgres enhancements takes 32 seconds to scan the table: that’s 66 MB/s. Not the efficiency I’d hope from the onboard SATA controller that I’d like, I would have expected to get 85% of the 100MB/s raw read performance. So that’s $1,200 / 66 MB/s (without adjusting for 2003 price versus now) = 18.2 $/MB/s Raw data: [EMAIL PROTECTED] IVP]$ cat scan.sh #!/bin/bash time psql -c "select count(*) from ivp.bigtable1" dgtestdb [EMAIL PROTECTED] IVP]$ cat sysout1 count -- 1000 (1 row) real 0m32.565s user 0m0.002s sys 0m0.003s Size of the table data: [EMAIL PROTECTED] IVP]$ du -sk dgtestdb/base 2121648 dgtestdb/base System B: This system is running an XFS filesystem, and has been tuned to use very large (16MB) readahead. It’s running the Centos 4.1 distro, which uses a Linux 2.6.9 kernel. Same test as above, but with 17GB of data takes 69.7 seconds to scan (!) That’s 244.2MB/s, which is obviously double my earlier point of 110-120MB/s. This system is running with a 16MB Linux readahead setting, let’s try it with the default (I think) setting of 256KB – AHA! Now we get 171.4 seconds or 99.3MB/s. So, using the tuned setting of “blockdev —setra 16384” we get $6,000 / 244MB/s = 24.6 $/MB/s If we use the default Linux setting it’s 2.5x worse. Raw data: [EMAIL PROTECTED] IVP]$ cat scan.sh #!/bin/bash time psql -c "select count(*) from ivp.bigtable1" dgtestdb [EMAIL PROTECTED] IVP]$ cat sysout3 count -- 8000 (1 row) real 1m9.875s user 0m0.000s sys 0m0.004s [EMAIL PROTECTED] IVP]$ !du du -sk dgtestdb/base 17021260 dgtestdb/base Summary: OK – you can get more I/O bandwidth out of the current I/O path for sequential scan if you tune the filesystem for large readahead. This is a cheap alternative to overhauling the executor to use asynch I/O. Still, there is a CPU limit here – this is not I/O bound, it is CPU limited as evidenced by the sensitivity to readahead settings. If the filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s. - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: Alan, On 11/18/05 5:41 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote: That's interesting, as I occasionally see more than 110MB/s of postgresql IO on our system. I'm using a 32KB block size, which has been a huge win in performance for our usage patterns. 300GB database with a lot of turnover. A vacuum analyze now takes about 3 hours, which is much shorter than before. Postgresql 8.1, dual opteron, 8GB memory, Linux 2.6.11, FC drives. 300GB / 3 hours = 27MB/s. That's 3 hours under load, with 80 compute clients beating on the database at the same time. We have the stats turned way up, so the analyze tends to read a big chunk of the tables a second time as well.We typically don't have three hours a day of idle time. -- Alan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware/OS recommendations for large databases (
Dave Cramer wrote: On 18-Nov-05, at 1:07 AM, Luke Lonergan wrote: Postgres + Any x86 CPU from 2.4GHz up to Opteron 280 is CPU bound after 110MB/s of I/O. This is true of Postgres 7.4, 8.0 and 8.1. A $1,000 system with one CPU and two SATA disks in a software RAID0 will perform exactly the same as a $80,000 system with 8 dual core CPUs and the world's best SCSI RAID hardware on a large database for decision support (what the poster asked about). Now there's an interesting line drawn in the sand. I presume you have numbers to back this up ? This should draw some interesting posts. Well, I'm prepared to swap Luke *TWO* $1000 systems for one $80,000 system if he's got one going :-) -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Hardware/OS recommendations for large databases (
On 17-Nov-05, at 2:50 PM, Alex Turner wrote: Just pick up a SCSI drive and a consumer ATA drive. Feel their weight. You don't have to look inside to tell the difference. At one point stereo manufacturers put weights in the case just to make them heavier. The older ones weighed more and the consumer liked heavy stereos. Be careful what you measure. Dave Alex On 11/16/05, David Boreham <[EMAIL PROTECTED]> wrote: I suggest you read this on the difference between enterprise/SCSI and desktop/IDE drives: http://www.seagate.com/content/docs/pdf/whitepaper/ D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf This is exactly the kind of vendor propaganda I was talking about and it proves my point quite well : that there's nothing specific relating to reliability that is different between SCSI and SATA drives cited in that paper. It does have a bunch of FUD such as 'oh yeah we do a lot more drive characterization during manufacturing'. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hardware/OS recommendations for large databases (
On 18-Nov-05, at 1:07 AM, Luke Lonergan wrote: Greg, On 11/17/05 9:17 PM, "Greg Stark" <[EMAIL PROTECTED]> wrote: Ok, a more productive point: it's not really the size of the database that controls whether you're I/O bound or CPU bound. It's the available I/O bandwidth versus your CPU speed. Postgres + Any x86 CPU from 2.4GHz up to Opteron 280 is CPU bound after 110MB/s of I/O. This is true of Postgres 7.4, 8.0 and 8.1. A $1,000 system with one CPU and two SATA disks in a software RAID0 will perform exactly the same as a $80,000 system with 8 dual core CPUs and the world's best SCSI RAID hardware on a large database for decision support (what the poster asked about). Now there's an interesting line drawn in the sand. I presume you have numbers to back this up ? This should draw some interesting posts. Dave Regards, - Luke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly