Re: [PERFORM] Final decision
Joshua D. Drake wrote: > Dave Page wrote: > > > > > > > >>-Original Message- > >>From: Joshua D. Drake [mailto:[EMAIL PROTECTED] > >>Sent: 27 April 2005 17:46 > >>To: Dave Page > >>Cc: Josh Berkus; Joel Fradkin; PostgreSQL Perform > >>Subject: Re: [PERFORM] Final decision > >> > >> > >>>It is? No-one told the developers... > >> > >>We have mentioned it on the list. > > > > > > Err, yes. But that's not quite the same as core telling us the current > > driver is being replaced. > > Well I don't think anyone knew that the current driver is still being > maintained? We have been looking for someone to take over ODBC and Pervasive agreed to do it, but there wasn't a big announcement about it. I have discussed this with them. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: ODBC driver overpopulation (was Re: [PERFORM] Final decision)
Joshua D. Drake wrote: > >>Mind you, having 2 different teams working on two different ODBC drivers is > >>a > >>problem for another list ... > > > > > > Only two? I thought another commercial entity was also working on their > > own ODBC driver, so there may be three of them. > > Well I only know of one company actually working on ODBC actively and > that is Command Prompt, If there are others I would like to hear about > it because I would rather work with someone than against them. Well, you should talk to Pervasive because they have a team working on improving the existing driver. I am sure they would want to work together too. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Final decision
Dave Page wrote: -Original Message- From: Joshua D. Drake [mailto:[EMAIL PROTECTED] Sent: 27 April 2005 17:46 To: Dave Page Cc: Josh Berkus; Joel Fradkin; PostgreSQL Perform Subject: Re: [PERFORM] Final decision It is? No-one told the developers... We have mentioned it on the list. Err, yes. But that's not quite the same as core telling us the current driver is being replaced. Well I don't think anyone knew that the current driver is still being maintained? Sincerely, Joshua D. Drake Regards, Dave. -- Your PostgreSQL solutions provider, Command Prompt, Inc. 24x7 support - 1.800.492.2240, programming, and consulting Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit http://www.commandprompt.com / http://www.postgresql.org ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: ODBC driver overpopulation (was Re: [PERFORM] Final decision)
Mind you, having 2 different teams working on two different ODBC drivers is a problem for another list ... Only two? I thought another commercial entity was also working on their own ODBC driver, so there may be three of them. Well I only know of one company actually working on ODBC actively and that is Command Prompt, If there are others I would like to hear about it because I would rather work with someone than against them. Sincerely, Joshua D. Drake -- Your PostgreSQL solutions provider, Command Prompt, Inc. 24x7 support - 1.800.492.2240, programming, and consulting Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit http://www.commandprompt.com / http://www.postgresql.org ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
ODBC driver overpopulation (was Re: [PERFORM] Final decision)
On Wed, Apr 27, 2005 at 08:09:27PM -0700, Josh Berkus wrote: > Mind you, having 2 different teams working on two different ODBC drivers is a > problem for another list ... Only two? I thought another commercial entity was also working on their own ODBC driver, so there may be three of them. -- Alvaro Herrera (<[EMAIL PROTECTED]>) "Always assume the user will do much worse than the stupidest thing you can imagine."(Julien PUYDT) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Final decision
Dave, folks, > Err, yes. But that's not quite the same as core telling us the current > driver is being replaced. Sorry, I spoke off the cuff.I also was unaware that work on the current driver had renewed. Us Core people are not omnicient, believe it or don't. Mind you, having 2 different teams working on two different ODBC drivers is a problem for another list ... -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Why is this system swapping?
Greg, > In fact I think it's generally superior to having a layer like pgpool > having to hand off all your database communication. Having to do an extra > context switch to handle every database communication is crazy. Although, one of their issues is that their database connection pooling is per-server.Which means that a safety margin of pre-allocated connections (something they need since they get bursts of 1000 new users in a few seconds) has to be maintained per server, increasing the total number of connections. So a pooling system that allowed them to hold 100 free connections centrally rather than 10 per server might be a win. Better would be getting some of this stuff offloaded onto database replication slaves. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Why is this system swapping?
Jeff <[EMAIL PROTECTED]> writes: > Are you (Anjan) using real or fake connection pooling - ie pgpool versus php's > persistent connections ? I'd strongly recommend looking at pgpool. it does > connection pooling correctly (A set of X connections shared among the entire > box rather than 1 per web server) Having one connection per web process isn't "fake connection pooling", it's a completely different arrangement. And there's nothing "incorrect" about it. In fact I think it's generally superior to having a layer like pgpool having to hand off all your database communication. Having to do an extra context switch to handle every database communication is crazy. For typical web sites where the database is the only slow component there's not much point in having more web server processes than connections anyways, All your doing is transferring the wait time from waiting for a web server process to waiting for a database process. Most applications that find they need connection pooling are using it to work around a poorly architected system that is mixing static requests (like images) and database driven requests in the same web server. However, your application sounds like it's more involved than a typical web server. If it's handling many slow resources, such as connections to multiple databases, SOAP services, mail, or other network services then you may well need that many processes. In which case you'll need something like pgpool. -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
BTW, http://stats.distributed.net/~decibel/base.log is a test I ran; select count(*) was ~6x faster than explain analyze select *. On Tue, Apr 26, 2005 at 07:46:52PM -0700, Kevin Brown wrote: > Josh Berkus wrote: > > Jim, Kevin, > > > > > > Hrm... I was about to suggest that for timing just the query (and not > > > > output/data transfer time) using explain analyze, but then I remembered > > > > that explain analyze can incur some non-trivial overhead with the timing > > > > calls. Is there a way to run the query but have psql ignore the output? > > > > If so, you could use \timing. > > > > > > Would timing "SELECT COUNT(*) FROM (query)" work? > > > > Just \timing would work fine; PostgreSQL doesn't return anything until it > > has > > the whole result set. > > Hmm...does \timing show the amount of elapsed time between query start > and the first results handed to it by the database (even if the > database itself has prepared the entire result set for transmission by > that time), or between query start and the last result handed to it by > the database? > > Because if it's the latter, then things like server<->client network > bandwidth are going to affect the results that \timing shows, and it > won't necessarily give you a good indicator of how well the database > backend is performing. I would expect that timing SELECT COUNT(*) > FROM (query) would give you an idea of how the backend is performing, > because the amount of result set data that has to go over the wire is > trivial. > > Each is, of course, useful in its own right, and you want to be able > to measure both (so, for instance, you can get an idea of just how > much your network affects the overall performance of your queries). > > > > That's why MSSQL vs. PostgreSQL timing comparisons are > > deceptive unless you're careful: MSSQL returns the results on block at a > > time, and reports execution time as the time required to return the *first* > > block, as opposed to Postgres which reports the time required to return the > > whole dataset. > > Interesting. I had no idea MSSQL did that, but I can't exactly say > I'm surprised. :-) > > > -- > Kevin Brown [EMAIL PROTECTED] > > ---(end of broadcast)--- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly > -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Why is this system swapping?
Using Resin's connection pooling. We are looking into pgpool alongside slony to separate some reporting functionality. -anjan -Original Message- From: Jeff [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 27, 2005 3:29 PM To: Greg Stark Cc: Anjan Dave; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Why is this system swapping? On Apr 27, 2005, at 2:29 PM, Greg Stark wrote: > "AI would seriously look at tuning those connection pools down. A lot. > If your > server processes are sitting idle over half the time I would at least > cut it > by a factor of 2. > Are you (Anjan) using real or fake connection pooling - ie pgpool versus php's persistent connections ? I'd strongly recommend looking at pgpool. it does connection pooling correctly (A set of X connections shared among the entire box rather than 1 per web server) -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Why is this system swapping?
Yes, HT is turned off (I haven't seen any recommendations to keep it on). This is when we were seeing 30 to 50% less traffic (users) than today - we didn't want the idle connections in the pool to expire too soon (default 30 secs, after which it goes back to pool) and reopen it quickly, or not have sufficient available (default 20 conns, we raised it to 50), so we figured a number per app server (50) and set that to expire after a very long time, so as to avoid any overhead, and always have the connection available whenever needed, without opening a new one. But now, for *some* reason, in some part of the day, we use up almost all connections in each app's pool. After that since they are set to expire after a long time, they remain there, taking up DB resources. I will be trimming down the idle-timeout to a few minutes first, see if that helps. Thanks, Anjan -Original Message- From: Greg Stark [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 27, 2005 2:29 PM To: Anjan Dave Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Why is this system swapping? "Anjan Dave" <[EMAIL PROTECTED]> writes: > Some background: > > This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4...pretty heavy > on concurrent usage. With peak traffic (db allows 1000 connections, in > line with the number of app servers and connection pools for each) > following is from 'top' (sorted by mem) Shared_buffers is 170MB, > sort_mem 2MB. Both WAL and pgdata are on separate LUNs on fibre channel > storage, RAID10. > > 972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped > > CPU states: cpuusernice systemirq softirq iowait idle >total 57.2%0.0% 23.2% 0.0% 3.6% 82.8% 232.4% This looks to me like most of your server processes are sitting around idle most of the time. > 21397 postgres 22 0 181M 180M 175M D25.9 1.5 85:17 0 > postmaster > > 23820 postgres 15 0 178M 177M 175M S 0.0 1.5 1:53 3 > postmaster So each process is taking up 8-11M of ram beyond the shared memory. 1,000 x 10M is 10G. Add in some memory for page tables and kernel data structures, as well as the kernel's need to keep some memory set aside for filesystem buffers (what you really want all that memory being used for anyways) and you've used up all your 12G. I would seriously look at tuning those connection pools down. A lot. If your server processes are sitting idle over half the time I would at least cut it by a factor of 2. Working the other direction: you have four processors (I guess you have hyperthreading turned off?) so ideally what you want is four runnable processes at all times and as few others as possible. If your load typically spends about half the time waiting on i/o (which is what that top output says) then you want a total of 8 connections. Realistically you might not be able to predict which app server will be providing the load at any given time, so you might want 8 connections per app server. And you might have some load that's more i/o intensive than the 50% i/o load shown here. Say you think some loads will be 80% i/o, you might want 20 connections for those loads. If you had 10 app servers with 20 connections each for a total of 200 connections I suspect that would be closer to right than having 1,000 connections. 200 connections would consume 2G of ram leaving you with 10G of filesystem cache. Which might in turn decrease the percentage of time waiting on i/o, which would decrease the number of processes you need even further... -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Final decision
> -Original Message- > From: Joshua D. Drake [mailto:[EMAIL PROTECTED] > Sent: 27 April 2005 17:46 > To: Dave Page > Cc: Josh Berkus; Joel Fradkin; PostgreSQL Perform > Subject: Re: [PERFORM] Final decision > > > It is? No-one told the developers... > > We have mentioned it on the list. Err, yes. But that's not quite the same as core telling us the current driver is being replaced. Regards, Dave. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Final decision
Joel Fradkin wrote: ... I am guessing our app is like 75% data entry and 25% reporting, but the reporting is taking the toll SQL wise. This was from my insert test with 15 users. Test type: Dynamic Simultaneous browser connections: 15 Warm up time (secs): 0 Test duration: 00:00:03:13 Test iterations: 200 Detailed test results generated: Yes Response Codes Response Code: 403 - The server understood the request, but is refusing to fulfill it. Count: 15 Percent (%): 0.29 Response Code: 302 - The requested resource resides temporarily under a different URI (Uniform Resource Identifier). Count: 200 Percent (%): 3.85 Response Code: 200 - The request completed successfully. Count: 4,980 Percent (%): 95.86 My select test with 25 users had this Properties Test type: Dynamic Simultaneous browser connections: 25 Warm up time (secs): 0 Test duration: 00:00:06:05 Test iterations: 200 Detailed test results generated: Yes Summary Total number of requests: 187 Total number of connections: 200 Average requests per second: 0.51 Average time to first byte (msecs): 30,707.42 Average time to last byte (msecs): 30,707.42 Average time to last byte per iteration (msecs): 28,711.44 Number of unique requests made in test: 1 Number of unique response codes: 1 Well, having a bandwidth of 392Bps seems *really* low. I mean that is a very old modem speed (3200 baud). I'm wondering if you are doing a lot of aggregating in the web server, and if you couldn't move some of that into the database by using plpgsql functions. That would take some of the load off of your IIS servers, and possibly improve your overall bandwidth. But I do agree, it looks like the select side is where you are hurting. If I understand the numbers correctly, you can do 5k inserts in 3min, but are struggling to do 200 selects in 6min. John =:-> Errors Counts HTTP: 0 DNS: 0 Socket: 26 Additional Network Statistics Average bandwidth (bytes/sec): 392.08 Number of bytes sent (bytes): 64,328 Number of bytes received (bytes): 78,780 Average rate of sent bytes (bytes/sec): 176.24 Average rate of received bytes (bytes/sec): 215.84 Number of connection errors: 0 Number of send errors: 13 Number of receive errors: 13 Number of timeout errors: 0 Response Codes Response Code: 200 - The request completed successfully. Count: 187 Percent (%): 100.00 Joel signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Why is this system swapping?
On Apr 27, 2005, at 2:29 PM, Greg Stark wrote: "AI would seriously look at tuning those connection pools down. A lot. If your server processes are sitting idle over half the time I would at least cut it by a factor of 2. Are you (Anjan) using real or fake connection pooling - ie pgpool versus php's persistent connections ? I'd strongly recommend looking at pgpool. it does connection pooling correctly (A set of X connections shared among the entire box rather than 1 per web server) -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Final decision
Joshua, This article was in July 2002, so is there update to this information? When will a new ODBC driver be available for testing? Is there a release of the ODBC driver with better performance than 7.0.3.0200 for a 7.4.x database? Steve Poe We have mentioned it on the list. http://www.linuxdevcenter.com/pub/a/linux/2002/07/16/drake.html Regards, Dave [and yes, I know Joshua said Command Prompt are rewriting /their/ driver] :) No we are rewriting a complete OSS driver. Sincerely, Joshua D. Drake Command Prompt, Inc. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Why is this system swapping?
Sorry, I didn't attach vmstat, the system does actively swap pages. Not to the point where it crawls, but for some brief periods the console becomes a bit unresponsive. I am taking this as a sign to prevent future problems. anjan -Original Message- From: Jeff [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 27, 2005 2:30 PM To: Anjan Dave Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Why is this system swapping? On Apr 27, 2005, at 1:48 PM, Anjan Dave wrote: > As you can see the system starts utilizing swap at some point, with so > many processes. Some time ago we had decided to keep the connections > from the pool open for longer You've shown the system has used swap but not that it is swapping. Having swap in use is fine - there is likely plenty of code and whatnot that is not being used so it dumped it out to swap. However if you are actively moving data to/from swap that is bad. Very bad. Especially on linux. To tell if you are swapping you need to watch the output of say, vmstat 1 and look at the si and so columns. Linux is very swap happy and likes to swap things for fun and profit. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Final decision
Just realize, you probably *don't* want to set that in postgresql.conf. You just want to issue an "SET enable_seqscan TO off" before issuing one of the queries that are mis-planned. I believe all the tested queries (90 some odd views) saw an improvement. I will however take the time to verify this and take your suggestion as I can certainly put the appropriate settings in each as opposed to using the config option, Thanks for the good advice (I believe Josh from Commandprompt.com also suggested this approach and I in my lazy self some how blurred the concept.) Also, I second the notion of getting a confidentiality contract. There have been several times where someone had a pathological case, and by sending the data to someone (Tom Lane), they were able to track down and fix the problem. Excellent point, Our data is confidential, but I should write something to allow me to ship concept without confidential, so in the future I can just send a backup and not have it break our agreements, but allow minds greater then my own to see, and feel my issues. What do you mean by "blew up"? IIS testing was being done with an old 2300 and a optiplex both machines reached 100%CPU utilization and the test suite (ASP code written in house by one of programmers) was not returning memory correctly, so it ran out of memory and died. Prior to death I did see cpu utilization on the 4proc linux box running postgres fluctuate and at times hit the 100% level, but the server seemed very stable. I did fix the memory usage of the suite and was able to see 50 concurrent users with fairly high RPS especially on select testing, the insert and update seemed to fall apart (many 404 errors etc) I assume you have IIS on a different machine than the database. Are you saying that the database slowed down dramatically, or that the machine crashed, or just that the web interface became unresponsive? Just the web interface. It probably depends on what queries are being done, and what kind of times you need. Usually the update machine needs the stronger hardware, so that it can do the writing. But it depends if you can wait longer to update data than to query data, obviously the opposite is true. It all depends on load, and that is pretty much application defined. I am guessing our app is like 75% data entry and 25% reporting, but the reporting is taking the toll SQL wise. This was from my insert test with 15 users. Test type: Dynamic Simultaneous browser connections: 15 Warm up time (secs): 0 Test duration: 00:00:03:13 Test iterations: 200 Detailed test results generated: Yes Response Codes Response Code: 403 - The server understood the request, but is refusing to fulfill it. Count: 15 Percent (%): 0.29 Response Code: 302 - The requested resource resides temporarily under a different URI (Uniform Resource Identifier). Count: 200 Percent (%): 3.85 Response Code: 200 - The request completed successfully. Count: 4,980 Percent (%): 95.86 My select test with 25 users had this Properties Test type: Dynamic Simultaneous browser connections: 25 Warm up time (secs): 0 Test duration: 00:00:06:05 Test iterations: 200 Detailed test results generated: Yes Summary Total number of requests: 187 Total number of connections: 200 Average requests per second: 0.51 Average time to first byte (msecs): 30,707.42 Average time to last byte (msecs): 30,707.42 Average time to last byte per iteration (msecs): 28,711.44 Number of unique requests made in test: 1 Number of unique response codes: 1 Errors Counts HTTP: 0 DNS: 0 Socket: 26 Additional Network Statistics Average bandwidth (bytes/sec): 392.08 Number of bytes sent (bytes): 64,328 Number of bytes received (bytes): 78,780 Average rate of sent bytes (bytes/sec): 176.24 Average rate of received bytes (bytes/sec): 215.84 Number of connection errors: 0 Number of send errors: 13 Number of receive errors: 13 Number of timeout errors: 0 Response Codes Response Code: 200 - The request completed successfully. Count: 187 Percent (%): 100.00 Joel ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Why is this system swapping?
"Anjan Dave" <[EMAIL PROTECTED]> writes: > Some background: > > This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4...pretty heavy > on concurrent usage. With peak traffic (db allows 1000 connections, in > line with the number of app servers and connection pools for each) > following is from 'top' (sorted by mem) Shared_buffers is 170MB, > sort_mem 2MB. Both WAL and pgdata are on separate LUNs on fibre channel > storage, RAID10. > > 972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped > > CPU states: cpuusernice systemirq softirq iowaitidle >total 57.2%0.0% 23.2% 0.0% 3.6% 82.8% 232.4% This looks to me like most of your server processes are sitting around idle most of the time. > 21397 postgres 22 0 181M 180M 175M D25.9 1.5 85:17 0 > postmaster > > 23820 postgres 15 0 178M 177M 175M S 0.0 1.5 1:53 3 > postmaster So each process is taking up 8-11M of ram beyond the shared memory. 1,000 x 10M is 10G. Add in some memory for page tables and kernel data structures, as well as the kernel's need to keep some memory set aside for filesystem buffers (what you really want all that memory being used for anyways) and you've used up all your 12G. I would seriously look at tuning those connection pools down. A lot. If your server processes are sitting idle over half the time I would at least cut it by a factor of 2. Working the other direction: you have four processors (I guess you have hyperthreading turned off?) so ideally what you want is four runnable processes at all times and as few others as possible. If your load typically spends about half the time waiting on i/o (which is what that top output says) then you want a total of 8 connections. Realistically you might not be able to predict which app server will be providing the load at any given time, so you might want 8 connections per app server. And you might have some load that's more i/o intensive than the 50% i/o load shown here. Say you think some loads will be 80% i/o, you might want 20 connections for those loads. If you had 10 app servers with 20 connections each for a total of 200 connections I suspect that would be closer to right than having 1,000 connections. 200 connections would consume 2G of ram leaving you with 10G of filesystem cache. Which might in turn decrease the percentage of time waiting on i/o, which would decrease the number of processes you need even further... -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Why is this system swapping?
On Apr 27, 2005, at 1:48 PM, Anjan Dave wrote: As you can see the system starts utilizing swap at some point, with so many processes. Some time ago we had decided to keep the connections from the pool open for longer You've shown the system has used swap but not that it is swapping. Having swap in use is fine - there is likely plenty of code and whatnot that is not being used so it dumped it out to swap. However if you are actively moving data to/from swap that is bad. Very bad. Especially on linux. To tell if you are swapping you need to watch the output of say, vmstat 1 and look at the si and so columns. Linux is very swap happy and likes to swap things for fun and profit. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Final decision
BTW, your performance troubleshooting will continue to be hampered if you can't share actual queries and data structure. I strongly suggest that you make a confidentiality contract with a support provider so that you can give them detailed (rather than general) problem reports. I am glad to hear your perspective, maybe my rollout is not as off base as I thought. FYI it is not that I can not share specifics (I have posted a few table structures and views here and on pgsql, I just can not backup the entire database and ship it off to a consultant. What I had suggested with Commandprompt was to use remote connectivity for him to have access to our server directly. In this way I can learn by watching what types of test he does and it allows him to do tests with our data set. Once I am in production that will not be something I want tests done on, so it may have to wait until we get a development box with a similar deployment (at the moment development is on a XP machine and production will be on Linux (The 4 proc is linux and will be our production). Thank you for letting me know what I can hope to see in the way of disk access on the next hardware procurement, I may email you off list to get the specific brands etc that you found that kind of through put with. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Why is this system swapping?
Hello, I am trying to understand what I need to do for this system to stop using swap. Maybe it’s something simple, or obvious for the situation. I’d appreciate some thoughts/suggestions. Some background: This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4…pretty heavy on concurrent usage. With peak traffic (db allows 1000 connections, in line with the number of app servers and connection pools for each) following is from ‘top’ (sorted by mem) Shared_buffers is 170MB, sort_mem 2MB. Both WAL and pgdata are on separate LUNs on fibre channel storage, RAID10. 972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 57.2% 0.0% 23.2% 0.0% 3.6% 82.8% 232.4% cpu00 22.0% 0.0% 9.1% 0.1% 0.9% 18.7% 48.8% cpu01 17.5% 0.0% 5.8% 0.0% 2.3% 19.7% 54.4% cpu02 7.8% 0.0% 3.7% 0.0% 0.0% 20.8% 67.5% cpu03 9.7% 0.0% 4.4% 0.0% 0.5% 23.6% 61.5% Mem: 12081744k av, 12055220k used, 26524k free, 0k shrd, 71828k buff 9020480k actv, 1741348k in_d, 237396k in_c Swap: 4096532k av, 472872k used, 3623660k free 9911176k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 21397 postgres 22 0 181M 180M 175M D 25.9 1.5 85:17 0 postmaster 23820 postgres 15 0 178M 177M 175M S 0.0 1.5 1:53 3 postmaster 24428 postgres 15 0 178M 177M 175M S 0.0 1.5 1:35 3 postmaster 24392 postgres 15 0 178M 177M 175M S 2.7 1.5 2:07 2 postmaster 23610 postgres 15 0 178M 177M 175M S 0.0 1.5 0:29 2 postmaster 24395 postgres 15 0 178M 177M 175M S 0.0 1.5 1:12 1 postmaster … … -bash-2.05b$ free total used free shared buffers cached Mem: 12081744 12055536 26208 0 66704 9943988 -/+ buffers/cache: 2044844 10036900 Swap: 4096532 512744 3583788 As you can see the system starts utilizing swap at some point, with so many processes. Some time ago we had decided to keep the connections from the pool open for longer periods of time, possibly to avoid connection maintenance overhead on the db. At that time the traffic was not as high as it is today, which might be causing this, because for the most part, non-idle postmaster processes are only a few, except when the system becomes busy and suddenly you see a lot of selects piling up, and load averages shooting upwards. I am thinking closing out connections sooner might help the system release some memory to the kernel. Swapping adds up to the IO, although OS is on separate channel than postgres. I can add more memory, but I want to make sure I haven’t missed out something obvious. Thanks! Anjan **This e-mail and any files transmitted with it are intended for the use of the addressee(s) only and may be confidential and covered by the attorney/client and other privileges. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.**
Re: [PERFORM] Final decision
Joel Fradkin wrote: I spent a great deal of time over the past week looking seriously at Postgres and MYSQL. Objectively I am not seeing that much of an improvement in speed with MYSQL, and we have a huge investment in postgrs. So I am planning on sticking with postgres fro our production database (going live this weekend). Glad to hear it. Good luck. ... Things I still have to make better are my settings in config, I have it set to no merge joins and no seq scans. Just realize, you probably *don't* want to set that in postgresql.conf. You just want to issue an "SET enable_seqscan TO off" before issuing one of the queries that are mis-planned. Because there are lots of times when merge join and seq scan is actually faster than the alternatives. And since I don't think you tested every query you are going to run, you probably want to let the planner handle the ones it gets right. (Usually it doesn't quite a good job.) Also, I second the notion of getting a confidentiality contract. There have been several times where someone had a pathological case, and by sending the data to someone (Tom Lane), they were able to track down and fix the problem. I am going to have to use flattened history files for reporting (I saw huge difference here the view for audit cube took 10 minutes in explain analyze and the flattened file took under one second). I understand both of these practices are not desirable, but I am at a place where I have to get it live and these are items I could not resolve. Nothing wrong with a properly updated flattened table. You just need to be careful to keep it consistent with the rest of the data. (Update triggers/lazy materialization, etc) I may try some more time with Commanpromt.com, or seek other professional help. In stress testing I found Postgres was holding up very well (but my IIS servers could not handle much of a load to really push the server). I have a few desktops acting as IIS servers at the moment and if I pushed past 50 consecutive users it pretty much blew the server up. On inserts that number was like 7 consecutive users and updates was also like 7 users. I believe that was totally IIS not postgres, but I am curious as to if using postgres odbc will put more stress on the IIS side then MSSQL did. What do you mean by "blew up"? I assume you have IIS on a different machine than the database. Are you saying that the database slowed down dramatically, or that the machine crashed, or just that the web interface became unresponsive? I did have a question if any folks are using two servers one for reporting and one for data entry what system should be the beefier? I have a 2proc machine I will be using and I can either put Sears off by themselves on this machine or split up functionality and have one for reporting and one for inserts and updates; so not sure which machine would be best for which spot (reminder the more robust is a 4proc with 8 gigs and 2 proc is 4 gigs, both dells). It probably depends on what queries are being done, and what kind of times you need. Usually the update machine needs the stronger hardware, so that it can do the writing. But it depends if you can wait longer to update data than to query data, obviously the opposite is true. It all depends on load, and that is pretty much application defined. Thank you for any ideas in this arena. Joel Fradkin John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Suggestions for a data-warehouse migration routine
Richard Rowell wrote: I've ported enough of my companies database to Postgres to make warehousing on PG a real possibility. I thought I would toss my data migration architecture ideas out for the list to shoot apart.. 1. Script on production server dumps the production database (MSSQL) to a set of delimited text files. 2. Script on production server moves files via FTP to a Postgres database server. 3. File Alteration Monitor trigger on PG server executes script when last file is transferred. 4. Script on PG server drops the target database (by issuing a "dropdb" command). 5. Script on PG server re-creates target database. (createdb command) 6. Script on PG server re-creates the tables. 7. Script on PG server issues COPY commands to import data. 8. Script on PG server indexes tables. 9. Script on PG server builds de-normalized reporting tables. 10. Script on PG server indexes the reporting tables. 11. Script on PG server creates needed reporting functions. 12. Vacuum analyze? My question revolves around the drop/create for the database. Is their significant downside to this approach? I'm taking this approach because it is simpler from a scripting point of view to simply start from scratch on each warehouse update. If I do not drop the database I would need to delete the contents of each table and drop all indexes prior to the COPY/data import. My assumption is all the table deletes and index drops would be more expensive then just droping/re-creating the entire database. I believe you are correct. If you are going to completely wipe the database, just drop it and re-create. Deleting is much slower than dropping. (One of the uses of partitioning is so that you can just drop one of the tables, rather than deleting the entries). Dropping the whole db skips any Foreign Key checks, etc. Also, is the Vacuum analyze step needed on a freshly minted database where the indexes have all been newly created? Thanks in advance for all feedback. ANALYZE is needed, since you haven't updated any of your statistics yet. So the planner doesn't really know how many rows there are. VACUUM probably isn't since everything should be pretty well aligned. John =:-> signature.asc Description: OpenPGP digital signature
Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?
"Dave Held" <[EMAIL PROTECTED]> writes: > > Actually, it's more to characterize how large of a sample > > we need. For example, if we sample 0.005 of disk pages, and > > get an estimate, and then sample another 0.005 of disk pages > > and get an estimate which is not even close to the first > > estimate, then we have an idea that this is a table which > > defies analysis based on small samples. > > I buy that. Better yet is to use the entire sample you've gathered of .01 and then perform analysis on that sample to see what the confidence interval is. Which is effectively the same as what you're proposing except looking at every possible partition. Unfortunately the reality according to the papers that were sent earlier is that you will always find the results disappointing. Until your sample is nearly the entire table your estimates for n_distinct will be extremely unreliable. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Final decision
It is? No-one told the developers... We have mentioned it on the list. http://www.linuxdevcenter.com/pub/a/linux/2002/07/16/drake.html Ooops ;) http://archives.postgresql.org/pgsql-odbc/2005-03/msg00109.php Sincerely, Joshua D. Drake Command Prompt, Inc. -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedication Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Final decision
Dave Page wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Josh Berkus Sent: 27 April 2005 17:14 To: Joel Fradkin Cc: PostgreSQL Perform Subject: Re: [PERFORM] Final decision Actually, I think the problem may be ODBC. Our ODBC driver is not the best and is currently being re-built from scratch. It is? No-one told the developers... We have mentioned it on the list. http://www.linuxdevcenter.com/pub/a/linux/2002/07/16/drake.html Regards, Dave [and yes, I know Joshua said Command Prompt are rewriting /their/ driver] :) No we are rewriting a complete OSS driver. Sincerely, Joshua D. Drake Command Prompt, Inc. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedication Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Final decision
Dave, > > Actually, I think the problem may be ODBC. Our ODBC driver > > is not the best > > and is currently being re-built from scratch. > > It is? No-one told the developers... > > Regards, Dave > > [and yes, I know Joshua said Command Prompt are rewriting /their/ > driver] OK. Well, let's put it this way: the v3 and v3.5 drivers will not be based on the current driver, unless you suddenly have a bunch of free time. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Final decision
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Josh Berkus > Sent: 27 April 2005 17:14 > To: Joel Fradkin > Cc: PostgreSQL Perform > Subject: Re: [PERFORM] Final decision > > Actually, I think the problem may be ODBC. Our ODBC driver > is not the best > and is currently being re-built from scratch. It is? No-one told the developers... Regards, Dave [and yes, I know Joshua said Command Prompt are rewriting /their/ driver] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Final decision
Joel, > So I am planning on sticking with postgres fro our production database > (going live this weekend). Glad to have you. > I did not find any resolutions to my issues with Commandprompt.com (we only > worked together 2.5 hours). BTW, your performance troubleshooting will continue to be hampered if you can't share actual queries and data structure. I strongly suggest that you make a confidentiality contract with a support provider so that you can give them detailed (rather than general) problem reports. > Most of my application is working about the same speed as MSSQL server > (unfortunately its twice the speed box, but as many have pointed out it > could be an issue with the 4 proc dell). I spent considerable time with > Dell and could see my drives are delivering 40 meg per sec. FWIW, on a v40z I get 180mb/s. So your disk array on the Dell is less than ideal ... basically, what you have is a more expensive box, not a faster one :-( > Things I still have to make better are my settings in config, I have it set > to no merge joins and no seq scans. Yeah, I'm also finding that our estimator underestimates the real cost of merge joins on some systems.Basically we need a sort-cost variable, because I've found an up to 2x difference in sort cost depending on architecture. > I am going to have to use flattened history files for reporting (I saw huge > difference here the view for audit cube took 10 minutes in explain analyze > and the flattened file took under one second). > I understand both of these practices are not desirable, but I am at a place > where I have to get it live and these are items I could not resolve. Flattening data for reporting is completely reasonable; I do it all the time. > I believe that was totally IIS not postgres, but I am curious as to if > using postgres odbc will put more stress on the IIS side then MSSQL did. Actually, I think the problem may be ODBC. Our ODBC driver is not the best and is currently being re-built from scratch. Is using npgsql, a much higher-performance driver (for .NET) out of the question? According to one company, npgsql performs better than drivers supplied by Microsoft. > I did have a question if any folks are using two servers one for reporting > and one for data entry what system should be the beefier? Depends on the relative # of users.This is often a good approach, because the requirements for DW reporting and OLTP are completely different. Basically: OLTP: Many slow processors, disk array set up for fast writes, moderate shared mem, low work_mem. DW: Few fast processors, disk array set up for fast reads, high shared mem and work mem. If reporting is at least 1/4 of your workload, I'd suggest spinning that off to the 2nd machine before putting one client on that machine.That way you can also use the 2nd machine as a failover back-up. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Suggestions for a data-warehouse migration routine
I've ported enough of my companies database to Postgres to make warehousing on PG a real possibility. I thought I would toss my data migration architecture ideas out for the list to shoot apart.. 1. Script on production server dumps the production database (MSSQL) to a set of delimited text files. 2. Script on production server moves files via FTP to a Postgres database server. 3. File Alteration Monitor trigger on PG server executes script when last file is transferred. 4. Script on PG server drops the target database (by issuing a "dropdb" command). 5. Script on PG server re-creates target database. (createdb command) 6. Script on PG server re-creates the tables. 7. Script on PG server issues COPY commands to import data. 8. Script on PG server indexes tables. 9. Script on PG server builds de-normalized reporting tables. 10. Script on PG server indexes the reporting tables. 11. Script on PG server creates needed reporting functions. 12. Vacuum analyze? My question revolves around the drop/create for the database. Is their significant downside to this approach? I'm taking this approach because it is simpler from a scripting point of view to simply start from scratch on each warehouse update. If I do not drop the database I would need to delete the contents of each table and drop all indexes prior to the COPY/data import. My assumption is all the table deletes and index drops would be more expensive then just droping/re-creating the entire database. Also, is the Vacuum analyze step needed on a freshly minted database where the indexes have all been newly created? Thanks in advance for all feedback. -- ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?
> -Original Message- > From: Josh Berkus [mailto:[EMAIL PROTECTED] > Sent: Wednesday, April 27, 2005 10:25 AM > To: Andrew Dunstan > Cc: Mischa Sandberg; pgsql-perform; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks > suggested? > > [...] > Actually, it's more to characterize how large of a sample > we need. For example, if we sample 0.005 of disk pages, and > get an estimate, and then sample another 0.005 of disk pages > and get an estimate which is not even close to the first > estimate, then we have an idea that this is a table which > defies analysis based on small samples. I buy that. > Wheras if the two estimates are < 1.0 stdev apart, we can > have good confidence that the table is easily estimated. I don't buy that. A negative indication is nothing more than proof by contradiction. A positive indication is mathematical induction over the set, which in this type of context is logically unsound. There is no reason to believe that two small samples with a small difference imply that a table is easily estimated rather than that you got unlucky in your samples. > [...] > Yes, actually. We need 3 different estimation methods: > 1 for tables where we can sample a large % of pages > (say, >= 0.1) > 1 for tables where we sample a small % of pages but are > "easily estimated" > 1 for tables which are not easily estimated by we can't > afford to sample a large % of pages. I don't buy that the first and second need to be different estimation methods. I think you can use the same block sample estimator for both, and simply stop sampling at different points. If you set the default to be a fixed number of blocks, you could get a large % of pages on small tables and a small % of pages on large tables, which is exactly how you define the first two cases. However, I think such a default should also be overridable to a % of the table or a desired accuracy. Of course, I would recommend the distinct sample technique for the third case. > If we're doing sampling-based estimation, I really don't > want people to lose sight of the fact that page-based random > sampling is much less expensive than row-based random > sampling. We should really be focusing on methods which > are page-based. Of course, that savings comes at the expense of having to account for factors like clustering within blocks. So block sampling is more efficient, but can also be less accurate. Nonetheless, I agree that of the sampling estimators, block sampling is the better technique. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Final decision
> > I did have a question if any folks are using two servers one for > reporting and one for data entry what system should be the beefier? Yeah. We started putting up slaves for reporting purposes and application specific areas using Slony replicating partial data sets to various locations -- some for reporting. If your reports have a long runtime and don't require transactional safety for writes (daily summary written or results aren't recorded in the DB at all) this is probably something to consider. I understand that PGAdmin makes Slony fairly painless to setup, but it can be time consuming to get going and Slony can add new complications depending on the data size and what you're doing with it -- but they're working hard to reduce the impact of those complications. > I have a 2proc machine I will be using and I can either put Sears off > by themselves on this machine or split up functionality and have one > for reporting and one for inserts and updates; so not sure which > machine would be best for which spot (reminder the more robust is a > 4proc with 8 gigs and 2 proc is 4 gigs, both dells). > > > > Thank you for any ideas in this arena. > > > > Joel Fradkin > > > > > > > > > -- ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?
Mischa, > >Perhaps I can save you some time (yes, I have a degree in Math). If I > >understand correctly, you're trying extrapolate from the correlation > >between a tiny sample and a larger sample. Introducing the tiny sample > >into any decision can only produce a less accurate result than just > >taking the larger sample on its own; GIGO. Whether they are consistent > >with one another has no relationship to whether the larger sample > >correlates with the whole population. You can think of the tiny sample > >like "anecdotal" evidence for wonderdrugs. Actually, it's more to characterize how large of a sample we need. For example, if we sample 0.005 of disk pages, and get an estimate, and then sample another 0.005 of disk pages and get an estimate which is not even close to the first estimate, then we have an idea that this is a table which defies analysis based on small samples. Wheras if the two estimates are < 1.0 stdev apart, we can have good confidence that the table is easily estimated. Note that this doesn't require progressively larger samples; any two samples would work. > I'm with Tom though in being very wary of solutions that require even > one-off whole table scans. Maybe we need an additional per-table > statistics setting which could specify the sample size, either as an > absolute number or as a percentage of the table. It certainly seems that > where D/N ~ 0.3, the estimates on very large tables at least are way way > out. Oh, I think there are several other cases where estimates are way out. Basically the estimation method we have doesn't work for samples smaller than 0.10. > Or maybe we need to support more than one estimation method. Yes, actually. We need 3 different estimation methods: 1 for tables where we can sample a large % of pages (say, >= 0.1) 1 for tables where we sample a small % of pages but are "easily estimated" 1 for tables which are not easily estimated by we can't afford to sample a large % of pages. If we're doing sampling-based estimation, I really don't want people to lose sight of the fact that page-based random sampling is much less expensive than row-based random sampling. We should really be focusing on methods which are page-based. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Final decision
Sorry I am using Redhat AS4 and postgres 8.0.2 Joel You didnt tell us what OS are you using, windows? If you want good performance you must install unix on that machine, ---
Re: [PERFORM] Final decision
-Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]On Behalf Of Joel FradkinSent: Wednesday, April 27, 2005 9:02 AMTo: PostgreSQL PerformSubject: [PERFORM] Final decision I spent a great deal of time over the past week looking seriously at Postgres and MYSQL. Objectively I am not seeing that much of an improvement in speed with MYSQL, and we have a huge investment in postgrs. So I am planning on sticking with postgres fro our production database (going live this weekend). Many people have offered a great deal of help and I appreciate all that time and energy. I did not find any resolutions to my issues with Commandprompt.com (we only worked together 2.5 hours). Most of my application is working about the same speed as MSSQL server (unfortunately its twice the speed box, but as many have pointed out it could be an issue with the 4 proc dell). I spent considerable time with Dell and could see my drives are delivering 40 meg per sec. Things I still have to make better are my settings in config, I have it set to no merge joins and no seq scans. I am going to have to use flattened history files for reporting (I saw huge difference here the view for audit cube took 10 minutes in explain analyze and the flattened file took under one second). I understand both of these practices are not desirable, but I am at a place where I have to get it live and these are items I could not resolve. I may try some more time with Commanpromt.com, or seek other professional help. In stress testing I found Postgres was holding up very well (but my IIS servers could not handle much of a load to really push the server). I have a few desktops acting as IIS servers at the moment and if I pushed past 50 consecutive users it pretty much blew the server up. On inserts that number was like 7 consecutive users and updates was also like 7 users. I believe that was totally IIS not postgres, but I am curious as to if using postgres odbc will put more stress on the IIS side then MSSQL did. I did have a question if any folks are using two servers one for reporting and one for data entry what system should be the beefier? I have a 2proc machine I will be using and I can either put Sears off by themselves on this machine or split up functionality and have one for reporting and one for inserts and updates; so not sure which machine would be best for which spot (reminder the more robust is a 4proc with 8 gigs and 2 proc is 4 gigs, both dells). Thank you for any ideas in this arena. Joel Fradkin You didnt tell us what OS are you using, windows? If you want good performance you must install unix on that machine, ---
[PERFORM] Final decision
I spent a great deal of time over the past week looking seriously at Postgres and MYSQL. Objectively I am not seeing that much of an improvement in speed with MYSQL, and we have a huge investment in postgrs. So I am planning on sticking with postgres fro our production database (going live this weekend). Many people have offered a great deal of help and I appreciate all that time and energy. I did not find any resolutions to my issues with Commandprompt.com (we only worked together 2.5 hours). Most of my application is working about the same speed as MSSQL server (unfortunately its twice the speed box, but as many have pointed out it could be an issue with the 4 proc dell). I spent considerable time with Dell and could see my drives are delivering 40 meg per sec. Things I still have to make better are my settings in config, I have it set to no merge joins and no seq scans. I am going to have to use flattened history files for reporting (I saw huge difference here the view for audit cube took 10 minutes in explain analyze and the flattened file took under one second). I understand both of these practices are not desirable, but I am at a place where I have to get it live and these are items I could not resolve. I may try some more time with Commanpromt.com, or seek other professional help. In stress testing I found Postgres was holding up very well (but my IIS servers could not handle much of a load to really push the server). I have a few desktops acting as IIS servers at the moment and if I pushed past 50 consecutive users it pretty much blew the server up. On inserts that number was like 7 consecutive users and updates was also like 7 users. I believe that was totally IIS not postgres, but I am curious as to if using postgres odbc will put more stress on the IIS side then MSSQL did. I did have a question if any folks are using two servers one for reporting and one for data entry what system should be the beefier? I have a 2proc machine I will be using and I can either put Sears off by themselves on this machine or split up functionality and have one for reporting and one for inserts and updates; so not sure which machine would be best for which spot (reminder the more robust is a 4proc with 8 gigs and 2 proc is 4 gigs, both dells). Thank you for any ideas in this arena. Joel Fradkin
Re: [PERFORM] What needs to be done for real Partitioning?
Hi, On Sun, Mar 20, 2005 at 06:01:49PM -0500, Tom Lane wrote: > Global indexes would seriously reduce the performance of both vacuum and > cluster for a single partition, and if you want seq scans you don't need > an index for that at all. So the above doesn't strike me as a strong > argument for global indexes ... I'd like to describe a usecase where a global index is usefull. We have a datawarehouse with invoices for a rolling window of a few years. Each invoice has several positions so a uk is (invoice,position). Dur to the fact that most of the queries are only on a few months or some quarters of a year, our pk starts with the time-attribute (followed by the dimension ids) which is the partition key (range). During the nightly update, we receive each updated invoice so we have to update that special (global unique) row which is resolved very fast by using the uk. So you can see, that there is a usefull case for providing a global index while using partitining and local indexes as well. Regards, Yann ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?
On Tue, 2005-04-26 at 15:00 -0700, Gurmeet Manku wrote: > 2. In a single scan, it is possible to estimate n_distinct by using > a very simple algorithm: > > "Distinct sampling for highly-accurate answers to distinct value > queries and event reports" by Gibbons, VLDB 2001. > > http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf That looks like the one... ...though it looks like some more complex changes to the current algorithm to use it, and we want the other stats as well... > 3. In fact, Gibbon's basic idea has been extended to "sliding windows" > (this extension is useful in streaming systems like Aurora / Stream): > > "Distributed streams algorithms for sliding windows" > by Gibbons and Tirthapura, SPAA 2002. > > http://home.eng.iastate.edu/~snt/research/tocs.pdf > ...and this offers the possibility of calculating statistics at load time, as part of the COPY command Best Regards, Simon Riggs ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])