[PERFORM] Read only transactions - Commit or Rollback
Hello, We have a database containing PostGIS MAP data, it is accessed mainly via JDBC. There are multiple simultaneous read-only connections taken from the JBoss connection pooling, and there usually are no active writers. We use connection.setReadOnly(true). Now my question is what is best performance-wise, if it does make any difference at all: Having autocommit on or off? (I presume off) Using commit or rollback? Committing / rolling back occasionally (e. G. when returning the connection to the pool) or not at all (until the pool closes the connection)? Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Read only transactions - Commit or Rollback
afaik, this should be completely neglectable. starting a transaction implies write access. if there is none, You do not need to think about transactions, because there are none. postgres needs to schedule the writing transactions with the reading ones, anyway. But I am not that performance profession anyway ;-) regards, Marcus -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Auftrag von Markus Schaber Gesendet: Dienstag, 20. Dezember 2005 11:41 An: PostgreSQL Performance List Betreff: [PERFORM] Read only transactions - Commit or Rollback Hello, We have a database containing PostGIS MAP data, it is accessed mainly via JDBC. There are multiple simultaneous read-only connections taken from the JBoss connection pooling, and there usually are no active writers. We use connection.setReadOnly(true). Now my question is what is best performance-wise, if it does make any difference at all: Having autocommit on or off? (I presume off) Using commit or rollback? Committing / rolling back occasionally (e. G. when returning the connection to the pool) or not at all (until the pool closes the connection)? Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] unsubscribe
unsubscribe ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Read only transactions - Commit or Rollback
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Nörder-Tuitje wrote: | We have a database containing PostGIS MAP data, it is accessed | mainly via JDBC. There are multiple simultaneous read-only | connections taken from the JBoss connection pooling, and there | usually are no active writers. We use connection.setReadOnly(true). | | Now my question is what is best performance-wise, if it does make | any difference at all: | | Having autocommit on or off? (I presume off) | | Using commit or rollback? | | Committing / rolling back occasionally (e. G. when returning the | connection to the pool) or not at all (until the pool closes the | connection)? | | afaik, this should be completely neglectable. | | starting a transaction implies write access. if there is none, You do | not need to think about transactions, because there are none. | | postgres needs to schedule the writing transactions with the reading | ones, anyway. | | But I am not that performance profession anyway ;-) Hello, Marcus, Nörder, list. What about isolation? For several dependent calculations, MVCC doesn't happen a bit with autocommit turned on, right? Cheers, - -- ~Grega Bremec ~gregab at p0f dot net -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFDp+2afu4IwuB3+XoRA6j3AJ0Ri0/NrJtHg4xBNcFsVFFW0XvCoQCfereo aX6ThZIlPL0RhETJK9IcqtU= =xalw -END PGP SIGNATURE- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Read only transactions - Commit or Rollback
Mmmm, good question. MVCC blocks reading processes when data is modified. using autocommit implies that each modification statement is an atomic operation. on a massive readonly table, where no data is altered, MVCC shouldn't have any effect (but this is only an assumption) basing on http://en.wikipedia.org/wiki/Mvcc using rowlevel locks with write access should make most of the mostly available to reading-only sessions, but this is an assumption only, too. maybe the community knows a little more ;-) regards, marcus -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Auftrag von Grega Bremec Gesendet: Dienstag, 20. Dezember 2005 12:41 An: PostgreSQL Performance List Betreff: Re: [PERFORM] Read only transactions - Commit or Rollback -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Nörder-Tuitje wrote: | We have a database containing PostGIS MAP data, it is accessed | mainly via JDBC. There are multiple simultaneous read-only | connections taken from the JBoss connection pooling, and there | usually are no active writers. We use connection.setReadOnly(true). | | Now my question is what is best performance-wise, if it does make | any difference at all: | | Having autocommit on or off? (I presume off) | | Using commit or rollback? | | Committing / rolling back occasionally (e. G. when returning the | connection to the pool) or not at all (until the pool closes the | connection)? | | afaik, this should be completely neglectable. | | starting a transaction implies write access. if there is none, You do | not need to think about transactions, because there are none. | | postgres needs to schedule the writing transactions with the reading | ones, anyway. | | But I am not that performance profession anyway ;-) Hello, Marcus, Nörder, list. What about isolation? For several dependent calculations, MVCC doesn't happen a bit with autocommit turned on, right? Cheers, - -- ~Grega Bremec ~gregab at p0f dot net -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFDp+2afu4IwuB3+XoRA6j3AJ0Ri0/NrJtHg4xBNcFsVFFW0XvCoQCfereo aX6ThZIlPL0RhETJK9IcqtU= =xalw -END PGP SIGNATURE- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Read only transactions - Commit or Rollback
Hi, Marcus, Nörder-Tuitje wrote: afaik, this should be completely neglectable. starting a transaction implies write access. if there is none, You do not need to think about transactions, because there are none. Hmm, I always thought that the transaction will be opened at the first statement, because there _could_ be a parallel writing transaction started later. postgres needs to schedule the writing transactions with the reading ones, anyway. As I said, there usually are no writing transactions on the same database. Btw, there's another setting that might make a difference: Having ACID-Level SERIALIZABLE or READ COMMITED? Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Read only transactions - Commit or Rollback
Markus Schaber schrieb: Hello, We have a database containing PostGIS MAP data, it is accessed mainly via JDBC. There are multiple simultaneous read-only connections taken from the JBoss connection pooling, and there usually are no active writers. We use connection.setReadOnly(true). Now my question is what is best performance-wise, if it does make any difference at all: Having autocommit on or off? (I presume off) If you are using large ResultSets, it is interesting to know that Statement.setFetchSize() does not do anything as long as you have autocommit on. So you might want to always disable autocommit and set a reasonable fetch size with large results, or otherwise have serious memory problems in Java/JDBC. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Read only transactions - Commit or Rollback
Markus Schaber writes: As I said, there usually are no writing transactions on the same database. Btw, there's another setting that might make a difference: Having ACID-Level SERIALIZABLE or READ COMMITED? Well, if nonrepeatable or phantom reads would pose a problem because of those occasional writes, you wouldn't be considering autocommit for performance reasons either, would you? regards, Andreas -- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Read only transactions - Commit or Rollback
On 12/20/05, Nörder-Tuitje, Marcus [EMAIL PROTECTED] wrote: MVCC blocks reading processes when data is modified. That is incorrect. The main difference between 2PL and MVCC is that readers are never blocked under MVCC. greetings, Nicolas -- Nicolas Barbier http://www.gnu.org/philosophy/no-word-attachments.html ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] filesystem performance with lots of files
David Lang wrote: ext3 has an option to make searching directories faster (htree), but enabling it kills performance when you create files. And this doesn't help with large files. The ReiserFS white paper talks about the data structure he uses to store directories (some kind of tree), and he says it's quick to both read and write. Don't forget if you find ls slow, that could just be ls, since it's ls, not the fs, that sorts this files into alphabetical order. how long would it take to do a tar-ftp-untar cycle with no smarts Note that you can do the taring, zipping, copying and untaring concurrentlt. I can't remember the exactl netcat command line options, but it goes something like this Box1: tar czvf - myfiles/* | netcat myserver:12345 Box2: netcat -listen 12345 | tar xzvf - Not only do you gain from doing it all concurrently, but not writing a temp file means that disk seeks a reduced too if you have a one spindle machine. Also condsider just copying files onto a network mount. May not be as fast as the above, but will be faster than rsync, which has high CPU usage and thus not a good choice on a LAN. Hmm, sorry this is not directly postgres anymore... David
Re: [PERFORM] High context switches occurring
Oleg Bartunov oleg@sai.msu.su writes: I see a very low performance and high context switches on our dual itanium2 slackware box (Linux ptah 2.6.14 #1 SMP) with 8Gb of RAM, running 8.1_STABLE. Any tips here ? [EMAIL PROTECTED]:~/cvs/8.1/pgsql/contrib/pgbench$ time pgbench -s 10 -c 10 -t 3000 pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 10 You can't expect any different with more clients than scaling factor :-(. Note that -s is only effective when supplied with -i; it's basically ignored during an actual test run. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL and Ultrasparc T1
Jignesh K. Shah wrote: I guess it depends on what you term as your metric for measurement. If it is just one query execution time .. It may not be the best on UltraSPARC T1. But if you have more than 8 complex queries running simultaneously, UltraSPARC T1 can do well compared comparatively provided the application can scale also along with it. I just want to clarify one issue here. It's my understanding that the 8-core, 4 hardware thread (known as strands) system is seen as a 32 cpu system by Solaris. So, one could have up to 32 postgresql processes running in parallel on the current systems (assuming the application can scale). -- Alan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL and Ultrasparc T1
On Tue, 20 Dec 2005, Alan Stange wrote: Jignesh K. Shah wrote: I guess it depends on what you term as your metric for measurement. If it is just one query execution time .. It may not be the best on UltraSPARC T1. But if you have more than 8 complex queries running simultaneously, UltraSPARC T1 can do well compared comparatively provided the application can scale also along with it. I just want to clarify one issue here. It's my understanding that the 8-core, 4 hardware thread (known as strands) system is seen as a 32 cpu system by Solaris. So, one could have up to 32 postgresql processes running in parallel on the current systems (assuming the application can scale). note that like hyperthreading, the strands aren't full processors, their efficiancy depends on how much other threads shareing the core stall waiting for external things. David Lang ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] PostgreSQL and Ultrasparc T1
David Lang wrote: On Tue, 20 Dec 2005, Alan Stange wrote: Jignesh K. Shah wrote: I guess it depends on what you term as your metric for measurement. If it is just one query execution time .. It may not be the best on UltraSPARC T1. But if you have more than 8 complex queries running simultaneously, UltraSPARC T1 can do well compared comparatively provided the application can scale also along with it. I just want to clarify one issue here. It's my understanding that the 8-core, 4 hardware thread (known as strands) system is seen as a 32 cpu system by Solaris. So, one could have up to 32 postgresql processes running in parallel on the current systems (assuming the application can scale). note that like hyperthreading, the strands aren't full processors, their efficiancy depends on how much other threads shareing the core stall waiting for external things. Exactly. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Read only transactions - Commit or Rollback
Hi, Tom, Tom Lane wrote: Some time ago, I had some tests with large bulk insertions, and it turned out that SERIALIZABLE seemed to be 30% faster, which surprised us. That surprises me too --- can you provide details on the test case so other people can reproduce it? AFAIR the only performance difference between SERIALIZABLE and READ COMMITTED is the frequency with which transaction status snapshots are taken; your report suggests you were spending 30% of the time in GetSnapshotData, which is a lot higher than I've ever seen in a profile. It was in my previous Job two years ago, so I don't have access to the exact code, and my memory is foggy. It was PostGIS 0.8 and PostgreSQL 7.4. AFAIR, it was inserting into a table with about 6 columns and some indices, some columns having database-provided values (now() and a SERIAL column), where the other columns (a PostGIS Point, a long, a foreign key into another table) were set via the aplication. We tried different insertion methods (INSERT, prepared statements, a pgjdbc patch to allow COPY support), different bunch sizes and different number of parallel connections to get the highest overall insert speed. However, the project never went productive the way it was designed initially. As you write about transaction snapshots: It may be that the PostgreSQL config was not optimized well enough, and the hard disk was rather slow. As to the original question, a transaction that hasn't modified the database does not bother to write either a commit or abort record to pg_xlog. I think you'd be very hard pressed to measure any speed difference between saying COMMIT and saying ROLLBACK after a read-only transaction. It'd be worth your while to let transactions run longer to minimize their startup/shutdown overhead, but there's a point of diminishing returns --- you don't want client code leaving transactions open for hours, because of the negative side-effects of holding locks that long (eg, VACUUM can't reclaim dead rows). Okay, so I'll stick with my current behaviour (Autocommit off and ROLLBACK after each bunch of work). Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] PostgreSQL and Ultrasparc T1
Jignesh, Juan says the following below: I figured the number of cores on the T1000/2000 processors would be utilized by the forked copies of the postgresql server. From the comments I have seen so far it does not look like this is the case. I think this needs to be refuted. Doesn't Solaris switch processes as well as threads (LWPs, whatever) equally well amongst cores? I realize the process context switch is more expensive than the thread switch, but Solaris will utilize all cores as processes or threads become ready to run, correct? BTW, it's great to see folks with your email address on the list. I feel it points to a brighter future for all involved. Thanks, Rick Jignesh K. Shah [EMAIL PROTECTED] To Sent by: Juan Casero [EMAIL PROTECTED] pgsql-performance cc [EMAIL PROTECTED] pgsql-performance@postgresql.org .org Subject Re: [PERFORM] PostgreSQL and Ultrasparc T1 12/19/2005 11:19 PM I guess it depends on what you term as your metric for measurement. If it is just one query execution time .. It may not be the best on UltraSPARC T1. But if you have more than 8 complex queries running simultaneously, UltraSPARC T1 can do well compared comparatively provided the application can scale also along with it. The best way to approach is to figure out your peak workload, find an accurate way to measure the true metric and then design a benchmark for it and run it on both servers. Regards, Jignesh Juan Casero wrote: Ok. That is what I wanted to know. Right now this database is a PostgreSQL 7.4.8 system. I am using it in a sort of DSS role. I have weekly summaries of the sales for our division going back three years. I have a PHP based webapp that I wrote to give the managers access to this data. The webapp lets them make selections for reports and then it submits a parameterized query to the database for execution. The returned data rows are displayed and formatted in their web browser. My largest sales table is about 13 million rows along with all the indexes it takes up about 20 gigabytes. I need to scale this application up to nearly 100 gigabytes to handle daily sales summaries. Once we start looking at daily sales figures our database size could grow ten to twenty times. I use postgresql because it gives me the kind of enterprise database features I need to program the complex logic for the queries.I also need the transaction isolation facilities it provides so I can optimize the queries in plpgsql without worrying about multiple users temp tables colliding with each other. Additionally, I hope to rewrite the front end application in JSP so maybe I could use the multithreaded features of the Java to exploit a multicore multi-cpu system. There are almost no writes to the database tables. The bulk of the application is just executing parameterized queries and returning huge amounts of data. I know bizgres is supposed to be better at this but I want to stay away from anything that is beta. I cannot afford for this thing to go wrong. My reasoning for looking at the T1000/2000 was simply the large number of cores. I know postgresql uses a super server that forks copies of itself to handle incoming requests on port 5432. But I figured the number of cores on the T1000/2000 processors would be utilized by the forked copies of the postgresql server. From the comments I have seen so far it does not look like this is the case. We had originally sized up a dual processor dual core AMD opteron system from HP for this but I thought I could get more bang for the buck on a T1000/2000. It now seems I may have been wrong. I am stronger in Linux than Solaris so I am not upset I am just trying to find the best hardware for the anticipated needs of this application. Thanks, Juan On Monday 19 December 2005 01:25, Scott Marlowe wrote: From: [EMAIL PROTECTED] on behalf of Juan Casero QUOTE: Hi - Can anyone tell me how well PostgreSQL 8.x performs on the new Sun Ultrasparc T1 processor and architecture on
Re: [PERFORM] PostgreSQL and Ultrasparc T1
But yes All LWPs (processes and threads) are switched across virtual CPUS . There is intelligence built in Solaris to understand which strands are executing on which cores and it will balance out the cores too so if there are only 8 threads running they will essentially run on separate cores rather than 2 cores with 8 threads. The biggest limitation is application scaling. pgbench shows that with more processes trying to bottleneck on same files will probably not perform better unless you tune your storage/file system. Those are the issues which we typically try to solve with community partners (vendors, open source) since that gives the biggest benefits. Best example to verify in such multi-processes environment, do you see greater than 60% avg CPU utilization in your dual/quad config Xeons/Itaniums, then Sun Fire T2000 will help you a lot. However if you are stuck below 50% (for dual) or 25% (for quad) which means you are pretty much stuck at 1 CPU performance and/or probably have more IO related contention then it won't help you with these systems. I hope you get the idea on when a workload will perform better on Sun Fire T2000 without burning hands. I will try to test some more with PostgreSQL on these systems to kind of highlight what can work or what will not work. Is pgbench the workload that you prefer? (It already has issues with pg_xlog so my guess is it probably won't scale much) If you have other workload informations let me know. Thanks. Regards, Jignesh [EMAIL PROTECTED] wrote: Jignesh, Juan says the following below: I figured the number of cores on the T1000/2000 processors would be utilized by the forked copies of the postgresql server. From the comments I have seen so far it does not look like this is the case. I think this needs to be refuted. Doesn't Solaris switch processes as well as threads (LWPs, whatever) equally well amongst cores? I realize the process context switch is more expensive than the thread switch, but Solaris will utilize all cores as processes or threads become ready to run, correct? BTW, it's great to see folks with your email address on the list. I feel it points to a brighter future for all involved. Thanks, Rick Jignesh K. Shah [EMAIL PROTECTED] To Sent by: Juan Casero [EMAIL PROTECTED] pgsql-performance cc [EMAIL PROTECTED] pgsql-performance@postgresql.org .org Subject Re: [PERFORM] PostgreSQL and Ultrasparc T1 12/19/2005 11:19 PM I guess it depends on what you term as your metric for measurement. If it is just one query execution time .. It may not be the best on UltraSPARC T1. But if you have more than 8 complex queries running simultaneously, UltraSPARC T1 can do well compared comparatively provided the application can scale also along with it. The best way to approach is to figure out your peak workload, find an accurate way to measure the true metric and then design a benchmark for it and run it on both servers. Regards, Jignesh Juan Casero wrote: Ok. That is what I wanted to know. Right now this database is a PostgreSQL 7.4.8 system. I am using it in a sort of DSS role. I have weekly summaries of the sales for our division going back three years. I have a PHP based webapp that I wrote to give the managers access to this data. The webapp lets them make selections for reports and then it submits a parameterized query to the database for execution. The returned data rows are displayed and formatted in their web browser. My largest sales table is about 13 million rows along with all the indexes it takes up about 20 gigabytes. I need to scale this application up to nearly 100 gigabytes to handle daily sales summaries. Once we start looking at daily sales figures our database size could grow ten to twenty times. I use postgresql because it gives me the kind of enterprise database features I need to program the complex logic for the queries.I also need the
[PERFORM] What's the best hardver for PostgreSQL 8.1?
Hi! What do you suggest for the next problem? We have complex databases with some 100million rows (2-3million new records per month). Our current servers are working on low resposibility in these days, so we have to buy new hardver for database server. Some weeks ago we started to work with PostgreSQL8.1, which solved the problem for some months. There are some massive, hard query execution, which are too slow (5-10 or more minutes). The parallel processing is infrequent (rarely max. 4-5 parallel query execution). So we need high performance in query execution with medium parallel processability. What's your opinion what productions could help us? What is the best or only better choice? The budget line is about 30 000$ - 40 000$. Regards, Atesz ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Overriding the optimizer
On Sat, Dec 17, 2005 at 07:31:40AM -0500, Jaime Casanova wrote: Yeah it would - an implementation I have seen that I like is where the developer can supply the *entire* execution plan with a query. This is complex enough to make casual use unlikely :-), but provides the ability to try out other plans, and also fix that vital query that must run today. Being able to specify an exact plan would also provide for query plan stability; something that is critically important in certain applications. If you have to meet a specific response time requirement for a query, you can't afford to have the optimizer suddenly decide that some other plan might be faster when in fact it's much slower. Plan stability doesn't mean time response stability... The plan that today is almost instantaneous tomorrow can take hours... Sure, if your underlying data changes that much, but that's often not going to happen in production systems (especially OLTP where this is most important). Of course if you have a proposal for ensuring that a query always finishes in X amount of time, rather than always using the same plan, I'd love to hear it. ;) -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] make bulk deletes faster?
On Mon, Dec 19, 2005 at 11:10:50AM -0800, James Klo wrote: Yes, I've considered partitioning as a long term change. I was thinking about this for other reasons - mainly performance. If I go the partitioning route, would I need to even perform archival? No. The idea is that you have your table split up into date ranges (perhaps each week gets it's own table). IE: table_2005w01, table_2005w02, etc. You can do this with either inheritence or individual tables and a UNION ALL view. In your case, inheritence is probably the better way to go. Now, if you have everything broken down by weeks and you typically only need to access 7 days worth of data, then generally you will only be reading from two tables, so those two tables should stay in memory, and indexes on them will be smaller. If desired, you can also play tricks on the older tables surch as vacuum full or cluster to further reduce space usage and improve performance. The larger problem that I need to solve is really twofold: 1. Need to keep reads on timeblocks that are from the current day through the following seven days very fast, especially current day reads. 2. Need to be able to maintain the timeblocks for reporting purposes, for at least a year (potentially more). This could probably better handled performing aggregate analysis, but this isn't on my current radar. I've written an RRD-like implementation in SQL that might interest you; it's at http://rrs.decibel.org (though the svn web access appears to be down right now...) -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL and Ultrasparc T1
On Tue, Dec 20, 2005 at 12:20:55PM -0500, Jignesh K. Shah wrote: Is pgbench the workload that you prefer? (It already has issues with pg_xlog so my guess is it probably won't scale much) If you have other workload informations let me know. From what the user described, dbt3 would probably be the best benchmark to use. Note that they're basically read-only, which is absolutely not what pgbench does. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL and Ultrasparc T1
On Sun, Dec 18, 2005 at 11:35:15AM -0500, Juan Casero wrote: Can anyone tell me how well PostgreSQL 8.x performs on the new Sun Ultrasparc T1 processor and architecture on Solaris 10? I have a custom built retail sales reporting that I developed using PostgreSQL 7.48 and PHP on a Fedora People have seen some pretty big gains going from 7.4 to 8.1. I recently migrated http://stats.distributed.net and the daily processing (basically OLAP) times were cut in half. As someone else mentioned, IO is probably going to be your biggest consideration, unless you have a lot of queries running at once. Probably your best bang for the buck will be from an Opteron-based server with a good number of internal drives. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Any way to optimize GROUP BY queries?
On Mon, Dec 19, 2005 at 03:47:35PM -0500, Greg Stark wrote: Increase your work_mem (or sort_mem in older postgres versions), you can do this for the server as a whole or just for this one session and set it back after this one query. You can increase it up until it starts causing swapping at which point it would be counter productive. Just remember that work_memory is per-operation, so it's easy to push the box into swapping if the workload increases. You didn't say how much memory you have, but I'd be careful if work_memory * max_connections gets very much larger than your total memory. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] separate drives for WAL or pgdata files
On Mon, Dec 19, 2005 at 07:20:56PM -0800, David Lang wrote: for persistant storage you can replicate from your ram-based system to a disk-based system, and as long as your replication messages hit disk quickly you can allow the disk-based version to lag behind in it's updates during your peak periods (as long as it is able to catch up with the writes overnight), and as the disk-based version won't have to do the seeks for the reads it will be considerably faster then if it was doing all the work (especially if you have good, large battery-backed disk caches to go with those drives to consolodate the writes) Huh? Unless you're doing a hell of a lot of writing just run a normal instance and make sure you have enough bandwidth to the drives with pg_xlog on it. Make sure those drives are using a battery-backed raid controller too. You'll also need to tune things to make sure that checkpoints never have much (if any) work to do when the occur, but you should be able to set that up with proper bg_writer tuning. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] filesystem performance with lots of files
On Tue, Dec 20, 2005 at 01:26:00PM +, David Roussel wrote: Note that you can do the taring, zipping, copying and untaring concurrentlt. I can't remember the exactl netcat command line options, but it goes something like this Box1: tar czvf - myfiles/* | netcat myserver:12345 Box2: netcat -listen 12345 | tar xzvf - You can also use ssh... something like tar -cf - blah/* | ssh machine tar -xf - -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?
On Tue, Dec 20, 2005 at 07:27:15PM +0100, Antal Attila wrote: We have complex databases with some 100million rows (2-3million new How much space does that equate to? records per month). Our current servers are working on low resposibility in these days, so we have to buy new hardver for database server. Some weeks ago we started to work with PostgreSQL8.1, which solved the problem for some months. There are some massive, hard query execution, which are too slow (5-10 or more minutes). The parallel processing is infrequent (rarely max. 4-5 parallel query execution). So we need high performance in query execution with medium parallel processability. What's your opinion what productions could help us? What is the best or only better choice? The budget line is about 30 000$ - 40 000$. Have you optimized the queries? Items that generally have the biggest impact on performance in decreasing order: 1. System architecture 2. Database design 3. (for long-running/problem queries) Query plans 4. Disk I/O 5. Memory 6. CPU So, I'd make sure that the queries have been optimized (and that includes tuning postgresql.conf) before assuming you need more hardware. Based on what you've told us (very little parallelization), then your biggest priority is probably either disk IO or memory (or both). Without knowing the size of your database/working set it's difficult to provide more specific advice. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?
On Dec 20, 2005, at 1:27 PM, Antal Attila wrote: The budget line is about 30 000$ - 40 000$. Like Jim said, without more specifics it is hard to give more specific recommendations, but I'm architecting something like this for my current app which needs ~100GB disk space. I made room to grow in my configuration: dual opteron 2.2GHz 4GB RAM LSI MegaRAID 320-2X 14-disk SCSI U320 enclosure with 15k RPM drives, 7 connected to each channel on the RAID. 1 pair in RAID1 mirror for OS + pg_xlog rest in RAID10 with each mirrored pair coming from opposite SCSI channels for data I run FreeBSD but whatever you prefer should be sufficient if it is not windows. I don't know how prices are in Hungary, but around here something like this with 36GB drives comes to around $11,000 or $12,000. The place I concentrate on is the disk I/O bandwidth which is why I prefer Opteron over Intel XEON. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Any way to optimize GROUP BY queries?
Jim C. Nasby [EMAIL PROTECTED] writes: On Mon, Dec 19, 2005 at 03:47:35PM -0500, Greg Stark wrote: Increase your work_mem (or sort_mem in older postgres versions), you can do this for the server as a whole or just for this one session and set it back after this one query. You can increase it up until it starts causing swapping at which point it would be counter productive. Just remember that work_memory is per-operation, so it's easy to push the box into swapping if the workload increases. You didn't say how much memory you have, but I'd be careful if work_memory * max_connections gets very much larger than your total memory. It's considered good practice to have a relatively small default work_mem setting (in postgresql.conf), and then let individual sessions push up the value locally with SET work_mem if they are going to execute queries that need it. This works well as long as you only have one or a few such heavy sessions at a time. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?
Can you elaborate on the reasons the opteron is better than the Xeon when it comes to disk io? I have a PostgreSQL 7.4.8 box running a DSS. One of our tables is about 13 million rows. I had a number of queries against this table that used innner joins on 5 or 6 tables including the 13 million row one. The performance was atrocious. The database itself is about 20 gigs but I want it to scale to 100 gigs. I tuned postgresql as best I could and gave the server huge amounts of memory for caching as well. I also tweaked the cost parameters for a sequential scan vs an index scan of the query optimizer and used the query explain mechanism to get some idea of what the optimizer was doing and where I should index the tables. When I added the sixth table to the inner join the query performance took a nose dive. Admittedly this system is a single PIII 1000Mhz with 1.2 gigs of ram and no raid. I do have two Ultra 160 scsi drives with the database tables mount point on a partition on one physical drive and pg_xlog mount point on another partition of the second drive.I have been trying to get my employer to spring for new hardware ($8k to $10k) which I had planned to be a dual - dual core opteron system from HP. Until they agree to spend the money I resorted to writing a plpgsql functions to handle the queries. Inside plpgsql I can break the query apart into seperate stages each of which runs much faster. I can use temporary tables to store intermediate results without worrying about temp table collisions with different users thanks to transaction isolation. I am convinced we need new hardware to scale this application *but* I agree with the consensus voiced here that it is more important to optimize the query first before going out to buy new hardware. I was able to do things with PostgreSQL on this cheap server that I could never imagine doing with SQL server or even oracle on such a low end box. My OS is Fedora Core 3 but I wonder if anyone has tested and benchmarked PostgreSQL on the new Sun x64 servers running Solaris 10 x86. Thanks, Juan On Tuesday 20 December 2005 16:08, Vivek Khera wrote: On Dec 20, 2005, at 1:27 PM, Antal Attila wrote: The budget line is about 30 000$ - 40 000$. Like Jim said, without more specifics it is hard to give more specific recommendations, but I'm architecting something like this for my current app which needs ~100GB disk space. I made room to grow in my configuration: dual opteron 2.2GHz 4GB RAM LSI MegaRAID 320-2X 14-disk SCSI U320 enclosure with 15k RPM drives, 7 connected to each channel on the RAID. 1 pair in RAID1 mirror for OS + pg_xlog rest in RAID10 with each mirrored pair coming from opposite SCSI channels for data I run FreeBSD but whatever you prefer should be sufficient if it is not windows. I don't know how prices are in Hungary, but around here something like this with 36GB drives comes to around $11,000 or $12,000. The place I concentrate on is the disk I/O bandwidth which is why I prefer Opteron over Intel XEON. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?
On Tue, 20 Dec 2005, Juan Casero wrote: Date: Tue, 20 Dec 2005 19:50:47 -0500 From: Juan Casero [EMAIL PROTECTED] To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] What's the best hardver for PostgreSQL 8.1? Can you elaborate on the reasons the opteron is better than the Xeon when it comes to disk io? the opteron is cheaper so you have more money to spend on disks :-) also when you go into multi-cpu systems the front-side-bus design of the Xeon's can easily become your system bottleneck so that you can't take advantage of all the CPU's becouse they stall waiting for memory accesses, Opteron systems have a memory bus per socket so the more CPU's you have the more memory bandwidth you have. The database itself is about 20 gigs but I want it to scale to 100 gigs. how large is the working set? in your tests you ran into swapping on your 1.2G system, buying a dual opteron with 16gigs of ram will allow you to work with much larger sets of data, and you can go beyond that if needed. David Lang ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] effizient query with jdbc
Hi, I have a java.util.List of values (1) which i wanted to use for a query in the where clause of an simple select statement. iterating over the list and and use an prepared Statement is quite slow. Is there a more efficient way to execute such a query. Thanks for any help. Johannes . List ids = new ArrayList(); List is filled with 1 values ... List uuids = new ArrayList(); PreparedStatement pstat = db.prepareStatement(SELECT UUID FROM MDM.KEYWORDS_INFO WHERE KEYWORDS_ID = ?); for (Iterator iter = ids.iterator(); iter.hasNext();) { String id = (String) iter.next(); pstat.setString(1, id); rs = pstat.executeQuery(); if (rs.next()) { uuids.add(rs.getString(1)); } rs.close(); } ... ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] effizient query with jdbc
Hi, I have a java.util.List of values (1) which i wanted to use for a query in the where clause of an simple select statement. iterating over the list and and use an prepared Statement is quite slow. Is there a more efficient way to execute such a query. Thanks for any help. Johannes . List ids = new ArrayList(); List is filled with 1 values ... List uuids = new ArrayList(); PreparedStatement pstat = db.prepareStatement(SELECT UUID FROM MDM.KEYWORDS_INFO WHERE KEYWORDS_ID = ?); for (Iterator iter = ids.iterator(); iter.hasNext();) { String id = (String) iter.next(); pstat.setString(1, id); rs = pstat.executeQuery(); if (rs.next()) { uuids.add(rs.getString(1)); } rs.close(); } ... ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Speed of different procedural language
I have a few small functions which I need to write. They will be hopefully quick running but will happen on almost every delete, insert and update on my database (for audit purposes). I know I should be writing these in C but that's a bit beyond me. I was going to try PL/Python or PL/Perl or even PL/Ruby. Has anyone any idea which language is fastest, or is the data access going to swamp the overhead of small functions? Thanks, Ben ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] effizient query with jdbc
Hi, I have a java.util.List of values (1) which i wanted to use for a query in the where clause of an simple select statement. iterating over the list and and use an prepared Statement is quite slow. Is there a more efficient way to execute such a query. Thanks for any help. Johannes . List ids = new ArrayList(); List is filled with 1 values ... List uuids = new ArrayList(); PreparedStatement pstat = db.prepareStatement(SELECT UUID FROM MDM.KEYWORDS_INFO WHERE KEYWORDS_ID = ?); for (Iterator iter = ids.iterator(); iter.hasNext();) { String id = (String) iter.next(); pstat.setString(1, id); rs = pstat.executeQuery(); if (rs.next()) { uuids.add(rs.getString(1)); } rs.close(); } ... ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Simple Join
On Wed, 2005-12-14 at 17:47 -0500, Tom Lane wrote: That plan looks perfectly fine to me. You could try forcing some other choices by fooling with the planner enable switches (eg set enable_seqscan = off) but I doubt you'll find much improvement. There are too many rows being pulled from ordered_products to make an index nestloop a good idea. Well, I'm no expert either, but if there was an index on ordered_products (paid, suspended_sub, id) it should be mergejoinable with the index on to_ship.ordered_product_id, right? Given the conditions on paid and suspended_sub. If you (Kevin) try adding such an index, ideally it would get used given that you're only pulling out a small fraction of the rows in to_ship. If it doesn't get used, then I had a similar issue with 8.0.3 where an index that was mergejoinable (only because of the restrictions in the where clause) wasn't getting picked up. Mitch Kevin Brown wrote: CREATE TABLE to_ship ( id int8 NOT NULL DEFAULT nextval(('to_ship_seq'::text)::regclass), ordered_product_id int8 NOT NULL, bounced int4 NOT NULL DEFAULT 0, operator_id varchar(20) NOT NULL, timestamp timestamptz NOT NULL DEFAULT ('now'::text)::timestamp(6) with time zone, CONSTRAINT to_ship_pkey PRIMARY KEY (id), CONSTRAINT to_ship_ordered_product_id_fkey FOREIGN KEY (ordered_product_id) REFERENCES ordered_products (id) ON UPDATE RESTRICT ON DELETE RESTRICT ) WITHOUT OIDS; CREATE TABLE ordered_products ( id int8 NOT NULL DEFAULT nextval(('ordered_products_seq'::text)::regclass), order_id int8 NOT NULL, product_id int8 NOT NULL, recipient_address_id int8 NOT NULL, hide bool NOT NULL DEFAULT false, renewal bool NOT NULL DEFAULT false, timestamp timestamptz NOT NULL DEFAULT ('now'::text)::timestamp(6) with time zone, operator_id varchar(20) NOT NULL, suspended_sub bool NOT NULL DEFAULT false, quantity int4 NOT NULL DEFAULT 1, price_paid numeric NOT NULL, tax_paid numeric NOT NULL DEFAULT 0, shipping_paid numeric NOT NULL DEFAULT 0, remaining_issue_obligation int4 NOT NULL DEFAULT 0, parent_product_id int8, delivery_method_id int8 NOT NULL, paid bool NOT NULL DEFAULT false, CONSTRAINT ordered_products_pkey PRIMARY KEY (id), CONSTRAINT ordered_products_order_id_fkey FOREIGN KEY (order_id) REFERENCES orders (id) ON UPDATE RESTRICT ON DELETE RESTRICT, CONSTRAINT ordered_products_parent_product_id_fkey FOREIGN KEY (parent_product_id) REFERENCES ordered_products (id) ON UPDATE RESTRICT ON DELETE RESTRICT, CONSTRAINT ordered_products_recipient_address_id_fkey FOREIGN KEY (recipient_address_id) REFERENCES addresses (id) ON UPDATE RESTRICT ON DELETE RESTRICT ) WITHOUT OIDS; === The two indexes that should matter === CREATE INDEX ordered_product_id_index ON to_ship USING btree (ordered_product_id); CREATE INDEX paid_index ON ordered_products USING btree (paid); ordered_products.id is a primary key, so it should have an implicit index. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] SAN/NAS options
Hello all, It seems that I'm starting to outgrow our current Postgres setup. We've been running a handful of machines as standalone db servers. This is all in a colocation environment, so everything is stuffed into 1U Supermicro boxes. Our standard build looks like this: Supermicro 1U w/SCA backplane and 4 bays 2x2.8 GHz Xeons Adaptec 2015S zero channel RAID card 2 or 4 x 73GB Seagate 10K Ultra 320 drives (mirrored+striped) 2GB RAM FreeBSD 4.11 PGSQL data from 5-10GB per box Recently I started studying what we were running up against in our nightly runs that do a ton of updates/inserts to prep things for the tasks the db does during the business day (light mix of selects/inserts/updates). While we have plenty of disk bandwidth (according to bonnie), we are really dying on IOPS. I'm guessing this is a mix of a rather anemic RAID controller (ever notice how adaptec doesn't publish any real performance specs on raid cards?) and having only two or four spindles (effectively 1 or 2 on writes). So that's where we are... I'm new to the whole SAN thing, but did recently pick up a few used NetApp shelves and a Fibre Channel RAID HBA (Mylex ExtremeRAID 3000, also used) to toy with. I started wondering if I could put something together to both get our storage on one set of boxes and allow me to get data striped across more drives. Our budget is not huge and we are not adverse to getting used gear where appropriate. What do you folks recommend? I'm just starting to look at what's out there for SANs and NAS, and from what I've seen, our options are: NetApp Filers - the pluses with these are that if we use NFS, we don't have to worry about either large filesystem support in FreeBSD (2TB practical limit), or limitation on growing partitions as the NetApp just deals with that. I also understand these make backups a bit simpler. I have a great, trusted, spare-stocking source for these. Apple X-Serve RAID - well, it's pretty cheap. Honestly, that's all I know about it - they don't talk about IOPS numbers, and I have no idea what lurks in that box as a RAID controller. SAN box w/integrated RAID - it seems like this might not be a good choice since the RAID hardware in the box may be where I hit any limits. I also imagine I'm probably overpaying for some OEM RAID controller integrated into the box. No idea where to look for used gear. SAN box, JBOD - this seems like it might be affordable as well. A few big shelves full of drives a SAN switch to plug all the shelves and hosts into and a FC RAID card in each host. No idea where to look for used gear here either. You'll note that I'm being somewhat driven by my OS of choice, FreeBSD. Unlike Solaris or other commercial offerings, there is no nice volume management available. While I'd love to keep managing a dozen or so FreeBSD boxes, I could be persuaded to go to Solaris x86 if the volume management really shines and Postgres performs well on it. Lastly, one thing that I'm not yet finding in trying to educate myself on SANs is a good overview of what's come out in the past few years that's more affordable than the old big-iron stuff. For example I saw some brief info on this list's archives about the Dell/EMC offerings. Anything else in that vein to look at? I hope this isn't too far off topic for this list. Postgres is the main application that I'm looking to accomodate. Anything else I can do with whatever solution we find is just gravy... Thanks! Charles ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] [postgis-users] Is my query planner failing me,or vice versa?
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Wed 12/14/2005 9:36 PM To: Gregory S. Williamson Cc: pgsql-performance@postgresql.org; PostGIS Users Discussion Subject: Re: [PERFORM] [postgis-users] Is my query planner failing me, or vice versa? Gregory S. Williamson [EMAIL PROTECTED] writes: Forgive the cross-posting, but I found myself wondering if might not be some way future way of telling the planner that a given table (column ?) has a high likelyhood of being TOASTed. What would you expect the planner to do with the information, exactly? We could certainly cause ANALYZE to record some estimate of this, but I'm not too clear on what happens after that... regards, tom lane -Original Message- From: [EMAIL PROTECTED] [mailto:postgis-users- [EMAIL PROTECTED] On Behalf Of Gregory S. Williamson Sent: 15 December 2005 12:03 To: Tom Lane Cc: pgsql-performance@postgresql.org; PostGIS Users Discussion Subject: RE: [PERFORM] [postgis-users] Is my query planner failing me,or vice versa? Well, what does the random_page_cost do internally ? I don't think I'd expect postgres to be able to *do* anything in particular, any more than I would expect it to do something about slow disk I/O or having limited cache. But it might be useful to the EXPLAIN ANALYZE in estimating costs of retrieving such data. Admittedly, this is not as clear as wanting a sequential scan in preference to indexed reads when there are either very few rows or a huge number, but it strikes me as useful to me the DBA to have this factoid thrust in front of me when considering why a given query is slower than I might like. Perhaps an added time based on this factor and the random_page_cost value, since lots of TOAST data and a high access time would indicate to my (ignorant!) mind that retrieval would be slower, especially over large data sets. Forgive my ignorance ... obviously I am but a humble user. grin. G As I understood from the original discussions with Markus/Tom, the problem was that the optimizer didn't consider the value of the VacAttrStats stawidth value when calculating the cost of a sequential scan. I don't know if this is still the case though - Tom will probably have a rough idea already whereas I would need to spend some time sifting through the source. However, I do know that the PostGIS statistics collector does store the average detoasted geometry size in stawidth during ANALYZE so the value is there if it can be used. Kind regards, Mark. WebBased Ltd 17 Research Way Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 http://www.webbased.co.uk http://www.infomapper.com http://www.swtc.co.uk This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] SAN/NAS options
On Wed, 14 Dec 2005, Charles Sprickman wrote: [big snip] The list server seems to be regurgitating old stuff, and in doing so it reminded me to thank everyone for their input. I was kind of waiting to see if anyone who was very pro-NAS/SAN was going to pipe up, but it looks like most people are content with per-host storage. You've given me a lot to go on... Now I'm going to have to do some research as to real-world RAID controller performance. It's vexing (to say the least) that most vendors don't supply any raw throughput or TPS stats on this stuff... Anyhow, thanks again. You'll probably see me back here in the coming months as I try to shake some mysql info out of my brain as our pgsql DBA gets me up to speed on pgsql and what specifically he's doing to stress things. Charles I hope this isn't too far off topic for this list. Postgres is the main application that I'm looking to accomodate. Anything else I can do with whatever solution we find is just gravy... Thanks! Charles ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Simple Join
Mark Kirkwood wrote: Kevin Brown wrote: I'll just start by warning that I'm new-ish to postgresql. I'm running 8.1 installed from source on a Debian Sarge server. I have a simple query that I believe I've placed the indexes correctly for, and I still end up with a seq scan. It makes sense, kinda, but it should be able to use the index to gather the right values. I do have a production set of data inserted into the tables, so this is running realistically: dli=# explain analyze SELECT ordered_products.product_id dli-# FROM to_ship, ordered_products dli-# WHERE to_ship.ordered_product_id = ordered_products.id AND dli-# ordered_products.paid = TRUE AND dli-# ordered_products.suspended_sub = FALSE; You scan 60 rows from to_ship to get about 25000 - so some way to cut this down would help. Try out an explicit INNER JOIN which includes the filter info for paid and suspended_sub in the join condition (you may need indexes on each of id, paid and suspended_sub, so that the 8.1 optimizer can use a bitmap scan): SELECT ordered_products.product_id FROM to_ship INNER JOIN ordered_products ON (to_ship.ordered_product_id = ordered_products.id AND ordered_products.paid = TRUE AND ordered_products.suspended_sub = FALSE); It has been a quiet day today, so I took another look at this. If the selectivity of clauses : paid = TRUE suspended_sub = FALSE is fairly high, then rewriting as a subquery might help: SELECT o.product_id FROM ordered_products o WHERE o.paid = TRUE AND o.suspended_sub = FALSE AND EXISTS ( SELECT 1 FROM to_ship s WHERE s.ordered_product_id = o.id ); However it depends on you not needing anything from to_ship in the SELECT list... Cheers Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match