Re: [HACKERS] [PERFORM] Slow count(*) again...
02.02.11 20:32, Robert Haas написав(ла): Yeah. Any kind of bulk load into an empty table can be a problem, even if it's not temporary. When you load a bunch of data and then immediately plan a query against it, autoanalyze hasn't had a chance to do its thing yet, so sometimes you get a lousy plan. May be introducing something like 'AutoAnalyze' threshold will help? I mean that any insert/update/delete statement that changes more then x% of table (and no less then y records) must do analyze right after it was finished. Defaults like x=50 y=1 should be quite good as for me. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which RAID Controllers to pick/avoid?
--- On Thu, 3/2/11, Greg Smith g...@2ndquadrant.com wrote: The 5405 and 5805 models do have a known problem where they overheat if you don't have enough cooling in the server box, with the 5805 seeming to be the bigger source of such issues. See the reviews at http://www.newegg.com/Product/Product.aspx?Item=N82E16816103099 for example. Scott Marlowe was griping recently about a similar issue in some of the LSI models, too. I suspect it's a problem impacting several of the larger RAID cards that use the big Intel IOP processors for their RAID computations, given that's the part with the heatsink on it. Quick summary: avoid the Adaptec 3405 and 4800. Rest are decent cards. Just make sure you monitor the temperatures in your case (and the card too if arcconf lets you, I haven't checked for that yet) if you end up with a 5405/5805. I can attest to the 5805 and 5805Z cards running a little hot, the ones we're running tend to run in the high 60s and low 70s Celsius with fairly good airflow over them. I've been running some 5805s for 3 years now, and 5805Zs for a year and they've been really good, stable, fast cards. I monitor everything on them (including temperature) with nagios and a simple script that uses the arcconf utility. Glyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, 3 Feb 2011, Vitalii Tymchyshyn wrote: 02.02.11 20:32, Robert Haas ???(??): Yeah. Any kind of bulk load into an empty table can be a problem, even if it's not temporary. When you load a bunch of data and then immediately plan a query against it, autoanalyze hasn't had a chance to do its thing yet, so sometimes you get a lousy plan. May be introducing something like 'AutoAnalyze' threshold will help? I mean that any insert/update/delete statement that changes more then x% of table (and no less then y records) must do analyze right after it was finished. Defaults like x=50 y=1 should be quite good as for me. If I am understanding things correctly, a full Analyze is going over all the data in the table to figure out patterns. If this is the case, wouldn't it make sense in the situation where you are loading an entire table from scratch to run the Analyze as you are processing the data? If you don't want to slow down the main thread that's inserting the data, you could copy the data to a second thread and do the analysis while it's still in RAM rather than having to read it off of disk afterwords. this doesn't make sense for updates to existing databases, but the use case of loading a bunch of data and then querying it right away isn't _that_ uncommon. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Get master-detail relationship metadata
Hi All, I'm working on a client program that iterates over master-detail relationships in a loop chain. Pseudo code: for row_1 in table_1: table_2 = get_details(row_1,table2) for row_2 in table_2: row_3 = get_details(row_2,table3) etc. process_data(row1,row_2,row_3,) My task is to write the get_details iterator effectively. The obvious way to do it is to query details in every get_details() call, but that is not efficient. We have relationships where one master only has a few details. For 1 million master rows, that would result in execution of millions of SQL SELECT commands, degrading the performance by magnitudes. My idea was that the iterator should pre-fetch and cache data for many master records at once. The get_details() would use the cached rows, thus reducing the number of SQL SELECT statements needed. Actually I wrote the iterator, and it works fine in some cases. For example: producers = get_rows(producer) for producer in producers: products = get_getails(producer,product) for product in products: prices = get_details(product,prices) for price in prices: process_product_price(producer,product,price) This works fine if one producer has not more than 1000 products and one product has not more than 10 prices. I can easly keep 10 000 records in memory. The actual code executes about 15 SQL queries while iterating over 1 million rows. Compared to the original obvious method, performance is increased to 1500% But sometimes it just doesn't work. If a producer has 1 million products, and one product has 100 prices, then it won't work, because I cannot keep 100 million prices in memory. My program should somehow figure out, how much rows it will get for one master, and select between the cached and not cached methods. So here is the question: is there a way to get this information from PostgreSQL itself? I know that the query plan contains information about this, but I'm not sure how to extract. Should I run an ANALYZE command of some kind, and parse the result as a string? For example: EXPLAIN select * from product where producer_id=1008; QUERY PLAN -- Seq Scan on product (cost=0.00..1018914.74 rows=4727498 width=1400) Filter: (producer_id = 1008) (2 rows) Then I could extract rows=4727498 to get an idea about how much detail rows I'll get for the master. Is there any better way to do it? And how reliable is this? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 03, 2011 at 02:11:58AM -0800, da...@lang.hm wrote: On Thu, 3 Feb 2011, Vitalii Tymchyshyn wrote: 02.02.11 20:32, Robert Haas ???(??): Yeah. Any kind of bulk load into an empty table can be a problem, even if it's not temporary. When you load a bunch of data and then immediately plan a query against it, autoanalyze hasn't had a chance to do its thing yet, so sometimes you get a lousy plan. May be introducing something like 'AutoAnalyze' threshold will help? I mean that any insert/update/delete statement that changes more then x% of table (and no less then y records) must do analyze right after it was finished. Defaults like x=50 y=1 should be quite good as for me. If I am understanding things correctly, a full Analyze is going over all the data in the table to figure out patterns. If this is the case, wouldn't it make sense in the situation where you are loading an entire table from scratch to run the Analyze as you are processing the data? If you don't want to slow down the main thread that's inserting the data, you could copy the data to a second thread and do the analysis while it's still in RAM rather than having to read it off of disk afterwords. this doesn't make sense for updates to existing databases, but the use case of loading a bunch of data and then querying it right away isn't _that_ uncommon. David Lang +1 for in-flight ANALYZE. This would be great for bulk loads of real tables as well as temp tables. Cheers, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 7:41 AM, Kenneth Marshall k...@rice.edu wrote: On Thu, Feb 03, 2011 at 02:11:58AM -0800, da...@lang.hm wrote: On Thu, 3 Feb 2011, Vitalii Tymchyshyn wrote: 02.02.11 20:32, Robert Haas ???(??): Yeah. Any kind of bulk load into an empty table can be a problem, even if it's not temporary. When you load a bunch of data and then immediately plan a query against it, autoanalyze hasn't had a chance to do its thing yet, so sometimes you get a lousy plan. May be introducing something like 'AutoAnalyze' threshold will help? I mean that any insert/update/delete statement that changes more then x% of table (and no less then y records) must do analyze right after it was finished. Defaults like x=50 y=1 should be quite good as for me. If I am understanding things correctly, a full Analyze is going over all the data in the table to figure out patterns. If this is the case, wouldn't it make sense in the situation where you are loading an entire table from scratch to run the Analyze as you are processing the data? If you don't want to slow down the main thread that's inserting the data, you could copy the data to a second thread and do the analysis while it's still in RAM rather than having to read it off of disk afterwords. this doesn't make sense for updates to existing databases, but the use case of loading a bunch of data and then querying it right away isn't _that_ uncommon. David Lang +1 for in-flight ANALYZE. This would be great for bulk loads of real tables as well as temp tables. Yes, please, that would be really nice. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which RAID Controllers to pick/avoid?
Dan Birken wrote: - Is the supercap + flash memory considered superior to the BBU in practice? Is that type of system well tested? The main risk is that it's a pretty new approach. The standard BBU setup has been used for a long time now; this whole flash+supercap thing has only showed up in the last couple of years. Theoretically it's better; the #1 weakness of the old battery setup was only surviving an outage of a few days, and storing to flash doesn't have that issue. It's just got the usual risks of something new. - Is the linux support of the LSI and Adaptec cards comparable? Seems to be. The latest versions of Adaptec's arcconf utility even provide about the same quality of command-line tools as LSI's megactl, after being behind in that area for a while. Only quirk, and I can't say where this was the manufacturer of the box or not because they installed the base OS, is that the 5405 setup I saw didn't turn off the write caches on the individual drives of the system. That's the standard safe practice and default for the LSI cards. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which RAID Controllers to pick/avoid?
Am 03.02.2011 00:15, schrieb Dan Birken: I'm setting up a dedicated linux postgres box with 2x300GB 15k SAS drive in a RAID 1, though if future load dictates we would like to be able to upgrade to RAID 10. The hosting provider offers the following options for a RAID controller (all are the same price): Adaptec at least has good tools for managing the controller, and performance in our RAID-1 (DB) and RAID-5 setups (Files) is very good. I don't think you can do wrong with the Adaptec controllers. Can't say much regarding LSI, but avoid cheap HP controllers. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
Each night we run over a 100,000 saved searches against PostgreSQL 9.0.x. These are all complex SELECTs using cube functions to perform a geo-spatial search to help people find adoptable pets at shelters. All of our machines in development in production have at least 2 cores in them, and I'm wondering about the best way to maximally engage all the processors. Now we simply run the searches in serial. I realize PostgreSQL may be taking advantage of the multiple cores some in this arrangement, but I'm seeking advice about the possibility and methods for running the searches in parallel. One naive I approach I considered was to use parallel cron scripts. One would run the odd searches and the other would run the even searches. This would be easy to implement, but perhaps there is a better way. To those who have covered this area already, what's the best way to put multiple cores to use when running repeated SELECTs with PostgreSQL? Thanks! Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Get master-detail relationship metadata
On 2/3/2011 5:40 AM, Laszlo Nagy wrote: Hi All, I'm working on a client program that iterates over master-detail relationships in a loop chain. Pseudo code: for row_1 in table_1: table_2 = get_details(row_1,table2) for row_2 in table_2: row_3 = get_details(row_2,table3) etc. process_data(row1,row_2,row_3,) My task is to write the get_details iterator effectively. The obvious way to do it is to query details in every get_details() call, but that is not efficient. We have relationships where one master only has a few details. For 1 million master rows, that would result in execution of millions of SQL SELECT commands, degrading the performance by magnitudes. My idea was that the iterator should pre-fetch and cache data for many master records at once. The get_details() would use the cached rows, thus reducing the number of SQL SELECT statements needed. Actually I wrote the iterator, and it works fine in some cases. For example: producers = get_rows(producer) for producer in producers: products = get_getails(producer,product) for product in products: prices = get_details(product,prices) for price in prices: process_product_price(producer,product,price) This works fine if one producer has not more than 1000 products and one product has not more than 10 prices. I can easly keep 10 000 records in memory. The actual code executes about 15 SQL queries while iterating over 1 million rows. Compared to the original obvious method, performance is increased to 1500% But sometimes it just doesn't work. If a producer has 1 million products, and one product has 100 prices, then it won't work, because I cannot keep 100 million prices in memory. My program should somehow figure out, how much rows it will get for one master, and select between the cached and not cached methods. So here is the question: is there a way to get this information from PostgreSQL itself? I know that the query plan contains information about this, but I'm not sure how to extract. Should I run an ANALYZE command of some kind, and parse the result as a string? For example: EXPLAIN select * from product where producer_id=1008; QUERY PLAN -- Seq Scan on product (cost=0.00..1018914.74 rows=4727498 width=1400) Filter: (producer_id = 1008) (2 rows) Then I could extract rows=4727498 to get an idea about how much detail rows I'll get for the master. Is there any better way to do it? And how reliable is this? Thanks, Laszlo One way would be to join the master to the detail, and write your code expecting duplicates. q = get_rows(select * from product inner join price ... order by productid, priceid); lastprodid = '' for x in q: prodid = q.prodid if prodid lastprodid: # we saw the last product, prepare to move to the next product lastprodid = prodid ... etc Is there any better way to do it? And how reliable is this? It makes the sql really easy, but the code complex... so pick your poison. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 4:54 AM, Vitalii Tymchyshyn tiv...@gmail.com wrote: 02.02.11 20:32, Robert Haas написав(ла): Yeah. Any kind of bulk load into an empty table can be a problem, even if it's not temporary. When you load a bunch of data and then immediately plan a query against it, autoanalyze hasn't had a chance to do its thing yet, so sometimes you get a lousy plan. May be introducing something like 'AutoAnalyze' threshold will help? I mean that any insert/update/delete statement that changes more then x% of table (and no less then y records) must do analyze right after it was finished. Defaults like x=50 y=1 should be quite good as for me. That would actually be a pessimization for many real world cases. Consider: COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY SELECT -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 5:11 AM, da...@lang.hm wrote: If I am understanding things correctly, a full Analyze is going over all the data in the table to figure out patterns. No. It's going over a small, fixed-size sample which depends on default_statistics_target but NOT on the table size. It's really important to come up with a solution that's not susceptible to running ANALYZE over and over again, in many cases unnecessarily. If this is the case, wouldn't it make sense in the situation where you are loading an entire table from scratch to run the Analyze as you are processing the data? If you don't want to slow down the main thread that's inserting the data, you could copy the data to a second thread and do the analysis while it's still in RAM rather than having to read it off of disk afterwords. Well that's basically what autoanalyze is going to do anyway, if the table is small enough to fit in shared_buffers. And it's actually usually BAD if it starts running while you're doing a large bulk load, because it competes for I/O bandwidth and the buffer cache and slows things down. Especially when you're bulk loading for a long time and it tries to run over and over. I'd really like to suppress all those asynchronous ANALYZE operations and instead do ONE synchronous one at the end, when we try to use the data. Of course, the devil is in the nontrivial details. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On 2/3/2011 9:08 AM, Mark Stosberg wrote: Each night we run over a 100,000 saved searches against PostgreSQL 9.0.x. These are all complex SELECTs using cube functions to perform a geo-spatial search to help people find adoptable pets at shelters. All of our machines in development in production have at least 2 cores in them, and I'm wondering about the best way to maximally engage all the processors. Now we simply run the searches in serial. I realize PostgreSQL may be taking advantage of the multiple cores some in this arrangement, but I'm seeking advice about the possibility and methods for running the searches in parallel. One naive I approach I considered was to use parallel cron scripts. One would run the odd searches and the other would run the even searches. This would be easy to implement, but perhaps there is a better way. To those who have covered this area already, what's the best way to put multiple cores to use when running repeated SELECTs with PostgreSQL? Thanks! Mark 1) I'm assuming this is all server side processing. 2) One database connection will use one core. To use multiple cores you need multiple database connections. 3) If your jobs are IO bound, then running multiple jobs may hurt performance. Your naive approach is the best. Just spawn off two jobs (or three, or whatever). I think its also the only method. (If there is another method, I dont know what it would be) -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
03.02.11 17:31, Robert Haas написав(ла): May be introducing something like 'AutoAnalyze' threshold will help? I mean that any insert/update/delete statement that changes more then x% of table (and no less then y records) must do analyze right after it was finished. Defaults like x=50 y=1 should be quite good as for me. That would actually be a pessimization for many real world cases. Consider: COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY SELECT If all the copies are ~ same in size and large this will make it: COPY ANALYZE COPY ANALYZE COPY COPY ANALYZE COPY COPY COPY COPY ANALYZE COPY COPY COPY COPY COPY SELECT instead of COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY COPY ANALYZE (manual, if one is clever enough) SELECT So, yes this will add 3 more analyze, but 1) Analyze is pretty cheap comparing to large data loading. I'd say this would add few percent of burden. And NOT doing analyze manually before select can raise select costs orders of magnitude. 2) How often in real world a single table is loaded in many COPY statements? (I don't say it's not often, I really don't know). At least for restore it is not the case, is not it? 3) default thresholds are things to discuss. You can make x=90 or x=200 (latter will make it run only for massive load/insert operations). You can even make it disabled by default for people to test. Or enable by default for temp tables only (and have two sets of thresholds) 4) As most other settings, this threshold can be changed on up to per-query basis. P.S. I would also like to have index analyze as part of any create index process. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And it has to because the Big Boys already do so, to some extent, and they've realized that the BCNF schema on such machines is supremely efficient. PG/MySql/OSEngineOfChoice will get left behind simply because the efficiency offered will be worth the price. I know this is far from trivial, and my C skills are such that I can offer no help. These machines have been the obvious current machine in waiting for at least 5 years, and those applications which benefit from parallelism (servers of all kinds, in particular) will filter out the winners and losers based on exploiting this parallelism. Much as it pains me to say it, but the MicroSoft approach to software: write to the next generation processor and force users to upgrade, will be the winning strategy for database engines. There's just way too much to gain. -- Robert Original message Date: Thu, 03 Feb 2011 09:44:03 -0600 From: pgsql-performance-ow...@postgresql.org (on behalf of Andy Colson a...@squeakycode.net) Subject: Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements To: Mark Stosberg m...@summersault.com Cc: pgsql-performance@postgresql.org On 2/3/2011 9:08 AM, Mark Stosberg wrote: Each night we run over a 100,000 saved searches against PostgreSQL 9.0.x. These are all complex SELECTs using cube functions to perform a geo-spatial search to help people find adoptable pets at shelters. All of our machines in development in production have at least 2 cores in them, and I'm wondering about the best way to maximally engage all the processors. Now we simply run the searches in serial. I realize PostgreSQL may be taking advantage of the multiple cores some in this arrangement, but I'm seeking advice about the possibility and methods for running the searches in parallel. One naive I approach I considered was to use parallel cron scripts. One would run the odd searches and the other would run the even searches. This would be easy to implement, but perhaps there is a better way. To those who have covered this area already, what's the best way to put multiple cores to use when running repeated SELECTs with PostgreSQL? Thanks! Mark 1) I'm assuming this is all server side processing. 2) One database connection will use one core. To use multiple cores you need multiple database connections. 3) If your jobs are IO bound, then running multiple jobs may hurt performance. Your naive approach is the best. Just spawn off two jobs (or three, or whatever). I think its also the only method. (If there is another method, I dont know what it would be) -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On 02/03/2011 10:54 AM, Oleg Bartunov wrote: Mark, you could try gevel module to get structure of GIST index and look if items distributed more or less homogenous (see different levels). You can visualize index like http://www.sai.msu.su/~megera/wiki/Rtree_Index Also, if your searches are neighbourhood searches, them you could try knn, available in 9.1 development version. Oleg, Those are interesting details to consider. I read more about KNN here: http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/ Will I be able to use it improve the performance of finding nearby zipcodes? It sounds like KNN has great potential for performance improvements! Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Greg Smith wrote: Mladen Gogala wrote: The techies at big companies are the guys who will or will not make it happen. And these guys are not beginners. Appeasing them may actually go a long way. The PostgreSQL community isn't real big on appeasing people if it's at the expense of robustness or correctness, and this issue falls into that category. With all due respect, I don't see how does the issue of hints fall into this category? As I explained, the mechanisms are already there, they're just not elegant enough. The verb appease doesn't convey the meaning that I had in mind quite correctly. The phrase target population would have described what I wanted to say in a much better way . There are downsides to that, but good things too. Chasing after whatever made people happy regardless of its impact on the server code itself has in my mind contributed to why Oracle is so bloated and MySQL so buggy, to pick two examples from my favorite horse to whip. Well, those two databases are also used much more widely than Postgres, which means that they're doing something better than Postgres. Hints are not even that complicated to program. The SQL parser should compile the list of hints into a table and optimizer should check whether any of the applicable access methods exist in the table. If it does - use it. If not, ignore it. This looks to me like a philosophical issue, not a programming issue. Basically, the current Postgres philosophy can be described like this: if the database was a gas stove, it would occasionally catch fire. However, bundling a fire extinguisher with the stove is somehow seen as bad. When the stove catches fire, users is expected to report the issue and wait for a better stove to be developed. It is a very rough analogy, but rather accurate one, too. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Mladen Gogala wrote: Greg Smith wrote: Mladen Gogala wrote: The techies at big companies are the guys who will or will not make it happen. And these guys are not beginners. Appeasing them may actually go a long way. The PostgreSQL community isn't real big on appeasing people if it's at the expense of robustness or correctness, and this issue falls into that category. With all due respect, I don't see how does the issue of hints fall into this category? As I explained, the mechanisms are already there, they're just not elegant enough. The verb appease doesn't convey the meaning that I had in mind quite correctly. The phrase target population would have described what I wanted to say in a much better way . The settings are currently there to better model the real world (random_page_cost), or for testing (enable_seqscan). They are not there to force certain plans. They can be used for that, but that is not their purpose and they would not have been added if that was their purpose. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Mladen Gogala wrote: Hints are not even that complicated to program. The SQL parser should compile the list of hints into a table and optimizer should check whether any of the applicable access methods exist in the table. If it does - use it. If not, ignore it. This looks to me like a philosophical issue, not a programming issue. Basically, the current Postgres philosophy can be described like this: if the database was a gas stove, it would occasionally catch fire. However, bundling a fire extinguisher with the stove is somehow seen as bad. When the stove catches fire, users is expected to report the issue and wait for a better stove to be developed. It is a very rough analogy, but rather accurate one, too. That might be true. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On 02/03/2011 10:38 AM, Mladen Gogala wrote: With all due respect, I don't see how does the issue of hints fall into this category? You have a few good arguments, and if you hadn't said this, it wouldn't have been so obvious that there was a fundamental philosophical disconnect. I asked this same question almost ten years ago, and the answer Tom gave me was more than sufficient. It all boils down to the database. Hints, whether they're well-intentioned or not, effectively cover up bugs in the optimizer, planner, or some other approach the database is using to build its execution. Your analogy is that PG is a gas stove, so bundle a fire extinguisher. Well, the devs believe that the stove should be upgraded to electric or possibly even induction to remove the need for the extinguisher. If they left hints in, it would just be one more thing to deprecate as the original need for the hint was removed. If you really need hints that badly, EnterpriseDB cobbled the Oracle syntax into the planner, and it seems to work alright. That doesn't mean it's right, just that it works. EnterpriseDB will now have to support those query hints forever, even if the planner gets so advanced they're effectively useless. Well, those two databases are also used much more widely than Postgres, which means that they're doing something better than Postgres. Please don't make arguments like this. Better is such a subjective evaluation it means nothing. Are Honda Accords better than Lamborghini Gallardos because more people buy Accords? The MySQL/PostgreSQL flame war is a long and sometimes bitter one, and bringing it up to try and persuade the devs to see reason is just going to backfire. Hints are not even that complicated to program. Then write a contrib module. It's not part of the core DB, and it probably never will be. This is a *very* old argument. There's literally nothing you can say, no argument you can bring, that hasn't been heard a million times in the last decade. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Mladen Gogala mladen.gog...@vmsinfo.com writes: Hints are not even that complicated to program. With all due respect, you don't know what you're talking about. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 11:56 AM, Bruce Momjian br...@momjian.us wrote: The settings are currently there to better model the real world (random_page_cost), or for testing (enable_seqscan). They are not there to force certain plans. They can be used for that, but that is not their purpose and they would not have been added if that was their purpose. Sure. But Mladen's point is that this is rather narrow-minded. I happen to agree. We are not building an ivory tower. We are building a program that real people will use to solve real problems, and it is not our job to artificially prevent them from achieving their objectives so that we remain motivated to improve future versions of the code. I don't, however, agree with his contention that this is easy to implement. It would be easy to implement something that sucked. It would be hard to implement something that actually helped in the cases where the existing settings aren't already sufficient. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Robert Haas robertmh...@gmail.com writes: I don't, however, agree with his contention that this is easy to implement. It would be easy to implement something that sucked. It would be hard to implement something that actually helped in the cases where the existing settings aren't already sufficient. Exactly. A hint system that actually did more good than harm would be a very nontrivial project. IMO such effort is better spent on making the optimizer smarter. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: FW: [PERFORM] Queries becoming slow under heavy load
Thanks to all of you who replied and pointed NFS as a potential culprit. Our issue was that pgsql's temp dir (pgsql_tmp) was set to the default value ( $PSQL_DIR/base/pgsql_tmp/) which was located in NFS. Moving the temp dir to local disk got us a huge improvement. Anne -Original Message- From: Shaun Thomas [mailto:stho...@peak6.com] Sent: Friday, January 28, 2011 7:31 AM To: Anne Rosset Cc: pgsql-performance@postgresql.org Subject: Re: FW: [PERFORM] Queries becoming slow under heavy load On 01/27/2011 11:12 PM, Anne Rosset wrote: Thanks for your response. We are over NFS for our storage ... NFS? I'm not sure you know this, but NFS has major locking issues that would make it a terrible candidate for hosting a database. and it's not until around the 221 second mark that we see catch it consuming CPU: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 7090 root 25 0 689m 399m 10m R 93.4 5.0 3872:07 java 28312 postgres 16 0 396m 225m 204m R 5.7 2.8 0:51.52 postmaster- here 3391 root 15 0 29056 2348 1424 R 1.9 0.0 0:00.01 top 4297 root 16 0 10228 740 632 D 0.0 0.0 12:53.66 hald-addon-stor 26885 httpd 15 0 2263m 1.5g 16m R 0.0 19.0 0:00.01 java Note that the load average is fine during this timeframe, ~4 out of 8, so plenty of CPU. Please listen to us. We asked you to use sar, or iostat, to tell us how much the disk IO is being utilized. From your other screenshots, there were at least two other PG processes that were running and could have been thrashing the disk or locking tables your slow query needed. If it's waiting for disk IO, the CPU will remain low until it gets what it needs. Not everything is about the CPU. Especially now that we know your DB is running on top of NFS. Further, or worse yet, this same behavior expands out to multiple processes, producing a true back up. It can look something like this. Notice the 0% cpu consumption: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 7090 root 22 0 689m 399m 10m R 91.1 5.0 3874:32 java 4139 root 15 0 29080 2344 1424 R 1.9 0.0 0:00.01 top 1555 postgres 16 0 474m 258m 162m D 0.0 3.2 0:17.32 postmaster 1846 postgres 16 0 474m 285m 189m D 0.0 3.6 0:47.43 postmaster 2713 postgres 16 0 404m 202m 179m D 0.0 2.5 0:33.54 postmaster 2801 postgres 16 0 391m 146m 131m D 0.0 1.8 0:04.48 postmaster 2804 postgres 16 0 419m 172m 133m D 0.0 2.2 0:09.41 postmaster 2825 postgres 16 0 473m 142m 49m D 0.0 1.8 0:04.12 postmaster Yes. And they could all be waiting for IO. Or NFS locking is blocking the reads. Or... what is that Java app doing? We don't know the state of your IO, and when you have 0% or very low CPU usage, you either have locking contention or you're being IO starved. And what queries are these connections performing? You can check it by getting the contents of the pg_stat_activity system view. If they're selecting and still slow, compare that against the iostat or sar results. For instance, here's an IOSTAT of our system: iostat -dmx dm-9 1 Linux 2.6.18-92.el5 (oslchi6pedb1) 01/28/2011 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util dm-9 0.00 0.00 125.46 227.78 4.95 0.8933.88 0.080.19 0.08 2.91 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util dm-9 0.00 0.00 5.00 0.00 0.04 0.0014.40 0.05 10.60 10.60 5.30 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util dm-9 0.00 0.00 2.00 0.00 0.02 0.0016.00 0.017.00 7.00 1.40 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util dm-9 0.00 0.00 4.00 1184.00 0.04 4.62 8.04 27.23 11.73 0.06 6.80 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util dm-9 0.00 0.00 11.00 847.00 0.09 3.31 8.10 29.31 49.65 0.79 67.90 That last column, %util, effectively tells us how saturated the controller is. If the percentage is high, it's really working hard to supply the data we're asking for, or trying to write. If it's low, we're probably working from memory cache, or getting less requests. There have been times our queries are slow and when we check this stat, it's often at or above 90%, sometimes for minutes at a time. That's almost always a clear indicator you have IO contention. Queries can't work without the data they need to return your results. Sending us more CPU charts isn't going to help us in helping you. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com
Re: [PERFORM] Exhaustive list of what takes what locks
On Tue, Feb 1, 2011 at 2:18 PM, Nikolas Everett nik9...@gmail.com wrote: This isn't exactly how our workload actually works. Ours is more deadlock prone. We have many connections all querying account and we do the migration in a transaction. It looks as though the AccessExclusiveLock is held until the transaction terminates. Unfortunately, that's necessary for correctness. :-( I'd really like to figure out some way to make these cases work with less locking. 9.1 will have some improvements in this area, as regards ALTER TABLE, but dropping a constraint will still require AccessExclusiveLock. There are even workloads where competition for AccessShareLock on the target table is a performance bottleneck (try pgbench -S -c 36 -j 36 or so). I've been idly mulling over whether there's any way to eliminate that locking or at least make it uncontended in the common case, but so far haven't thought of a solution that I'm entirely happy with. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
mladen.gog...@vmsinfo.com (Mladen Gogala) writes: Hints are not even that complicated to program. The SQL parser should compile the list of hints into a table and optimizer should check whether any of the applicable access methods exist in the table. If it does - use it. If not, ignore it. This looks to me like a philosophical issue, not a programming issue. It's worth looking back to what has already been elaborated on in the ToDo. http://wiki.postgresql.org/wiki/Todo --- Optimizer hints (not wanted) Optimizer hints are used to work around problems in the optimizer and introduce upgrade and maintenance issues. We would rather have the problems reported and fixed. We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. --- The complaint is that kludging hints into a particular query attacks the problem from the wrong direction. The alternative recommended is to collect some declarative information, that *won't* be part of the query, that *won't* be processed by the parser, and that *won't* kludge up the query with information that is liable to turn into crud over time. Tom Lane was pretty specific about some kinds of declarative information that seemed useful: http://archives.postgresql.org/pgsql-hackers/2006-10/msg00663.php On Jeapordy, participants are expected to phrase one's answers in the form of a question, and doing so is rewarded. Based on the presence of query hints on the Not Wanted portion of the ToDo list, it's pretty clear that participants here are expected to propose optimizer hints in ways that do NOT involve decorating queries with crud. You'll get a vastly friendlier response if you at least make an attempt to attack the problem in the declarative information fashion. Perhaps we're all wrong in believing that pushing query optimization information into application queries by decorating the application with hints, is the right idea but it's a belief that certainly seems to be regularly agreed upon by gentle readers. -- cbbrowne,@,linuxdatabases.info The people's revolutionary committee has decided that the name e is retrogressive, unmulticious and reactionary, and has been flushed. Please update your abbrevs. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On Thu, Feb 3, 2011 at 4:57 PM, gnuo...@rcn.com wrote: Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And it has to because the Big Boys already do so, to some extent, and they've realized that the BCNF schema on such machines is supremely efficient. PG/MySql/OSEngineOfChoice will get left behind simply because the efficiency offered will be worth the price. this kind of view on what postgres community has to do can only be true if postgres has no intention to support cloud environments or any kind of hardware virtualization. while i'm sure targeting specific hardware features can greatly improve postgres performance it should be an option not a requirement. forcing users to have specific hardware is basically telling users that you can forget about using postgres in amazon/rackspace cloud environments (or any similar environment). i'm sure that a large part of postgres community doesn't care about cloud environments (although this is only my personal impression) but if plan is to disable postgres usage in such environments you are basically loosing a large part of developers/companies targeting global internet consumers with their online products. cloud environments are currently the best platform for internet oriented developers/companies to start a new project or even to migrate from custom hardware/dedicated data center. Much as it pains me to say it, but the MicroSoft approach to software: write to the next generation processor and force users to upgrade, will be the winning strategy for database engines. There's just way too much to gain. it can arguably be said that because of this approach microsoft is losing ground in most of their businesses/strategies. Aljosa Mohorovic -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load,
On Thu, Jan 27, 2011 at 5:31 AM, Michael Kohl michael.k...@tupalo.com wrote: we are running a fairly big Ruby on Rails application on Postgres 8.4. Our traffic grew quite a bit lately, and since then we are facing DB performance issues. System load occasionally explodes (around 170 yesterday on a 16 core system), which seems to be caused by disk I/O (iowait in our Munin graphs goes up significantly during these periods). At other times the laod stays rather low under pretty much the same circumstances. [...] [1] http://www.linux.com/learn/tutorials/394523-configuring-postgresql-for-pretty-good-performance [2] http://www.pgcon.org/2010/schedule/events/210.en.html At the risk of shameless self-promotion, you might also find this helpful: http://rhaas.blogspot.com/2010/12/troubleshooting-database.html It's fairly basic but it might at least get you pointed in the right direction... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Mladen Gogala wrote: With all due respect, I don't see how does the issue of hints fall into this category? As I explained, the mechanisms are already there, they're just not elegant enough. You're making some assumptions about what a more elegant mechanism would look to develop that are simplifying the actual situation here. If you take a survey of everyone who ever works on this area of the code, and responses to this thread are already approaching a significant percentage of such people, you'll discover that doing what you want is more difficult--and very much not elegant enough from the perspective of the code involved--than you think it would be. It's actually kind of funny...I've run into more than one person who charged into the PostgreSQL source code with the goal of I'm going to add good hinting! But it seems like the minute anyone gets enough understanding of how it fits together to actually do that, they realize there are just plain better things to be done in there instead. I used to be in the same situation you're in--thinking that all it would take is a better UI for tweaking the existing parameters. But now that I've actually done such tweaking for long enough to get a feel for what's really wrong with the underlying assumptions, I can name 3 better uses of development resources that I'd rather work on instead. I mentioned incorporating cache visibility already, Robert has talked about improvements to the sensitivity estimates, and the third one is improving pooling of work_mem so individual clients can get more of it safely. Well, those two databases are also used much more widely than Postgres, which means that they're doing something better than Postgres. Starting earlier is the only better here. Obviously Oracle got a much earlier start than either open-source database. The real divergence in MySQL adoption relative to PostgreSQL was when they released a Windows port in January of 1998. PostgreSQL didn't really match that with a fully native port until January of 2005. Check out http://www.indeed.com/jobtrends?q=postgres%2C+mysql%2C+oraclerelative=1relative=1 if you want to see the real story here. Oracle has a large installed base, but it's considered a troublesome legacy product being replaced whenever possible now in every place I visit. Obviously my view of the world as seen through my client feedback is skewed a bit toward PostgreSQL adoption. But you would be hard pressed to support any view that suggests Oracle usage is anything other than flat or decreasing at this point. When usage of one product is growing at an expontential rate and the other is not growing at all, eventually the market share curves always cross too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On Thu, Feb 3, 2011 at 8:57 AM, gnuo...@rcn.com wrote: Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And I'm pretty sure multi-core query processing is in the TODO list. Not sure anyone's working on it tho. Writing a big check might help. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: FW: [PERFORM] Queries becoming slow under heavy load
Excellent! And you learned a bit more about how to monitor your server while you were at it. Win win! On Thu, Feb 3, 2011 at 10:40 AM, Anne Rosset aros...@collab.net wrote: Thanks to all of you who replied and pointed NFS as a potential culprit. Our issue was that pgsql's temp dir (pgsql_tmp) was set to the default value ( $PSQL_DIR/base/pgsql_tmp/) which was located in NFS. Moving the temp dir to local disk got us a huge improvement. Anne -Original Message- From: Shaun Thomas [mailto:stho...@peak6.com] Sent: Friday, January 28, 2011 7:31 AM To: Anne Rosset Cc: pgsql-performance@postgresql.org Subject: Re: FW: [PERFORM] Queries becoming slow under heavy load On 01/27/2011 11:12 PM, Anne Rosset wrote: Thanks for your response. We are over NFS for our storage ... NFS? I'm not sure you know this, but NFS has major locking issues that would make it a terrible candidate for hosting a database. and it's not until around the 221 second mark that we see catch it consuming CPU: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7090 root 25 0 689m 399m 10m R 93.4 5.0 3872:07 java 28312 postgres 16 0 396m 225m 204m R 5.7 2.8 0:51.52 postmaster- here 3391 root 15 0 29056 2348 1424 R 1.9 0.0 0:00.01 top 4297 root 16 0 10228 740 632 D 0.0 0.0 12:53.66 hald-addon-stor 26885 httpd 15 0 2263m 1.5g 16m R 0.0 19.0 0:00.01 java Note that the load average is fine during this timeframe, ~4 out of 8, so plenty of CPU. Please listen to us. We asked you to use sar, or iostat, to tell us how much the disk IO is being utilized. From your other screenshots, there were at least two other PG processes that were running and could have been thrashing the disk or locking tables your slow query needed. If it's waiting for disk IO, the CPU will remain low until it gets what it needs. Not everything is about the CPU. Especially now that we know your DB is running on top of NFS. Further, or worse yet, this same behavior expands out to multiple processes, producing a true back up. It can look something like this. Notice the 0% cpu consumption: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7090 root 22 0 689m 399m 10m R 91.1 5.0 3874:32 java 4139 root 15 0 29080 2344 1424 R 1.9 0.0 0:00.01 top 1555 postgres 16 0 474m 258m 162m D 0.0 3.2 0:17.32 postmaster 1846 postgres 16 0 474m 285m 189m D 0.0 3.6 0:47.43 postmaster 2713 postgres 16 0 404m 202m 179m D 0.0 2.5 0:33.54 postmaster 2801 postgres 16 0 391m 146m 131m D 0.0 1.8 0:04.48 postmaster 2804 postgres 16 0 419m 172m 133m D 0.0 2.2 0:09.41 postmaster 2825 postgres 16 0 473m 142m 49m D 0.0 1.8 0:04.12 postmaster Yes. And they could all be waiting for IO. Or NFS locking is blocking the reads. Or... what is that Java app doing? We don't know the state of your IO, and when you have 0% or very low CPU usage, you either have locking contention or you're being IO starved. And what queries are these connections performing? You can check it by getting the contents of the pg_stat_activity system view. If they're selecting and still slow, compare that against the iostat or sar results. For instance, here's an IOSTAT of our system: iostat -dmx dm-9 1 Linux 2.6.18-92.el5 (oslchi6pedb1) 01/28/2011 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util dm-9 0.00 0.00 125.46 227.78 4.95 0.89 33.88 0.08 0.19 0.08 2.91 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util dm-9 0.00 0.00 5.00 0.00 0.04 0.00 14.40 0.05 10.60 10.60 5.30 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util dm-9 0.00 0.00 2.00 0.00 0.02 0.00 16.00 0.01 7.00 7.00 1.40 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util dm-9 0.00 0.00 4.00 1184.00 0.04 4.62 8.04 27.23 11.73 0.06 6.80 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util dm-9 0.00 0.00 11.00 847.00 0.09 3.31 8.10 29.31 49.65 0.79 67.90 That last column, %util, effectively tells us how saturated the controller is. If the percentage is high, it's really working hard to supply the data we're asking for, or trying to write. If it's low, we're probably working from memory cache, or getting less requests. There have been times our queries are slow and when we check this stat, it's often at or above 90%, sometimes for minutes at a time. That's almost always a clear indicator you have IO contention. Queries can't work without the data
Re: [PERFORM] How to best use 32 15k.7 300GB drives?
2011/1/30 Віталій Тимчишин tiv...@gmail.com: I was thinking if a table file could be deleted if it has no single live row. And if this could be done by vacuum. In this case vacuum on table that was fully updated recently could be almost as good as cluster - any scan would skip such non-existing files really fast. Also almost no disk space would be wasted. VACUUM actually already does something along these lines. If there are 1 or any larger number of entirely-free pages at the end of a table, VACUUM will truncate them away. In the degenerate case where ALL pages are entirely-free, this results in zeroing out the file. The problem with this is that it rarely does much. Consider a table with 1,000,000 pages, 50% of which contain live rows. On average, how many pages will this algorithm truncate away? Answer: if the pages containing live rows are randomly distributed, approximately one. (Proof: There is a 50% chance that the last page will contain live rows. If so, we can't truncate anything. If not, we can truncate one page, and maybe more. Now the chances of the next page being free are 499,999 in 999,999, or roughly one-half. So we have an almost-25% chance of being able to truncate at least two pages. And so on. So you get roughly 1/2 + 1/4 + 1/8 + 1/16 + 1/32 + ... = 1 page.) Your idea of having a set of heaps rather than a single heap is an interesting one, but it's pretty much catering to the very specific case of a full-table update. I think the code changes needed would be far too invasive to seriously contemplate doing it just for that one case - although it is an important case that I would like to see us improve. Tom Lane previously objected to the idea of on-line table compaction on the grounds that people's apps might break if CTIDs changed under them, but I think a brawl between all the people who want on-line table compaction and all the people who want to avoid unexpected CTID changes would be pretty short. A bigger problem - or at least another problem - is that moving tuples this way is cumbersome and expensive. You basically have to move some tuples (inserting new index entries for them), vacuum away the old index entries (requiring a full scan of every index), and then repeat as many times as necessary to shrink the table. This is not exactly a smooth maintenance procedure, or one that can be done without significant disruption, but AFAIK nobody's come up with a better idea yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to best use 32 15k.7 300GB drives?
There is a process in Oracle which essentially allows you to do the equivalent of a CLUSTER in Postgres, but without locking the table, and so updates can continue throughout the process. It requires a bit of manual setup and fiddling (at least in Oracle 10g) this would probably scratch a lot of people's itches in this area. Of course, it's not trivial at all to implement :-( The Oracle equivalent of too many dead rows is too many chained rows and that's where I've seen it used. Cheers Dave 2011/2/3 Robert Haas robertmh...@gmail.com 2011/1/30 Віталій Тимчишин tiv...@gmail.com: I was thinking if a table file could be deleted if it has no single live row. And if this could be done by vacuum. In this case vacuum on table that was fully updated recently could be almost as good as cluster - any scan would skip such non-existing files really fast. Also almost no disk space would be wasted. VACUUM actually already does something along these lines. If there are 1 or any larger number of entirely-free pages at the end of a table, VACUUM will truncate them away. In the degenerate case where ALL pages are entirely-free, this results in zeroing out the file. The problem with this is that it rarely does much. Consider a table with 1,000,000 pages, 50% of which contain live rows. On average, how many pages will this algorithm truncate away? Answer: if the pages containing live rows are randomly distributed, approximately one. (Proof: There is a 50% chance that the last page will contain live rows. If so, we can't truncate anything. If not, we can truncate one page, and maybe more. Now the chances of the next page being free are 499,999 in 999,999, or roughly one-half. So we have an almost-25% chance of being able to truncate at least two pages. And so on. So you get roughly 1/2 + 1/4 + 1/8 + 1/16 + 1/32 + ... = 1 page.) Your idea of having a set of heaps rather than a single heap is an interesting one, but it's pretty much catering to the very specific case of a full-table update. I think the code changes needed would be far too invasive to seriously contemplate doing it just for that one case - although it is an important case that I would like to see us improve. Tom Lane previously objected to the idea of on-line table compaction on the grounds that people's apps might break if CTIDs changed under them, but I think a brawl between all the people who want on-line table compaction and all the people who want to avoid unexpected CTID changes would be pretty short. A bigger problem - or at least another problem - is that moving tuples this way is cumbersome and expensive. You basically have to move some tuples (inserting new index entries for them), vacuum away the old index entries (requiring a full scan of every index), and then repeat as many times as necessary to shrink the table. This is not exactly a smooth maintenance procedure, or one that can be done without significant disruption, but AFAIK nobody's come up with a better idea yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Chris Browne wrote: It's worth looking back to what has already been elaborated on in the ToDo. And that precisely is what I am trying to contest. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
Original message Date: Thu, 3 Feb 2011 18:56:34 +0100 From: pgsql-performance-ow...@postgresql.org (on behalf of Aljoša Mohorović aljosa.mohoro...@gmail.com) Subject: Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements To: gnuo...@rcn.com Cc: pgsql-performance@postgresql.org On Thu, Feb 3, 2011 at 4:57 PM, gnuo...@rcn.com wrote: Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And it has to because the Big Boys already do so, to some extent, and they've realized that the BCNF schema on such machines is supremely efficient. PG/MySql/OSEngineOfChoice will get left behind simply because the efficiency offered will be worth the price. this kind of view on what postgres community has to do can only be true if postgres has no intention to support cloud environments or any kind of hardware virtualization. while i'm sure targeting specific hardware features can greatly improve postgres performance it should be an option not a requirement. Being an option is just fine. It's not there now. Asserting that the cloud meme, based on lowest cost marginal hardware, should dictate a database engine is putting the cart before the horse. forcing users to have specific hardware is basically telling users that you can forget about using postgres in amazon/rackspace cloud environments (or any similar environment). Just not on cheap clouds, if they want maximal performance from the engine using BCNF schemas. Replicating COBOL/VSAM/flatfile applications in any relational database engine is merely deluding oneself. i'm sure that a large part of postgres community doesn't care about cloud environments (although this is only my personal impression) but if plan is to disable postgres usage in such environments you are basically loosing a large part of developers/companies targeting global internet consumers with their online products. cloud environments are currently the best platform for internet oriented developers/companies to start a new project or even to migrate from custom hardware/dedicated data center. Much as it pains me to say it, but the MicroSoft approach to software: write to the next generation processor and force users to upgrade, will be the winning strategy for database engines. There's just way too much to gain. it can arguably be said that because of this approach microsoft is losing ground in most of their businesses/strategies. Not really. MicroSoft is losing ground for the same reason all other client/standalone applications are: such applications don't run any better on multi-core/processor machines. Add in the netbook/phone devices, and that they can't seem to make a version of windows that's markedly better than XP. Arguably MicroSoft is failing *because Office no longer requires* the next generation hardware to run right. Hmm? Linux prospers because it's a server OS, largely. Desktop may, or may not, remain relevant. Linux does make good use of such machines. MicroSoft applications? Not so much. Aljosa Mohorovic -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 03, 2011 at 12:44:23PM -0500, Chris Browne wrote: mladen.gog...@vmsinfo.com (Mladen Gogala) writes: Hints are not even that complicated to program. The SQL parser should compile the list of hints into a table and optimizer should check whether any of the applicable access methods exist in the table. If it does - use it. If not, ignore it. This looks to me like a philosophical issue, not a programming issue. It's worth looking back to what has already been elaborated on in the ToDo. http://wiki.postgresql.org/wiki/Todo --- Optimizer hints (not wanted) Optimizer hints are used to work around problems in the optimizer and introduce upgrade and maintenance issues. We would rather have the problems reported and fixed. We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. And as to the 'wait around for a new version to fix that': there are constantly excellent examples of exactly this happening, all the time with PostgreSQL - most recent example I've seen - http://archives.postgresql.org/pgsql-performance/2011-01/msg00337.php The wait often isn't long, at all. Ross -- Ross Reedstrom, Ph.D. reeds...@rice.edu Systems Engineer Admin, Research Scientistphone: 713-348-6166 Connexions http://cnx.orgfax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, 3 Feb 2011, Robert Haas wrote: On Thu, Feb 3, 2011 at 5:11 AM, da...@lang.hm wrote: If I am understanding things correctly, a full Analyze is going over all the data in the table to figure out patterns. No. It's going over a small, fixed-size sample which depends on default_statistics_target but NOT on the table size. It's really important to come up with a solution that's not susceptible to running ANALYZE over and over again, in many cases unnecessarily. If this is the case, wouldn't it make sense in the situation where you are loading an entire table from scratch to run the Analyze as you are processing the data? If you don't want to slow down the main thread that's inserting the data, you could copy the data to a second thread and do the analysis while it's still in RAM rather than having to read it off of disk afterwords. Well that's basically what autoanalyze is going to do anyway, if the table is small enough to fit in shared_buffers. And it's actually usually BAD if it starts running while you're doing a large bulk load, because it competes for I/O bandwidth and the buffer cache and slows things down. Especially when you're bulk loading for a long time and it tries to run over and over. I'd really like to suppress all those asynchronous ANALYZE operations and instead do ONE synchronous one at the end, when we try to use the data. If the table is not large enough to fit in ram, then it will compete for I/O, and the user will have to wait. what I'm proposing is that as the records are created, the process doing the creation makes copies of the records (either all of them, or some of them if not all are needed for the analysis, possibly via shareing memory with the analysis process), this would be synchronous with the load, not asynchronous. this would take zero I/O bandwidth, it would take up some ram, memory bandwidth, and cpu time, but a load of a large table like this is I/O contrained. it would not make sense for this to be the default, but as an option it should save a significant amount of time. I am making the assumption that an Analyze run only has to go over the data once (a seqential scan of the table if it's ram for example) and gathers stats as it goes. with the current code, this is a completely separate process that knows nothing about the load, so if you kick it off when you start the load, it makes a pass over the table (competing for I/O), finishes, you continue to update the table, so it makes another pass, etc. As you say, this is a bad thing to do. I am saying to have an option that ties the two togeather, essentially making the data feed into the Analyze run be a fork of the data comeing out of the insert run going to disk. So the Analyze run doesn't do any I/O and isn't going to complete until the insert is complete. At which time it will have seen one copy of the entire table. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Shaun Thomas wrote: On 02/03/2011 10:38 AM, Mladen Gogala wrote: It all boils down to the database. Hints, whether they're well-intentioned or not, effectively cover up bugs in the optimizer, planner, or some other approach the database is using to build its execution. Hints don't cover up bugs, they simply make it possible for the user to circumvent the bugs and keep the users happy. As I hinted before, this is actually a purist argument which was made by someone who has never had to support a massive production database with many users for living. Your analogy is that PG is a gas stove, so bundle a fire extinguisher. Well, the devs believe that the stove should be upgraded to electric or possibly even induction to remove the need for the extinguisher. In the meantime, the fire is burning. What should the hapless owner of the database application do in the meantime? Tell the users that it will be better in the next version? As I've said before: hints are make it or break it point. Without hints, I cannot consider Postgres for the mission critical projects. I am managing big databases for living and I flatter myself that after more than two decades of doing it, I am not too bad at it. If they left hints in, it would just be one more thing to deprecate as the original need for the hint was removed. If you really need hints that badly, EnterpriseDB cobbled the Oracle syntax into the planner, and it seems to work alright. That doesn't mean it's right, just that it works. EnterpriseDB will now have to support those query hints forever, even if the planner gets so advanced they're effectively useless. I don't foresee that to happen in my lifetime. And I plan to go on for quite a while. There will always be optimizer bugs, users will be smarter and know more about their data than computer programs in foreseeable future. What this attitude boils down to is that developers don't trust their users enough to give them control of the execution path. I profoundly disagree with that type of philosophy. DB2 also has hints: http://tinyurl.com/48fv7w7 So does SQL Server: http://www.sql-server-performance.com/tips/hints_general_p1.aspx Finally, even the Postgres greatest open source competitor MySQL supports hints: http://dev.mysql.com/doc/refman/5.0/en/index-hints.html I must say that this purist attitude is extremely surprising to me. All the major DB vendors support optimizer hints, yet in the Postgres community, they are considered bad with almost religious fervor. Postgres community is quite unique with the fatwa against hints. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
mladen.gog...@vmsinfo.com (Mladen Gogala) writes: I must say that this purist attitude is extremely surprising to me. All the major DB vendors support optimizer hints, yet in the Postgres community, they are considered bad with almost religious fervor. Postgres community is quite unique with the fatwa against hints. Well, the community declines to add hints until there is actual consensus on a good way to add hints. Nobody has ever proposed a way to add hints where consensus was arrived at that the way was good, so... -- http://linuxfinances.info/info/nonrdbms.html Rules of the Evil Overlord #192. If I appoint someone as my consort, I will not subsequently inform her that she is being replaced by a younger, more attractive woman. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Mladen Gogala mladen.gog...@vmsinfo.com wrote: In the meantime, the fire is burning. What should the hapless owner of the database application do in the meantime? Tell the users that it will be better in the next version? As I've said before: hints are make it or break it point. Without hints, I cannot consider Postgres for the mission critical projects. I am managing big databases for living and I flatter myself that after more than two decades of doing it, I am not too bad at it. Well, I've been at it since 1972, and I'm OK with the current situation because I push hard for *testing* in advance of production deployment. So I generally discover that leaving a pan of grease on maximum flame unattended is a bad idea in the test lab, where no serious damage is done. Then I take steps to ensure that this doesn't happen in the user world. We've got about 100 production databases, some at 2TB and growing, and 100 development, testing, and staging databases. About 3,000 directly connected users and millions of web hits per day generating tens of millions of queries. Lots of fun replication and automated interfaces to business partners -- DOT, county sheriffs, local police agencies, district attorneys, public defenders offices, Department of Revenue (for tax intercept collections), Department of Justice, etc. (That was really just the tip of the iceberg.) Almost all of this was converted inside of a year with minimal fuss and only a one user complaint that I can recall. Most users described it as a non-event, with the only visible difference being that applications were snappier than under the commercial database product. One type of query was slow in Milwaukee County (our largest). We tuned seq_page_cost and random_page_cost until all queries were running with good plans. It did not require any down time to sort this out and fix it -- same day turnaround. This is not a matter of hinting; it's a matter of creating a cost model for the planner which matches reality. (We don't set this or any other hint per query, we tune the model.) When the cost estimates mirror reality, good plans are chosen. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On 02/03/2011 03:01 PM, Mladen Gogala wrote: As I hinted before, this is actually a purist argument which was made by someone who has never had to support a massive production database with many users for living. Our database handles 9000 transactions per second and over 200-million transactions per day just fine, thanks. It may not be a real database in your world, but it's real enough for us. I must say that this purist attitude is extremely surprising to me. All the major DB vendors support optimizer hints, yet in the Postgres community, they are considered bad with almost religious fervor. Postgres community is quite unique with the fatwa against hints. You missed the argument. The community, or at least the devs, see hints as an ugly hack. Do I agree? Not completely, but I can definitely understand the perspective. Saying every other vendor has hints is really just admitting every other vendor has a crappy optimizer. Is that something to be proud of? In almost every single case I've seen a query with bad performance, it's the fault of the author or the DBA. Not enough where clauses; not paying attention to cardinality or selectivity; inappropriate or misapplied indexes; insufficient table statistics... the list of worse grievances out there is endless. And here's something I never saw you consider: hints making performance worse. Sure, for now, forcing a sequence scan or forcing it to use indexes on a specific table is faster for some specific edge-case. But hints are like most code, and tend to miss frequent refactor. As the optimizer improves, hints likely won't, meaning code is likely to be slower than if the hints didn't exist. This of course ignores the contents of a table are likely to evolve or grow in volume, which can also drastically alter the path the optimizer would choose, but can't because a hint is forcing it to take a specific path. Want to remove a reverse index scan? Reindex with DESC on the column being reversed. That was added in 8.3. Getting too many calls for nested loops when a merge or hash would be faster? Increase the statistics target for the column causing the problems and re-analyze. Find an actual bug in the optimizer? Tell the devs and they'll fix it. Just stay current, and you get all those benefits. This is true for any database; bugs get fixed, things get faster and more secure. Or like I said, if you really need hints that badly, use EnterpriseDB instead. It's basically completely Oracle-compatible at this point. But pestering the PostgreSQL dev community about how inferior they are, and how they're doing it wrong, and how they're just another vendor making a database product that can't support massive production databases, is doing nothing but ensuring they'll ignore you. Flies, honey, vinegar, etc. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Chris Browne wrote: Well, the community declines to add hints until there is actual consensus on a good way to add hints. OK. That's another matter entirely. Who should make that decision? Is there a committee or a person who would be capable of making that decision? Nobody has ever proposed a way to add hints where consensus was arrived at that the way was good, so... So, I will have to go back on my decision to use Postgres and re-consider MySQL? I will rather throw away the effort invested in studying Postgres than to risk an unfixable application downtime. I am not sure about the world domination thing, though. Optimizer hints are a big feature that everybody else has and Postgres does not have because of religious reasons. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On 04/02/11 10:01, Mladen Gogala wrote: In the meantime, the fire is burning. What should the hapless owner of the database application do in the meantime? Tell the users that it will be better in the next version? As I've said before: hints are make it or break it point. Without hints, I cannot consider Postgres for the mission critical projects. I am managing big databases for living and I flatter myself that after more than two decades of doing it, I am not too bad at it. This is somewhat of a straw man argument. This sort of query that the optimizer does badly usually gets noticed during the test cycle i.e before production, so there is some lead time to get a fix into the code, or add/subtract indexes/redesign the query concerned. The cases I've seen in production typically involve outgrowing optimizer parameter settings: (e.g work_mem, effective_cache_size) as the application dataset gets bigger over time. I would note that this is *more* likely to happen with hints, as they lobotomize the optimizer so it *cannot* react to dataset size or distribution changes. regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Shaun Thomas wrote: You missed the argument. The community, or at least the devs, see hints as an ugly hack. Do I agree? Not completely, but I can definitely understand the perspective. Saying every other vendor has hints is really just admitting every other vendor has a crappy optimizer. Is that something to be proud of? This is funny? Everybody else has a crappy optimizer? That's a funny way of looking at the fact that every other major database supports hints. I would be tempted to call that a major missing feature, but the statement that everybody else has a crappy optimizer sounds kind of funny. No disrespect meant. It's not unlike claiming that the Earth is 6000 years old. And here's something I never saw you consider: hints making performance worse. Sure. If you give me the steering wheell, there is a chance that I might get car off the cliff or even run someone over, but that doesn't mean that there is no need for having one. After all, we're talking about the ability to control the optimizer decision. Want to remove a reverse index scan? Reindex with DESC on the column being reversed. That was added in 8.3. Getting too many calls for nested loops when a merge or hash would be faster? Increase the statistics target for the column causing the problems and re-analyze. Find an actual bug in the optimizer? Tell the devs and they'll fix it. Just stay current, and you get all those benefits. This is true for any database; bugs get fixed, things get faster and more secure. In the meantime, the other databases provide hints which help me bridge the gap. As I said before: hints are there, even if they were not meant to be used that way. I can do things in a way that I consider very non-elegant. The hints are there because they are definitely needed. Yet, there is a religious zeal and a fatwa against them. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Feb 3, 2011, at 1:50 PM, Mladen Gogala wrote: So, I will have to go back on my decision to use Postgres and re-consider MySQL? I will rather throw away the effort invested in studying Postgres than to risk an unfixable application downtime. I am not sure about the world domination thing, though. Optimizer hints are a big feature that everybody else has and Postgres does not have because of religious reasons. As always, you should use the tool you consider best for the job. If you think MySQL as both a product and a community has a better chance of giving you what you want, then you should use MySQL. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
2011/2/3 Mladen Gogala mladen.gog...@vmsinfo.com: Chris Browne wrote: Well, the community declines to add hints until there is actual consensus on a good way to add hints. OK. That's another matter entirely. Who should make that decision? Is there a committee or a person who would be capable of making that decision? Because there are not consensus about hints, then hints are not in pg. And community development must be based on consensus. There are not second way. Hints are not a win from some reasons. Situation isn't immutable. There are a lot of features, that was rejected first time - like replication. But it needs a different access. You have to show tests, use cases, code and you have to satisfy all people, so your request is good and necessary. Argument, so other databases has this feature is a last on top ten. Nobody has ever proposed a way to add hints where consensus was arrived at that the way was good, so... So, I will have to go back on my decision to use Postgres and re-consider MySQL? I will rather throw away the effort invested in studying Postgres than to risk an unfixable application downtime. I am not sure about the world domination thing, though. Optimizer hints are a big feature that everybody else has and Postgres does not have because of religious reasons. it's not correct from you. There is a real arguments against hints. you can try a edb. There is a other external modul http://postgresql.1045698.n5.nabble.com/contrib-plantuner-enable-PostgreSQL-planner-hints-td1924794.html Regards Pavel Stehule -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On 2/3/11 1:18 PM, Chris Browne wrote: mladen.gog...@vmsinfo.com (Mladen Gogala) writes: I must say that this purist attitude is extremely surprising to me. All the major DB vendors support optimizer hints, I don't think that's actually accurate. Can you give me a list of DBMSes which support hints other than Oracle? Well, the community declines to add hints until there is actual consensus on a good way to add hints. Nobody has ever proposed a way to add hints where consensus was arrived at that the way was good, so... Well, we did actually have some pretty good proposals (IIRC) for selectively adjusting the cost model to take into account DBA knowledge. These needed some refinement, but in general seem like the right way to go. However, since this system wasn't directly compatible with Oracle Hints, folks pushing for hints dropped the solution as unsatisfactory. This is the discussion we have every time: the users who want hints specifically want hints which work exactly like Oracle's, and aren't interested in a system designed for PostgreSQL. It's gotten very boring; it's like the requests to support MySQL-only syntax. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
In the meantime, the other databases provide hints which help me bridge the gap. As I said before: hints are there, even if they were not meant to be used that way. I can do things in a way that I consider very non-elegant. The hints are there because they are definitely needed. Yet, there is a religious zeal and a fatwa against them. Other databases has different development model. It isn't based on consensus. The are not any commercial model for PostgreSQL. There are not possible to pay programmers. So you can pay and as customer, you are boss or use it freely and search a consensus - a common talk. Regards Pavel Stehule -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Josh Berkus wrote: However, since this system wasn't directly compatible with Oracle Hints, folks pushing for hints dropped the solution as unsatisfactory. This is the discussion we have every time: the users who want hints specifically want hints which work exactly like Oracle's, and aren't interested in a system designed for PostgreSQL. It's gotten very boring; it's like the requests to support MySQL-only syntax. Actually, I don't want Oracle hints. Oracle hints are ugly and cumbersome. I would prefer something like this: http://dev.mysql.com/doc/refman/5.0/en/index-hints.html That should also answer the question about other databases supporting hints. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Mladen Gogala wrote: Actually, I don't want Oracle hints. Oracle hints are ugly and cumbersome. I would prefer something like this: http://dev.mysql.com/doc/refman/5.0/en/index-hints.html That should also answer the question about other databases supporting hints. Sorry. I forgot that MySQL too is now an Oracle product. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
The hints are there because they are definitely needed. Yet, there is a religious zeal and a fatwa against them. The opposition is philosophical, not religious. There is no fatwa. If you want a serious discussion, avoid inflammatory terms. --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 www.truviso.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On 04/02/11 11:08, Josh Berkus wrote: I don't think that's actually accurate. Can you give me a list of DBMSes which support hints other than Oracle? DB2 LUW (Linux, Unix, Win32 code base) has hint profiles: http://justdb2chatter.blogspot.com/2008/06/db2-hints-optimizer-selection.html -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
da...@lang.hm wrote: I am making the assumption that an Analyze run only has to go over the data once (a seqential scan of the table if it's ram for example) and gathers stats as it goes. And that's the part there's some confusion about here. ANALYZE grabs a random set of samples from the table, the number of which is guided by the setting for default_statistics_target. The amount of time it takes is not proportional to the table size; it's only proportional to the sampling size. Adding a process whose overhead is proportional to the table size, such as the continuous analyze idea you're proposing, is quite likely to be a big step backwards relative to just running a single ANALYZE after the loading is finished. What people should be doing if concerned about multiple passes happening is something like this: CREATE TABLE t (s serial, i integer) WITH (autovacuum_enabled=off); [populate table] ANALYZE t; ALTER TABLE t SET (autovacuum_enabled=on); I'm not optimistic the database will ever get smart enough to recognize bulk loading and do this sort of thing automatically, but as the workaround is so simple it's hard to get motivated to work on trying. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Maciek Sakrejda wrote: The hints are there because they are definitely needed. Yet, there is a religious zeal and a fatwa against them. The opposition is philosophical, not religious. There is no fatwa. If you want a serious discussion, avoid inflammatory terms. I don't want to insult anybody but the whole thing does look strange. Maybe we can agree to remove that ridiculous we don't want hints note from Postgresql wiki? That would make it look less like , hmph, philosophical issue and more not yet implemented issue, especially if we have in mind that hints are already here, in the form of enable_method switches. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Mark Kirkwood wrote: On 04/02/11 11:08, Josh Berkus wrote: I don't think that's actually accurate. Can you give me a list of DBMSes which support hints other than Oracle? DB2 LUW (Linux, Unix, Win32 code base) has hint profiles: http://justdb2chatter.blogspot.com/2008/06/db2-hints-optimizer-selection.html SQL Server and MySQL too. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Feb 3, 2011, at 17:08, Josh Berkus wrote: On 2/3/11 1:18 PM, Chris Browne wrote: mladen.gog...@vmsinfo.com (Mladen Gogala) writes: I must say that this purist attitude is extremely surprising to me. All the major DB vendors support optimizer hints, I don't think that's actually accurate. Can you give me a list of DBMSes which support hints other than Oracle? 1 minute of Googling shows results for: db2: http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db2.doc.admin/p9li375.htm informix: http://www.ibm.com/developerworks/data/zones/informix/library/techarticle/0502fan/0502fan.html sybase: http://searchenterpriselinux.techtarget.com/answer/Query-hints-to-override-optimizer mysql: http://dev.mysql.com/doc/refman/5.0/en/index-hints.html I haven't read much of the rest of this thread, so others may have brought these up before. Michael Glaesemann grzm seespotcode net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
Scott Marlowe wrote: On Thu, Feb 3, 2011 at 8:57 AM, gnuo...@rcn.com wrote: Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And I'm pretty sure multi-core query processing is in the TODO list. Not sure anyone's working on it tho. Writing a big check might help. Work on the exciting parts people are interested in is blocked behind completely mundane tasks like coordinating how the multiple sessions are going to end up with a consistent view of the database. See Export snapshots to other sessions at http://wiki.postgresql.org/wiki/ClusterFeatures for details on that one. Parallel query works well for accelerating CPU-bound operations that are executing in RAM. The reality here is that while the feature sounds important, these situations don't actually show up that often. There are exactly zero clients I deal with regularly who would be helped out by this. The ones running web applications whose workloads do fit into memory are more concerned about supporting large numbers of users, not optimizing things for a single one. And the ones who have so much data that single users running large reports would seemingly benefit from this are usually disk-bound instead. The same sort of situation exists with SSDs. Take out the potential users whose data can fit in RAM instead, take out those who can't possibly get an SSD big enough to hold all their stuff anyway, and what's left in the middle is not very many people. In a database context I still haven't found anything better to do with a SSD than to put mid-sized indexes on them, ones a bit too large for RAM but not so big that only regular hard drives can hold them. I would rather strongly disagree with the suggestion that embracing either of these fancy but not really as functional as they appear at first approaches is critical to PostgreSQL's future. They're specialized techniques useful to only a limited number of people. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books
Re: [HACKERS] [PERFORM] Slow count(*) again...
I don't want to insult anybody but the whole thing does look strange. Maybe we can agree to remove that ridiculous we don't want hints note from Postgresql wiki? That would make it look less like , hmph, philosophical issue and more not yet implemented issue, especially if we have in mind that hints are already here, in the form of enable_method switches. Link? There's a lot of stuff on the wiki. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Mladen Gogala mladen.gog...@vmsinfo.com wrote: Maybe we can agree to remove that ridiculous we don't want hints note from Postgresql wiki? I'd be against that. This is rehashed less frequently since that went in. Less wasted time and bandwidth with it there. That would make it look less like , hmph, philosophical issue and more not yet implemented issue, Exactly what we don't want. especially if we have in mind that hints are already here, in the form of enable_method switches. Those aren't intended as hints for production use. They're there for diagnostic purposes. In our shop we've never used any of those flags in production. That said, there are ways to force an optimization barrier when needed, which I have occasionally seen people find useful. And there are sometimes provably logically equivalent ways to write a query which result in different plans with different performance. It's rare that someone presents a poorly performing query on the list and doesn't get a satisfactory resolution fairly quickly -- if they present sufficient detail and work nicely with others who are volunteering their time to help. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Exhaustive list of what takes what locks
On Tue, Feb 01, 2011 at 02:18:37PM -0500, Nikolas Everett wrote: Is there an exhaustive list of what takes what locks and how long they last? This documents which commands take each lock type, but it is not exhaustive: http://www.postgresql.org/docs/current/interactive/explicit-locking.html All locks on user-created database objects last until the transaction ends. This does not apply to advisory locks. Also, many commands internally take locks on system catalogs and release those locks as soon as possible. CREATE TABLE account (account_id SERIAL PRIMARY KEY, name CHARACTER VARYING NOT NULL); CREATE TABLE foo (account_id INTEGER NOT NULL REFERENCES account (account_id), stuff CHARACTER VARYING); DROP TABLE foo; That query shows that the DROP takes an AccessExclusiveLock on account. This isn't totally unexpected but it is unfortunate because it means we have to wait for a downtime window to maintain constraints even if they are not really in use. PostgreSQL 9.1 will contain changes to make similar operations, though not that one, take ShareRowExclusiveLock instead of AccessExclusiveLock. Offhand, the same optimization probably could be arranged for it with minimal fuss. If account is heavily queried but seldom changed, that might be enough for you. The internal implementation of a FOREIGN KEY constraint takes the form of triggers on both tables. Each INSERT or UPDATE needs to know definitively whether to fire a given trigger, so adding or removing an arbitrary trigger will continue to require at least ShareRowExclusiveLock. In the abstract, the special case of a FOREIGN KEY constraint could be looser still, but that would be tricky to implement. nm -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
With all due respect, I consider myself smarter than the optimizer. I'm 6'4, 235LBS so telling me that you disagree and that I am more stupid than a computer program, would not be a smart thing to do. Please, do not misunderestimate me. I don't see computer programs make thinly veiled threats, especially in a public forum. I'll do what you claim is not the smart thing and disagree with you. You are wrong. You are dragging the signal-to-noise ratio of this discussion down. You owe Greg an apology. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Thank you. It appears I owe an apology also, for jumping to that conclusion. It was rash and unfair of me. I am sorry. On Wed, Feb 2, 2011 at 5:03 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: Justin Pitts wrote: With all due respect, I consider myself smarter than the optimizer. I'm 6'4, 235LBS so telling me that you disagree and that I am more stupid than a computer program, would not be a smart thing to do. Please, do not misunderestimate me. I don't see computer programs make thinly veiled threats, especially in a public forum. I'll do what you claim is not the smart thing and disagree with you. You are wrong. You are dragging the signal-to-noise ratio of this discussion down. You owe Greg an apology. I apologize if that was understood as a threat. It was actually a joke. I thought that my using of the word misunderestimate has made it abundantly clear. Apparently, G.W. doesn't have as many fans as I have previously thought. Once again, it was a joke, I humbly apologize if that was misunderstood. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On 02/02/2011 07:17 PM, Greg Smith wrote: I direct anyone who thought Mladen was making a serious comment to http://www.nydailynews.com/news/politics/2009/01/08/2009-01-08_misunderestimate_tops_list_of_notable_bu-3.html if you want to get his little joke there. I plan to start using misunderestimate more in the future when talking about planner errors. Might even try to slip it into the docs at some point in the future and see if anybody catches it. My wings take dream ... cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which RAID Controllers to pick/avoid?
Am 03.02.2011 07:45, schrieb Magnus Hagander: On Thu, Feb 3, 2011 at 07:30, Dan Birkenbir...@gmail.com wrote: - Is the linux support of the LSI and Adaptec cards comparable? Can't comment on that one, sorry. We dropped LSI in favour of Adaptec for exactly this reason. We run hundreds of machines in remote locations, and always again had problems with LSI regarding support of new controller models and kernel versions (esp. on system installation), and because of occasional kernel panics triggers by the LSI driver. The Adaptec-support integrated into the Linux kernel source tree does a flawless job here. On the other hand, Adaptec is not perfect as well: when we attached a 24-drive SAS storage unit to a 5805Z, it failed to properly handle drive failures in this unit. We are using an LSI again there, which so far works well... Joachim -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Josh Berkus wrote: I don't want to insult anybody but the whole thing does look strange. Maybe we can agree to remove that ridiculous we don't want hints note from Postgresql wiki? That would make it look less like , hmph, philosophical issue and more not yet implemented issue, especially if we have in mind that hints are already here, in the form of enable_method switches. Link? There's a lot of stuff on the wiki. http://wiki.postgresql.org/wiki/Todo#Features_We_Do_Not_Want No. 2 on the list. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 3:54 PM, da...@lang.hm wrote: with the current code, this is a completely separate process that knows nothing about the load, so if you kick it off when you start the load, it makes a pass over the table (competing for I/O), finishes, you continue to update the table, so it makes another pass, etc. As you say, this is a bad thing to do. I am saying to have an option that ties the two togeather, essentially making the data feed into the Analyze run be a fork of the data comeing out of the insert run going to disk. So the Analyze run doesn't do any I/O and isn't going to complete until the insert is complete. At which time it will have seen one copy of the entire table. Yeah, but you'll be passing the entire table through this separate process that may only need to see 1% of it or less on a large table. If you want to write the code and prove it's better than what we have now, or some other approach that someone else may implement in the meantime, hey, this is an open source project, and I like improvements as much as the next guy. But my prediction for what it's worth is that the results will suck. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Kevin Grittner wrote: Mladen Gogala mladen.gog...@vmsinfo.com wrote: Maybe we can agree to remove that ridiculous we don't want hints note from Postgresql wiki? I'd be against that. This is rehashed less frequently since that went in. Less wasted time and bandwidth with it there. Well, the problem will not go away. As I've said before, all other databases have that feature and none of the reasons listed here convinced me that everybody else has a crappy optimizer. The problem may go away altogether if people stop using PostgreSQL. That would make it look less like , hmph, philosophical issue and more not yet implemented issue, Exactly what we don't want. Who is we? -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, 2011-02-03 at 18:33 -0500, Mladen Gogala wrote: Exactly what we don't want. Who is we? The majority of long term hackers. -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 6:33 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: Kevin Grittner wrote: Mladen Gogala mladen.gog...@vmsinfo.com wrote: Maybe we can agree to remove that ridiculous we don't want hints note from Postgresql wiki? I'd be against that. This is rehashed less frequently since that went in. Less wasted time and bandwidth with it there. Well, the problem will not go away. As I've said before, all other databases have that feature and none of the reasons listed here convinced me that everybody else has a crappy optimizer. The problem may go away altogether if people stop using PostgreSQL. You seem to be asserting that without hints, problem queries can't be fixed. But you haven't offered any evidence for that proposition, and it doesn't match my experience, or the experience of other people on this list who have been using PostgreSQL for a very long time. If you want to seriously advance this conversation, you should (1) learn how people who use PostgreSQL solve these problems and then (2) if you think there are cases where those methods are inadequate, present them, and let's have a discussion about it. People in this community DO change their mind about things - but they do so in response to *evidence*. You haven't presented one tangible example of where the sort of hints you seem to want would actually help anything, and yet you're accusing the people who don't agree with you of being engaged in a religious war. It seems to me that the shoe is on the other foot. Religion is when you believe something first and then look for evidence to support it. Science goes the other direction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Joshua D. Drake wrote: On Thu, 2011-02-03 at 18:33 -0500, Mladen Gogala wrote: Exactly what we don't want. Who is we? The majority of long term hackers. If that is so, I don't see world domination in the future, exactly the opposite. Database whose creators don't trust their users cannot count on the very bright future. All other databases do have that feature. I must say, this debate gave me a good deal of stuff to think about. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On 2/3/11 1:34 PM, Shaun Thomas wrote: I must say that this purist attitude is extremely surprising to me. All the major DB vendors support optimizer hints, yet in the Postgres community, they are considered bad with almost religious fervor. Postgres community is quite unique with the fatwa against hints. You missed the argument. The community, or at least the devs, see hints as an ugly hack. Let's kill the myth right now that Postgres doesn't have hints. It DOES have hints. Just read this forum for a few days and see how many time there are suggestions like disable nested loops or disable seqscan, or change the random page cost, or change the join collapse limit. All of these options are nothing more than a way of altering the planner's choices so that it will pick the plan that the designer already suspects is more optimal. If that's not a hint, I don't know what is. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
All other databases do have that feature. I must say, this debate gave me a good deal of stuff to think about. And, I think we're done here. The idea that the lack of hints will kill PostgreSQL is already demonstrably false. This is sounding more and more like a petulant tantrum. Folks, I apologize for ever taking part in this conversation and contributing to the loss of signal to noise. Please forgive me. -- Shaun Thomas Peak6 | 141 W. Jackson Blvd. | Suite 800 | Chicago, IL 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, 3 Feb 2011, Robert Haas wrote: On Thu, Feb 3, 2011 at 3:54 PM, da...@lang.hm wrote: with the current code, this is a completely separate process that knows nothing about the load, so if you kick it off when you start the load, it makes a pass over the table (competing for I/O), finishes, you continue to update the table, so it makes another pass, etc. As you say, this is a bad thing to do. I am saying to have an option that ties the two togeather, essentially making the data feed into the Analyze run be a fork of the data comeing out of the insert run going to disk. So the Analyze run doesn't do any I/O and isn't going to complete until the insert is complete. At which time it will have seen one copy of the entire table. Yeah, but you'll be passing the entire table through this separate process that may only need to see 1% of it or less on a large table. If you want to write the code and prove it's better than what we have now, or some other approach that someone else may implement in the meantime, hey, this is an open source project, and I like improvements as much as the next guy. But my prediction for what it's worth is that the results will suck. :-) I will point out that 1% of a very large table can still be a lot of disk I/O that is avoided (especially if it's random I/O that's avoided) David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Robert Haas wrote: On Thu, Feb 3, 2011 at 6:33 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: Kevin Grittner wrote: Mladen Gogala mladen.gog...@vmsinfo.com wrote: Maybe we can agree to remove that ridiculous we don't want hints note from Postgresql wiki? I'd be against that. This is rehashed less frequently since that went in. Less wasted time and bandwidth with it there. Well, the problem will not go away. As I've said before, all other databases have that feature and none of the reasons listed here convinced me that everybody else has a crappy optimizer. The problem may go away altogether if people stop using PostgreSQL. You seem to be asserting that without hints, problem queries can't be fixed. But you haven't offered any evidence for that proposition, and it doesn't match my experience, or the experience of other people on this list who have been using PostgreSQL for a very long time. If you want to seriously advance this conversation, you should (1) learn how people who use PostgreSQL solve these problems and then (2) if you think there are cases where those methods are inadequate, present them, and let's have a discussion about it. People in this community DO change their mind about things - but they do so in response to *evidence*. You haven't presented one tangible example of where the sort of hints you seem to want would actually help anything, and yet you're accusing the people who don't agree with you of being engaged in a religious war. It seems to me that the shoe is on the other foot. Religion is when you believe something first and then look for evidence to support it. Science goes the other direction. Actually, it is not unlike a religious dogma, only stating that hints are bad. It even says so in the wiki. The arguments are 1) Refusal to implement hints is motivated by distrust toward users, citing that some people may mess things up. Yes, they can, with and without hints. 2) All other databases have them. This is a major feature and if I were in the MySQL camp, I would use it as an argument. Asking me for some proof is missing the point. All other databases have hints precisely because they are useful. Assertion that only Postgres is so smart that can operate without hints doesn't match the reality. As a matter of fact, Oracle RDBMS on the same machine will regularly beat PgSQL in performance. That has been my experience so far. I even posted counting query results. 3) Hints are make it or break it feature. They're absolutely needed in the fire extinguishing situations. I see no arguments to say otherwise and until that ridiculous we don't want hints dogma is on wiki, this is precisely what it is: a dogma. Dogmas do not change and I am sorry that you don't see it that way. However, this discussion did convince me that I need to take another look at MySQL and tone down my engagement with PostgreSQL community. This is my last post on the subject because posts are becoming increasingly personal. This level of irritation is also characteristic of a religious community chastising a sinner. Let me remind you again: all other major databases have that possibility: Oracle, MySQL, DB2, SQL Server and Informix. Requiring burden of proof about hints is equivalent to saying that all these databases are developed by idiots and have a crappy optimizer. I am not going to back down, but I may stop using Postgres altogether. If that was your goal, you almost achieved it. Oh yes, and good luck with the world domination. If there is not enough common sense even to take down that stupid dogma on the wiki, there isn't much hope left. With this post, my participation in this group is finished, for the foreseeable future. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On 2011-02-03 23:29, Robert Haas wrote: Yeah, but you'll be passing the entire table through this separate process that may only need to see 1% of it or less on a large table. It doesn't sound too impossible to pass only a percentage, starting high and dropping towards 1% once the loaded size has become large. -- Jeremy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On 2011-02-03 21:51, Mark Kirkwood wrote: The cases I've seen in production typically involve outgrowing optimizer parameter settings: (e.g work_mem, effective_cache_size) as the application dataset gets bigger over time. An argument in favour of the DBMS maintaining a running estimate of such things. -- Jeremy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On PostgreSQL, the difference in no hints and hints for that one query with skewed data is that the query finishes a little faster. On some others, which shall remain nameless, it is the difference between finishing in seconds or days, or maybe never. Hints can be useful, but I can also see why they are not a top priority. They are rarely needed, and only when working around a bug. If you want them so badly, you have the source, write a contrib module (can you do that on Oracle or MSSQL?) If I have a choice between the developers spending time on implementing hints, and spending time on improving the optimiser, I'll take the optimiser. Tom Kyte agrees: http://asktom.oracle.com/pls/asktom/f?p=100:11:0P11_QUESTION_ID:8912905298920 http://tkyte.blogspot.com/2006/08/words-of-wisdom.html Oracle can be faster on count queries, but that is the only case I have seen. Generally on most other queries, especially when it involves complex joins, or indexes on text fields, PostgreSQL is faster on the same hardware. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On 04/02/11 13:49, Jeremy Harris wrote: On 2011-02-03 21:51, Mark Kirkwood wrote: The cases I've seen in production typically involve outgrowing optimizer parameter settings: (e.g work_mem, effective_cache_size) as the application dataset gets bigger over time. An argument in favour of the DBMS maintaining a running estimate of such things. That is an interesting idea - I'm not quite sure how it could apply to server config settings (e.g work_mem) for which it would be dangerous to allow the server to increase on the fly, but it sure would be handy to have some sort of query execution memory so that alerts like: STATEMENT: SELECT blah : PARAMETERS blah: using temp file(s), last execution used memory could be generated (this could be quite complex I guess, requiring some sort of long lived statement plan cache). Cheers Mark
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 7:39 PM, da...@lang.hm wrote: Yeah, but you'll be passing the entire table through this separate process that may only need to see 1% of it or less on a large table. If you want to write the code and prove it's better than what we have now, or some other approach that someone else may implement in the meantime, hey, this is an open source project, and I like improvements as much as the next guy. But my prediction for what it's worth is that the results will suck. :-) I will point out that 1% of a very large table can still be a lot of disk I/O that is avoided (especially if it's random I/O that's avoided) Sure, but I think that trying to avoid it will be costly in other ways - you'll be streaming a huge volume of data through some auxiliary process, which will have to apply some algorithm that's very different from the one we use today. The reality is that I think there's little evidence that the way we do ANALYZE now is too expensive. It's typically very cheap and works very well. It's a bit annoying when it fires off in the middle of a giant data load, so we might need to change the time of it a little, but if there's a problem with the operation itself being too costly, this is the first I'm hearing of it. We've actually worked *really* hard to make it cheap. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 7:39 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: reality. As a matter of fact, Oracle RDBMS on the same machine will regularly beat PgSQL in performance. That has been my experience so far. I even posted counting query results. It sure is, but those count queries didn't run faster because of query planner hints. They ran faster because of things like index-only scans, fast full index scans, asynchronous I/O, and parallel query. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, 3 Feb 2011, Robert Haas wrote: On Thu, Feb 3, 2011 at 7:39 PM, da...@lang.hm wrote: Yeah, but you'll be passing the entire table through this separate process that may only need to see 1% of it or less on a large table. If you want to write the code and prove it's better than what we have now, or some other approach that someone else may implement in the meantime, hey, this is an open source project, and I like improvements as much as the next guy. But my prediction for what it's worth is that the results will suck. :-) I will point out that 1% of a very large table can still be a lot of disk I/O that is avoided (especially if it's random I/O that's avoided) Sure, but I think that trying to avoid it will be costly in other ways - you'll be streaming a huge volume of data through some auxiliary process, which will have to apply some algorithm that's very different from the one we use today. The reality is that I think there's little evidence that the way we do ANALYZE now is too expensive. It's typically very cheap and works very well. It's a bit annoying when it fires off in the middle of a giant data load, so we might need to change the time of it a little, but if there's a problem with the operation itself being too costly, this is the first I'm hearing of it. We've actually worked *really* hard to make it cheap. I could be misunderstanding things here, but my understanding is that it's 'cheap' in that it has little impact on the database while it is running. the issue here is that the workflow is load data analyze start work so the cost of analyze in this workflow is not 1% impact on query speed for the next X time, it's the database can't be used for the next X time while we wait for analyze to finish running I don't understand why the algorithm would have to be so different than what's done today, surely the analyze thread could easily be tweaked to ignore the rest of the data (assuming we don't have the thread sending the data to analyze do the filtering) David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...
http://wiki.postgresql.org/wiki/Todo#Features_We_Do_Not_Want No. 2 on the list. Heck, *I* wrote that text. I quote: Optimizer hints are used to work around problems in the optimizer and introduce upgrade and maintenance issues. We would rather have the problems reported and fixed. We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. That seems pretty straightforwards. There are even links to prior discussions about what kind of system would work. I don't think this text needs any adjustment; that's our clear consensus on the hint issue: we want a tool which works better than what we've seen in other databases. Quite frankly, the main reason why most DBMSes have a hinting system has nothing to do with the quality of optimizer and everything to do with DBAs who think they're smarter than the optimizer (incorrectly). Oracle has a darned good query optimizer, and SQL server's is even better. However, there are a lot of undereducated or fossilized DBAs out there who don't trust the query planner and want to override it in fairly arbitrary ways; I refer you to the collected works of Dan Tow, for example. In many cases Hints are used by DBAs in emergency situations because they are easier than figuring out what the underlying issue is, even when that could be done relatively simply. Improving diagnostic query tools would be a much better approach here; for example, the team working on hypothetical indexes has a lot to offer. If you can figure out what's really wrong with the query in 10 minutes, you don't need a hint. Yes, I occasionally run across cases where having a query tweaking system would help me fix a pathological failure in the planner. However, even on data warehouses that's less than 0.1% of the queries I deal with, so this isn't exactly a common event. And any hinting system we develop needs to address those specific cases, NOT a hypothetical case which can't be tested. Otherwise we'll implement hints which actually don't improve queries. Commercial DBMSes have to give in to what their big paying customers want, no matter how stupid it is. I'm grateful that I can work on a DBMS -- the third most popular SQL DBMS in the world -- which can focus on quality instead. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 8:37 PM, da...@lang.hm wrote: On Thu, 3 Feb 2011, Robert Haas wrote: On Thu, Feb 3, 2011 at 7:39 PM, da...@lang.hm wrote: Yeah, but you'll be passing the entire table through this separate process that may only need to see 1% of it or less on a large table. If you want to write the code and prove it's better than what we have now, or some other approach that someone else may implement in the meantime, hey, this is an open source project, and I like improvements as much as the next guy. But my prediction for what it's worth is that the results will suck. :-) I will point out that 1% of a very large table can still be a lot of disk I/O that is avoided (especially if it's random I/O that's avoided) Sure, but I think that trying to avoid it will be costly in other ways - you'll be streaming a huge volume of data through some auxiliary process, which will have to apply some algorithm that's very different from the one we use today. The reality is that I think there's little evidence that the way we do ANALYZE now is too expensive. It's typically very cheap and works very well. It's a bit annoying when it fires off in the middle of a giant data load, so we might need to change the time of it a little, but if there's a problem with the operation itself being too costly, this is the first I'm hearing of it. We've actually worked *really* hard to make it cheap. I could be misunderstanding things here, but my understanding is that it's 'cheap' in that it has little impact on the database while it is running. I mean that it's cheap in that it usually takes very little time to complete. the issue here is that the workflow is load data analyze start work so the cost of analyze in this workflow is not 1% impact on query speed for the next X time, it's the database can't be used for the next X time while we wait for analyze to finish running OK. I don't understand why the algorithm would have to be so different than what's done today, surely the analyze thread could easily be tweaked to ignore the rest of the data (assuming we don't have the thread sending the data to analyze do the filtering) If you want to randomly pick 10,000 rows out of all the rows that are going to be inserted in the table without knowing in advance how many there will be, how do you do that? Maybe there's an algorithm, but it's not obvious to me. But mostly, I question how expensive it is to have a second process looking at the entire table contents vs. going back and rereading a sample of rows at the end. I can't remember anyone ever complaining ANALYZE took too long to run. I only remember complaints of the form I had to remember to manually run it and I wish it had just happened by itself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
I can't remember anyone ever complaining ANALYZE took too long to run. I only remember complaints of the form I had to remember to manually run it and I wish it had just happened by itself. Robert, This sounds like an argument in favor of an implicit ANALYZE after all COPY statements, and/or an implicit autoanalyze check after all INSERT/UPDATE statements. -Conor -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, 2011-02-03 at 18:12 -0800, Conor Walsh wrote: I can't remember anyone ever complaining ANALYZE took too long to run. I only remember complaints of the form I had to remember to manually run it and I wish it had just happened by itself. Robert, This sounds like an argument in favor of an implicit ANALYZE after all COPY statements, and/or an implicit autoanalyze check after all INSERT/UPDATE statements. Well that already happens. Assuming you insert/update or copy in a greater amount than the threshold for the autovacuum_analyze_scale_factor Then autovacuum is going to analyze on the next run. The default is .1 so it certainly doesn't take much. JD -Conor -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 6:33 PM, Joshua D. Drake j...@commandprompt.com wrote: Well that already happens... My understanding is that auto-analyze will fire only after my transaction is completed, because it is a seperate daemon. If I do like so: BEGIN; COPY ...; -- Dangerously un-analyzed SELECT complicated-stuff ...; END; Auto-analyze does not benefit me, or might not because it won't fire often enough. I agree that analyze is very fast, and it often seems to me like the cost/benefit ratio suggests making auto-analyze even more aggressive. Disclaimer/disclosure: I deal exclusively with very large data sets these days, so analyzing all the time is almost a highly effective worst-case amortization. I understand that constant analyze is not so great in, say, an OLTP setting. But if the check is cheap, making auto-analyze more integrated and less daemon-driven might be a net win. I'm not sure. -Conor -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 5:39 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: Actually, it is not unlike a religious dogma, only stating that hints are bad. It even says so in the wiki. The arguments are There's been considerably more output than hints bad! Hulk Smash! 1) Refusal to implement hints is motivated by distrust toward users, citing that some people may mess things up. It's more about creating a knob that will create more problems than it solves. Which I get. And making sure that if you make such a knob that it'll do the least damage and give the most usefulness. Until a good proposal and some code to do it shows up, we're all just waving our hands around describing different parts of the elephant. 2) All other databases have them. This is a major feature and if I were in the MySQL camp, I would use it as an argument. Asking me for some proof is missing the point. All other databases have hints precisely because they are useful. Uh, two points. 1: Argumentum Ad Populum. Just because it's popular doesn't mean it's right. 2: Other databases have them because their optimizers can't make the right decision even most of the time. Yes they're useful, but like a plastic bad covering a broken car window, they're useful because they cover something that's inherently broken. Assertion that only Postgres is so smart that can operate without hints doesn't match the reality. Again, you're twisting what people have said. the point being that while postgresql makes mistakes, we'd rather concentrate on making the planner smarter than giving it a lobotomy and running it remotely like a robot. As a matter of fact, Oracle RDBMS on the same machine will regularly beat PgSQL in performance. Yes. And this has little to do with hints. It has to do with years of development lead with THOUSANDS of engineers who can work on the most esoteric corner cases in their spare time. Find the pg project a couple hundred software engineers and maybe we'll catch Oracle a little quicker. Otherwise we'll have to marshall our resources to do the best we can on the project ,and that means avoiding maintenance black holes and having the devs work on the things that give the most benefit for the cost. Hints are something only a tiny percentage of users could actually use and use well. Write a check, hire some developers and get the code done and present it to the community. If it's good and works it'll likely get accepted. Or use EDB, since it has oracle compatibility in it. That has been my experience so far. I even posted counting query results. 3) Hints are make it or break it feature. They're absolutely needed in the fire extinguishing situations. I've been using pg since 6.5.2. I've used Oracle since version 8 or so. I have never been in a situation with postgresql where I couldn't fix the problem with either tuning, query editing, or asking Tom for a patch for a problem I found in it. Turnaround time on the last patch that was made to fix my problem was somewhere in the 24 hour range. If Oracle can patch their planner that fast, let me know. I see no arguments to say otherwise and until that ridiculous we don't want hints dogma is on wiki, this is precisely what it is: a dogma. Dogmas do not change and I am sorry that you don't see it that way. However, this discussion No, it's not dogma, you need to present a strong coherent argument, not threaten people on the list etc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, 2011-02-03 at 16:50 -0500, Mladen Gogala wrote: Chris Browne wrote: Well, the community declines to add hints until there is actual consensus on a good way to add hints. OK. That's another matter entirely. Who should make that decision? Is there a committee or a person who would be capable of making that decision? Admittedly I haven't read this whole discussion, but it seems like hints might be too poorly defined right now. If by hints we mean some mechanism to influence the planner in a more fine-grained way, I could imagine that some proposal along those lines might gain significant support. But, as always, it depends on the content and quality of the proposal more than the title. If someone has thoughtful proposal that tries to balance things like: * DBA control versus query changes/comments * compatibility across versions versus finer plan control * allowing the existing optimizer to optimize portions of the query while controlling other portions * indicating costs and cardinalities versus plans directly I am confident that such a proposal will gain traction among the community as a whole. However, a series proposals for individual hacks for specific purposes will probably be rejected. I am in no way implying that you are approaching it this way -- I am just trying to characterize an approach that won't make progress. Regards, Jeff Davis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 7:05 PM, Robert Haas robertmh...@gmail.com wrote: If you want to randomly pick 10,000 rows out of all the rows that are going to be inserted in the table without knowing in advance how many there will be, how do you do that? Maybe you could instead just have it use some % of the rows going by? Just a guess. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On 02/03/2011 04:56 PM, Greg Smith wrote: Scott Marlowe wrote: On Thu, Feb 3, 2011 at 8:57 AM,gnuo...@rcn.com wrote: Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And I'm pretty sure multi-core query processing is in the TODO list. Not sure anyone's working on it tho. Writing a big check might help. Work on the exciting parts people are interested in is blocked behind completely mundane tasks like coordinating how the multiple sessions are going to end up with a consistent view of the database. See Export snapshots to other sessions at http://wiki.postgresql.org/wiki/ClusterFeatures for details on that one. Parallel query works well for accelerating CPU-bound operations that are executing in RAM. The reality here is that while the feature sounds important, these situations don't actually show up that often. There are exactly zero clients I deal with regularly who would be helped out by this. The ones running web applications whose workloads do fit into memory are more concerned about supporting large numbers of users, not optimizing things for a single one. And the ones who have so much data that single users running large reports would seemingly benefit from this are usually disk-bound instead. The same sort of situation exists with SSDs. Take out the potential users whose data can fit in RAM instead, take out those who can't possibly get an SSD big enough to hold all their stuff anyway, and what's left in the middle is not very many people. In a database context I still haven't found anything better to do with a SSD than to put mid-sized indexes on them, ones a bit too large for RAM but not so big that only regular hard drives can hold them. I would rather strongly disagree with the suggestion that embracing either of these fancy but not really as functional as they appear at first approaches is critical to PostgreSQL's future. They're specialized techniques useful to only a limited number of people. -- Greg Smith 2ndQuadrant usg...@2ndquadrant.comBaltimore, MD PostgreSQL Training, Services, and 24x7 Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance:http://www.2ndQuadrant.com/books 4 cores is cheap and popular now, 6 in a bit, 8 next year, 16/24 cores in 5 years. You can do 16 cores now, but its a bit expensive. I figure hundreds of cores will be expensive in 5 years, but possible, and available. Cpu's wont get faster, but HD's and SSD's will. To have one database connection, which runs one query, run fast, it's going to need multi-core support. That's not to say we need parallel query's. Or we need multiple backends to work on one query. We need one backend, working on one query, using mostly the same architecture, to just use more than one core. You'll notice I used _mostly_ and _just_, and have no knowledge of PG internals, so I fully expect to be wrong. My point is, there must be levels of threading, yes? If a backend has data to sort, has it collected, nothing locked, what would it hurt to use multi-core sorting? -- OR -- Threading (and multicore), to me, always mean queues. What if new type's of backend's were created that did simple things, that normal backends could distribute work to, then go off and do other things, and come back to collect the results. I thought I read a paper someplace that said shared cache (L1/L2/etc) multicore cpu's would start getting really slow at 16/32 cores, and that message passing was the way forward past that. If PG started aiming for 128 core support right now, it should use some kinda message passing with queues thing, yes? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Scott Marlowe wrote: Yes they're useful, but like a plastic bad covering a broken car window, they're useful because they cover something that's inherently broken. Awesome. Now we have a car anology, with a funny typo no less. Plastic bad, I love it. This is real progress toward getting all the common list argument idioms aired out. All we need now is a homage to Mike Godwin and we can close this down. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Does auto-analyze work on dirty writes? (was: Re: [HACKERS] [PERFORM] Slow count(*) again...)
On 02/03/2011 09:45 PM, Conor Walsh wrote: My understanding is that auto-analyze will fire only after my transaction is completed, because it is a seperate daemon. If I do like so: BEGIN; COPY ...; -- Dangerously un-analyzed SELECT complicated-stuff ...; END; Auto-analyze does not benefit me, or might not because it won't fire often enough. I agree that analyze is very fast, and it often seems to me like the cost/benefit ratio suggests making auto-analyze even more aggressive. The count discussion is boring. Nothing new there. But auto-analyze on dirty writes does interest me. :-) My understanding is: 1) Background daemon wakes up and checks whether a number of changes have happened to the database, irrelevant of transaction boundaries. 2) Background daemon analyzes a percentage of rows in the database for statistical data, irrelevant of row visibility. 3) Analyze is important for both visible rows and invisible rows, as plan execution is impacted by invisible rows. As long as they are part of the table, they may impact the queries performed against the table. 4) It doesn't matter if the invisible rows are invisible because they are not yet committed, or because they are not yet vacuumed. Would somebody in the know please confirm the above understanding for my own piece of mind? Thanks, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 8:40 PM, Greg Smith g...@2ndquadrant.com wrote: Scott Marlowe wrote: Yes they're useful, but like a plastic bad covering a broken car window, they're useful because they cover something that's inherently broken. Awesome. Now we have a car anology, with a funny typo no less. Plastic bad, I love it. This is real progress toward getting all the common list argument idioms aired out. All we need now is a homage to Mike Godwin and we can close this down. It's not so much a car analogy as a plastic bad analogy. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Scott Marlowe wrote: It's not so much a car analogy as a plastic bad analogy. Is that like a Plastic Ono Band? Because I think one of those is the only thing holding the part of my bumper I smashed in the snow on right now. I could be wrong about the name. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
On Thu, Feb 3, 2011 at 8:56 PM, Greg Smith g...@2ndquadrant.com wrote: Scott Marlowe wrote: It's not so much a car analogy as a plastic bad analogy. Is that like a Plastic Ono Band? Because I think one of those is the only thing holding the part of my bumper I smashed in the snow on right now. I could be wrong about the name. No, that's a plastic oh no! band you have. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
Andy Colson wrote: Cpu's wont get faster, but HD's and SSD's will. To have one database connection, which runs one query, run fast, it's going to need multi-core support. My point was that situations where people need to run one query on one database connection that aren't in fact limited by disk I/O are far less common than people think. My troublesome database servers aren't ones with a single CPU at its max but wishing there were more workers, they're the ones that have 25% waiting for I/O. And even that crowd is still a subset, distinct from people who don't care about the speed of any one core, they need lots of connections to go at once. That's not to say we need parallel query's. Or we need multiple backends to work on one query. We need one backend, working on one query, using mostly the same architecture, to just use more than one core. That's exactly what we mean when we say parallel query in the context of a single server. My point is, there must be levels of threading, yes? If a backend has data to sort, has it collected, nothing locked, what would it hurt to use multi-core sorting? Optimizer nodes don't run that way. The executor pulls rows out of the top of the node tree, which then pulls from its children, etc. If you just blindly ran off and executed every individual node to completion in parallel, that's not always going to be faster--could be a lot slower, if the original query never even needed to execute portions of the tree. When you start dealing with all of the types of nodes that are out there it gets very messy in a hurry. Decomposing the nodes of the query tree into steps that can be executed in parallel usefully is the hard problem hiding behind the simple idea of use all the cores! I thought I read a paper someplace that said shared cache (L1/L2/etc) multicore cpu's would start getting really slow at 16/32 cores, and that message passing was the way forward past that. If PG started aiming for 128 core support right now, it should use some kinda message passing with queues thing, yes? There already is a TupleStore type that is going to serve as the message being sent between the client backends. Unfortunately we won't get anywhere near 128 cores without addressing the known scalability issues that are in the code right now, ones you can easily run into even with 8 cores. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On Thu, Feb 3, 2011 at 9:00 PM, Greg Smith g...@2ndquadrant.com wrote: Andy Colson wrote: Cpu's wont get faster, but HD's and SSD's will. To have one database connection, which runs one query, run fast, it's going to need multi-core support. My point was that situations where people need to run one query on one database connection that aren't in fact limited by disk I/O are far less common than people think. My troublesome database servers aren't ones with a single CPU at its max but wishing there were more workers, they're the ones that have 25% waiting for I/O. And even that crowd is still a subset, distinct from people who don't care about the speed of any one core, they need lots of connections to go at once. The most common case where I can use 1 core is loading data. and pg_restore supports parallel restore threads, so that takes care of that pretty well. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
Scott Marlowe wrote: No, that's a plastic oh no! band you have. Wow, right you are. So with this type holding together my Japanese car, if it breaks and parts fall off, I'm supposed to yell Oh, no! There goes Tokyo!, yes? -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On 02/03/2011 10:00 PM, Greg Smith wrote: Andy Colson wrote: Cpu's wont get faster, but HD's and SSD's will. To have one database connection, which runs one query, run fast, it's going to need multi-core support. My point was that situations where people need to run one query on one database connection that aren't in fact limited by disk I/O are far less common than people think. My troublesome database servers aren't ones with a single CPU at its max but wishing there were more workers, they're the ones that have 25% waiting for I/O. And even that crowd is still a subset, distinct from people who don't care about the speed of any one core, they need lots of connections to go at once. Yes, I agree... for today. If you gaze into 5 years... double the core count (but not the speed), double the IO rate. What do you see? My point is, there must be levels of threading, yes? If a backend has data to sort, has it collected, nothing locked, what would it hurt to use multi-core sorting? Optimizer nodes don't run that way. The executor pulls rows out of the top of the node tree, which then pulls from its children, etc. If you just blindly ran off and executed every individual node to completion in parallel, that's not always going to be faster--could be a lot slower, if the original query never even needed to execute portions of the tree. When you start dealing with all of the types of nodes that are out there it gets very messy in a hurry. Decomposing the nodes of the query tree into steps that can be executed in parallel usefully is the hard problem hiding behind the simple idea of use all the cores! What if... the nodes were run in separate threads, and interconnected via queues? A node would not have to run to completion either. A queue could be setup to have a max items. When a node adds 5 out of 5 items it would go to sleep. Its parent node, removing one of the items could wake it up. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
On Thu, Feb 3, 2011 at 9:19 PM, Andy Colson a...@squeakycode.net wrote: On 02/03/2011 10:00 PM, Greg Smith wrote: Andy Colson wrote: Cpu's wont get faster, but HD's and SSD's will. To have one database connection, which runs one query, run fast, it's going to need multi-core support. My point was that situations where people need to run one query on one database connection that aren't in fact limited by disk I/O are far less common than people think. My troublesome database servers aren't ones with a single CPU at its max but wishing there were more workers, they're the ones that have 25% waiting for I/O. And even that crowd is still a subset, distinct from people who don't care about the speed of any one core, they need lots of connections to go at once. Yes, I agree... for today. If you gaze into 5 years... double the core count (but not the speed), double the IO rate. What do you see? I run a cluster of pg servers under slony replication, and we have 112 cores between three servers, soon to go to 144 cores. We have no need for individual queries to span the cores, honestly. Our real limit is the ability get all those cores working at the same time on individual queries efficiently without thundering herd issues. Yeah, it's only one datapoint, but for us, with a lot of cores, we need each one to run one query as fast as it can. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance