Re: [PERFORM] Odd behavior with temp usage logging
Josh Berkus wrote: > Folks, > > I turned on temp file logging for PostgreSQL to see if I needed to > adjust work_mem. Oddly, it's logging file usage up to 33 times per > executed query (and no, the query isn't large enough to need 33 separate > sorts). Are you sure there's not a sort happening inside a loop? It might help if you posted the actual log messages along with the output of `EXPLAIN ANALYZE' on your query. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] Buying hardware
>> 4) Use software raid unless you have the money to buy a raid >> controller, in which case here is the ranking of them >> > > Areca and 3ware/Escalade are the two best controllers for the money > out right now. They tend to take turns being the absolute best as > they release new cards. Newer Arecas (the 1680 series) use an > ethernet port for traps and such, so no need for special software that > might be kernel version dependent. > > Both cost about the same for their top of the line cards. > > Make sure you have battery backed cache. While browsing the net I found a server with a raid controller HP Smart Array P400/512MB BBWC Controller How does one know what this is, if it is any good or so? I guess they just stuck their "HP" label onto some other raid controller? I could write HP but I guess that wouldn't help much. And I could also look through the archives for the mailinglist. When I find the time,I'll do so and try to create a wiki page. The problem with this kind of built-in hardware is that it might suck, and then you can't plug in any other hardware in the box. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] Buying hardware
da...@lang.hm wrote: > On Mon, 26 Jan 2009, David Rees wrote: > >> And yes, the more memory you can squeeze into the machine, the better, >> though you'll find that after a certain point, price starts going up >> steeply. Of course, if you only have a 15GB database, once you reach >> 16GB of memory you've pretty much hit the point of diminishing >> returns. > > actually, you need more memory than that. besides the data itself you > would want memory for several other things, among them: > > 1. your OS > 2. your indexes > 3. you per-request memory allocations (for sorting, etc) > this is highly dependant on your workload (type and number of parallel > requests) > 4. 'dead' tuples in your table (that will be cleared by a vaccum, but > haven't been yet) > > and probably other things as well. > > I don't know how large a database will fit in 16G of ram, but I suspect > it's closer to 8G than 15G. > > any experts want to throw out a rule-of-thumb here? > > David Lang > It depends on what else the server is doing. If you're running the whole LAPP stack on a single box, for example, the PHP interpreter will need space for intermediate data. Apache and the Linux kernel will use less space. If PostgreSQL is the only thing running on the server, though, assuming 64-bit Linux, most of the RAM in a box that large should be in either memory you've deliberately set aside for internal PostgreSQL data structures or the Linux page cache. There are starting to be some tools built that will show you how RAM is allocated, now that recent kernels (2.6.25+) do a better job of accounting for RAM pages. So I would expect the total memory dedicated to the database functionality to be much closer to 15 GB than 8 GB. Given large amounts of RAM and only PostgreSQL running in the server, the interesting trade-offs become a. How little memory can you buy without putting your service level agreements at risk? b. How do you allocate the PostgreSQL-specific memory buffers at the expense of the Linux page cache for optimum performance? -- M. Edward (Ed) Borasky I've never met a happy clam. In fact, most of them were pretty steamed. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] Buying hardware
M. Edward (Ed) Borasky wrote: > Given large amounts of RAM and only PostgreSQL running in the server, > the interesting trade-offs become > > a. How little memory can you buy without putting your service level > agreements at risk? > > b. How do you allocate the PostgreSQL-specific memory buffers at the > expense of the Linux page cache for optimum performance? c. What do I do with the idle cores? :) (or, how can I exploit them by changing my database design or the PostgreSQL architecture?) -- M. Edward (Ed) Borasky I've never met a happy clam. In fact, most of them were pretty steamed. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] Buying hardware
On Tue, 27 Jan 2009, M. Edward (Ed) Borasky wrote: da...@lang.hm wrote: On Mon, 26 Jan 2009, David Rees wrote: And yes, the more memory you can squeeze into the machine, the better, though you'll find that after a certain point, price starts going up steeply. Of course, if you only have a 15GB database, once you reach 16GB of memory you've pretty much hit the point of diminishing returns. actually, you need more memory than that. besides the data itself you would want memory for several other things, among them: 1. your OS 2. your indexes 3. you per-request memory allocations (for sorting, etc) this is highly dependant on your workload (type and number of parallel requests) 4. 'dead' tuples in your table (that will be cleared by a vaccum, but haven't been yet) and probably other things as well. I don't know how large a database will fit in 16G of ram, but I suspect it's closer to 8G than 15G. any experts want to throw out a rule-of-thumb here? There are starting to be some tools built that will show you how RAM is allocated, now that recent kernels (2.6.25+) do a better job of accounting for RAM pages. So I would expect the total memory dedicated to the database functionality to be much closer to 15 GB than 8 GB. that's not quite the opposite of the statement that I was trying to make. assuming that you are not running anything else on the system, how much data can you put on the system and run entirely out of ram. the database has it's overhead (sort buffers, indexes, per-request buffers, 'dead tuples', etc) that mean that if you have a database that an uncompressed dump takes 8G, you need substantially more than 8G of ram to avoid using the disks (other than to store changes) how much more is the question. I know it is going to vary from installation to installation, but is there any guidelines that people can start with? David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] Buying hardware
da...@lang.hm wrote: > that's not quite the opposite of the statement that I was trying to make. > > assuming that you are not running anything else on the system, how much > data can you put on the system and run entirely out of ram. > > the database has it's overhead (sort buffers, indexes, per-request > buffers, 'dead tuples', etc) that mean that if you have a database that > an uncompressed dump takes 8G, you need substantially more than 8G of > ram to avoid using the disks (other than to store changes) > > how much more is the question. I know it is going to vary from > installation to installation, but is there any guidelines that people > can start with? I'm not sure there are any rules of thumb / guidelines for that. My experience has been that doing no disk I/O except writing logs to disk, creating and updating rows is an unrealistic expectation, even for "small" databases. The cost is prohibitive, for one thing. And for capacity planning, what's probably more important is whether the service level agreements are being met, not whether you're meeting them purely in RAM or by re-reading data from disk sometimes. I think it's "easy", however, to solve the inverse problem. Borrow a huge-memory server from your vendor, put your small database up on it, run benchmarks and gradually reduce the amount of memory available until the performance becomes unacceptable. The tools exist to measure memory allocations while the benchmarks are running. If you get enough data points (about five for the simplest models) you can build a model that you could then "invert" to go the other way. -- take a database size and figure out how much more RAM was needed to meet the SLAs. You don't necessarily have to reboot to reduce available memory -- there are ways you can tie up memory without consuming processor or disk time to do so. But you would need to "poison" the caches between runs, and restart PostgreSQL if you're modifying its memory allocations. -- M. Edward (Ed) Borasky I've never met a happy clam. In fact, most of them were pretty steamed. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Odd behavior with temp usage logging
Craig Ringer wrote: Josh Berkus wrote: Folks, I turned on temp file logging for PostgreSQL to see if I needed to adjust work_mem. Oddly, it's logging file usage up to 33 times per executed query (and no, the query isn't large enough to need 33 separate sorts). Are you sure there's not a sort happening inside a loop? It might help if you posted the actual log messages along with the output of `EXPLAIN ANALYZE' on your query. Hmmm, it's possible. The sort sizes seem very large for the query in question, though. Will have to check when I get back on the system. That would make a TODO in tempfile logging though ... it would be *far* more useful to colect summary stats per query than log each individual sort. --Josh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] LIKE Query performance
Hi, I am relatively new to PostgreSQL(8.1) and facing the following problem. We have indexes defined on timestamp and description (create index description_idx on event using btree (description varchar_pattern_ops)) EXPLAIN ANALYZE SELECT event_id, category, current_session_number, description, event_type_id, realm_name, root_session_number, severity, source_name, target_key, target_name, timestamp, jdo_version FROM event WHERE description like '%mismatch%' ORDER BY timestamp desc; QUERY PLAN -- Sort (cost=36267.09..36272.73 rows=2256 width=314) (actual time=19255.075..20345.774 rows=647537 loops=1) Sort Key: "timestamp" Sort Method: external merge Disk: 194080kB -> Seq Scan on event (cost=0.00..36141.44 rows=2256 width=314) (actual time=0.080..1475.041 rows=647537 loops=1) Filter: ((description)::text ~~ '%mismatch%'::text) Total runtime: 22547.292 ms (6 rows) But startsWith query use indexes. EXPLAIN ANALYZE SELECT event_id, category, current_session_number, description, event_type_id, realm_name, root_session_number, severity, source_name, target_key, target_name, timestamp, jdo_version FROM event WHERE description like 'mismatch%' ORDER BY timestamp desc; QUERY PLAN --- Sort (cost=9.26..9.27 rows=1 width=314) (actual time=0.766..0.766 rows=0 loops=1) Sort Key: "timestamp" Sort Method: quicksort Memory: 17kB -> Index Scan using description_idx on event (cost=0.00..9.25 rows=1 width=314) (actual time=0.741..0.741 rows=0 loops=1) Index Cond: (((description)::text ~>=~ 'mismatch'::text) AND ((description)::text ~<~ 'mismatci'::text)) Filter: ((description)::text ~~ 'mismatch%'::text) Total runtime: 0.919 ms (7 rows) Is there any tweaks to force pgsql to use index on description? Balaji P.S The event database has 700k records.
Re: [PERFORM] LIKE Query performance
Only wildspeed http://www.sai.msu.su/~megera/wiki/wildspeed has index support for %text% But, it has limitations. Oleg On Tue, 27 Jan 2009, Hari, Balaji wrote: Hi, I am relatively new to PostgreSQL(8.1) and facing the following problem. We have indexes defined on timestamp and description (create index description_idx on event using btree (description varchar_pattern_ops)) EXPLAIN ANALYZE SELECT event_id, category, current_session_number, description, event_type_id, realm_name, root_session_number, severity, source_name, target_key, target_name, timestamp, jdo_version FROM event WHERE description like '%mismatch%' ORDER BY timestamp desc; QUERY PLAN -- Sort (cost=36267.09..36272.73 rows=2256 width=314) (actual time=19255.075..20345.774 rows=647537 loops=1) Sort Key: "timestamp" Sort Method: external merge Disk: 194080kB -> Seq Scan on event (cost=0.00..36141.44 rows=2256 width=314) (actual time=0.080..1475.041 rows=647537 loops=1) Filter: ((description)::text ~~ '%mismatch%'::text) Total runtime: 22547.292 ms (6 rows) But startsWith query use indexes. EXPLAIN ANALYZE SELECT event_id, category, current_session_number, description, event_type_id, realm_name, root_session_number, severity, source_name, target_key, target_name, timestamp, jdo_version FROM event WHERE description like 'mismatch%' ORDER BY timestamp desc; QUERY PLAN --- Sort (cost=9.26..9.27 rows=1 width=314) (actual time=0.766..0.766 rows=0 loops=1) Sort Key: "timestamp" Sort Method: quicksort Memory: 17kB -> Index Scan using description_idx on event (cost=0.00..9.25 rows=1 width=314) (actual time=0.741..0.741 rows=0 loops=1) Index Cond: (((description)::text ~>=~ 'mismatch'::text) AND ((description)::text ~<~ 'mismatci'::text)) Filter: ((description)::text ~~ 'mismatch%'::text) Total runtime: 0.919 ms (7 rows) Is there any tweaks to force pgsql to use index on description? Balaji P.S The event database has 700k records. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance