Re: [PERFORM] bizarre query performance question
H. William Connors II wrote: fa_assignment has 44184945 records fa_assignment_detail has 82196027 records explain select * from fa_assignment fa JOIN fa_assignment_detail fad ON (fad.assignment_id = fa.assignment_id) where fa.scenario_id = 0; QUERY PLAN --- Hash Join (cost=581289.72..4940729.76 rows=9283104 width=91) Are you really expecting 9 million rows in the result? If so, this is probably a reasonable plan. Hash Cond: (fad.assignment_id = fa.assignment_id) - Seq Scan on fa_assignment_detail fad (cost=0.00..1748663.60 rows=82151360 width=61) - Hash (cost=484697.74..484697.74 rows=4995439 width=30) - Bitmap Heap Scan on fa_assignment fa (cost=93483.75..484697.74 rows=4995439 width=30) Recheck Cond: (scenario_id = 0) - Bitmap Index Scan on fa_assignment_idx2 (cost=0.00..92234.89 rows=4995439 width=0) Index Cond: (scenario_id = 0) It's restricting on scenario_id, building a bitmap to identify which disk-blocks will contain one or more matching rows and then scanning those. If those 5 million scenario_id=0 rows are spread over 10% of the blocks then that's a good idea. If it was expecting only a handful of rows with scenario_id=0 then I'd expect it to switch to a standard index scan. If your work_mem is small try something like: set work_mem = '50MB'; before running the query - maybe even larger. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dedicated server postgresql 8.1 conf tunning
Thanks, Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable. So i'm going to play with work_mem shared_buffers. With big shared_buffers pgsql tells me shmget(cle=5432001, taille=11183431680, 03600). so i do echo 13183431680 /proc/sys/kernel/shmmax ( 10Go + 2Go just in case) but pgsql tells me again that it there's not enought shm.. How can i compute the go shmmax for my server ? On Wed, 01 Oct 2008 12:36:48 +0100, Richard Huxton [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: Hello I'm running pgsql 8.1.11 (from debian stable) on a server with 16GB RAM (Linux helios 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64 GNU/Linux). Unless you're committed to this version, I'd seriously look into 8.3 from backports (or compiled yourself). I'd expect some serious performance improvements for the workload you describe. I have a table tickets with 1 000 000 insert by month ( ~2600 each 2hours ) (for the moment 1300 rows for 5GB ) and i have to extract statistics ( number of calls, number of calls less than X seconds, number of news calles, number of calls from the new callers, ...) OK, so not a lot of updates, but big aggregation queries. You might want to pre-summarise older data as the system gets larger. 1°) The server will handle max 15 queries at a time. So this is my postgresql.conf max_connections = 15 Well, I'd allow 20 - just in case. shared_buffers = 995600 # ~1Go temp_buffers = 1000 work_mem = 512000 # ~512Ko I'd be tempted to increase work_mem by a lot, possibly even at the expense of shared_buffers. You're going to be summarising large amounts of data so the larger the better, particularly as your database is currently smaller than RAM. Start with 5MB then try 10MB, 20MB and see what difference it makes. maintenance_work_mem = 1048576 # 1Mo max_fsm_pages = 41522880 # ~40Mo max_fsm_relations = 8000 See what a vacuum full verbose says for how much free space you need to track. checkpoint_segments = 10 checkpoint_timeout = 3600 With your low rate of updates shouldn't matter. effective_cache_size = 13958643712 # 13Go Assuming that's based on what top or free say, that's fine. Don't forget it will need to be reduced if you increase work_mem or shared_buffers. stats_start_collector = on stats_command_string = on stats_block_level = on stats_row_level = on autovacuum = off Make sure you're vacuuming if autovacuum is off. How can i optimize the configuration? Looks reasonable, so far as you can tell from an email. Try playing with work_mem though. 2°) My queries look like SELECT tday AS n, COUNT(DISTINCT(a.appelant)) AS new_callers, COUNT(a.appelant) AS new_calls FROM cirpacks.tickets AS a WHERE LENGTH(a.appelant) 4 AND a.service_id IN ( 95, 224, 35, 18 ) AND a.exploitant_id = 66 AND a.tyear = 2008 AND a.tmonth = 08 Index on (tyear,tmonth) might pay off, or one on exploitant_id perhaps. AND EXISTS ( SELECT 1 FROM cirpacks.clients AS b WHERE b.appelant = a.appelant AND b.service_id IN ( 95, 224, 35, 18 ) AND b.heberge_id = 66 HAVING to_char(MIN(b.premier_appel), 'MMDD') = to_char(a.date, 'MMDD') ) It looks like you're comparing two dates by converting them to text. That's probably not the most efficient way of doing it. Might not be an issue here. GROUP BY n ORDER BY n; or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM cirpacks.tickets WHERE tyear = ... and tmonth = ... and tday = ... AND audiotel IN ( '...', '...' ); or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM cirpacks.tickets WHERE '2007-01-01' = date AND date = '2008-08-31' AND audiotel IN ( '...', '...' ); which indexes are the best ? The only way to find out is to test. You'll want to run EXPLAIN after adding each index to see what difference it makes. Then you'll want to see what impact this has on overall workload. Mostly though, I'd try out 8.3 and see if that buys you a free performance boost. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dedicated server postgresql 8.1 conf tunning
[EMAIL PROTECTED] wrote: Thanks, Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable. That's why backports.org was invented :-) Or does can't mean not allowed to? So i'm going to play with work_mem shared_buffers. With big shared_buffers pgsql tells me shmget(cle=5432001, taille=11183431680, 03600). so i do echo 13183431680 /proc/sys/kernel/shmmax ( 10Go + 2Go just in case) but pgsql tells me again that it there's not enought shm.. How can i compute the go shmmax for my server ? I'm not seeing anything terribly wrong there. Are you hitting a limit with shmall? Oh - and I'm not sure there's much point in having more shared-buffers than you have data. Try much larger work_mem first, I think that's the biggest gain for you. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dedicated server postgresql 8.1 conf tunning
Richard Huxton wrote: [EMAIL PROTECTED] wrote: Thanks, Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable. That's why backports.org was invented :-) Or does can't mean not allowed to? Well, running production servers from backports can be a risky proposition too, and can land you in situations like the one discussed in Debian packages for Postgres 8.2 from the General list. -- Tommy Gildseth -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dedicated server postgresql 8.1 conf tunning
Tommy Gildseth wrote: Richard Huxton wrote: [EMAIL PROTECTED] wrote: Thanks, Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable. That's why backports.org was invented :-) Or does can't mean not allowed to? Well, running production servers from backports can be a risky proposition too, and can land you in situations like the one discussed in Debian packages for Postgres 8.2 from the General list. Well, there's a reason why stable is a popular choice for production servers. I must admit that I build from source for my PostgreSQL packages (because I care which version I run). I was reading one of the Perl fellows recommending the same. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Delete performance again
Hello. I have a database with company table that have a bunch of related (delete=cascade) tables. Also it has 1-M relation to company_descr table. Once we've found that ~half of our companies do not have any description and we would like to remove them. First this I've tried was delete from company where id not in (select company_id from company_descr); I've tried to analyze command, but unlike to other RDBM I've used it did not include cascade deletes/checks into query plan. That is first problem. It was SLOW. To make it faster I've done next thing: create temporary table comprm(id) as select id from company; delete from comprm where id in (select company_id from company_descr); delete from company where id in (select id from comprm); That was much better. So the question is why postgresql can't do such a thing. But it was better only until removing dataset was small (~5% of all table). As soon as I've tried to remove 50% I've got speed problems. I've ensured I have all indexes for both ends of foreign key. I've tried to remove all cascaded entries by myself, e.g.: create temporary table comprm(id) as select id from company; delete from comprm where id in (select company_id from company_descr); delete from company_alias where company_id in (select id from comprm); ... delete from company where id in (select id from comprm); It did not help until I drop all constraints before and recreate all constraints after. Now I have it work for 15minutes, while previously it could not do in a day. Is it OK? I'd say, some (if not all) of the optimizations could be done by postgresql optimizer.
Re: [PERFORM] Delete performance again
=?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes: delete from company where id not in (select company_id from company_descr); I've tried to analyze command, but unlike to other RDBM I've used it did not include cascade deletes/checks into query plan. That is first problem. It was SLOW. Usually the reason for that is having forgotten to make an index on the referencing column(s) ? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dedicated server postgresql 8.1 conf tunning
I played with work_mem and setting work_mem more than 256000 do not change the performance. I try to upgrade to 8.3 using etch-backports but it's a new install not an upgrade. So i have to create users, permissions, import data again, it scared me so i want to find another solutions first. But now i'll try 8.3 On Thu, 02 Oct 2008 10:36:50 +0200, Tommy Gildseth [EMAIL PROTECTED] wrote: Richard Huxton wrote: [EMAIL PROTECTED] wrote: Thanks, Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable. That's why backports.org was invented :-) Or does can't mean not allowed to? Well, running production servers from backports can be a risky proposition too, and can land you in situations like the one discussed in Debian packages for Postgres 8.2 from the General list. -- Tommy Gildseth -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Delete performance again
2008/10/2 Tom Lane [EMAIL PROTECTED] =?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes: delete from company where id not in (select company_id from company_descr); I've tried to analyze command, but unlike to other RDBM I've used it did not include cascade deletes/checks into query plan. That is first problem. It was SLOW. Usually the reason for that is having forgotten to make an index on the referencing column(s) ? Not at all. As you can see below in original message, simply extending the query to what should have been done by optimizer helps. I'd say optimizer always uses fixed plan not taking into account that this is massive update and id doing index lookup of children records for each parent record, while it would be much more effective to perform removal of all children records in single table scan. It's like trigger for each record instead of for each statement.