Re: [PERFORM] bizarre query performance question

2008-10-02 Thread Richard Huxton
H. William Connors II wrote:
 fa_assignment has 44184945 records
 fa_assignment_detail has 82196027 records
 
 explain select * from fa_assignment fa JOIN fa_assignment_detail fad ON
 (fad.assignment_id = fa.assignment_id) where fa.scenario_id = 0;
 
  QUERY
 PLAN 
 ---
 
 Hash Join  (cost=581289.72..4940729.76 rows=9283104 width=91)

Are you really expecting 9 million rows in the result? If so, this is
probably a reasonable plan.

   Hash Cond: (fad.assignment_id = fa.assignment_id)
   -  Seq Scan on fa_assignment_detail fad  (cost=0.00..1748663.60
 rows=82151360 width=61)
   -  Hash  (cost=484697.74..484697.74 rows=4995439 width=30)
 -  Bitmap Heap Scan on fa_assignment fa 
 (cost=93483.75..484697.74 rows=4995439 width=30)
   Recheck Cond: (scenario_id = 0)
   -  Bitmap Index Scan on fa_assignment_idx2 
 (cost=0.00..92234.89 rows=4995439 width=0)
 Index Cond: (scenario_id = 0)

It's restricting on scenario_id, building a bitmap to identify which
disk-blocks will contain one or more matching rows and then scanning
those. If those 5 million scenario_id=0 rows are spread over 10% of the
blocks then that's a good idea.

If it was expecting only a handful of rows with scenario_id=0 then I'd
expect it to switch to a standard index scan.

If your work_mem is small try something like:
  set work_mem = '50MB';
before running the query - maybe even larger.

-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dedicated server postgresql 8.1 conf tunning

2008-10-02 Thread paul
Thanks,

Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable.

So i'm going to play with work_mem  shared_buffers.

With big shared_buffers pgsql tells me 
shmget(cle=5432001, taille=11183431680, 03600).
so i do echo 13183431680  /proc/sys/kernel/shmmax ( 10Go + 2Go just
in case)

but pgsql tells me again that it there's not enought shm..
How can i compute the go shmmax for my server ?

On Wed, 01 Oct 2008 12:36:48 +0100, Richard Huxton [EMAIL PROTECTED]
wrote:
 [EMAIL PROTECTED] wrote:
 Hello
 
 I'm running pgsql 8.1.11 (from debian stable) on a server with 16GB RAM
 (Linux helios 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64
 GNU/Linux).
 
 Unless you're committed to this version, I'd seriously look into 8.3
 from backports (or compiled yourself). I'd expect some serious
 performance improvements for the workload you describe.
 
 I have a table tickets with 1 000 000 insert by month ( ~2600 each
 2hours
 ) (for the moment 1300 rows for 5GB )
 and i have to extract statistics ( number of calls, number of calls less
 than X seconds, number of news calles, number of calls from the new
 callers, ...)
 
 OK, so not a lot of updates, but big aggregation queries. You might want
 to pre-summarise older data as the system gets larger.
 
 1°) The server will handle max 15 queries at a time.
 So this is my postgresql.conf
 
 max_connections = 15
 
 Well, I'd allow 20 - just in case.
 
 shared_buffers =  995600 # ~1Go
 temp_buffers = 1000
 work_mem = 512000 # ~512Ko
 
 I'd be tempted to increase work_mem by a lot, possibly even at the
 expense of shared_buffers. You're going to be summarising large amounts
 of data so the larger the better, particularly as your database is
 currently smaller than RAM. Start with 5MB then try 10MB, 20MB and see
 what difference it makes.
 
 maintenance_work_mem = 1048576 # 1Mo
 
 max_fsm_pages = 41522880 # ~40Mo
 max_fsm_relations = 8000 
 
 See what a vacuum full verbose says for how much free space you need to
 track.
 
 checkpoint_segments = 10
 checkpoint_timeout = 3600
 
 With your low rate of updates shouldn't matter.
 
 effective_cache_size = 13958643712 # 13Go
 
 Assuming that's based on what top or free say, that's fine. Don't
 forget it will need to be reduced if you increase work_mem or
 shared_buffers.
 
 stats_start_collector = on
 stats_command_string = on
 stats_block_level = on
 stats_row_level = on
 autovacuum = off
 
 Make sure you're vacuuming if autovacuum is off.
 
 How can i optimize the configuration?
 
 Looks reasonable, so far as you can tell from an email. Try playing with
 work_mem though.
 
 2°) My queries look like
 SELECT tday AS n,
 COUNT(DISTINCT(a.appelant)) AS new_callers,
 COUNT(a.appelant) AS new_calls
 FROM cirpacks.tickets AS a
 WHERE LENGTH(a.appelant)  4
 AND a.service_id IN ( 95, 224, 35, 18 )
 AND a.exploitant_id = 66
 AND a.tyear = 2008
 AND a.tmonth = 08
 
 Index on (tyear,tmonth) might pay off, or one on exploitant_id perhaps.
 
 AND EXISTS ( SELECT 1 FROM cirpacks.clients AS b WHERE b.appelant =
 a.appelant AND b.service_id IN ( 95, 224, 35, 18 ) AND b.heberge_id = 66
 HAVING to_char(MIN(b.premier_appel), 'MMDD') = to_char(a.date,
 'MMDD') )
 
 It looks like you're comparing two dates by converting them to text.
 That's probably not the most efficient way of doing it. Might not be an
 issue here.
 
 GROUP BY n
 ORDER BY n;
 
 or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM
 cirpacks.tickets WHERE tyear = ... and tmonth = ... and tday = ... AND
 audiotel IN ( '...', '...' );
 or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM
 cirpacks.tickets WHERE '2007-01-01' = date AND date = '2008-08-31' AND
 audiotel IN ( '...', '...' );
 
 
 which indexes are the best ?
 
 The only way to find out is to test. You'll want to run EXPLAIN after
 adding each index to see what difference it makes. Then you'll want to
 see what impact this has on overall workload.
 
 Mostly though, I'd try out 8.3 and see if that buys you a free
 performance boost.
 
 -- 
   Richard Huxton
   Archonet Ltd
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dedicated server postgresql 8.1 conf tunning

2008-10-02 Thread Richard Huxton
[EMAIL PROTECTED] wrote:
 Thanks,
 
 Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable.

That's why backports.org was invented :-)
Or does can't mean not allowed to?

 So i'm going to play with work_mem  shared_buffers.
 
 With big shared_buffers pgsql tells me 
 shmget(cle=5432001, taille=11183431680, 03600).
 so i do echo 13183431680  /proc/sys/kernel/shmmax ( 10Go + 2Go just
 in case)
 
 but pgsql tells me again that it there's not enought shm..
 How can i compute the go shmmax for my server ?

I'm not seeing anything terribly wrong there. Are you hitting a limit
with shmall?

Oh - and I'm not sure there's much point in having more shared-buffers
than you have data.

Try much larger work_mem first, I think that's the biggest gain for you.

-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dedicated server postgresql 8.1 conf tunning

2008-10-02 Thread Tommy Gildseth

Richard Huxton wrote:

[EMAIL PROTECTED] wrote:

Thanks,

Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable.


That's why backports.org was invented :-)
Or does can't mean not allowed to?



Well, running production servers from backports can be a risky 
proposition too, and can land you in situations like the one discussed 
in Debian packages for Postgres 8.2 from the General list.



--
Tommy Gildseth

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dedicated server postgresql 8.1 conf tunning

2008-10-02 Thread Richard Huxton
Tommy Gildseth wrote:
 Richard Huxton wrote:
 [EMAIL PROTECTED] wrote:
 Thanks,

 Unfornatly, i can't update pgsql to 8.3 since it's not in debian stable.

 That's why backports.org was invented :-)
 Or does can't mean not allowed to?
 
 Well, running production servers from backports can be a risky
 proposition too, and can land you in situations like the one discussed
 in Debian packages for Postgres 8.2 from the General list.

Well, there's a reason why stable is a popular choice for production
servers. I must admit that I build from source for my PostgreSQL
packages (because I care which version I run). I was reading one of the
Perl fellows recommending the same.

-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Delete performance again

2008-10-02 Thread Віталій Тимчишин
Hello.

I have a database with company table that have a bunch of related
(delete=cascade) tables.
Also it has 1-M relation to company_descr table.
Once we've found that ~half of our companies do not have any description and
we would like to remove them.
First this I've tried was
delete from company where id not in (select company_id from company_descr);
I've tried to analyze command, but unlike to other RDBM I've used it did not
include cascade deletes/checks into query plan. That is first problem.
It was SLOW. To make it faster I've done next thing:

create temporary table comprm(id) as select id from company;
delete from comprm where id in (select company_id from company_descr);
delete from company where id in (select id from comprm);

That was much better. So the question is why postgresql can't do such a
thing.
But it was better only until removing dataset was small (~5% of all
table).
As soon as I've tried to remove 50% I've got speed problems. I've ensured I
have all indexes for both ends of foreign key.
I've tried to remove all cascaded entries by myself, e.g.:

create temporary table comprm(id) as select id from company;
delete from comprm where id in (select company_id from company_descr);
delete from company_alias where company_id in (select id from comprm);
...
delete from company where id in (select id from comprm);

It did not help until I drop all constraints before and recreate all
constraints after.
Now I have it work for 15minutes, while previously it could not do in a day.

Is it OK? I'd say, some (if not all) of the optimizations could be done by
postgresql optimizer.


Re: [PERFORM] Delete performance again

2008-10-02 Thread Tom Lane
=?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes:
 delete from company where id not in (select company_id from company_descr);
 I've tried to analyze command, but unlike to other RDBM I've used it did not
 include cascade deletes/checks into query plan. That is first problem.
 It was SLOW.

Usually the reason for that is having forgotten to make an index on the
referencing column(s) ?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dedicated server postgresql 8.1 conf tunning

2008-10-02 Thread paul
I played with work_mem and setting work_mem more than 256000 do not change
the performance.

I try to upgrade to 8.3 using etch-backports but it's a new install not an
upgrade.
So i have to create users, permissions, import data again, it scared me so
i want to find another solutions first.
But now i'll try 8.3


On Thu, 02 Oct 2008 10:36:50 +0200, Tommy Gildseth
[EMAIL PROTECTED] wrote:
 Richard Huxton wrote:
 [EMAIL PROTECTED] wrote:
 Thanks,

 Unfornatly, i can't update pgsql to 8.3 since it's not in debian
 stable.

 That's why backports.org was invented :-)
 Or does can't mean not allowed to?
 
 
 Well, running production servers from backports can be a risky
 proposition too, and can land you in situations like the one discussed
 in Debian packages for Postgres 8.2 from the General list.
 
 
 --
 Tommy Gildseth


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Delete performance again

2008-10-02 Thread Віталій Тимчишин
2008/10/2 Tom Lane [EMAIL PROTECTED]

 =?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes:
  delete from company where id not in (select company_id from
 company_descr);
  I've tried to analyze command, but unlike to other RDBM I've used it did
 not
  include cascade deletes/checks into query plan. That is first problem.
  It was SLOW.

 Usually the reason for that is having forgotten to make an index on the
 referencing column(s) ?


Not at all. As you can see below in original message, simply extending the
query to what should have been done by optimizer helps. I'd say optimizer
always uses fixed plan not taking into account that this is massive update
and id doing index lookup of children records for each parent record, while
it would be much more effective to perform removal of all children records
in single table scan.

It's like trigger for each record instead of for each statement.