[PERFORM] UNSUBSCRIBE

2005-10-28 Thread Nick Howden
UNSUBSCRIBE


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] How much memory?

2005-10-28 Thread Alex Turner
Reasons not to buy from Sun or Compaq - why get Opteron 252 when a 240
will do just fine for a fraction of the cost, which of course they
don't stock, white box all the way baby ;).  My box from Sun or Compaq
or IBM is 2x the whitebox cost because you can't buy apples to apples.
 We have a bitchin' DB server for $7.5k

Alex

On 10/27/05, Jim C. Nasby <[EMAIL PROTECTED]> wrote:
> On Thu, Oct 27, 2005 at 06:39:33PM -0400, Ron Peacetree wrote:
> > Databases basically come in 4 sizes:
> >
> > 1= The entire DB fits into memory.
> > 2= The performance critical table(s) fit(s) into memory
> > 3= The indexes of the performance critical table(s) fit into memory.
> > 4= Neither the performance critical tables nor their indexes fit into 
> > memory.
> >
> > Performance decreases (exponentially), and development + maintenance 
> > cost/difficulty/pain increases (exponentially), as you go down the list.
> >
> > While it is often not possible to be in class "1" above, do everything you 
> > can to be in at least class "3" and do everything you can to avoid class 
> > "4".
> >
> > At ~$75-$150 per GB as of this post, RAM is the cheapest investment you can 
> > make in a high perfomance, low hassle DBMS.  IWill's and Tyan's 16 DIMM 
> > slot mainboards are worth every penny.
>
> And note that your next investment after RAM should be better disk IO.
> More CPUs *generally* don't buy you much (if anything). My rule of
> thumb: the only time your database should be CPU-bound is if you've got
> a bad design*.
>
> *NOTE: before everyone goes off about query parallelism and big
> in-memory sorts and what-not, keep in mind I said "rule of thumb". :)
> --
> Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
> Pervasive Software  http://pervasive.comwork: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461
>
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
>

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] How long it takes to vacuum a big table

2005-10-28 Thread Csaba Nagy
Hi all,

I wonder what is the main driving factor for vacuum's duration: the size
of the table, or the number of dead tuples it has to clean ?

We have a few big tables which are also heavily updated, and I couldn't
figure out a way to properly vacuum them. Vacuuming any of those took
very long amounts of time (I started one this morning and after ~5h30min
it's still running - and it's not even the biggest or most updated
table), which I can't really afford because it prevents other vacuum
processes on smaller tables to do their job due to the transaction open
for the long-running vacuum. 

BTW, is it in any way feasible to implement to make one vacuum not
blocking other vacuums from cleaning dead tuples after the first one
started ? I know it's the transaction not the vacuum which blocks, but
then wouldn't be a way to run vacuum somehow in "out of transaction
context" mode ?

Another issue: vacuum is not responding to cancel requests, at least not
in a reasonable amount of time...

Thanks in advance,
Csaba.
 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] How long it takes to vacuum a big table

2005-10-28 Thread Jan Peterson
We've also experienced problems with VACUUM running for a long time. 
A VACUUM on our pg_largeobject table, for example, can take over 24
hours to complete (pg_largeobject in our database has over 45million
rows).  With our other tables, we've been able to partition them
(using inheritance) to keep any single table from getting "too large",
but we've been unable to do that with pg_largeobject.  Currently,
we're experimenting with moving some of our bulk (large object) data
outside of the database and storing it in the filesystem directly.

I know that Hannu Krosing has developed some patches that allow
concurrent VACUUMs to run more effectively.  Unfortunately, these
patches didn't get into 8.1 so far as I know.  You can search the
performance mailing list for more information.

-jan-
--
Jan L. Peterson
<[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Best way to check for new data.

2005-10-28 Thread Rodrigo Madera
I have a table that holds entries as in a ficticious table Log(id integer, msg text).
 
Lets say then that I have the program log_tail that has as it´s sole purpose to print newly added data elements.
 
What is the best solution in terms of performace?
 
Thank you for your time,
Rodrigo
 


[PERFORM] Simple query: how to optimize

2005-10-28 Thread Collin Peters
I have two tables, one is called 'users' the other is 'user_activity'.
 The 'users' table simply contains the users in the system there is
about 30,000 rows.  The 'user_activity' table stores the activities
the user has taken.  This table has about 430,000 rows and also
(notably) has a column which tracks the type of activity.  90% of the
table is type 7 which indicates the user logged into the system.

I am trying to write a simple query that returns the last time each
user logged into the system.  This is how the query looks at the
moment:

SELECT u.user_id, MAX(ua.activity_date)
FROM pp_users u
LEFT OUTER JOIN user_activity ua ON (u.user_id = ua.user_id AND
ua.user_activity_type_id = 7)
WHERE u.userstatus_id <> 4
AND age(u.joined_date) < interval '30 days'
GROUP BY u.user_id

The above query takes about 5 seconds but I'm wondering how it can be
optimized.  When the query is formatted as above it does use an index
on the user_id column of the user_activity table... but the cost is
huge (cost=0.00..1396700.80).

I have tried formatting it another way with a sub-query but it takes
about the same amount to completed:

SELECT u.user_id, ua.last
FROM pp_users u
LEFT OUTER JOIN (SELECT max(activity_date) as last, user_id FROM
user_activity WHERE user_activity_type_id = 7 GROUP BY user_id) as ua
ON (u.user_id = ua.user_id)
WHERE u.userstatus_id <> 4
AND age(u.joined_date) < interval '30 days'

Can anybody offer any pointers on this scenario?

Regards,
Collin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Best way to check for new data.

2005-10-28 Thread Havasvölgyi Ottó



Rodrigo,
 
You could use LISTEN + NOTIFY with 
triggers.
In after_insert_statement trigger you could notify 
a listener, the client could query it immediately.
 
Best Regards,
Otto
 

  - Original Message - 
  From: 
  Rodrigo Madera 
  To: pgsql-performance@postgresql.org 
  
  Sent: Friday, October 28, 2005 11:39 
  PM
  Subject: [PERFORM] Best way to check for 
  new data.
  
  I have a table that holds entries as in a ficticious table Log(id 
  integer, msg text).
   
  Lets say then that I have the program log_tail that has as it´s sole 
  purpose to print newly added data elements.
   
  What is the best solution in terms of performace?
   
  Thank you for your time,
  Rodrigo
   


Re: [PERFORM] Simple query: how to optimize

2005-10-28 Thread Roger Hand
On October 28, 2005 2:54 PM
Collin Peters wrote:
> I have two tables, one is called 'users' the other is 'user_activity'.
...
> I am trying to write a simple query that returns the last time each
> user logged into the system.  This is how the query looks at the
> moment:
> 
> SELECT u.user_id, MAX(ua.activity_date)
> FROM pp_users u
> LEFT OUTER JOIN user_activity ua ON (u.user_id = ua.user_id AND
> ua.user_activity_type_id = 7)
> WHERE u.userstatus_id <> 4
> AND age(u.joined_date) < interval '30 days'
> GROUP BY u.user_id

You're first joining against the entire user table, then filtering out the users
you don't need.

Instead, filter out the users you don't need first, then do the join:

SELECT users.user_id, MAX(ua.activity_date)
FROM 
(SELECT u.user_id 
FROM pp_users u
WHERE u.userstatus_id <> 4
AND age(u.joined_date) < interval '30 days'
) users
LEFT OUTER JOIN user_activity ua 
  ON (users.user_id = ua.user_id 
  AND ua.user_activity_type_id = 7)
GROUP BY users.user_id

(disclaimer: I haven't actually tried this sql)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Simple query: how to optimize

2005-10-28 Thread Collin Peters
These two queries execute at exactly the same speed.  When I run run
EXPLAIN on them both they return the *exact* same query plan as well. 
I find this strange... but it is also kind of what I expected from
reading up on various things.  I am under the impression the
postgresql will break up your query and run it as it sees best.   So
in the case of these two queries... it seems it is actually almost
converting one into the other.  Maybe I am wrong.

Is there a good resource list somewhere for postgresql query
optimization?  There are entire books devoted to the subject for
oracle but I can't find more than a few small articles on postgresql
query optimizations on the web.

Regards,
Collin

On 10/28/05, Roger Hand <[EMAIL PROTECTED]> wrote:
> > SELECT u.user_id, MAX(ua.activity_date)
> > FROM pp_users u
> > LEFT OUTER JOIN user_activity ua ON (u.user_id = ua.user_id AND
> > ua.user_activity_type_id = 7)
> > WHERE u.userstatus_id <> 4
> > AND age(u.joined_date) < interval '30 days'
> > GROUP BY u.user_id
>
> You're first joining against the entire user table, then filtering out the 
> users
> you don't need.
>
> Instead, filter out the users you don't need first, then do the join:
>
> SELECT users.user_id, MAX(ua.activity_date)
> FROM
> (SELECT u.user_id
> FROM pp_users u
> WHERE u.userstatus_id <> 4
> AND age(u.joined_date) < interval '30 days'
> ) users
> LEFT OUTER JOIN user_activity ua
>   ON (users.user_id = ua.user_id
>   AND ua.user_activity_type_id = 7)
> GROUP BY users.user_id
>
> (disclaimer: I haven't actually tried this sql)
>

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Simple query: how to optimize

2005-10-28 Thread Collin Peters
A little bit more on my last post that I forget to mention.  The two
queries run at the same speed and have the same plan only if I have an
index on the user_activity.user_id column.  Otherwise they run at
different speeds.  The query you gave me actually runs slower without
the index.  All  this is making my head spin!! :O

On 10/28/05, Collin Peters <[EMAIL PROTECTED]> wrote:
> These two queries execute at exactly the same speed.  When I run run
> EXPLAIN on them both they return the *exact* same query plan as well.
> I find this strange... but it is also kind of what I expected from
> reading up on various things.  I am under the impression the
> postgresql will break up your query and run it as it sees best.   So
> in the case of these two queries... it seems it is actually almost
> converting one into the other.  Maybe I am wrong.
>
> Is there a good resource list somewhere for postgresql query
> optimization?  There are entire books devoted to the subject for
> oracle but I can't find more than a few small articles on postgresql
> query optimizations on the web.
>
> Regards,
> Collin
>
> On 10/28/05, Roger Hand <[EMAIL PROTECTED]> wrote:
> > > SELECT u.user_id, MAX(ua.activity_date)
> > > FROM pp_users u
> > > LEFT OUTER JOIN user_activity ua ON (u.user_id = ua.user_id AND
> > > ua.user_activity_type_id = 7)
> > > WHERE u.userstatus_id <> 4
> > > AND age(u.joined_date) < interval '30 days'
> > > GROUP BY u.user_id
> >
> > You're first joining against the entire user table, then filtering out the 
> > users
> > you don't need.
> >
> > Instead, filter out the users you don't need first, then do the join:
> >
> > SELECT users.user_id, MAX(ua.activity_date)
> > FROM
> > (SELECT u.user_id
> > FROM pp_users u
> > WHERE u.userstatus_id <> 4
> > AND age(u.joined_date) < interval '30 days'
> > ) users
> > LEFT OUTER JOIN user_activity ua
> >   ON (users.user_id = ua.user_id
> >   AND ua.user_activity_type_id = 7)
> > GROUP BY users.user_id
> >
> > (disclaimer: I haven't actually tried this sql)
> >
>

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Simple query: how to optimize

2005-10-28 Thread Steinar H. Gunderson
On Fri, Oct 28, 2005 at 03:40:40PM -0700, Roger Hand wrote:
> You're first joining against the entire user table, then filtering out the 
> users
> you don't need.

That's just wrong, sorry -- the planner is perfectly able to push the WHERE
down before the join.

I'd guess the problem is the age() query; age() doesn't really return what
you'd expect, and I don't think it can use an index easily (I might be wrong
here, though). Instead, try something like

  WHERE u.joined_date >= current_date - interval '30 days'

except that if you're running pre-8.0, you might want to precalculate the
right-hand side on the client.

I couldn't see EXPLAIN ANALYZE of your query, BTW -- having it would be
useful.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Simple query: how to optimize

2005-10-28 Thread PostgreSQL
Postgres is somewhat speed-challenged on aggregate functions.
The most-repeated work-around would be something like:

SELECT u.user_id,
(SELECT activity_date
  FROM user_activity
  WHERE user_activity.user_id = pp_users.user_id
  AND user_activity_type_id = 7
  ORDER BY activity_date DESC
  LIMIT 1)
FROM pp_users u
WHERE u.userstatus_id <> 4
AND age(u.joined_date) < interval '30 days'

(code above is untested) I've read that aggregate functions are
improved in the 8.1 code.  I'm running 8.1beta3 on one machine
but haven't experimented to verify the claimed improvements.

Martin Nickel

"Collin Peters" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
>I have two tables, one is called 'users' the other is 'user_activity'.
> The 'users' table simply contains the users in the system there is
> about 30,000 rows.  The 'user_activity' table stores the activities
> the user has taken.  This table has about 430,000 rows and also
> (notably) has a column which tracks the type of activity.  90% of the
> table is type 7 which indicates the user logged into the system.
>
> I am trying to write a simple query that returns the last time each
> user logged into the system.  This is how the query looks at the
> moment:
>
> SELECT u.user_id, MAX(ua.activity_date)
> FROM pp_users u
> LEFT OUTER JOIN user_activity ua ON (u.user_id = ua.user_id AND
> ua.user_activity_type_id = 7)
> WHERE u.userstatus_id <> 4
> AND age(u.joined_date) < interval '30 days'
> GROUP BY u.user_id
>
> The above query takes about 5 seconds but I'm wondering how it can be
> optimized.  When the query is formatted as above it does use an index
> on the user_id column of the user_activity table... but the cost is
> huge (cost=0.00..1396700.80).
>
> I have tried formatting it another way with a sub-query but it takes
> about the same amount to completed:
>
> SELECT u.user_id, ua.last
> FROM pp_users u
> LEFT OUTER JOIN (SELECT max(activity_date) as last, user_id FROM
> user_activity WHERE user_activity_type_id = 7 GROUP BY user_id) as ua
> ON (u.user_id = ua.user_id)
> WHERE u.userstatus_id <> 4
> AND age(u.joined_date) < interval '30 days'
>
> Can anybody offer any pointers on this scenario?
>
> Regards,
> Collin
>
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly
> 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org