Re: [GENERAL] How can this be optimized, if possible?

2005-07-05 Thread Greg Stark
Net Virtual Mailing Lists [EMAIL PROTECTED] writes:

 The query I want to run against these two tables is something like this:
 
 SELECT
  count(*) as count,
  category.category,
  nlevel(category.category) AS level,
  subpath(category.category,0,nlevel(category.category)-1) as parent,
  category.head_title,
  category.cat_title,
  category.subcat_title
 FROM
  test,
  category
 WHERE
  test.category @ category.category 
 GROUP BY
   category.category, category.head_title, category.cat_title,
 category.subcat_title | 
 
 
 Many times the WHERE clause will contain additional search criteria on
 the 'test' table.  What I am trying to get is a count of how many rows
 from the test table fall into each category, being limited by the search
 criteria.

Post the output of EXPLAIN ANALYZE SELECT ...

Also list any indexes you have on the tables. Do you have a GIST index on
the ltree column in test?

There are a number of ways of rewriting this query, you'll probably have some
success with one of them.

You could for example write it:

SELECT *,
 (SELECT count(*) 
FROM test 
   WHERE category @ category.category 
[AND search criteria...]) as count
 FROM category

Normally I would say your form with the join gives the planner the maximum
flexibility, but I don't think the planner is going to be able to do any
better than nested loops with a join clause like that so I don't think this
will be any worse than the join. And it might have a better chance of using an
index on test.category.

But not that it's still got to do 300 scans of the test index. If each one
takes .5s then this query is still going to take 150s or so. But with a gist
index on the test.category column it may be more 10s total. It will depend
partly on how many categories you have that span a large number of records in
test. That is, how many parent categories you have.


-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] How can this be optimized, if possible?

2005-07-05 Thread Dann Corbit
Would it help if you created a trigger that puts a category count into a
statistics table for each table?  Perhaps you could gather most of the
information you need from such a process.  If an estimate is good enough
and you are using the statistics collector and if you have an index on
that column, perhaps you could even query the system statistics tables.

Do you have a large number of categories, or is it only a few?

If it is only 10 categories or so, and if they do not change, then
perhaps you can reformulate as a collection of equal joins.

What does the query plan look like?  What sort of hardware are you
using?  What version of PostgreSQL are you using?

Your query speed does seem shockingly slow for only 30,000 rows in the
largest table.

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-general-
 [EMAIL PROTECTED] On Behalf Of Net Virtual Mailing Lists
 Sent: Monday, July 04, 2005 9:47 PM
 To: pgsql-general@postgresql.org
 Subject: [GENERAL] How can this be optimized, if possible?
 
 Hello,
 
 
 My database has grown far faster then expected and a query which used
to
 run acceptably now does not.   I'm trying to figure out a way to make
 this operate faster and scale better.   I'm very open to the idea that
 this does not need to be done using a SQL query at all - right now I'm
 really just in need of some conceptual/architectural help on this one.
 
 
 So I have two tables:
 
  Table category
 Column|   Type| Modifiers
 --+---+---
  head_title   | character varying |
  cat_title| character varying |
  subcat_title | character varying |
  category | ltree |
 
 
 
  Table test
Column   |   Type   |
Modifiers
 +--
 +-
  id | integer  | not null default
 nextval('master.test_id_seq'::text)
  category   | ltree[]  |
 
 
 ... there are other fields in the test table, but these are really the
 only two relevant to this.
 
 
 The query I want to run against these two tables is something like
this:
 
 SELECT
  count(*) as count,
  category.category,
  nlevel(category.category) AS level,
  subpath(category.category,0,nlevel(category.category)-1) as parent,
  category.head_title,
  category.cat_title,
  category.subcat_title
 FROM
  test,
  category
 WHERE
  test.category @ category.category
 GROUP BY
   category.category, category.head_title, category.cat_title,
 category.subcat_title |
 
 
 Many times the WHERE clause will contain additional search criteria
on
 the 'test' table.  What I am trying to get is a count of how many rows
 from the test table fall into each category, being limited by the
search
 criteria.
 
 This query is starting to take an enormous amount of time (30+
seconds)
 and I really need the results of this in a couple seconds tops.  I can
do
 a select category from test and it completes in about .5 seconds.
The
 category table currently only has 225 rows, the test table having
 approximately 30,000.  SELECT count(category,category FROM test GROUP
BY
 category is quite slow and I thought of making a materialized view of
 this, but then of course I don't see any way to make that result
limited
 by my search criteria.
 
 I am completely open to re-architecting this entirely, performance of
 this query is critical to my application - I really just am not sure
 where to start.  It seems like everything I do is worse then what I
 started with.
 
 ... It *almost* seems as if I need to build some sort of search
engine
 like tool which performs all the queries against the database, has
tons
 of memory, and cache the category attributes for each record in
memory.
 This sure seems like a lot of work though - I sincerely hope there is
an
 easier way.
 
 Thanks for your help, as always!
 
 - Greg
 
 
 ---(end of
broadcast)---
 TIP 7: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[GENERAL] How can this be optimized, if possible?

2005-07-04 Thread Net Virtual Mailing Lists
Hello,


My database has grown far faster then expected and a query which used to
run acceptably now does not.   I'm trying to figure out a way to make
this operate faster and scale better.   I'm very open to the idea that
this does not need to be done using a SQL query at all - right now I'm
really just in need of some conceptual/architectural help on this one.


So I have two tables:

 Table category
Column|   Type| Modifiers 
--+---+---
 head_title   | character varying | 
 cat_title| character varying | 
 subcat_title | character varying | 
 category | ltree | 



 Table test
   Column   |   Type   |
   Modifiers
+--
+-
 id | integer  | not null default
nextval('master.test_id_seq'::text)
 category   | ltree[]  | 


... there are other fields in the test table, but these are really the
only two relevant to this.


The query I want to run against these two tables is something like this:

SELECT
 count(*) as count,
 category.category,
 nlevel(category.category) AS level,
 subpath(category.category,0,nlevel(category.category)-1) as parent,
 category.head_title,
 category.cat_title,
 category.subcat_title
FROM
 test,
 category
WHERE
 test.category @ category.category 
GROUP BY
  category.category, category.head_title, category.cat_title,
category.subcat_title | 


Many times the WHERE clause will contain additional search criteria on
the 'test' table.  What I am trying to get is a count of how many rows
from the test table fall into each category, being limited by the search
criteria.

This query is starting to take an enormous amount of time (30+ seconds)
and I really need the results of this in a couple seconds tops.  I can do
a select category from test and it completes in about .5 seconds.  The
category table currently only has 225 rows, the test table having
approximately 30,000.  SELECT count(category,category FROM test GROUP BY
category is quite slow and I thought of making a materialized view of
this, but then of course I don't see any way to make that result limited
by my search criteria.

I am completely open to re-architecting this entirely, performance of
this query is critical to my application - I really just am not sure
where to start.  It seems like everything I do is worse then what I
started with.

... It *almost* seems as if I need to build some sort of search engine
like tool which performs all the queries against the database, has tons
of memory, and cache the category attributes for each record in memory. 
This sure seems like a lot of work though - I sincerely hope there is an
easier way.

Thanks for your help, as always!

- Greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings