Re: [GENERAL] Unnecessary scan on a partial index slows down query dramatically

2008-04-25 Thread Jimmy Choi
Thanks. I'm not really in a position to upgrade at the moment. I guess
in the short-term I'll tweak the query to work around this (e.g.
removing "status = 3" or adding more refining conditions both seem to
work).

Please let me know if there are configuration settings I should try.

Thanks,
Jimmy


On Fri, Apr 25, 2008 at 12:13 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Jimmy Choi" <[EMAIL PROTECTED]> writes:
>  > A simple query is executing much slower than expected. When looking at
>  > the query plan, I see a bitmap index scan on a partial index that does
>  > not have any associated index condition. How could that happen?
>
>  Easily --- it thinks that the partial index predicate is useful in
>  itself.
>
>
>  > "vacuum analyze" doesn't solve the problem. I'm running 8.1.5.
>
>  choose_bitmap_and was rewritten (again) in 8.1.9.  Try a newer
>  version.
>
>  http://archives.postgresql.org/pgsql-committers/2007-04/msg00233.php
>
> regards, tom lane
>
>  --
>  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>  To make changes to your subscription:
>  http://www.postgresql.org/mailpref/pgsql-general
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Unnecessary scan on a partial index slows down query dramatically

2008-04-25 Thread Jimmy Choi
Hello,

A simple query is executing much slower than expected. When looking at
the query plan, I see a bitmap index scan on a partial index that does
not have any associated index condition. How could that happen?

The query is:

select id from test_run_results where test_run_id = 12902 and status = 3

The query plan is:

"Bitmap Heap Scan on test_run_results  (cost=3240.97..3963.72
rows=2556 width=250)"
"  Recheck Cond: ((test_run_id = 12902) AND (status = 3))"
"  ->  BitmapAnd  (cost=3240.97..3240.97 rows=243 width=0)"
"->  Bitmap Index Scan on trr_same_status_in_run_index
(cost=0.00..24.33 rows=2556 width=0)"
"  Index Cond: ((test_run_id = 12902) AND (status = 3))"
"->  Bitmap Index Scan on
trr_same_failure_reason_for_owner_index  (cost=0.00..3216.39
rows=884694 width=0)"

My question refers to the last bitmap index scan which does not have
an associated index cond line.

trr_same_status_in_run_index is defined on (test_run_id, status)

and

trr_same_failure_reason_for_owner_index is a partial index defined on
(owner_id, failure_reason) where status = 3.

"vacuum analyze" doesn't solve the problem. I'm running 8.1.5.

As you can see, there really isn't much reason to use the partial index at all.

Remove "status = 3" from the query gets rid of the useless index scan
and makes the query much faster.

Thank,
Jimmy

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] "vacuum" and "cluster"

2008-04-16 Thread Jimmy Choi
Presumably, even if CLUSTER does reindexing internally, it only does
that for the index used for clustering. Since REINDEX includes all
indices, CLUSTER cannot truly replace REINDEX. Correct?

Jimmy

On Wed, Apr 16, 2008 at 12:06 PM, Alvaro Herrera
<[EMAIL PROTECTED]> wrote:
> Craig Ringer escribió:
>
>
>  > It's not stated explicitly, but I'm pretty sure discussion here has
>  > mentioned that too. Given that, VACUUM FULL on a just-CLUSTERed table
>  > should be redundant.
>
>  It is, and a REINDEX is redundant too because CLUSTER does it
>  internally.
>
>  --
>  Alvaro Herrerahttp://www.CommandPrompt.com/
>  PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] "vacuum" and "cluster"

2008-04-16 Thread Jimmy Choi
Hello,

Does running "cluster" remove the need to run "vacuum"? 

I get a feeling that since cluster is already physically reordering the
rows, it may as well remove the dead rows... no?

My second question is, if vacuum is still needed, does it matter whether
I run vacuum first or cluster first? 

Here's our current weekly db maintenance routine:

1. vacuum full
2. cluster
3. reindex
4. analyze

Thanks,
Jimmy Choi




Confidentiality Notice.  This message may contain information that is 
confidential or otherwise protected from disclosure.
If you are not the intended recipient, you are hereby notified that any use, 
disclosure, dissemination, distribution, 
or copying of this message, or any attachments, is strictly prohibited.  If you 
have received this message in error, 
please advise the sender by reply e-mail, and delete the message and any 
attachments.  Thank you.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] "vacuum" and "cluster"

2008-04-16 Thread Jimmy Choi
Hello,

Does running "cluster" remove the need to run "vacuum"?

I get a feeling that since cluster is already physically reordering
the rows, it may as well remove the dead rows... no?

My second question is, if vacuum is still needed, does it matter
whether I run vacuum first or cluster first?

Here's our current weekly db maintenance routine:

1. vacuum full
2. cluster
3. reindex
4. analyze

Thanks,
Jimmy

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Guideline on use of temporary tables

2007-10-12 Thread Jimmy Choi
I'm looking for general guideline on the use of temporary tables.

I would like to use temporary table as a caching mechanism to speed up
queries within the same session. Specifically, a temporary table is
created to store a subset of data from a possibly large table, and
subsequent queries select from the temporary table instead of
re-applying the same complex filters on the actual table again and
again.

Is this what temporary table is designed for? Are there caveats that I
should be aware of? Can you think of other better alternatives?

Thank you very much.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Unexpected behavior with CASE statement

2007-10-03 Thread Jimmy Choi
This will work for this particular example. But what if my case
statement is more complicated than that? Example:

select
   metric_type,
   case metric_type
  when 0 then
 sum (1 / val)
  when 1 then
 sum (val)
  when 2 then
 max (val)
  when 3 then
 min (val)
   end as result
from metrics
group by metric_type

Thanks!

On 10/3/07, Rodrigo De León <[EMAIL PROTECTED]> wrote:
> On 10/3/07, Jimmy Choi <[EMAIL PROTECTED]> wrote:
> > I expect to get the following result set:
> >
> > metric_type | result
> > +---
> > 0   |   2
> > 1   |   3
>
> Try:
>
> SELECT   metric_type
>, SUM(CASE metric_type
>WHEN 0
>  THEN 1 / val
>WHEN 1
>  THEN val
>  END) AS RESULT
> FROM metrics
> GROUP BY metric_type
> ORDER BY metric_type
>

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


[GENERAL] Unexpected behavior with CASE statement

2007-10-03 Thread Jimmy Choi
Suppose I have the following table named "metrics":

metric_type | val
+-
 0  | 1
 0  | 1
 1  | 0
 1  | 3

Now suppose I run the following simple query:

select
   metric_type,
   case metric_type
  when 0 then
 sum (1 / val)
  when 1 then
 sum (val)
   end as result
from metrics
group by metric_type

I expect to get the following result set:

metric_type | result
+---
0   |   2
1   |   3

But in reality I get the following error:

ERROR: division by zero
SQL state: 22012

So it appears that Postgres executes all cases and select the result
in the end. Is this expected behavior?

Thanks
- Jimmy

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] Unexpected behavior with CASE statement

2007-10-03 Thread Jimmy Choi
Suppose I have the following table named "metrics":

metric_type | val
+-
 0  | 1
 0  | 1
 1  | 0
 1  | 3

Now suppose I run the following simple query:

select 
   metric_type,
   case metric_type
  when 0 then
 sum (1 / val)
  when 1 then
 sum (val)
   end as result
from metrics
group by metric_type

I expect to get the following result set:

metric_type | result
+---
0   |   2
1   |   3

But in reality I get the following error:

ERROR: division by zero
SQL state: 22012

So it appears that Postgres executes all cases and select the result in
the end. Is this expected behavior? 

Thanks
- Jimmy


Confidentiality Notice.  This message may contain information that is 
confidential or otherwise protected from disclosure.
If you are not the intended recipient, you are hereby notified that any use, 
disclosure, dissemination, distribution, 
or copying of this message, or any attachments, is strictly prohibited.  If you 
have received this message in error, 
please advise the sender by reply e-mail, and delete the message and any 
attachments.  Thank you.




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Help tuning a large table off disk and into RAM

2007-09-26 Thread Jimmy Choi
Have you tried clustering tables based on the most-frequently used
indexes to improve locality?

http://www.postgresql.org/docs/8.2/static/sql-cluster.html

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bill Moran
Sent: Wednesday, September 26, 2007 11:24 AM
To: James Williams
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Help tuning a large table off disk and into RAM

In response to "James Williams" <[EMAIL PROTECTED]>:

> I'm stuck trying to tune a big-ish postgres db and wondering if anyone
> has any pointers.
> 
> I cannot get Postgres to make good use of plenty of available RAM and
> stop thrashing the disks.
> 
> One main table. ~30 million rows, 20 columns all integer, smallint or
> char(2).  Most have an index.  It's a table for holding webserver
> logs.  The main table is all foreign key ids.  Row size is ~100bytes.
> 
> The typical query is an aggregate over a large number of rows (~25%
say).
> 
>  SELECT COUNT(*), COUNT(DISTINCT user_id)
>  FROM table
>  WHERE epoch > ...
>  AND epoch < ...
>  AND country = ...
> 
> The box has 4 x Opterons, 4Gb RAM & five 15k rpm disks, RAID 5.  We
> wanted fast query/lookup.  We know we can get fast disk IO.
> 
> Running a typical query like above seems to:
> 
> * hardly tax a single CPU
> * plenty of RAM free
> * disks thrash about
> 
> The last is based mostly on the observation that another tiddly
> unrelated mysql db which normally runs fast, grinds to a halt when
> we're querying the postgres db (and cpu, memory appear to have spare
> capacity).
> 
> We've currently got these settings, and have tried doubling/halving
> them, restarted and benchmarked a test query.  They don't appear to
> materially alter our query time.
> 
>  shared_buffers  = 128MB

shared_buffers = 1.5GB

Unless you've got a lot of stuff other than PostgreSQL on this machine.

>  temp_buffers= 160MB
>  work_mem= 200MB
>  max_stack_depth = 7MB

These look reasonable, although I can't be sure without more details.

> 
> We're less concerned about insert speed.  Typically 1 or 2 users, but
> want fast queries.
> 
> Perhaps a little extreme, but I'm trying to find a way to express this
> in a way that Postgres understands:
> 
> * Load this table, and one or two indexes (epoch, user_id) into RAM.

Give it enough shared_buffers and it will do that.  You're estimating
the size of your table @ 3G (try a pg_relation_size() on it to get an
actual size)  If you really want to get _all_ of it in all the time,
you're probably going to need to add RAM to the machine.  With 8G, you
could allocate about 3G to shared_buffers, but that would be ignoring
the size of indexes.

However, I think you'll be surprised how much performance improves
with 1.5G of shared_buffers.  You may not need any more.  128M is
really forcing PG to work within limited space.

> * All of the table, all of those indexes.
> * Keep them there, but keep a disk based backup for integrity.
> * Run all selects against the in RAM copy.  Always.

This is what PG does if you allocate enough shared_buffers.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/



Confidentiality Notice.  This message may contain information that is 
confidential or otherwise protected from disclosure.
If you are not the intended recipient, you are hereby notified that any use, 
disclosure, dissemination, distribution, 
or copying of this message, or any attachments, is strictly prohibited.  If you 
have received this message in error, 
please advise the sender by reply e-mail, and delete the message and any 
attachments.  Thank you.




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] Short circuit evaluation of expressions in query

2006-06-13 Thread Jimmy Choi








Say I have the following parameterized query in a function:

 

select * from foo where ($1 = -1 or
foo.status = $1) and (…)

 

where the (…) part consists of more parameterized
conditions similar to the first one.

 

Suppose that at runtime, $1 is supplied a value of -1, does
the foo.status = $1 condition still have to be evaluated? 

 

Assuming the $1 = -1 case, should the above query be as fast
as the following?

 

select * from foo where (…)

 

Thanks
Jimmy

 

 








[GENERAL] Multiple-index optimization not working for = ANY operator

2006-02-15 Thread Jimmy Choi
Hello,

From Section 11.4 of the Postgres 8.1 documentation, a new optimization
is shipped in the latest release. 

"... a query like WHERE x = 42 OR x = 47 OR x = 53 OR x = 99 could be
broken down into four separate scans of an index on x, each scan using
one of the query clauses. The results of these scans are then ORed
together to produce the result."

While the feature works fine for cases such as:

  SELECT * FROM foo WHERE id IN (1, 2);

And

  SELECT * FROM foo WHERE id = 1 OR id = 2;

I find that it doesn't work (i.e. index is not used and a sequential
scan is needed), if I have the following instead:

  SELECT * FROM foo WHERE id = ANY (ARRAY[1,2]);

Is this expected? The reason I would like the last case to work is that
my plpgsql function takes as input an array of IDs, and so I cannot
write my query using the first two forms above.

Any idea on how I can get around this is greatly appreciated.

Thanks
Jimmy



---(end of broadcast)---
TIP 6: explain analyze is your friend