Re: [PERFORM] Why does a simple query not use an obvious index?

2004-08-30 Thread Pierre-Frdric Caillaud
Another primary key trick :
	If you insert records with a serial primary key, and rarely delete them  
or update the timestamp, you can use the primary key to compute an  
approximate number of rows.

	a := SELECT pkey FROM table WHERE timestamp()  threshold ORDER BY  
timestamp ASC LIMIT 1;
	b := SELECT pkey FROM table WHERE ORDER BY pkey DESC LIMIT 1;

(b-a) is an approximate count.
	Performance is great because you only fetch two rows. Index scan is  
guaranteed (LIMIT 1). On the downside, you get an approximation, and this  
only works for tables where timestamp is a date of INSERT, timestamp  
worrelated wiht pkey) not when timestamp is a date of UPDATE (uncorrelated  
with pkey).

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] seqscan instead of index scan

2004-08-30 Thread Pierre-Frdric Caillaud

create index t_idx on t((c+d));
select * from t where c+d  0;
Why not :
	select ((select * from t where c0::bigint) UNION (select * from t where  
d0::bigint))
	group by whatever;

or someting ?
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] seqscan instead of index scan

2004-08-30 Thread Stephan Szabo
On Mon, 30 Aug 2004, Martin Sarsale wrote:

 On Mon, 2004-08-30 at 15:06, Merlin Moncure wrote:
  create function is_somethingable (ctype, dtype) returns boolean as

 Thanks, but I would prefer a simpler solution.

 I would like to know why this uses a seqscan instead of an index scan:

 create index t_idx on t((c+d));
 select * from t where c+d  0;

As a geuss, since 7.4 and earlier have no statistics on the distribution
of c+d it has to guess about how likely that is to be true and is probably
overestimating. 8.0beta might handle this better.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Why does a simple query not use an obvious index?

2004-08-30 Thread Greg Stark


[I'm actually responding to the previous post from Tom Lane, but I've deleted
it and the archives seem to be down again.]


The assumption being made is that the first provided result is representative
of all future results. I don't see any reason that making this assumption of
all stable functions should be less scary than making the assumption about
user provided parameters.

However I have the complementary reaction. I find peeking at the first
bind parameter to be scary as hell. Functions seem slightly less scary.

On Oracle Peeking at bind parameters is a feature explicitly intended for DSS
data warehouse type systems. The use of placeholders there was purely for
security and programming ease, not efficiency, since the queries are only
planned executed a small number of times per plan. These are systems that
suffered enormously without the parameter values. They often involved full
table scans or bitmap index scans and without the statistics produced awful
plans.

For OLTP systems peeking at placeholders is more a danger than a benefit. The
query will be executed thousands of times and if it's planned based on a
single unusual value initially the entire system could fail.

Consider the following scenario which isn't farfetched at all. In fact I think
it well describes my current project:

I have a table with a few million records. 99% of the time users are working
with only a few hundred records at most. There's an index on the column
they're keying off of. 1% of the key values have an unusually large number of
records.

Without peeking at placeholders the system should see that virtually all the
key values are well under the threshold for an index scan to be best. So it
always uses an index scan. 1% of the time it takes longer than that it would
have with a sequential scan, but only by a small factor. (On the whole we're
probably still better off avoiding the cache pollution anyways.)

With peeking at placeholders 99% of the backends would perform the same way.
However 1 backend in 100 sees one of these unusual values for its first query.
This backend will use a sequential scan for *every* request. Executing a
sequential table scan of this big table once a second this backend will drive
the entire system into the ground.

This means every time I start the system up I stand a small but significant
chance of it just completely failing to perform properly. Worse, apache is
designed to periodically start new processes, so at any given time the system
could just randomly fall over and die.

I would rather incur a 10% penalty on every query than have a 1% chance of it
keeling over and dieing. Given this I would when I upgrade to 8.0 have to
ensure that my application driver is either not using placeholders at all (at
the protocol level -- I always prefer them at the api level) or ensure that
postgres is *not* peeking at the value.

I like the feature but I just want to be sure that it's optional.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] seqscan instead of index scan

2004-08-30 Thread Tom Lane
Martin Sarsale [EMAIL PROTECTED] writes:
 I indexed columns c and d (separately) but this query used the slow
 seqscan instead of the index scan:

 select * from t where c0 or d0;

 After playing some time, I noticed that if I change the or for an
 and, pg used the fast index scan (but the query w/ 'and' was not what
 I as looking for).

I don't think so.   is not an indexable operator --- it appears
nowhere in the index operator classes.  It would help if you showed us
*exactly* what you did instead of a not-very-accurate filtered version.
I'm going to assume that you meant  ...

 Now: I've no idea why it chooses to use a seq scan instead of the index
 scan (yes, I've just vacuum analyzed the table before running the
 query).

Because 7.4 doesn't have statistics about expression indexes, so it has
no idea that there are only a few rows with c+d  0.

What I'd suggest is

select * from t where c0 union select * from t where d0

with separate indexes on c and d.

Another possibility is a partial index on exactly the condition you
want:

create index nonzero on t(c) where c0 or d0;

although I'm not certain if 7.4 has enough stats to recognize this as a win.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Why does a simple query not use an obvious index?

2004-08-30 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 However I have the complementary reaction. I find peeking at the first
 bind parameter to be scary as hell. Functions seem slightly less scary.

FWIW, we only do it in the context of unnamed parameterized queries.
As the protocol docs say, those are optimized on the assumption that
they will be executed only once.  It seems entirely legitimate to me 
to use the parameter values in such a case.

We might in future get braver about using sample parameter values,
but 8.0 is conservative about it.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster