Re: [PERFORM] Why does a simple query not use an obvious index?
Another primary key trick : If you insert records with a serial primary key, and rarely delete them or update the timestamp, you can use the primary key to compute an approximate number of rows. a := SELECT pkey FROM table WHERE timestamp() threshold ORDER BY timestamp ASC LIMIT 1; b := SELECT pkey FROM table WHERE ORDER BY pkey DESC LIMIT 1; (b-a) is an approximate count. Performance is great because you only fetch two rows. Index scan is guaranteed (LIMIT 1). On the downside, you get an approximation, and this only works for tables where timestamp is a date of INSERT, timestamp worrelated wiht pkey) not when timestamp is a date of UPDATE (uncorrelated with pkey). ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] seqscan instead of index scan
create index t_idx on t((c+d)); select * from t where c+d 0; Why not : select ((select * from t where c0::bigint) UNION (select * from t where d0::bigint)) group by whatever; or someting ? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] seqscan instead of index scan
On Mon, 30 Aug 2004, Martin Sarsale wrote: On Mon, 2004-08-30 at 15:06, Merlin Moncure wrote: create function is_somethingable (ctype, dtype) returns boolean as Thanks, but I would prefer a simpler solution. I would like to know why this uses a seqscan instead of an index scan: create index t_idx on t((c+d)); select * from t where c+d 0; As a geuss, since 7.4 and earlier have no statistics on the distribution of c+d it has to guess about how likely that is to be true and is probably overestimating. 8.0beta might handle this better. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Why does a simple query not use an obvious index?
[I'm actually responding to the previous post from Tom Lane, but I've deleted it and the archives seem to be down again.] The assumption being made is that the first provided result is representative of all future results. I don't see any reason that making this assumption of all stable functions should be less scary than making the assumption about user provided parameters. However I have the complementary reaction. I find peeking at the first bind parameter to be scary as hell. Functions seem slightly less scary. On Oracle Peeking at bind parameters is a feature explicitly intended for DSS data warehouse type systems. The use of placeholders there was purely for security and programming ease, not efficiency, since the queries are only planned executed a small number of times per plan. These are systems that suffered enormously without the parameter values. They often involved full table scans or bitmap index scans and without the statistics produced awful plans. For OLTP systems peeking at placeholders is more a danger than a benefit. The query will be executed thousands of times and if it's planned based on a single unusual value initially the entire system could fail. Consider the following scenario which isn't farfetched at all. In fact I think it well describes my current project: I have a table with a few million records. 99% of the time users are working with only a few hundred records at most. There's an index on the column they're keying off of. 1% of the key values have an unusually large number of records. Without peeking at placeholders the system should see that virtually all the key values are well under the threshold for an index scan to be best. So it always uses an index scan. 1% of the time it takes longer than that it would have with a sequential scan, but only by a small factor. (On the whole we're probably still better off avoiding the cache pollution anyways.) With peeking at placeholders 99% of the backends would perform the same way. However 1 backend in 100 sees one of these unusual values for its first query. This backend will use a sequential scan for *every* request. Executing a sequential table scan of this big table once a second this backend will drive the entire system into the ground. This means every time I start the system up I stand a small but significant chance of it just completely failing to perform properly. Worse, apache is designed to periodically start new processes, so at any given time the system could just randomly fall over and die. I would rather incur a 10% penalty on every query than have a 1% chance of it keeling over and dieing. Given this I would when I upgrade to 8.0 have to ensure that my application driver is either not using placeholders at all (at the protocol level -- I always prefer them at the api level) or ensure that postgres is *not* peeking at the value. I like the feature but I just want to be sure that it's optional. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] seqscan instead of index scan
Martin Sarsale [EMAIL PROTECTED] writes: I indexed columns c and d (separately) but this query used the slow seqscan instead of the index scan: select * from t where c0 or d0; After playing some time, I noticed that if I change the or for an and, pg used the fast index scan (but the query w/ 'and' was not what I as looking for). I don't think so. is not an indexable operator --- it appears nowhere in the index operator classes. It would help if you showed us *exactly* what you did instead of a not-very-accurate filtered version. I'm going to assume that you meant ... Now: I've no idea why it chooses to use a seq scan instead of the index scan (yes, I've just vacuum analyzed the table before running the query). Because 7.4 doesn't have statistics about expression indexes, so it has no idea that there are only a few rows with c+d 0. What I'd suggest is select * from t where c0 union select * from t where d0 with separate indexes on c and d. Another possibility is a partial index on exactly the condition you want: create index nonzero on t(c) where c0 or d0; although I'm not certain if 7.4 has enough stats to recognize this as a win. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Why does a simple query not use an obvious index?
Greg Stark [EMAIL PROTECTED] writes: However I have the complementary reaction. I find peeking at the first bind parameter to be scary as hell. Functions seem slightly less scary. FWIW, we only do it in the context of unnamed parameterized queries. As the protocol docs say, those are optimized on the assumption that they will be executed only once. It seems entirely legitimate to me to use the parameter values in such a case. We might in future get braver about using sample parameter values, but 8.0 is conservative about it. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster