[SQL] Seq Scans when index expected to be used

2003-11-29 Thread ow
pgSql 7.4.0

Hi,

Am trying to find duplicate values in a large table (about 80M records).
Somehow, none of the two (2) queries (see below) is using the index "I_bca"
that, I believe, should've been used.

Any ideas? Thanks

--

CREATE TABLE te.test
(
  id te.didlong NOT NULL,
  a te.dtimestamp,
  b te.didint NOT NULL,
  c te.didint NOT NULL,
  d te.dstring,
) WITHOUT OIDS;

-- attempt to create AK_bca failed due to duplicate values
-- alter table te.test
--   add constraint AK_bca unique (b, c, a);

create index I_bca on te.test (
b, c, a
);

alter table te.test
   add constraint PK_id primary key (id);

analyze te.test;

--
--  first attempt to find duplicate values
explain select b, c, a
from test
group by b, c, a
having count(*) > 1

QUERY PLAN
"GroupAggregate  (cost=19644987.88..21026410.30 rows=78938424 width=16)"
"  Filter: (count(*) > 1)"
"  ->  Sort  (cost=19644987.88..19842333.94 rows=78938424 width=16)"
"Sort Key: b, c, a"
"->  Seq Scan on test  (cost=0.00..1589706.24 rows=78938424 width=16)"


--
--  second attempt to find duplicate values
explain select DV1.b, DV1.c, DV1.a
from test DV1, test DV2
where DV1.b = DV2.b
  and DV1.c = DV2.c
  and DV1.a = DV2.a
  and DV1.id <> DV2.id

QUERY PLAN
"Merge Join  (cost=42373495.75..45309925.87 rows=95424260 width=16)"
"  Merge Cond: (("outer"."?column5?" = "inner"."?column5?") AND
("outer"."?column6?" = "inner"."?column6?") AND ("outer"."?column7?" =
"inner"."?column7?"))"
"  Join Filter: (("outer".id)::bigint <> ("inner".id)::bigint)"
"  ->  Sort  (cost=21186747.88..21384093.94 rows=78938424 width=24)"
"Sort Key: (dv1.a)::timestamp without time zone, (dv1.c)::integer,
(dv1.b)::integer"
"->  Seq Scan on test dv1  (cost=0.00..1589706.24 rows=78938424
width=24)"
"  ->  Sort  (cost=21186747.88..21384093.94 rows=78938424 width=24)"
"Sort Key: (dv2.a)::timestamp without time zone, (dv2.c)::integer,
(dv2.b)::integer"
"->  Seq Scan on test dv2  (cost=0.00..1589706.24 rows=78938424
width=24)"






__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Seq Scans when index expected to be used

2003-11-29 Thread jasiek
On Sat, 29 Nov 2003 08:49:24 -0800 (PST), ow wrote
> explain select b, c, a
> from test
> group by b, c, a
> having count(*) > 1

I'm not sure about 7.4 aggregate improvements, but <=7.3 didn't work good 
with aggregates at all. Maybe it's not directly an answer to your question, 
but try theses queries:

select t1.b,t1.c,t1.a from 
 test t1 
 join test t2 using (b,c,a)
where t2.id<>t1.id
group by t1.b,t1.c,t1.a

or

select a,b,c from test t1
where exists (select * from test t2 where t2.a=t1.a and t2.b=t1.b and 
t2.c=t1.c and t1.id<>t2.id)
group by a,b,c

Regards,
Tomasz Myrta

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [SQL] Seq Scans when index expected to be used

2003-11-29 Thread Tom Lane
ow <[EMAIL PROTECTED]> writes:
> Am trying to find duplicate values in a large table (about 80M records).
> Somehow, none of the two (2) queries (see below) is using the index "I_bca"
> that, I believe, should've been used.

Those plans are perfectly reasonable (particularly the GroupAggregate one).

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] Seq Scans when index expected to be used

2003-11-29 Thread Joe Conway
jasiek wrote:
On Sat, 29 Nov 2003 08:49:24 -0800 (PST), ow wrote

explain select b, c, a
from test
group by b, c, a
having count(*) > 1
Why would you expect this to use an index scan when it needs to read the 
entire table? If you read the whole table (or even a significant 
fraction of it), a seq scan is faster.

Joe



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] Seq Scans when index expected to be used

2003-11-29 Thread Stephan Szabo

On Sat, 29 Nov 2003, ow wrote:

> Am trying to find duplicate values in a large table (about 80M records).
> Somehow, none of the two (2) queries (see below) is using the index "I_bca"
> that, I believe, should've been used.

As a side note, to help determine if it should be used, you probably want
explain analyze not just explain. In general the index is only going to
help you if the sort is very expensive, not for the initial scan (where
the seq scan is generally faster).

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Equivalent of Reverse() functions

2003-11-29 Thread greg

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
 
> I am migrating an SQL Server 2000 database to Postgres 7.3.4 running
> on RH Linux 7.2. While migrating I encounter SQL Server REVERSE( )
> function, seems no such functions at Postgres.
>
> Is there a equivalent function available at Postgres? Please
> shed some light
 
No. If you have plperl installed, as somebody already posted, you can
take advantage of perl's built in reverse() function. Here is a
plpgsql version:
  
CREATE OR REPLACE FUNCTION reverse(TEXT) RETURNS TEXT AS '
  DECLARE
original ALIAS FOR $1;
reversed TEXT := \'\';
onechar  VARCHAR;
myposINTEGER;
  BEGIN
SELECT LENGTH(original) INTO mypos;
LOOP
  EXIT WHEN mypos < 1;
  SELECT substring(original FROM mypos FOR 1) INTO onechar;
  reversed := reversed || onechar;
  mypos := mypos -1;
END LOOP;
RETURN reversed;
  END
' LANGUAGE plpgsql IMMUTABLE RETURNS NULL ON NULL INPUT;
  
pg=> SELECT reverse('A man, a plan, a canal, Panama');
 
reverse
- 
 amanap ,lanac a ,nalp a ,nam A
 
 
- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200311291246
-BEGIN PGP SIGNATURE-
 
iD8DBQE/yNwwvJuQZxSWSsgRAnTyAJ9TqV0D3pV4Cv2b0VZfb8TxuvgxKgCeNBN+
OoFWwoD3omlLw+MUxcWZkT0=
=JtRf
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [SQL] Seq Scans when index expected to be used

2003-11-29 Thread ow

--- Joe Conway <[EMAIL PROTECTED]> wrote:
> > 
> >>explain select b, c, a
> >>from test
> >>group by b, c, a
> >>having count(*) > 1
> 
> Why would you expect this to use an index scan when it needs to read the 
> entire table? If you read the whole table (or even a significant 
> fraction of it), a seq scan is faster.

My impression was that the index "I_bca" covers the query, hence there should
not be a need to go to the table itself. Why would it?

P.S.
explain analyze vs explain. Normally, would've used "explain analyze" but in
this case it's taking way too long so I used "explain".

Thanks





__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[SQL]

2003-11-29 Thread Andreas Heissenberger



 


[SQL] Updating session id based on accesstimeout

2003-11-29 Thread Andreas Heissenberger



Hi,
 
I want to set a session id based on a session 
timeout per user. I have a very large logfile with more than 1.3 Mio records and 
look for a very efficient solution - maybe with PL/pgSQL 
.
 
The important columns in the table are: access 
timestamp and userid sorted by userid and 
access time.
The logic for the session id is: when ever the 
userid changes or the time between two following records is greater than 60 sec 
create a new session id and update the record with this id.
 
I tried to solve this problem with Perl and the DBI 
interface - but it took about 12 hours to update all records.
The fastes solution I found until now was not to 
use any data base, sort the data with the unix command sort and process the 
logfile text directly with perl.
 
Thanks in advance!
 
Andreas
 


Re: [SQL] Seq Scans when index expected to be used

2003-11-29 Thread Joe Conway
ow wrote:
My impression was that the index "I_bca" covers the query, hence there should
not be a need to go to the table itself. Why would it?
Postgres always has to go to the table. The ability to read data 
directly from indexes (ala other RDBMSs) has been discussed, but not 
implemented. IIRC it's a hard problem due to the way Postgres does MVCC. 
Check the archives.

explain analyze vs explain. Normally, would've used "explain analyze" but in
this case it's taking way too long so I used "explain".
I can understand that, but most people will ask for explain analyze 
anyway ;-)

Joe

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] Updating session id based on accesstimeout

2003-11-29 Thread Jamie Lawrence
On Sat, 29 Nov 2003, Andreas Heissenberger wrote:

> Hi,
> 
> I want to set a session id based on a session timeout per user. I have a very large 
> logfile with more than 1.3 Mio records and look for a very efficient solution - 
> maybe with PL/pgSQL .
> 
> The important columns in the table are: access timestamp and userid sorted by userid 
> and access time.
> The logic for the session id is: when ever the userid changes or the time between 
> two following records is greater than 60 sec create a new session id and update the 
> record with this id.
> 
> I tried to solve this problem with Perl and the DBI interface - but it took about 12 
> hours to update all records.
> The fastes solution I found until now was not to use any data base, sort the data 
> with the unix command sort and process the logfile text directly with perl.

You won't find a method faster than updating text files, until you stop
brute forcing the data.

Databases don't provide speed, at least in this case. They provide 
ACID compliance, and other features.

Perhaps, if you revisit your assumptions, you might find that your need
is solved by a different notion of the problem.

-j



-- 
Jamie Lawrence[EMAIL PROTECTED]
"God created the integers, all else is the work of man."
   - Kronecker 



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org