[SQL] Seq Scans when index expected to be used
pgSql 7.4.0 Hi, Am trying to find duplicate values in a large table (about 80M records). Somehow, none of the two (2) queries (see below) is using the index "I_bca" that, I believe, should've been used. Any ideas? Thanks -- CREATE TABLE te.test ( id te.didlong NOT NULL, a te.dtimestamp, b te.didint NOT NULL, c te.didint NOT NULL, d te.dstring, ) WITHOUT OIDS; -- attempt to create AK_bca failed due to duplicate values -- alter table te.test -- add constraint AK_bca unique (b, c, a); create index I_bca on te.test ( b, c, a ); alter table te.test add constraint PK_id primary key (id); analyze te.test; -- -- first attempt to find duplicate values explain select b, c, a from test group by b, c, a having count(*) > 1 QUERY PLAN "GroupAggregate (cost=19644987.88..21026410.30 rows=78938424 width=16)" " Filter: (count(*) > 1)" " -> Sort (cost=19644987.88..19842333.94 rows=78938424 width=16)" "Sort Key: b, c, a" "-> Seq Scan on test (cost=0.00..1589706.24 rows=78938424 width=16)" -- -- second attempt to find duplicate values explain select DV1.b, DV1.c, DV1.a from test DV1, test DV2 where DV1.b = DV2.b and DV1.c = DV2.c and DV1.a = DV2.a and DV1.id <> DV2.id QUERY PLAN "Merge Join (cost=42373495.75..45309925.87 rows=95424260 width=16)" " Merge Cond: (("outer"."?column5?" = "inner"."?column5?") AND ("outer"."?column6?" = "inner"."?column6?") AND ("outer"."?column7?" = "inner"."?column7?"))" " Join Filter: (("outer".id)::bigint <> ("inner".id)::bigint)" " -> Sort (cost=21186747.88..21384093.94 rows=78938424 width=24)" "Sort Key: (dv1.a)::timestamp without time zone, (dv1.c)::integer, (dv1.b)::integer" "-> Seq Scan on test dv1 (cost=0.00..1589706.24 rows=78938424 width=24)" " -> Sort (cost=21186747.88..21384093.94 rows=78938424 width=24)" "Sort Key: (dv2.a)::timestamp without time zone, (dv2.c)::integer, (dv2.b)::integer" "-> Seq Scan on test dv2 (cost=0.00..1589706.24 rows=78938424 width=24)" __ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Seq Scans when index expected to be used
On Sat, 29 Nov 2003 08:49:24 -0800 (PST), ow wrote > explain select b, c, a > from test > group by b, c, a > having count(*) > 1 I'm not sure about 7.4 aggregate improvements, but <=7.3 didn't work good with aggregates at all. Maybe it's not directly an answer to your question, but try theses queries: select t1.b,t1.c,t1.a from test t1 join test t2 using (b,c,a) where t2.id<>t1.id group by t1.b,t1.c,t1.a or select a,b,c from test t1 where exists (select * from test t2 where t2.a=t1.a and t2.b=t1.b and t2.c=t1.c and t1.id<>t2.id) group by a,b,c Regards, Tomasz Myrta ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] Seq Scans when index expected to be used
ow <[EMAIL PROTECTED]> writes: > Am trying to find duplicate values in a large table (about 80M records). > Somehow, none of the two (2) queries (see below) is using the index "I_bca" > that, I believe, should've been used. Those plans are perfectly reasonable (particularly the GroupAggregate one). regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] Seq Scans when index expected to be used
jasiek wrote: On Sat, 29 Nov 2003 08:49:24 -0800 (PST), ow wrote explain select b, c, a from test group by b, c, a having count(*) > 1 Why would you expect this to use an index scan when it needs to read the entire table? If you read the whole table (or even a significant fraction of it), a seq scan is faster. Joe ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] Seq Scans when index expected to be used
On Sat, 29 Nov 2003, ow wrote: > Am trying to find duplicate values in a large table (about 80M records). > Somehow, none of the two (2) queries (see below) is using the index "I_bca" > that, I believe, should've been used. As a side note, to help determine if it should be used, you probably want explain analyze not just explain. In general the index is only going to help you if the sort is very expensive, not for the initial scan (where the seq scan is generally faster). ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Equivalent of Reverse() functions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 > I am migrating an SQL Server 2000 database to Postgres 7.3.4 running > on RH Linux 7.2. While migrating I encounter SQL Server REVERSE( ) > function, seems no such functions at Postgres. > > Is there a equivalent function available at Postgres? Please > shed some light No. If you have plperl installed, as somebody already posted, you can take advantage of perl's built in reverse() function. Here is a plpgsql version: CREATE OR REPLACE FUNCTION reverse(TEXT) RETURNS TEXT AS ' DECLARE original ALIAS FOR $1; reversed TEXT := \'\'; onechar VARCHAR; myposINTEGER; BEGIN SELECT LENGTH(original) INTO mypos; LOOP EXIT WHEN mypos < 1; SELECT substring(original FROM mypos FOR 1) INTO onechar; reversed := reversed || onechar; mypos := mypos -1; END LOOP; RETURN reversed; END ' LANGUAGE plpgsql IMMUTABLE RETURNS NULL ON NULL INPUT; pg=> SELECT reverse('A man, a plan, a canal, Panama'); reverse - amanap ,lanac a ,nalp a ,nam A - -- Greg Sabino Mullane [EMAIL PROTECTED] PGP Key: 0x14964AC8 200311291246 -BEGIN PGP SIGNATURE- iD8DBQE/yNwwvJuQZxSWSsgRAnTyAJ9TqV0D3pV4Cv2b0VZfb8TxuvgxKgCeNBN+ OoFWwoD3omlLw+MUxcWZkT0= =JtRf -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] Seq Scans when index expected to be used
--- Joe Conway <[EMAIL PROTECTED]> wrote: > > > >>explain select b, c, a > >>from test > >>group by b, c, a > >>having count(*) > 1 > > Why would you expect this to use an index scan when it needs to read the > entire table? If you read the whole table (or even a significant > fraction of it), a seq scan is faster. My impression was that the index "I_bca" covers the query, hence there should not be a need to go to the table itself. Why would it? P.S. explain analyze vs explain. Normally, would've used "explain analyze" but in this case it's taking way too long so I used "explain". Thanks __ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[SQL]
[SQL] Updating session id based on accesstimeout
Hi, I want to set a session id based on a session timeout per user. I have a very large logfile with more than 1.3 Mio records and look for a very efficient solution - maybe with PL/pgSQL . The important columns in the table are: access timestamp and userid sorted by userid and access time. The logic for the session id is: when ever the userid changes or the time between two following records is greater than 60 sec create a new session id and update the record with this id. I tried to solve this problem with Perl and the DBI interface - but it took about 12 hours to update all records. The fastes solution I found until now was not to use any data base, sort the data with the unix command sort and process the logfile text directly with perl. Thanks in advance! Andreas
Re: [SQL] Seq Scans when index expected to be used
ow wrote: My impression was that the index "I_bca" covers the query, hence there should not be a need to go to the table itself. Why would it? Postgres always has to go to the table. The ability to read data directly from indexes (ala other RDBMSs) has been discussed, but not implemented. IIRC it's a hard problem due to the way Postgres does MVCC. Check the archives. explain analyze vs explain. Normally, would've used "explain analyze" but in this case it's taking way too long so I used "explain". I can understand that, but most people will ask for explain analyze anyway ;-) Joe ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] Updating session id based on accesstimeout
On Sat, 29 Nov 2003, Andreas Heissenberger wrote: > Hi, > > I want to set a session id based on a session timeout per user. I have a very large > logfile with more than 1.3 Mio records and look for a very efficient solution - > maybe with PL/pgSQL . > > The important columns in the table are: access timestamp and userid sorted by userid > and access time. > The logic for the session id is: when ever the userid changes or the time between > two following records is greater than 60 sec create a new session id and update the > record with this id. > > I tried to solve this problem with Perl and the DBI interface - but it took about 12 > hours to update all records. > The fastes solution I found until now was not to use any data base, sort the data > with the unix command sort and process the logfile text directly with perl. You won't find a method faster than updating text files, until you stop brute forcing the data. Databases don't provide speed, at least in this case. They provide ACID compliance, and other features. Perhaps, if you revisit your assumptions, you might find that your need is solved by a different notion of the problem. -j -- Jamie Lawrence[EMAIL PROTECTED] "God created the integers, all else is the work of man." - Kronecker ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org