Dear all,

I have 2 tables in my database name clause2( 4900 MB) & page_content(1582 MB).

My table definations are as :

*page_content :-

*CREATE TABLE page_content
(
 content_id integer,
 wkb_geometry geometry,
 link_level integer,
 isprocessable integer,
 isvalid integer,
 isanalyzed integer,
 islocked integer,
 content_language character(10),
 url_id integer,
 publishing_date character(40),
 heading character(150),
 category character(150),
 crawled_page_url character(500),
 keywords character(500),
 dt_stamp timestamp with time zone,
 "content" character varying,
 crawled_page_id bigint,
 id integer
)
WITH (
 OIDS=FALSE
);

*Indexes on it :-*
CREATE INDEX idx_page_id  ON page_content  USING btree  (crawled_page_id);
CREATE INDEX idx_page_id_content ON page_content USING btree (crawled_page_id, content_language, publishing_date, isprocessable); CREATE INDEX pgweb_idx ON page_content USING gin (to_tsvector('english'::regconfig, content::text));

*clause 2:-
*CREATE TABLE clause2
(
 id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
 source_id integer,
 sentence_id integer,
 clause_id integer,
 tense character varying(30),
 clause text,
 CONSTRAINT pk_clause_demo_id PRIMARY KEY (id)
)WITH ( OIDS=FALSE);

*Indexes on it :

*CREATE INDEX idx_clause2_march10
 ON clause2
 USING btree
 (id, source_id);*

*I perform a join query on it as :

* explain analyze select distinct(p.crawled_page_id) from page_content p , clause2 c where p.crawled_page_id != c.source_id ;

*What it takes more than 1 hour to complete. As I issue the explain analyze command and cannot able to wait for output but I send my explain output as : QUERY PLAN --------------------------------------------------------------------------------------------------------
Unique  (cost=927576.16..395122387390.13 rows=382659 width=8)
-> Nested Loop (cost=927576.16..360949839832.15 rows=13669019023195 width=8)
        Join Filter: (p.crawled_page_id <> c.source_id)
-> Index Scan using idx_page_id on page_content p (cost=0.00..174214.02 rows=428817 width=8)
        ->  Materialize  (cost=927576.16..1370855.12 rows=31876196 width=4)
-> Seq Scan on clause2 c (cost=0.00..771182.96 rows=31876196 width=4)
(6 rows)


Please guide me how to make the above query run faster as I am not able to do that.


Thanks, Adarsh

*


*


Reply via email to