i have the following tables all threads adds jobs to the que and when
one thread processes the url it is added to the fetched table i want
to select the first record from the que which contains the given host
and is not in the fetched table i use the following query to get the
result but as tables get bigger this query takes 50 60 seconds to
complete sometimes is there a way to reduce this time?
select id,url,anchor,pid from spider.que where id =
(SELECT min(id) id FROM spider.Que
where url like 'host %' and id >= 5
and url not in (select url from spider.fetched)
);
CREATE TABLE `Que` (
`id` int(11) NOT NULL auto_increment,
`url` text NOT NULL,
`anchor` text,
`pid` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `newindex` (`url`(250))
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='index of urls for
future sessions';
CREATE TABLE `fetched` (
`id` int(11) default NULL,
`url` text NOT NULL,
`anchor` text,
`content` text,
`pid` int(11) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Nurullah Akkaya What lies behind us
and what
[EMAIL PROTECTED] lies before us are tiny matters
Registered Linux User #301438 compared to what lies within us.
WARNING all messages "If at first an idea is not
containing attachments absurd, there is no hope for it"
or html will be silently Albert Einstein
deleted. Send only
plain text.
Because the people who are crazy enough to think
they can change the world, are the ones who do.....