You don't need to do anything, just don't execute the rest of the function. I think that if you add the 'return' keyword in place of your comment it should do what you want.
Another approach is to first check if the regex matches, and if so then parse the page. Catching and ignoring all exceptions is usually considered a bad idea as it can hide other bugs. On 1 July 2014 06:52, Duy Nguyen <[email protected]> wrote: > Hi guys, > > I have a spider which crawl thousands of post. The requirement is the post > must have a contact email. If spider detects no valid email within the > post, spider should discard the page and move to next in-queued page. > > here is the code > > # this is the individual ad page > def parse_an_ad(self, response): > reply = re.search("\/reply\/.+/\d+", response.body) > try: > link = urlparse.urljoin(response.url, reply.group()) > except: > #what to do here to tell spider discard the current page and > move onto next in-queued page > > hxs = Selector(response) > post_title = hxs.xpath('//h2/text()').extract()[1].strip() > description_list = > hxs.xpath('//section[@id=\'postingbody\']/text()').extract() > description = ''.join(description_list).strip() > > yield Request(link, callback=self.parse_reply_page, > meta={'post_title':post_title,'link':response.url, > 'description':description}) > > > I tried > > * raise CloseSpider("no contact info is found") - this will kill the > spider which I dont want to > > * raise IgnoreRequest() - this give me Spider error processing because > IgnoreRequest should be call within scheduler or middleware > > What do I do ? > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
