Hello,
I have a method that drills down into menus, and when it gets to an index
page, passes that page to a method that gets the products from the index.
Problem is, sometimes I get to the last menu before index, and sometimes I
get the actual index page. I have tackled the issue this way:
def drill_down(self, response):
hxs=HtmlXPathSelector(response)
xpath =
".//dl[@id='categories_menu']//dt[contains(text(),'Category')]/following-sibling::dd[count(preceding-sibling::dt)
= 1]//a/@href"
if hxs.select(xpath):
for submenu in hxs.select(xpath).extract():
self.log('dl index page: %s' % self.get_abs_url(submenu),
level=log.DEBUG)
yield Request(url=self.get_abs_url(submenu),
callback=self.get_products_from_index)
else:
self.log('reg index page: %s' % response.url, level=log.DEBUG )
yield self.get_products_from_index(response) # ERROR: Spider must
return Request, BaseItem or None, got 'generator'
#return self.get_products_from_index(response) # SyntaxError:
'return' with argument inside generator
#yield Request(url=response.url,
callback=self.get_products_from_index) # gets filtered as duplicate request
Anyone know the best way to deal with this?
Thanks,
Bill
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.