I want to scrape the TOP TEN RELEASES table from torrenting.com <https://www.torrenting.com>, and I have made a crawler for that purpose, but you first need to be logged in to the site. The initial data that I have scraped was basically nothing, so I started rebuilding mine `torrent_spider.py` for that purpose and because I am new to web scraping I am stuck whit this issue.
I am reading the Scrapy docs on this and I have found that `start_requests()` will help me connect to torrenting and start scraping for the table. Mine question is, can someone explain to me how do I return the `https://www.torrenting.com/browse.php` page after mine spider is logged in, so I can start scraping the wanted data. This is `torrent_spider.py`: from scrapy import Spider from scrapy.selector import Selector class TorrentSpider(Spider): """ TorrentSpider who will Scrape the Top Then Relese Table. """ name = "torrenting" allowed_domains = ["torrenting.com"] start_urls = [ "https://www.torrenting.com/browse.php", ] def start_request(self): return [scrapy.FormRequest( "https://www.torrenting.com/login.php?returnto=Login", formdata={'user': 'example', 'pass': 'somepass'}, callback = self.logged_in) def logged_in(self, response): pass def parse(self, response): pass -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
