Proper way of contrusting scrapy start_requests()

Petar Pilipovic Wed, 10 Jun 2015 02:08:20 -0700

I want to scrape the TOP TEN RELEASES table from torrenting.com 
<https://www.torrenting.com>, and I have made a crawler for that purpose, 
but you first need to be logged in to the site. The initial data that I 
have scraped was basically nothing, so I started rebuilding mine 
`torrent_spider.py` for that purpose and because I am new to web scraping I 
am stuck whit this issue.


I am reading the Scrapy docs on this and I have found that  
`start_requests()` will help me connect to torrenting and start scraping 
for the table. 

Mine question is, can someone explain to me how do I return the 
`https://www.torrenting.com/browse.php` page after mine spider is logged 
in, so I can start scraping the wanted data.

This is `torrent_spider.py`:

  
  from scrapy import Spider
    from scrapy.selector import Selector
    
    
    class TorrentSpider(Spider):
        """ TorrentSpider who will Scrape the Top Then Relese Table. """
        name = "torrenting"
        allowed_domains = ["torrenting.com"]
        start_urls = [
            "https://www.torrenting.com/browse.php";,
        ]
    
        def start_request(self):
            return [scrapy.FormRequest(
"https://www.torrenting.com/login.php?returnto=Login";,
                                        formdata={'user': 'example', 'pass': 
'somepass'},
                                        callback = self.logged_in)
    
        def logged_in(self, response):
            pass
    
    
        def parse(self, response):
            pass







 

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Proper way of contrusting scrapy start_requests()

Reply via email to