CONCURRENT_REQUESTS=1
isn't good. Your code looks good in principle. This
<https://github.com/scrapy/scrapy/blob/ebef6d7c6dd8922210db8a4a44f48fe27ee0cd16/scrapy/spiders/crawl.py#L42>
is the default CrawlSpider parse() method that handles e.g.
your FormRequest.from_response(). From that response onwards your responses
get handled by the Rules you set. Before that they don't. Since you have my
e-mail - give me access to the code to have a run/look and we can update
this thread.
On Thursday, May 26, 2016 at 11:32:45 AM UTC+1, Massimo Canonico wrote:
>
> Hi all,
>
> I've obtained some progress with my problem by setting
> "CONCURRENT_REQUESTS=1" in , but I would like to:
>
> - run authentication request first and
>
> - then run concurrently requests for pages to scrapy
>
> Could you help me, please?
>
> M
>
>
> On 19/05/16 16:59, Massimo Canonico wrote:
> > Hi all,
> >
> > I'm using scrapy for a site with bulletin board (phpbb) and I would
> > like to start "scraping" the pages ONLY after the autentication went
> > good.
> >
> > In my code, the authentication is done inside star_request method:
> >
> > def start_requests(self):
> > self.log("start_requests called")
> >
> > return [
> > Request(
> >
> > "http://<mysite>/phpBB3",
> > callback=self.parse_welcome,
> > priority=100
> > )
> > ]
> >
> >
> > def parse_welcome(self, response):
> > self.log("parse_welcome called")
> >
> > request = FormRequest.from_response(
> > response,
> > formnumber=1,
> >
> > formdata={"username": "rightusername", "password":
> > "rightpassword"}
> > )
> >
> > return request
> >
> > rules = (
> > Rule(LinkExtractor(),callback = 'parse_standard',follow=True),
> >
> > )
> >
> > [cut]
> >
> >
> > From the output that I got, it seems that some pages are scraped
> > without the autentication first.
> >
> > Am I wrong/missing something? Am I using "priority" in the right way?
> >
> > Thanks,
> >
> > Massimo
> >
>
>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.