Re: How to use scrapy crawl content pagination and merge it?

Dimitris Kouzis - Loukas Wed, 03 Feb 2016 03:27:41 -0800


   - Extract the body text from that page in a variable e.g. page_info.
   - Do a response.css(".z_list_page").xpath(".//a/@href") to extract 
   pagination links. Select the one for your next page. 
   - For all but the last page return a Request to the next page and set 
   its request.meta['page_info'] = page_info (see 
   
http://doc.scrapy.org/en/latest/topics/request-response.html?highlight=meta#scrapy.http.Request.meta)
   - Then you will be able to receive that from the next page in 
   response.meta (see 
   
http://doc.scrapy.org/en/latest/topics/request-response.html?highlight=meta#scrapy.http.Response.meta)
 
   and merge consequent documents.
   - In the last page return the merged document.
   



On Sunday, January 17, 2016 at 8:16:03 PM UTC, Steven Almeroth wrote:
>
> What have you tried so far?
>
> On Sunday, December 27, 2015 at 6:25:25 AM UTC-7, xiao liu wrote:
>>
>> In order to increase the content of some of the site's user experience 
>> and increase lift pv, the length of the article content pagination. As this 
>> article: http: //www.cs.com.cn/xwzx/hg/201409/t20140924_4521344.html, 
>> how to use the other two tab content scrapy Crawl and merge a complete 
>> article it?
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: How to use scrapy crawl content pagination and merge it?

Reply via email to