Re: It seems my rule does not work. May anyone help me ?

vivian Y Sat, 04 Oct 2014 16:33:53 -0700

And the log output is shown as follows:

2014-10-04 18:09:34-0500 [yhdspider] INFO: Dumping Scrapy stats:
>>>     {'downloader/request_bytes': 492,
>>>      'downloader/request_count': 2,
>>>      'downloader/request_method_count/GET': 2,
>>>      'downloader/response_bytes': 65464,
>>>      'downloader/response_count': 2,
>>>      'downloader/response_status_count/200': 2,
>>>      'finish_reason': 'finished',
>>>      'finish_time': datetime.datetime(2014, 10, 4, 23, 9, 34, 189848),
>>>      'item_scraped_count': 1,
>>>      'log_count/DEBUG': 5,
>>>      'log_count/INFO': 8,
>>>      'request_depth_max': 1,
>>>      'response_received_count': 2,
>>>      'scheduler/dequeued': 2,
>>>      'scheduler/dequeued/memory': 2,
>>>      'scheduler/enqueued': 2,
>>>      'scheduler/enqueued/memory': 2,
>>>      'start_time': datetime.datetime(2014, 10, 4, 23, 9, 32, 555975)}
>>> 2014-10-04 18:09:34-0500 [yhdspider] INFO: Spider closed (finished)
>>> 在此输入代码...
>>>
>>>
>>>
在 2014年10月4日星期六UTC-5下午6时01分21秒，vivian Y写道：
>
> Hello guys, 
>
> I am a newer to scrapy, and  my script can not get more requested url. Can 
> anyone help me ? thanks.
>
> My code is shown as follows:
>
> My test2.py file:   
>> from scrapy.contrib.spiders import CrawlSpider, Rule
>> from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
>> from try_yhd.items import TryYhdItem
>> from scrapy.selector import Selector
>>
>> class MySpider(CrawlSpider):
>>     name = "yhdspider"
>>     allowed_domains = ["yhd.com"]
>>     start_urls = ["http://item.yhd.com/item/30838751";,]   
>>     rules = [Rule(SgmlLinkExtractor(allow=['^http://item.yhd.com/item/\d+ 
>> <http://item.yhd.com/item/%5Cd+>',]),
>> callback="parse_items",follow = True),
>> ]
>>
>>     def parse_items(self,response):
>>         print "Hello this is the url %s" % response.url
>>         hxs = Selector(response)
>>        # items = []
>>        # find the price and product id.
>>         item = TryYhdItem()
>>         item['url'] = response.url
>>         item['price']= hxs.xpath("//span[@id='current_price']").extract()
>>         item['productId']= hxs.xpath("//p[@class='product_id']/text()").
>> extract()
>>         item['title'] = hxs.xpath("//h1[@id = 'productMainName']").
>> extract()
>>         yield item
>>
>>
>> my middlewares.py file: 
>>
>
> SPIDER_MIDDLEWARES = {
>    'try_yhd.middlewares.CustomSpiderMiddleware': 543,              
> #'scrapy.contrib.spidermiddleware.offsite.OffsiteMiddleware': 
> None,
> }
>    
>              my items.py file
> # -*- coding: utf-8 -*-
>
> # Define here the models for your scraped items
> #
> # See documentation in:
> # http://doc.scrapy.org/en/latest/topics/items.html
> from scrapy.item import Item, Field
> import scrapy
>
> class TryYhdItem(Item):
>     # define the fields for your item here like:
>     price = Field()
>     productId = Field()
>     url = Field()
>     title = Field()
>
>
>
>
>


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: It seems my rule does not work. May anyone help me ?

Reply via email to