Note that if I just run

scrapy crawl govcrawl_main

hard-coding all DomainSpider's attribute in the class, I get no "Filtered 
offsite request", which is what usually happens in these questions because 
the "allow" parameter is not correctly set. That appears not to be the case 
here, see log:

2014-11-04 14:48:45-0500 [scrapy] INFO: Scrapy 0.24.4 started (bot: govcrawl
)
2014-11-04 14:48:45-0500 [scrapy] INFO: Optional features available: ssl, 
http11, boto, django
2014-11-04 14:48:45-0500 [scrapy] INFO: Overridden settings: {
'NEWSPIDER_MODULE': 'govcrawl.spiders', 'DEPTH_LIMIT': 3, 'SPIDER_MODULES': 
['govcrawl.spiders'], 'BOT_NAME': 'govcrawl', 'DOWNLOAD_TIMEOUT': 60, 
'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:30.0) 
Gecko/20100101 Firefox/30.0', 'DOWNLOAD_DELAY': 1.5}
2014-11-04 14:48:45-0500 [scrapy] INFO: Enabled extensions: LogStats, 
TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-11-04 14:48:46-0500 [scrapy] INFO: Enabled downloader middlewares: 
HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, 
RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, 
HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, 
ChunkedTransferMiddleware, DownloaderStats
2014-11-04 14:48:46-0500 [scrapy] INFO: Enabled spider middlewares: 
HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, 
UrlLengthMiddleware, DepthMiddleware
2014-11-04 14:48:46-0500 [scrapy] INFO: Enabled item pipelines: 
DomainPipeline
2014-11-04 14:48:46-0500 [govcrawl_main] INFO: Spider opened
2014-11-04 14:48:46-0500 [govcrawl_main] INFO: Crawled 0 pages (at 0 pages/
min), scraped 0 items (at 0 items/min)
2014-11-04 14:48:46-0500 [scrapy] DEBUG: Telnet console listening on 127.0.
0.1:6023
2014-11-04 14:48:46-0500 [scrapy] DEBUG: Web service listening on 127.0.0.1:
6080
2014-11-04 14:48:46-0500 [govcrawl_main] DEBUG: Crawled (200) <GET 
http://www.mass.gov/eea/agencies/dfg/der/> 
(referer: None)
2014-11-04 14:48:46-0500 [govcrawl_main] INFO: URL: 
http://www.mass.gov/eea/agencies/dfg/der/ 
(0) Crawled 1 pages. To Crawl: 0
2014-11-04 14:48:46-0500 [govcrawl_main] DEBUG: Scraped from <200 http:
//www.mass.gov/eea/agencies/dfg/der/>
    None
2014-11-04 14:48:46-0500 [govcrawl_main] INFO: Closing spider (finished)
2014-11-04 14:48:46-0500 [govcrawl_main] INFO: Dumping Scrapy stats:
    {'downloader/request_bytes': 274,
     'downloader/request_count': 1,
     'downloader/request_method_count/GET': 1,
     'downloader/response_bytes': 24320,
     'downloader/response_count': 1,
     'downloader/response_status_count/200': 1,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2014, 11, 4, 19, 48, 46, 156057),
     'item_scraped_count': 1,
     'log_count/DEBUG': 4,
     'log_count/INFO': 8,
     'pages_crawled': 1,
     'response_received_count': 1,
     'scheduler/dequeued': 1,
     'scheduler/dequeued/memory': 1,
     'scheduler/enqueued': 1,
     'scheduler/enqueued/memory': 1,
     'start_time': datetime.datetime(2014, 11, 4, 19, 48, 46, 61865)}
2014-11-04 14:48:46-0500 [govcrawl_main] INFO: Spider closed (finished)

Thanks!
Michele C

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to