Hi Travis, thanks for the response. Not sure why its not able to find it, its there, see below:
pawnbahnimac:spiders pawnbahn$ pwd /Users/pawnbahn/tm/tm/spiders pawnbahnimac:spiders pawnbahn$ ls Books Resources __init__.py __init__.pyc items.json tm_spider.py tm_spider.pyc pawnbahnimac:spiders pawnbahn$ It only behave like this on this site for some reason. Running the dmoz example works fine. pawnbahnimac:spiders pawnbahn$ scrapy crawl tm :0: UserWarning: You do not have a working installation of the service_identity module: 'No module named service_identity'. Please install it from <https://pypi.python.org/pypi/service_identity> and make sure all of its dependencies are satisfied. Without the service_identity module and a recent enough pyOpenSSL to support it, Twisted can perform only rudimentary TLS client hostname verification. Many valid certificate/hostname mappings may be rejected. 2015-04-02 14:56:01-0500 [scrapy] INFO: Scrapy 0.24.5 started (bot: tm) 2015-04-02 14:56:01-0500 [scrapy] INFO: Optional features available: ssl, http11 2015-04-02 14:56:01-0500 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tm.spiders', 'SPIDER_MODULES': ['tm.spiders'], 'BOT_NAME': 'tm'} 2015-04-02 14:56:01-0500 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState 2015-04-02 14:56:01-0500 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 2015-04-02 14:56:01-0500 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 2015-04-02 14:56:01-0500 [scrapy] INFO: Enabled item pipelines: 2015-04-02 14:56:01-0500 [tm] INFO: Spider opened 2015-04-02 14:56:01-0500 [tm] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2015-04-02 14:56:01-0500 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 2015-04-02 14:56:01-0500 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080 2015-04-02 14:56:01-0500 [tm] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None) 2015-04-02 14:56:01-0500 [tm] INFO: Closing spider (finished) 2015-04-02 14:56:01-0500 [tm] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 260, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 6234, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2015, 4, 2, 19, 56, 1, 861714), 'log_count/DEBUG': 3, 'log_count/INFO': 7, 'response_received_count': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2015, 4, 2, 19, 56, 1, 494696)} 2015-04-02 14:56:01-0500 [tm] INFO: Spider closed (finished) On Thursday, April 2, 2015 at 11:30:41 AM UTC-5, Travis Leleu wrote: > > Python can't find the file whose path is stored in filename. Used in line > 13 of your spider. Read your scrapy debug output to find out more > information. > > File "/Users/pawnbahn/tm/tm/spiders/tm_spider.py", line 13, in parse > with open(filename, 'wb') as f: > exceptions.IOError: [Errno 2] No such file or directory: '' > > On Wed, Apr 1, 2015 at 10:38 PM, Troy Perkins <[email protected] > <javascript:>> wrote: > >> Greetings all: >> >> I'm new to scrapy and managed to get everything installed and working. >> However my simple test project has proven not so simple, at least for me. >> >> I'm simply wanting to request the home page of t 1 c k e t m a s t e r d >> o t c o m, click the red Just Announced tab down the middle of the page and >> -o the list of results out to an email address once a day via cron. I want >> to be able to keep up with the announcements because their mailing lists >> simply don't send them soon enough. >> >> Here is my starting spider, which I've tested with other sites and its >> works fine. I believe the error is due to it being a javascript rendered >> site. I've used firebug to look for clues but I'm too new at this to >> understand as well as understand javascript. I'm hoping someone would be >> willing to point this noob a direction. I've also tried removing >> middleware in the settings.py file with same results. >> >> I've purposely masked out the site address as though I don't mean any >> harm, I'm not quite sure of their ToS as of yet. I plan to poll once a day >> anyway for personal use. >> >> import scrapy >> >> from tm.items import TmItem >> >> class TmSpider(scrapy.Spider): >> name = "tm" >> allowed_domains = ["www.************.com"] >> start_urls = [ >> "http://www.***********.com" >> ] >> def parse(self, response): >> filename = response.url.split("/")[-2] >> with open(filename, 'wb') as f: >> f.write(response.body) >> >> scrapy crawl tm results in the following: >> >> :0: UserWarning: You do not have a working installation of the >> service_identity module: 'No module named service_identity'. Please >> install it from <https://pypi.python.org/pypi/service_identity> and make >> sure all of its dependencies are satisfied. Without the service_identity >> module and a recent enough pyOpenSSL to support it, Twisted can perform >> only rudimentary TLS client hostname verification. Many valid >> certificate/hostname mappings may be rejected. >> 2015-04-02 00:30:12-0500 [scrapy] INFO: Scrapy 0.24.5 started (bot: tm) >> 2015-04-02 00:30:12-0500 [scrapy] INFO: Optional features available: ssl, >> http11 >> 2015-04-02 00:30:12-0500 [scrapy] INFO: Overridden settings: >> {'NEWSPIDER_MODULE': 'tm.spiders', 'SPIDER_MODULES': ['tm.spiders'], >> 'BOT_NAME': 'tm'} >> 2015-04-02 00:30:12-0500 [scrapy] INFO: Enabled extensions: LogStats, >> TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState >> 2015-04-02 00:30:12-0500 [scrapy] INFO: Enabled downloader middlewares: >> HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, >> RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, >> HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, >> ChunkedTransferMiddleware, DownloaderStats >> 2015-04-02 00:30:12-0500 [scrapy] INFO: Enabled spider middlewares: >> HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, >> UrlLengthMiddleware, DepthMiddleware >> 2015-04-02 00:30:12-0500 [scrapy] INFO: Enabled item pipelines: >> 2015-04-02 00:30:12-0500 [tm] INFO: Spider opened >> 2015-04-02 00:30:12-0500 [tm] INFO: Crawled 0 pages (at 0 pages/min), >> scraped 0 items (at 0 items/min) >> 2015-04-02 00:30:12-0500 [scrapy] DEBUG: Telnet console listening on >> 127.0.0.1:6023 >> 2015-04-02 00:30:12-0500 [scrapy] DEBUG: Web service listening on >> 127.0.0.1:6080 >> 2015-04-02 00:30:13-0500 [tm] DEBUG: Crawled (200) <GET >> http://www.****************com> >> (referer: None) >> 2015-04-02 00:30:13-0500 [tm] ERROR: Spider error processing <GET >> http://www.****************.com> >> Traceback (most recent call last): >> File "/usr/local/lib/python2.7/site-packages/twisted/internet/base.py", >> line 1201, in mainLoop >> self.runUntilCurrent() >> File "/usr/local/lib/python2.7/site-packages/twisted/internet/base.py", >> line 824, in runUntilCurrent >> call.func(*call.args, **call.kw) >> File >> "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line >> 383, in callback >> self._startRunCallbacks(result) >> File >> "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line >> 491, in _startRunCallbacks >> self._runCallbacks() >> --- <exception caught here> --- >> File >> "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line >> 578, in _runCallbacks >> current.result = callback(current.result, *args, **kw) >> File "/Users/pawnbahn/tm/tm/spiders/tm_spider.py", line 13, in parse >> with open(filename, 'wb') as f: >> exceptions.IOError: [Errno 2] No such file or directory: '' >> 2015-04-02 00:30:13-0500 [tm] INFO: Closing spider (finished) >> 2015-04-02 00:30:13-0500 [tm] INFO: Dumping Scrapy stats: >> {'downloader/request_bytes': 219, >> 'downloader/request_count': 1, >> 'downloader/request_method_count/GET': 1, >> 'downloader/response_bytes': 73266, >> 'downloader/response_count': 1, >> 'downloader/response_status_count/200': 1, >> 'finish_reason': 'finished', >> 'finish_time': datetime.datetime(2015, 4, 2, 5, 30, 13, 3001), >> 'log_count/DEBUG': 3, >> 'log_count/ERROR': 1, >> 'log_count/INFO': 7, >> 'response_received_count': 1, >> 'scheduler/dequeued': 1, >> 'scheduler/dequeued/memory': 1, >> 'scheduler/enqueued': 1, >> 'scheduler/enqueued/memory': 1, >> 'spider_exceptions/IOError': 1, >> 'start_time': datetime.datetime(2015, 4, 2, 5, 30, 12, 344868)} >> 2015-04-02 00:30:13-0500 [tm] INFO: Spider closed (finished) >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
