Hello.
I'm using Ubuntu 14.04 and Scrapy 0.24.5. I run a simple scrapy crawl:
# -*- coding: utf-8 -*-
import scrapy
class GoogleSpider(scrapy.Spider):
name = "google"
allowed_domains = ["https://google.com"]
start_urls = (
'https://google.com/',
)
def parse(self, response):
pass
And have an error:
2015-03-03 14:03:40+0200 [scrapy] INFO: Scrapy 0.24.5 started (bot:
myproject)
2015-03-03 14:03:40+0200 [scrapy] INFO: Optional features available: ssl,
django
2015-03-03 14:03:40+0200 [scrapy] INFO: Overridden settings: {
'NEWSPIDER_MODULE': 'myproject.spiders', 'SPIDER_MODULES': [
'myproject.spiders'], 'BOT_NAME': 'myproject'}
2015-03-03 14:03:40+0200 [scrapy] INFO: Enabled extensions: LogStats,
TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2015-03-03 14:03:40+0200 [scrapy] INFO: Enabled downloader middlewares:
HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware,
RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware,
HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware,
ChunkedTransferMiddleware, DownloaderStats
2015-03-03 14:03:40+0200 [scrapy] INFO: Enabled spider middlewares:
HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware,
UrlLengthMiddleware, DepthMiddleware
2015-03-03 14:03:40+0200 [scrapy] INFO: Enabled item pipelines:
2015-03-03 14:03:40+0200 [google] INFO: Spider opened
2015-03-03 14:03:40+0200 [google] INFO: Crawled 0 pages (at 0 pages/min),
scraped 0 items (at 0 items/min)
2015-03-03 14:03:40+0200 [scrapy] DEBUG: Telnet console listening on 127.0.
0.1:6023
2015-03-03 14:03:40+0200 [scrapy] DEBUG: Web service listening on 127.0.0.1:
6080
2015-03-03 14:03:40+0200 [ScrapyHTTPPageGetter,client] Unhandled Error
Traceback (most recent call last):
File
"/usr/local/lib/python2.7/site-packages/Twisted-10.0.0-py2.7-linux-x86_64.egg/twisted/python/log.py"
, line 84, in callWithLogger
return callWithContext({"system": lp}, func, *args, **kw)
File
"/usr/local/lib/python2.7/site-packages/Twisted-10.0.0-py2.7-linux-x86_64.egg/twisted/python/log.py"
, line 69, in callWithContext
return context.call({ILogContext: newCtx}, func, *args, **kw)
File
"/usr/local/lib/python2.7/site-packages/Twisted-10.0.0-py2.7-linux-x86_64.egg/twisted/python/context.py"
, line 59, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args,
**kw)
File
"/usr/local/lib/python2.7/site-packages/Twisted-10.0.0-py2.7-linux-x86_64.egg/twisted/python/context.py"
, line 37, in callWithContext
return func(*args,**kw)
--- <exception caught here> ---
File
"/usr/local/lib/python2.7/site-packages/Twisted-10.0.0-py2.7-linux-x86_64.egg/twisted/internet/selectreactor.py"
, line 146, in _doReadOrWrite
why = getattr(selectable, method)()
File
"/usr/local/lib/python2.7/site-packages/Twisted-10.0.0-py2.7-linux-x86_64.egg/twisted/internet/tcp.py"
, line 177, in doWrite
return Connection.doWrite(self)
File
"/usr/local/lib/python2.7/site-packages/Twisted-10.0.0-py2.7-linux-x86_64.egg/twisted/internet/tcp.py"
, line 428, in doWrite
result = abstract.FileDescriptor.doWrite(self)
File
"/usr/local/lib/python2.7/site-packages/Twisted-10.0.0-py2.7-linux-x86_64.egg/twisted/internet/abstract.py"
, line 115, in doWrite
l = self.writeSomeData(self.dataBuffer)
File
"/usr/local/lib/python2.7/site-packages/Twisted-10.0.0-py2.7-linux-x86_64.egg/twisted/internet/tcp.py"
, line 181, in writeSomeData
return Connection.writeSomeData(self, data)
File
"/usr/local/lib/python2.7/site-packages/Twisted-10.0.0-py2.7-linux-x86_64.egg/twisted/internet/tcp.py"
, line 474, in writeSomeData
return self.socket.send(buffer(data, 0, self.SEND_LIMIT))
File "build/bdist.linux-x86_64/egg/OpenSSL/SSL.py", line 947, in
send
exceptions.TypeError: data must be a byte string
2015-03-03 14:03:40+0200 [google] ERROR: Error downloading <GET
https://google.com/>:
data must be a byte string
2015-03-03 14:03:40+0200 [google] INFO: Closing spider (finished)
2015-03-03 14:03:40+0200 [google] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
'downloader/exception_type_count/exceptions.TypeError': 1,
'downloader/request_bytes': 209,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2015, 3, 3, 12, 3, 40, 319498),
'log_count/DEBUG': 2,
'log_count/ERROR': 2,
'log_count/INFO': 7,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2015, 3, 3, 12, 3, 40, 255178)}
2015-03-03 14:03:40+0200 [google] INFO: Spider closed (finished)
I tried to reinstall openssl and scrapy but it didn't help.
If someone could help me I would be incredibly grateful.
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.