Maybe it's too late, but I think that it needs to be answered.
To understand what's happening you need to clarify one thing:
for handling HTTP 1.0 requests, Scrapy uses twisted.web.http.HTTPClient
class
(http://twistedmatrix.com/documents/8.1.0/api/twisted.web.http.HTTPClient.html),
but for HTTP 1.1 handling it uses more high-level client called
twisted.web.client.Agent
(http://twistedmatrix.com/documents/13.1.0/api/twisted.web.client.Agent.html).
That's why there is no HTTPClientFactory to override.
So, to extend the default functionality, you need to override 'http' and
'https' handlers in DOWNLOAD_HANDLERS settings variable.
Like this:
DOWNLOAD_HANDLERS = {
'http': 'myproject.downloadhandlers.http11.MyHTTP11DownloadHandler',
'https': 'myproject.downloadhandlers.http11.MyHTTP11DownloadHandler',
}
And your http11.py will looks like this:
from scrapy.core.downloader.handlers.http11 import HTTP11DownloadHandler,
ScrapyAgent
class MyHTTP11DownloadHandler(HTTP11DownloadHandler):
def download_request(self, request, spider):
"""Return a deferred for the HTTP download"""
agent = MyScrapyAgent(contextFactory=self._contextFactory, pool=self
._pool)
return agent.download_request(request)
class MyScrapyAgent(ScrapyAgent):
def _cb_bodyready(self, txresponse, request):
"""
Prevents body downloading if content-length
is more than constant value
"""
content_length = int(txresponse.headers.getRawHeaders(
"content-length", [0])[0])
if content_length > MAX_RESPONSE_SIZE:
return txresponse, '', None
return super(MyScrapyAgent, self)._cb_bodyready(txresponse, request)
That's all!
If you need to extend this with more complex way, just see to
scrapy/core/downloader/handlers/http11.py and read the twisted
documentation above.
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.