Maybe it's too late, but I think that it needs to be answered.

To understand what's happening you need to clarify one thing:
for handling HTTP 1.0 requests, Scrapy uses twisted.web.http.HTTPClient 
class 
(http://twistedmatrix.com/documents/8.1.0/api/twisted.web.http.HTTPClient.html),
 
but for HTTP 1.1 handling it uses more high-level client called 
twisted.web.client.Agent 
(http://twistedmatrix.com/documents/13.1.0/api/twisted.web.client.Agent.html). 
That's why there is no HTTPClientFactory to override.

So, to extend the default functionality, you need to override 'http' and 
'https' handlers in DOWNLOAD_HANDLERS settings variable.
Like this:
DOWNLOAD_HANDLERS = {
    'http': 'myproject.downloadhandlers.http11.MyHTTP11DownloadHandler',
    'https': 'myproject.downloadhandlers.http11.MyHTTP11DownloadHandler',
}

And your http11.py will looks like this:
from scrapy.core.downloader.handlers.http11 import HTTP11DownloadHandler, 
ScrapyAgent

class MyHTTP11DownloadHandler(HTTP11DownloadHandler):

    def download_request(self, request, spider):
        """Return a deferred for the HTTP download"""

        agent = MyScrapyAgent(contextFactory=self._contextFactory, pool=self
._pool)
        return agent.download_request(request)


class MyScrapyAgent(ScrapyAgent):

    def _cb_bodyready(self, txresponse, request):
        """
          Prevents body downloading if content-length
          is more than constant value
        """

        content_length = int(txresponse.headers.getRawHeaders(
"content-length", [0])[0])

        if content_length > MAX_RESPONSE_SIZE:
            return txresponse, '', None

        return super(MyScrapyAgent, self)._cb_bodyready(txresponse, request)

That's all!
If you need to extend this with more complex way, just see to 
scrapy/core/downloader/handlers/http11.py and read the twisted 
documentation above.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to