A bit more detail if you noticed that response.headers representation seems to be missing some Set-Cookies values. In fact you can received multiple Set-Cookie headers, so you need to use .getlist(headername) to get them all:
Same example with Amazon.com and COOKIES_DEBUG enabled $ scrapy shell "http://www.amazon.com" --set USER_AGENT="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36" --set COOKIES_DEBUG=1 2014-07-02 15:56:32+0200 [scrapy] INFO: Scrapy 0.24.1 started (bot: scrapybot) 2014-07-02 15:56:33+0200 [default] DEBUG: Received cookies from: <200 http://www.amazon.com> Set-Cookie: skin=noskin; path=/; domain=.amazon.com Set-Cookie: x-wl-uid=1TzxsioAAJu0q37UxjzEKb4UNs0KLyIW8rCypLuAVZpMc8uplJgfLcrbX2StxWEpT59BoUyDBl5A=; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT Set-Cookie: session-id-time=2082787201l; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT Set-Cookie: session-id=182-4946683-0637966; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT 2014-07-02 15:56:33+0200 [default] DEBUG: Crawled (200) <GET http://www.amazon.com> (referer: None) [s] Available Scrapy objects: [s] crawler <scrapy.crawler.Crawler object at 0x7f1ce9d8fbd0> [s] item {} [s] request <GET http://www.amazon.com> [s] response <200 http://www.amazon.com> [s] settings <scrapy.settings.Settings object at 0x7f1cea430d50> [s] spider <Spider 'default' at 0x7f1ce94e5e50> [s] Useful shortcuts: [s] shelp() Shell help (print this help) [s] fetch(req_or_url) Fetch request (or URL) and update local objects [s] view(response) View response in a browser In [1]: response.headers Out[1]: {'Cache-Control': 'no-cache', 'Content-Type': 'text/html; charset=ISO-8859-1', 'Date': 'Wed, 02 Jul 2014 13:56:32 GMT', 'Expires': '-1', 'P3P': 'policyref="http://www.amazon.com/w3c/p3p.xml",CP="CAO DSP LAW CUR ADM IVAo IVDo CONo OTPo OUR DELi PUBi OTRi BUS PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA HEA PRE LOC GOV OTC "', 'Pragma': 'no-cache', 'Server': 'Server', 'Set-Cookie': 'session-id=182-4946683-0637966; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT', 'Vary': 'Accept-Encoding,User-Agent', 'X-Amz-Id-1': '1ZAYQZK49NGDTCJPSH1C', 'X-Amz-Id-2': 'puwShmgjkOwsTu9o4UP22PoJMqv9eeh0EOI52svdSdZ96b9VtkJbPKdwDHuojOay', 'X-Frame-Options': 'SAMEORIGIN'} In [2]: type(response.headers) Out[2]: scrapy.http.headers.Headers In [3]: response.headers.getlist("Set-Cookie") Out[3]: ['skin=noskin; path=/; domain=.amazon.com', 'x-wl-uid=1TzxsioAAJu0q37UxjzEKb4UNs0KLyIW8rCypLuAVZpMc8uplJgfLcrbX2StxWEpT59BoUyDBl5A=; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT', 'session-id-time=2082787201l; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT', 'session-id=182-4946683-0637966; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT'] In [4]: And the cookies Scrapy sends: In [4]: fetch('http://www.amazon.com/gp/goldbox/ref=cs_top_nav_gb27') 2014-07-02 15:59:24+0200 [default] DEBUG: Sending cookies to: <GET http://www.amazon.com/gp/goldbox/ref=cs_top_nav_gb27> Cookie: session-id=182-4946683-0637966; session-id-time=2082787201l; x-wl-uid=1TzxsioAAJu0q37UxjzEKb4UNs0KLyIW8rCypLuAVZpMc8uplJgfLcrbX2StxWEpT59BoUyDBl5A=; skin=noskin 2014-07-02 15:59:25+0200 [default] DEBUG: Received cookies from: <200 http://www.amazon.com/gp/goldbox/ref=cs_top_nav_gb27> Set-Cookie: ubid-main=183-9706629-1828940; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT Set-Cookie: session-id-time=2082787201l; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT Set-Cookie: session-id=182-4946683-0637966; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT 2014-07-02 15:59:25+0200 [default] DEBUG: Crawled (200) <GET http://www.amazon.com/gp/goldbox/ref=cs_top_nav_gb27> (referer: None) [s] Available Scrapy objects: [s] crawler <scrapy.crawler.Crawler object at 0x7f1ce9d8fbd0> [s] item {} [s] request <GET http://www.amazon.com/gp/goldbox/ref=cs_top_nav_gb27> [s] response <200 http://www.amazon.com/gp/goldbox/ref=cs_top_nav_gb27> [s] settings <scrapy.settings.Settings object at 0x7f1cea430d50> [s] spider <Spider 'default' at 0x7f1ce94e5e50> [s] Useful shortcuts: [s] shelp() Shell help (print this help) [s] fetch(req_or_url) Fetch request (or URL) and update local objects [s] view(response) View response in a browser In [5]: response.request.headers.getlist("Cookie") Out[5]: ['session-id=182-4946683-0637966; session-id-time=2082787201l; x-wl-uid=1TzxsioAAJu0q37UxjzEKb4UNs0KLyIW8rCypLuAVZpMc8uplJgfLcrbX2StxWEpT59BoUyDBl5A=; skin=noskin'] In [6]: response.headers.getlist("Set-Cookie") Out[6]: ['ubid-main=183-9706629-1828940; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT', 'session-id-time=2082787201l; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT', 'session-id=182-4946683-0637966; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT'] In [7]: On Wednesday, July 2, 2014 3:51:00 PM UTC+2, Paul Tremberth wrote: > > You can get "Set-Cookie" headers from the responses > > $ scrapy shell "http://www.amazon.com" --set USER_AGENT="Mozilla/5.0 > (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) > Chrome/35.0.1916.153 Safari/537.36" > 2014-07-02 14:53:12+0200 [scrapy] INFO: Scrapy 0.24.1 started (bot: > scrapybot) > 2014-07-02 14:53:12+0200 [scrapy] INFO: Optional features available: ssl, > http11, boto > 2014-07-02 14:53:12+0200 [scrapy] INFO: Overridden settings: > {'LOGSTATS_INTERVAL': 0, 'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) > AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36'} > 2014-07-02 14:53:12+0200 [scrapy] INFO: Enabled extensions: TelnetConsole, > CloseSpider, WebService, CoreStats, SpiderState > 2014-07-02 14:53:12+0200 [scrapy] INFO: Enabled downloader middlewares: > HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, > RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, > HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, > ChunkedTransferMiddleware, DownloaderStats > 2014-07-02 14:53:12+0200 [scrapy] INFO: Enabled spider middlewares: > HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, > UrlLengthMiddleware, DepthMiddleware > 2014-07-02 14:53:12+0200 [scrapy] INFO: Enabled item pipelines: > 2014-07-02 14:53:12+0200 [scrapy] DEBUG: Telnet console listening on > 127.0.0.1:6023 > 2014-07-02 14:53:12+0200 [scrapy] DEBUG: Web service listening on > 127.0.0.1:6080 > 2014-07-02 14:53:12+0200 [default] INFO: Spider opened > 2014-07-02 14:53:13+0200 [default] DEBUG: Crawled (200) <GET > http://www.amazon.com> (referer: None) > [s] Available Scrapy objects: > [s] crawler <scrapy.crawler.Crawler object at 0x7f9ff6894bd0> > [s] item {} > [s] request <GET http://www.amazon.com> > [s] response <200 http://www.amazon.com> > [s] settings <scrapy.settings.Settings object at 0x7f9ff6f35d50> > [s] spider <Spider 'default' at 0x7f9ff5feae50> > [s] Useful shortcuts: > [s] shelp() Shell help (print this help) > [s] fetch(req_or_url) Fetch request (or URL) and update local objects > [s] view(response) View response in a browser > > In [1]: response.headers > Out[1]: > {'Cache-Control': 'no-cache', > 'Content-Type': 'text/html; charset=ISO-8859-1', > 'Date': 'Wed, 02 Jul 2014 12:53:13 GMT', > 'Expires': '-1', > 'P3P': 'policyref="http://www.amazon.com/w3c/p3p.xml",CP="CAO DSP LAW > CUR ADM IVAo IVDo CONo OTPo OUR DELi PUBi OTRi BUS PHY ONL UNI PUR FIN COM > NAV INT DEM CNT STA HEA PRE LOC GOV OTC "', > 'Pragma': 'no-cache', > 'Server': 'Server', > 'Set-Cookie': 'session-id=185-4345826-3198169; path=/; domain=.amazon.com; > expires=Tue, 01-Jan-2036 08:00:01 GMT', > 'Vary': 'Accept-Encoding,User-Agent', > 'X-Amz-Id-1': '0HSR62FXE7WW8GGJ3003', > 'X-Amz-Id-2': > 'TX9doI/wHzZDQLi61C/nIydE0Sv7wjkhNs30li5KMVSEWLqRqVSvL03WYmkTnASu', > 'X-Frame-Options': 'SAMEORIGIN'} > > In [2]: > > > And "Cookie" headers from response.requests: > > In [2]: fetch('http://www.amazon.com/gp/goldbox/ref=cs_top_nav_gb27') > 2014-07-02 15:47:23+0200 [default] DEBUG: Crawled (200) <GET > http://www.amazon.com/gp/goldbox/ref=cs_top_nav_gb27> (referer: None) > [s] Available Scrapy objects: > [s] crawler <scrapy.crawler.Crawler object at 0x7f9ff6894bd0> > [s] item {} > [s] request <GET http://www.amazon.com/gp/goldbox/ref=cs_top_nav_gb27 > > > [s] response <200 http://www.amazon.com/gp/goldbox/ref=cs_top_nav_gb27 > > > [s] settings <scrapy.settings.Settings object at 0x7f9ff6f35d50> > [s] spider <Spider 'default' at 0x7f9ff5feae50> > [s] Useful shortcuts: > [s] shelp() Shell help (print this help) > [s] fetch(req_or_url) Fetch request (or URL) and update local objects > [s] view(response) View response in a browser > > In [3]: response.headers > Out[3]: > {'Content-Type': 'text/html; charset=ISO-8859-1', > 'Date': 'Wed, 02 Jul 2014 13:47:22 GMT', > 'P3P': 'policyref="http://www.amazon.com/w3c/p3p.xml",CP="CAO DSP LAW > CUR ADM IVAo IVDo CONo OTPo OUR DELi PUBi OTRi BUS PHY ONL UNI PUR FIN COM > NAV INT DEM CNT STA HEA PRE LOC GOV OTC "', > 'Server': 'Server', > 'Set-Cookie': 'session-id=185-4345826-3198169; path=/; domain=.amazon.com; > expires=Tue, 01-Jan-2036 08:00:01 GMT', > 'Vary': 'Accept-Encoding,User-Agent', > 'X-Amz-Id-1': '0C0QXN1ZK555MP10HWB5', > 'X-Amz-Id-2': > 'CcXo3odRFUSFkmnICLBbdhYKKmiygNJ/b7c3s74p2mWaRnqldFyDmhrdB9PPVK6O', > 'X-Frame-Options': 'SAMEORIGIN'} > > In [4]: response.request.headers > Out[4]: > {'Accept': > 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', > 'Accept-Encoding': 'gzip,deflate', > 'Accept-Language': 'en', > 'Cookie': 'session-id=185-4345826-3198169; session-id-time=2082787201l; > x-wl-uid=1/kDeNun+YQYYmW1esQBg6XsiW68oMT1FJXDavoxODm1tzaDnaKf1KOMU+Jmni6iWQngWZhCnOjI=; > > skin=noskin', > 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, > like Gecko) Chrome/35.0.1916.153 Safari/537.36'} > > > > > On Wednesday, July 2, 2014 7:18:31 AM UTC+2, Reggie wrote: >> >> I want to read cookies when I parse response, but I can't find cookies >> neither in response.meta or response.headers, how could I read cookies? >> > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
