I have made some headway.

It seems things are not working since Scrapy / Splash is sending a POST 
request as seen in the Splash log:

Scrapy output:

2016-06-03 19:43:37 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), 
scraped 0 items (at 0 items/min)
2016-06-03 19:43:37 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:
6023
2016-06-03 19:43:37 [scrapy] DEBUG: Crawled (404) <GET 
https://sapui5.hana.ondemand.com/robots.txt> 
(referer: None)
2016-06-03 19:43:37 [scrapy] DEBUG: Crawled (200) <GET 
https://sapui5.hana.ondemand.com/> 
(referer: None)
2016-06-03 19:43:37 [scrapy] DEBUG: Crawled (404) <GET 
http://localhost:8050/robots.txt> 
(referer: None)
2016-06-03 19:43:42 [scrapy] DEBUG: Crawled (200) <GET 
https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html 
via http://localhost:8050/render.html> (referer: None)


Splash Window

2016-06-04 02:43:42.574895 [pool] [140619310439728] SLOT 10 done with <
splash.qtrender.HtmlRender object at 0x7fe4341c00b8>
2016-06-04 02:43:42.576237 [events] {"active": 0, "path": "/render.html", 
"rendertime": 5.003755807876587, "maxrss": 94368, "client_ip": "172.17.0.1", 
"qsize": 0, "method": "POST", "user-agent": "Mozilla/5.0 (Windows NT 10.0; 
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 
Safari/537.36", "timestamp": 1465008222, "load": [0.09, 0.05, 0.05], 
"status_code": 200, "fds": 19, "_id": 140619310439728, "args": {"height": 
768, "headers": {"Accept-Encoding": "gzip,deflate", "Referer": 
"https://sapui5.hana.ondemand.com/";, "User-Agent": "Mozilla/5.0 (Windows NT 
10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/50.0.2661.102 Safari/537.36", "Accept": 
"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
"Accept-Language": "en"}, "uid": 140619310439728, "png": 1, "iframes": 1, 
"wait": 5.0, "url": 
"https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html";, 
"http_method": "GET", "timeout": 10, "script": 1, "width": 1024, "html": 1, 
"console": 1}}
2016-06-04 02:43:42.576691 [-] "172.17.0.1" - - [04/Jun/2016:02:43:41 +0000] 
"POST 
/render.html HTTP/1.1" 200 1830 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; 
x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 
Safari/537.36"
2016-06-04 02:43:42.577109 [pool] SLOT 10 is available


When I use this curl request:

curl 
'http://localhost:8050/render.html?url=https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html&iframe=1&html=1&png=1&width=1024&height=768&script=1&console=1&timeout=10&wait=0.5'



When I use curl, it uses a GET request, and the data is rendered 
appropriately.


2016-06-04 02:45:06.550405 [pool] [140619313333752] SLOT 11 done with <
splash.qtrender.HtmlRender object at 0x7fe47c038390>
2016-06-04 02:45:06.551410 [events] {"active": 0, "path": "/render.html", 
"rendertime": 0.7969868183135986, "maxrss": 94368, "client_ip": "172.17.0.1"
, "qsize": 0, "method": "GET", "user-agent": "curl/7.47.0", "timestamp": 
1465008306, "load": [0.23, 0.11, 0.07], "status_code": 200, "fds": 19, "_id"
: 140619313333752, "args": {"height": "768", "console": "1", "iframe": "1", 
"uid": 140619313333752, "png": "1", "width": "1024", "wait": "0.5", "url": 
"https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html";, "timeout": 
"10", "script": "1", "html": "1"}}
2016-06-04 02:45:06.552238 [-] "172.17.0.1" - - [04/Jun/2016:02:45:05 +0000] 
"GET 
/render.html?url=https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html&iframe=1&html=1&png=1&width=1024&height=768&script=1&console=1&timeout=10&wait=0.5
 
HTTP/1.1" 200 5562 "-" "curl/7.47.0"
2016-06-04 02:45:06.552681 [pool] SLOT 11 is available




No matter what I try in my spider, it always sends a POST request;
Here is my latest code:

  def parse(self, response):
        #url = 
'https://sapui5.hana.ondemand.com/sdk/#docs/api/symbols/sap.html'
        url = 
'https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html'
        yield SplashRequest(url, self.parse_page,
                            args={
                                'http_method': 'GET',
                                'timeout': 10,
                                'wait': 5.,
                                'iframes': 1,
                                'html': 1,
                                'png': 1,
                                'script': 1,
                                'console': 1,
                                'width': 1024,
                                'height': 768,
                            },
                            endpoint='render.html')


Whether I use render.json or render.html, same result a POST request is 
sent.

Any idea how to change that?

David


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to