Hi, here is the code and logs: https://gist.github.com/rolando/e3da0515aff240dde3e790196809b4d6
I had to increase the wait time to 10 as I was getting empty result with 5. Best, Rolando On Fri, Jun 3, 2016 at 11:46 PM, David Fishburn <[email protected]> wrote: > I have made some headway. > > It seems things are not working since Scrapy / Splash is sending a POST > request as seen in the Splash log: > > Scrapy output: > > 2016-06-03 19:43:37 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), > scraped 0 items (at 0 items/min) > 2016-06-03 19:43:37 [scrapy] DEBUG: Telnet console listening on 127.0.0.1: > 6023 > 2016-06-03 19:43:37 [scrapy] DEBUG: Crawled (404) <GET https:// > sapui5.hana.ondemand.com/robots.txt> (referer: None) > 2016-06-03 19:43:37 [scrapy] DEBUG: Crawled (200) <GET https:// > sapui5.hana.ondemand.com/> (referer: None) > 2016-06-03 19:43:37 [scrapy] DEBUG: Crawled (404) <GET > http://localhost:8050/robots.txt> > (referer: None) > 2016-06-03 19:43:42 [scrapy] DEBUG: Crawled (200) <GET https:// > sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html via > http://localhost:8050/render.html> (referer: None) > > > Splash Window > > 2016-06-04 02:43:42.574895 [pool] [140619310439728] SLOT 10 done with < > splash.qtrender.HtmlRender object at 0x7fe4341c00b8> > 2016-06-04 02:43:42.576237 [events] {"active": 0, "path": "/render.html", > "rendertime": 5.003755807876587, "maxrss": 94368, "client_ip": > "172.17.0.1", "qsize": 0, "method": "POST", "user-agent": "Mozilla/5.0 > (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) > Chrome/50.0.2661.102 Safari/537.36", "timestamp": 1465008222, "load": [ > 0.09, 0.05, 0.05], "status_code": 200, "fds": 19, "_id": 140619310439728, > "args": {"height": 768, "headers": {"Accept-Encoding": "gzip,deflate", > "Referer": "https://sapui5.hana.ondemand.com/", "User-Agent": "Mozilla/5.0 > (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) > Chrome/50.0.2661.102 Safari/537.36", "Accept": > "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", > "Accept-Language": "en"}, "uid": 140619310439728, "png": 1, "iframes": 1, > "wait": 5.0, "url": " > https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html", > "http_method": "GET", "timeout": 10, "script": 1, "width": 1024, "html": 1 > , "console": 1}} > 2016-06-04 02:43:42.576691 [-] "172.17.0.1" - - [04/Jun/2016:02:43:41 + > 0000] "POST /render.html HTTP/1.1" 200 1830 "-" "Mozilla/5.0 (Windows NT > 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) > Chrome/50.0.2661.102 Safari/537.36" > 2016-06-04 02:43:42.577109 [pool] SLOT 10 is available > > > When I use this curl request: > > curl ' > http://localhost:8050/render.html?url=https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html&iframe=1&html=1&png=1&width=1024&height=768&script=1&console=1&timeout=10&wait=0.5 > ' > > > > When I use curl, it uses a GET request, and the data is rendered > appropriately. > > > 2016-06-04 02:45:06.550405 [pool] [140619313333752] SLOT 11 done with < > splash.qtrender.HtmlRender object at 0x7fe47c038390> > 2016-06-04 02:45:06.551410 [events] {"active": 0, "path": "/render.html", > "rendertime": 0.7969868183135986, "maxrss": 94368, "client_ip": > "172.17.0.1", "qsize": 0, "method": "GET", "user-agent": "curl/7.47.0", > "timestamp": 1465008306, "load": [0.23, 0.11, 0.07], "status_code": 200, > "fds": 19, "_id": 140619313333752, "args": {"height": "768", "console": > "1", "iframe": "1", "uid": 140619313333752, "png": "1", "width": "1024", > "wait": "0.5", "url": " > https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html", "timeout" > : "10", "script": "1", "html": "1"}} > 2016-06-04 02:45:06.552238 [-] "172.17.0.1" - - [04/Jun/2016:02:45:05 + > 0000] "GET /render.html?url= > https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html&iframe=1&html=1&png=1&width=1024&height=768&script=1&console=1&timeout=10&wait=0.5 > HTTP/1.1" 200 5562 "-" "curl/7.47.0" > 2016-06-04 02:45:06.552681 [pool] SLOT 11 is available > > > > > No matter what I try in my spider, it always sends a POST request; > Here is my latest code: > > def parse(self, response): > #url = ' > https://sapui5.hana.ondemand.com/sdk/#docs/api/symbols/sap.html' > url = ' > https://sapui5.hana.ondemand.com/sdk/docs/api/symbols/sap.html' > yield SplashRequest(url, self.parse_page, > args={ > 'http_method': 'GET', > 'timeout': 10, > 'wait': 5., > 'iframes': 1, > 'html': 1, > 'png': 1, > 'script': 1, > 'console': 1, > 'width': 1024, > 'height': 768, > }, > endpoint='render.html') > > > Whether I use render.json or render.html, same result a POST request is > sent. > > Any idea how to change that? > > David > > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
