On Tuesday, November 4, 2014 4:10:59 PM UTC+1, Kiuhnm wrote: > On Tuesday, November 4, 2014 4:00:51 PM UTC+1, Chris Angelico wrote: > > On Wed, Nov 5, 2014 at 1:53 AM, Kiuhnm <gandal...@mail.com> wrote: > > > I wish to automate the downloading from a particular site which has some > > > ADs and which requires to click on a lot of buttons before the download > > > starts. > > > > > > What library should I use to handle HTTP? > > > Also, I need to support big files (> 1 GB) so the library should hand the > > > data to me chunk by chunk. > > > > You may be violating the site's terms of service, so be aware of what > > you're doing. > > > > This could be a really simple job (just figure out what the last HTTP > > query is, and replicate that), or it could be insanely complicated > > (crypto, JavaScript, and/or timestamped URLs could easily be > > involved). To start off, I would recommend not writing a single like > > of Python code, but just pulling up Mozilla Firefox with Firebug, or > > Google Chrome with in-built inspection tools, or some equivalent, and > > watching the exact queries that go through. Once you figure out what > > queries are happening, you can figure out how to do them in Python. > > > > ChrisA > > It'll be tricky. I'm sure of that, but if the browser can do it, so can I :) > Fortunately, there are no captchas.
There are no captcha but the site is behind cloudflare (DDOS protection). Anyway, I now know what to do. To deal with cloudflare's javascript challenge I'm going to use jsdb, a neat little javascript interpreter. By the way, I'm using requests instead of urllib, but I need to figure out how to download and write to disk big files. -- https://mail.python.org/mailman/listinfo/python-list